TY - GEN
T1 - On the DCJ median problem
AU - Shao, Mingfu
AU - Moret, Bernard M.E.
PY - 2014
Y1 - 2014
N2 - As many whole genomes are sequenced, comparative genomics is moving from pairwise comparisons to multiway comparisons framed within a phylogenetic tree. A central problem in this process is the inference of data for internal nodes of the tree from data given at the leaves. When phrased as an optimization problem, this problem reduces to computing a median of three genomes under the operations (evolutionary changes) of interest. We focus on the universal rearrangement operation known as double-cut-and join (DCJ) and present three contributions to the DCJ median problem. First, we describe a new strategy to find so-called adequate subgraphs in the multiple breakpoint graph, using a seed genome. We show how to compute adequate subgraphs w.r.t. this seed genome using a network flow formulation. Second, we prove that the upper bound of the median distance computed from the triangle inequality is tight. Finally, we study the question of whether the median distance can reach its lower and upper bounds. We derive a necessary and sufficient condition for the median distance to reach its lower bound and a necessary condition for it to reach its upper bound and design algorithms to test for these conditions.
AB - As many whole genomes are sequenced, comparative genomics is moving from pairwise comparisons to multiway comparisons framed within a phylogenetic tree. A central problem in this process is the inference of data for internal nodes of the tree from data given at the leaves. When phrased as an optimization problem, this problem reduces to computing a median of three genomes under the operations (evolutionary changes) of interest. We focus on the universal rearrangement operation known as double-cut-and join (DCJ) and present three contributions to the DCJ median problem. First, we describe a new strategy to find so-called adequate subgraphs in the multiple breakpoint graph, using a seed genome. We show how to compute adequate subgraphs w.r.t. this seed genome using a network flow formulation. Second, we prove that the upper bound of the median distance computed from the triangle inequality is tight. Finally, we study the question of whether the median distance can reach its lower and upper bounds. We derive a necessary and sufficient condition for the median distance to reach its lower bound and a necessary condition for it to reach its upper bound and design algorithms to test for these conditions.
UR - http://www.scopus.com/inward/record.url?scp=84958534648&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958534648&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-07566-2_28
DO - 10.1007/978-3-319-07566-2_28
M3 - Conference contribution
AN - SCOPUS:84958534648
SN - 9783319075655
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 273
EP - 282
BT - Combinatorial Pattern Matching - 25th Annual Symposium, CPM 2014, Proceedings
PB - Springer Verlag
T2 - 25th Annual Symposium on Combinatorial Pattern Matching, CPM 2014
Y2 - 16 June 2014 through 18 June 2014
ER -