TY - GEN
T1 - A fast and exact algorithm for the exemplar breakpoint distance
AU - Shao, Mingfu
AU - Moret, Bernard M.E.
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - A fundamental problem in comparative genomics is to compute the distance between two genomes. For two genomes without duplicate genes, we can easily compute a variety of distance measures in linear time, but the problem is NP-hard under most models when genomes contain duplicate genes. Sankoff proposed the use of exemplars to tackle the problem of duplicates genes and gene families: each gene family is represented by a single gene (the exemplar for that family), chosen so as to optimize some metric. Unfortunately, choosing exemplars is itself an NP-hard problem. In this paper, we propose a very fast and exact algorithm to compute the exemplar breakpoint distance, based on new insights in the underlying structure of genome rearrangements and exemplars. We evaluate the performance of our algorithm on simulation data and compare its performance to the best effort to date (a divideand- conquer approach), showing that our algorithm runs much faster and scales much better. We also devise a new algorithm for the generalized breakpoint distance problem, which can then be applied to assign orthologs. We compare our algorithm with the state-of-the-art method MSOAR by assigning orthologs among five well annotated mammalian genomes, showing that our algorithm runs much faster and is slightly more accurate than MSOAR.
AB - A fundamental problem in comparative genomics is to compute the distance between two genomes. For two genomes without duplicate genes, we can easily compute a variety of distance measures in linear time, but the problem is NP-hard under most models when genomes contain duplicate genes. Sankoff proposed the use of exemplars to tackle the problem of duplicates genes and gene families: each gene family is represented by a single gene (the exemplar for that family), chosen so as to optimize some metric. Unfortunately, choosing exemplars is itself an NP-hard problem. In this paper, we propose a very fast and exact algorithm to compute the exemplar breakpoint distance, based on new insights in the underlying structure of genome rearrangements and exemplars. We evaluate the performance of our algorithm on simulation data and compare its performance to the best effort to date (a divideand- conquer approach), showing that our algorithm runs much faster and scales much better. We also devise a new algorithm for the generalized breakpoint distance problem, which can then be applied to assign orthologs. We compare our algorithm with the state-of-the-art method MSOAR by assigning orthologs among five well annotated mammalian genomes, showing that our algorithm runs much faster and is slightly more accurate than MSOAR.
UR - http://www.scopus.com/inward/record.url?scp=84926352653&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84926352653&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-16706-0_31
DO - 10.1007/978-3-319-16706-0_31
M3 - Conference contribution
AN - SCOPUS:84926352653
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 309
EP - 322
BT - Research in Computational Molecular Biology - 19th Annual International Conference, RECOMB 2015, Proceedings
A2 - Przytycka, Teresa M.
PB - Springer Verlag
T2 - 19th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2015
Y2 - 12 April 2015 through 15 April 2015
ER -