TY - GEN
T1 - An exact algorithm to compute the DCJ distance for genomes with duplicate genes
AU - Shao, Mingfu
AU - Lin, Yu
AU - Moret, Bernard
PY - 2014
Y1 - 2014
N2 - Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this paper, we propose an ILP (integer linear programming) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse and rat genomes, where once again our method outperforms MSOAR.
AB - Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this paper, we propose an ILP (integer linear programming) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse and rat genomes, where once again our method outperforms MSOAR.
UR - http://www.scopus.com/inward/record.url?scp=84958532388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958532388&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-05269-4_22
DO - 10.1007/978-3-319-05269-4_22
M3 - Conference contribution
AN - SCOPUS:84958532388
SN - 9783319052687
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 280
EP - 292
BT - Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings
PB - Springer Verlag
T2 - 18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014
Y2 - 2 April 2014 through 5 April 2014
ER -