TY - GEN
T1 - A divide-and-conquer implementation of three sequence alignment and ancestor inference
AU - Yue, Feng
AU - Tang, Jijun
PY - 2007
Y1 - 2007
N2 - In this paper, we present an algorithm to simultaneously align three biological sequences with affine gap model and infer their common ancestral sequence. Our algorithm can be further extended to perform tree alignment for more sequences, and eventually unify the two procedures of phylogenetic reconstruction and sequence alignment. The novelty of our algorithm is: it applies the divide-and-conquer strategy so that the memory usage is reduced from O(n3) to O(n2), while at the same time, it is based on dynamic programming and optimal alignment is guaranteed. Traditionally, three sequence alignment is limited by the huge demand of memory space and can only handle sequences less than two hundred characters long. With the new improved algorithm, we can produce the optimal alignment of sequences of several thousand characters long. We implemented our algorithm as a C program package MSAM. It has been extensively tested with BAliBASE, a real manually refined multiple sequence alignment database, as well as simulated datasets generated by Rose (Random Model of Sequence Evolution). We compared our results with those of other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. The experiment shows that MSAM produces not only better alignment, but also better ancestral sequence. The software can be downloaded for free at http://www.cse.sc.edu/phylo/MSAM.html.
AB - In this paper, we present an algorithm to simultaneously align three biological sequences with affine gap model and infer their common ancestral sequence. Our algorithm can be further extended to perform tree alignment for more sequences, and eventually unify the two procedures of phylogenetic reconstruction and sequence alignment. The novelty of our algorithm is: it applies the divide-and-conquer strategy so that the memory usage is reduced from O(n3) to O(n2), while at the same time, it is based on dynamic programming and optimal alignment is guaranteed. Traditionally, three sequence alignment is limited by the huge demand of memory space and can only handle sequences less than two hundred characters long. With the new improved algorithm, we can produce the optimal alignment of sequences of several thousand characters long. We implemented our algorithm as a C program package MSAM. It has been extensively tested with BAliBASE, a real manually refined multiple sequence alignment database, as well as simulated datasets generated by Rose (Random Model of Sequence Evolution). We compared our results with those of other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. The experiment shows that MSAM produces not only better alignment, but also better ancestral sequence. The software can be downloaded for free at http://www.cse.sc.edu/phylo/MSAM.html.
UR - http://www.scopus.com/inward/record.url?scp=49049095503&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49049095503&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2007.40
DO - 10.1109/BIBM.2007.40
M3 - Conference contribution
AN - SCOPUS:49049095503
SN - 0769530311
SN - 9780769530314
T3 - Proceedings - 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007
SP - 143
EP - 150
BT - Proceedings - 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007
T2 - 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007
Y2 - 2 November 2007 through 4 November 2007
ER -