TY - JOUR
T1 - Simultaneous phylogeny reconstruction and multiple sequence alignment
AU - Yue, Feng
AU - Shi, Jian
AU - Tang, Jijun
N1 - Funding Information:
The authors were supported by US National Institutes of Health (NIH grant number R01 GM078991). F.Y. is also supported by the laboratory of gene regulation, Ludwig Institute for Cancer Research, UCSD School of Medicine. All experiments were conducted on a 128-core shared memory computer funded by US National Science Foundation (NSF grant number CNS 0708391).
PY - 2009/1/30
Y1 - 2009/1/30
N2 - Background: A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results: We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion: We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments.
AB - Background: A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results: We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion: We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments.
UR - http://www.scopus.com/inward/record.url?scp=60849092251&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=60849092251&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-10-S1-S11
DO - 10.1186/1471-2105-10-S1-S11
M3 - Article
C2 - 19208110
AN - SCOPUS:60849092251
SN - 1471-2105
VL - 10
JO - BMC bioinformatics
JF - BMC bioinformatics
IS - SUPPL. 1
M1 - S11
ER -