TY - JOUR
T1 - Phyrn
T2 - A robust method for phylogenetic analysis of highly divergent sequences
AU - Bhardwaj, Gaurav
AU - Ko, Kyung Dae
AU - Hong, Yoojin
AU - Zhang, Zhenhai
AU - Ho, Ngai Lam
AU - Chintapalli, Sree V.
AU - Kline, Lindsay A.
AU - Gotlin, Matthew
AU - Hartranft, David Nicholas
AU - Patterson, Morgen E.
AU - Dave, Foram
AU - Smith, Evan J.
AU - Holmes, Edward C.
AU - Patterson, Randen L.
AU - van Rossum, Damian B.
PY - 2012/4/13
Y1 - 2012/4/13
N2 - Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.
AB - Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.
UR - http://www.scopus.com/inward/record.url?scp=84859708669&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859708669&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0034261
DO - 10.1371/journal.pone.0034261
M3 - Article
C2 - 22514627
AN - SCOPUS:84859708669
SN - 1932-6203
VL - 7
JO - PloS one
JF - PloS one
IS - 4
M1 - e34261
ER -