TY - JOUR
T1 - Robustness to divergence time underestimation when inferring species trees from estimated gene trees
AU - Degiorgio, Michael
AU - Degnan, James H.
N1 - Funding Information:
FUNDING This work was supported by National Science Foundation grants [DBI-1103639] and [DBI-1146722]. J.H.D was additionally funded by the New Zealand Marsden Fund and the National Institute for Mathematical and Biological Synthesis, an institute sponsored by the National Science Foundation, the US Department of Homeland Security, and the U.S. Department of Agriculture through NSF Award #EF-0832858, with additional support from the University of Tennessee, Knoxville.
PY - 2014/1
Y1 - 2014/1
N2 - To infer species trees from gene trees estimated from phylogenomic data sets, tractable methods are needed that can handle dozens to hundreds of loci. We examine several computationally efficient approaches-MP-EST, STAR, STEAC, STELLS, and STEM-for inferring species trees from gene trees estimated using maximum likelihood (ML) and Bayesian approaches. Among the methods examined, we found that topology-based methods often performed better using ML gene trees and methods employing coalescent times typically performed better using Bayesian gene trees, with MP-EST, STAR, STEAC, and STELLS outperforming STEM under most conditions. We examine why the STEM tree (also called GLASS or Maximum Tree) is less accurate on estimated gene trees by comparing estimated and true coalescence times, performing species tree inference using simulations, and analyzing a great ape data set keeping track of false positive and false negative rates for inferred clades. We find that although true coalescence times are more ancient than speciation times under the multispecies coalescent model, estimated coalescence times are often more recent than speciation times. This underestimation can lead to increased bias and lack of resolution with increased sampling (either alleles or loci) when gene trees are estimated with ML. The problem appears to be less severe using Bayesian gene-tree estimates.
AB - To infer species trees from gene trees estimated from phylogenomic data sets, tractable methods are needed that can handle dozens to hundreds of loci. We examine several computationally efficient approaches-MP-EST, STAR, STEAC, STELLS, and STEM-for inferring species trees from gene trees estimated using maximum likelihood (ML) and Bayesian approaches. Among the methods examined, we found that topology-based methods often performed better using ML gene trees and methods employing coalescent times typically performed better using Bayesian gene trees, with MP-EST, STAR, STEAC, and STELLS outperforming STEM under most conditions. We examine why the STEM tree (also called GLASS or Maximum Tree) is less accurate on estimated gene trees by comparing estimated and true coalescence times, performing species tree inference using simulations, and analyzing a great ape data set keeping track of false positive and false negative rates for inferred clades. We find that although true coalescence times are more ancient than speciation times under the multispecies coalescent model, estimated coalescence times are often more recent than speciation times. This underestimation can lead to increased bias and lack of resolution with increased sampling (either alleles or loci) when gene trees are estimated with ML. The problem appears to be less severe using Bayesian gene-tree estimates.
UR - http://www.scopus.com/inward/record.url?scp=84891354563&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84891354563&partnerID=8YFLogxK
U2 - 10.1093/sysbio/syt059
DO - 10.1093/sysbio/syt059
M3 - Article
C2 - 23988674
AN - SCOPUS:84891354563
SN - 1063-5157
VL - 63
SP - 66
EP - 82
JO - Systematic Biology
JF - Systematic Biology
IS - 1
ER -