TY - JOUR
T1 - The mega-matrix tree of life
T2 - Using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life
AU - Kurt Lienau, E.
AU - DeSalle, Rob
AU - Allard, Marc
AU - Brown, Eric W.
AU - Swofford, David
AU - Rosenfeld, Jeffrey A.
AU - Sarkar, Indra N.
AU - Planet, Paul J.
PY - 2011/8
Y1 - 2011/8
N2 - Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323404 translated open reading frames arranged into 12381 gene homologue groups coded as amino acid data and 63336, 64105, 65153, 66922 and 67109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination.
AB - Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323404 translated open reading frames arranged into 12381 gene homologue groups coded as amino acid data and 63336, 64105, 65153, 66922 and 67109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination.
UR - http://www.scopus.com/inward/record.url?scp=79960152084&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79960152084&partnerID=8YFLogxK
U2 - 10.1111/j.1096-0031.2010.00337.x
DO - 10.1111/j.1096-0031.2010.00337.x
M3 - Article
AN - SCOPUS:79960152084
SN - 0748-3007
VL - 27
SP - 417
EP - 427
JO - Cladistics
JF - Cladistics
IS - 4
ER -