Using homolog groups to create a whole-genomic tree of free-living organisms: An update

Christopher H. House, Sorel T. Fitz-Gibbon

Research output: Contribution to journalArticlepeer-review

79 Scopus citations


Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial "partial genomes" were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences.

Original languageEnglish (US)
Pages (from-to)539-547
Number of pages9
JournalJournal Of Molecular Evolution
Issue number4
StatePublished - 2002

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics


Dive into the research topics of 'Using homolog groups to create a whole-genomic tree of free-living organisms: An update'. Together they form a unique fingerprint.

Cite this