Long Reads Enable Accurate Estimates of Complexity of Metagenomes

Anton Bankevich, Pavel Pevzner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


Although reduced microbiome diversity has been linked to various diseases, estimating the diversity of bacterial communities (the number and the total length of distinct genomes within a metagenome) remains an open problem in microbial ecology. We describe the first analysis of microbial diversity using long reads without any assumption on the frequencies of genomes within a metagenome (parametric methods) and without requiring a large database that covers the total diversity (non-parametric methods). The long read technologies provide new insights into the diversity of metagenomes by interrogating rare species that remained below the radar of previous approaches based on short reads. We present a novel approach for estimating the diversity of metagenomes based on joint analysis of short and long reads and benchmark it on various datasets. We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for 50 % of total abundance having total length varying from only 40 to 60 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two-orders of magnitude larger total length (≈ 840 billion nucleotides).

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 22nd Annual International Conference, RECOMB 2018, Proceedings
EditorsBenjamin J. Raphael
PublisherSpringer Verlag
Number of pages20
ISBN (Print)9783319899282
StatePublished - 2018
Event22nd International Conference on Research in Computational Molecular Biology, RECOMB 2018 - Paris, France
Duration: Apr 21 2018Apr 24 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10812 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other22nd International Conference on Research in Computational Molecular Biology, RECOMB 2018

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Long Reads Enable Accurate Estimates of Complexity of Metagenomes'. Together they form a unique fingerprint.

Cite this