Joint Analysis of Long and Short Reads Enables Accurate Estimates of Microbiome Complexity

Anton Bankevich, Pavel A. Pevzner

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


Reduced microbiome diversity has been linked to several diseases. However, estimating the diversity of bacterial communities—the number and the total length of distinct genomes within a metagenome—remains an open problem in microbial ecology. Here, we describe an algorithm for estimating the microbial diversity in a metagenomic sample based on a joint analysis of short and long reads. Unlike previous approaches, the algorithm does not make any assumptions on the distribution of the frequencies of genomes within a metagenome (as in parametric methods) and does not require a large database that covers the total diversity (as in non-parametric methods). We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for 50% of total abundance having total length varying from only 25 to 61 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two orders of magnitude larger total length (≈840 billion nucleotides). We present a method for estimating the diversity of metagenomic samples that combines short and long sequencing reads. We show that our method is capable of capturing rare species and apply it to analyze diversity of the human gut and aquifer sediment metagenomes.

Original languageEnglish (US)
Pages (from-to)192-200.e3
JournalCell Systems
Issue number2
StatePublished - Aug 22 2018

All Science Journal Classification (ASJC) codes

  • Pathology and Forensic Medicine
  • Histology
  • Cell Biology


Dive into the research topics of 'Joint Analysis of Long and Short Reads Enables Accurate Estimates of Microbiome Complexity'. Together they form a unique fingerprint.

Cite this