TY - JOUR
T1 - Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads
AU - Bankevich, Anton
AU - Bzikadze, Andrey V.
AU - Kolmogorov, Mikhail
AU - Antipov, Dmitry
AU - Pevzner, Pavel A.
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Nature America, Inc.
PY - 2022/7
Y1 - 2022/7
N2 - Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.
AB - Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.
UR - http://www.scopus.com/inward/record.url?scp=85125338574&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125338574&partnerID=8YFLogxK
U2 - 10.1038/s41587-022-01220-6
DO - 10.1038/s41587-022-01220-6
M3 - Article
C2 - 35228706
AN - SCOPUS:85125338574
SN - 1087-0156
VL - 40
SP - 1075
EP - 1081
JO - Nature Biotechnology
JF - Nature Biotechnology
IS - 7
ER -