TY - GEN
T1 - On the representation of de bruijn graphs
AU - Chikhi, Rayan
AU - Limasset, Antoine
AU - Jackman, Shaun
AU - Simpson, Jared T.
AU - Medvedev, Paul
PY - 2014
Y1 - 2014
N2 - The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers proposed a navigational data structure approach in order to improve memory usage. We prove several theoretical space lower bounds to show the limitations of these types of approaches. We further design and implement a general data structure (dbgfm) and demonstrate its use on a human whole-genome dataset, achieving space usage of 1.5 GB and a 46% improvement over previous approaches. As part of dbgfm, we develop the notion of frequency-based minimizers and show how it can be used to enumerate all maximal simple paths of the de Bruijn graph using only 43 MB of memory. Finally, we demonstrate that our approach can be integrated into an existing assembler by modifying the ABySS software to use dbgfm.
AB - The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers proposed a navigational data structure approach in order to improve memory usage. We prove several theoretical space lower bounds to show the limitations of these types of approaches. We further design and implement a general data structure (dbgfm) and demonstrate its use on a human whole-genome dataset, achieving space usage of 1.5 GB and a 46% improvement over previous approaches. As part of dbgfm, we develop the notion of frequency-based minimizers and show how it can be used to enumerate all maximal simple paths of the de Bruijn graph using only 43 MB of memory. Finally, we demonstrate that our approach can be integrated into an existing assembler by modifying the ABySS software to use dbgfm.
UR - http://www.scopus.com/inward/record.url?scp=84958532975&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958532975&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-05269-4_4
DO - 10.1007/978-3-319-05269-4_4
M3 - Conference contribution
AN - SCOPUS:84958532975
SN - 9783319052687
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 35
EP - 55
BT - Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings
PB - Springer Verlag
T2 - 18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014
Y2 - 2 April 2014 through 5 April 2014
ER -