TY - GEN
T1 - Paired de Bruijn Graphs
T2 - 15th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2011
AU - Medvedev, Paul
AU - Pham, Son
AU - Chaisson, Mark
AU - Tesler, Glenn
AU - Pevzner, Pavel
N1 - Funding Information:
Acknowledgements. Glenn Tesler and Paul Medvedev were supported in part by NIH grant 3P41RR024851-02S1.
Publisher Copyright:
© 2011, Springer-Verlag Berlin Heidelberg.
PY - 2011
Y1 - 2011
N2 - The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated error-free data, we argue that this can effectively improve the contig sizes in assembly.
AB - The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated error-free data, we argue that this can effectively improve the contig sizes in assembly.
UR - http://www.scopus.com/inward/record.url?scp=79953218019&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953218019&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-20036-6_22
DO - 10.1007/978-3-642-20036-6_22
M3 - Conference contribution
AN - SCOPUS:79953218019
SN - 9783642200359
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 238
EP - 251
BT - Research in Computational Molecular Biology - 15th Annual International Conference, RECOMB 2011, Proceedings
A2 - Bafna, Vineet
A2 - Sahinalp, S. Cenk
PB - Springer Verlag
Y2 - 28 March 2011 through 31 March 2011
ER -