TY - GEN
T1 - Safe and complete contig assembly via omnitigs
AU - Tomescu, Alexandru I.
AU - Medvedev, Paul
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs-a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph G (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.
AB - Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs-a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph G (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.
UR - http://www.scopus.com/inward/record.url?scp=84964009569&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964009569&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-31957-5_11
DO - 10.1007/978-3-319-31957-5_11
M3 - Conference contribution
AN - SCOPUS:84964009569
SN - 9783319319568
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 152
EP - 163
BT - Research in Computational Molecular Biology - 20th Annual Conference, RECOMB 2016, Proceedings
A2 - Singh, Mona
PB - Springer Verlag
T2 - 20th Annual Conference on Research in Computational Molecular Biology, RECOMB 2016
Y2 - 17 April 2016 through 21 April 2016
ER -