Safe and complete contig assembly via omnitigs

Alexandru I. Tomescu, Paul Medvedev

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (SciVal)


Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs-a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph G (e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 20th Annual Conference, RECOMB 2016, Proceedings
EditorsMona Singh
PublisherSpringer Verlag
Number of pages12
ISBN (Print)9783319319568
StatePublished - 2016
Event20th Annual Conference on Research in Computational Molecular Biology, RECOMB 2016 - Santa Monica, United States
Duration: Apr 17 2016Apr 21 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other20th Annual Conference on Research in Computational Molecular Biology, RECOMB 2016
Country/TerritoryUnited States
CitySanta Monica

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Safe and complete contig assembly via omnitigs'. Together they form a unique fingerprint.

Cite this