Computability of models for sequence assembly

Paul Medvedev, Konstantinos Georgiou, Gene Myers, Michael Brudno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

116 Scopus citations

Abstract

Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.

Original languageEnglish (US)
Title of host publicationAlgorithms in Bioinformatics - 7th International Workshop, WABI 2007, Proceedings
PublisherSpringer Verlag
Pages289-301
Number of pages13
ISBN (Print)9783540741251
DOIs
StatePublished - 2007
Event7th International Workshop on Algorithms in Bioinformatics, WABI 2007 - PhiIadelphia, PA, United States
Duration: Sep 8 2007Sep 9 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4645 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other7th International Workshop on Algorithms in Bioinformatics, WABI 2007
Country/TerritoryUnited States
CityPhiIadelphia, PA
Period9/8/079/9/07

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Computability of models for sequence assembly'. Together they form a unique fingerprint.

Cite this