TY - GEN
T1 - Computability of models for sequence assembly
AU - Medvedev, Paul
AU - Georgiou, Konstantinos
AU - Myers, Gene
AU - Brudno, Michael
PY - 2007
Y1 - 2007
N2 - Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.
AB - Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.
UR - http://www.scopus.com/inward/record.url?scp=37249080658&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=37249080658&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-74126-8_27
DO - 10.1007/978-3-540-74126-8_27
M3 - Conference contribution
AN - SCOPUS:37249080658
SN - 9783540741251
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 289
EP - 301
BT - Algorithms in Bioinformatics - 7th International Workshop, WABI 2007, Proceedings
PB - Springer Verlag
T2 - 7th International Workshop on Algorithms in Bioinformatics, WABI 2007
Y2 - 8 September 2007 through 9 September 2007
ER -