TY - GEN
T1 - Ab initio whole genome shotgun assembly with mated short reads
AU - Medvedev, Paul
AU - Brudno, Michael
PY - 2008
Y1 - 2008
N2 - Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.
AB - Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.
UR - http://www.scopus.com/inward/record.url?scp=47249147947&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47249147947&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-78839-3_5
DO - 10.1007/978-3-540-78839-3_5
M3 - Conference contribution
AN - SCOPUS:47249147947
SN - 3540788387
SN - 9783540788386
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 50
EP - 64
BT - Research in Computational Molecular Biology - 12th Annual International Conference, RECOMB 2008, Proceedings
T2 - "12th Annual InternationalConference on REsearch in COmputational Molecular Biology, RECOMB 2008"
Y2 - 30 March 2008 through 2 April 2008
ER -