An Eulerian Path Approach to Global Multiple Alignment for DNA Sequences

Yu Zhang, Michael S. Waterman

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one configuration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance.

Original languageEnglish (US)
Pages (from-to)803-819
Number of pages17
JournalJournal of Computational Biology
Volume10
Issue number6
DOIs
StatePublished - 2003

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'An Eulerian Path Approach to Global Multiple Alignment for DNA Sequences'. Together they form a unique fingerprint.

Cite this