Noise-cancelling repeat finder: Uncovering tandem repeats in error-prone long-read sequencing data

Robert S. Harris, Monika Cechova, Kateryna D. Makova

Research output: Contribution to journalArticlepeer-review

31 Scopus citations

Abstract

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.

Original languageEnglish (US)
Pages (from-to)4809-4811
Number of pages3
JournalBioinformatics
Volume35
Issue number22
DOIs
StatePublished - Nov 1 2019

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Noise-cancelling repeat finder: Uncovering tandem repeats in error-prone long-read sequencing data'. Together they form a unique fingerprint.

Cite this