On de novo Bridging Paired-end RNA-seq Data

Xiang Li, Mingfu Shao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced two ends in the absence of the reference genome - -a problem here we refer to as de novo bridging. Solving this problem provides longer, more informative RNA-seq reads, and benefits downstream RNA-seq analysis such as transcript assembly, expression quantification, and splicing differential analysis. However, de novo bridging is a challenging and complicated task owing to alternative splicing, transcript noises, and sequencing errors. It remains unclear if the data provides sufficient information for accurate bridging, let alone efficient algorithms that determine the true bridges. Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying compacted de Bruijn graph (cdBG) used in the latter task often contains millions of vertices and edges. We designed a new truncated Dijkstra's algorithm for this problem, and proposed a novel algorithm that reuses the shortest path tree to avoid running the truncated Dijkstra's algorithm from scratch for all vertices for further speeding up. These innovative techniques result in scalable algorithms that can bridge all paired-end reads in a cdBG with millions of vertices. Our experiments showed that paired-end RNA-seq reads can be accurately bridged to a large extent. The resulting tool is freely available at https://github.com/Shao-Group/rnabridge-denovo.

Original languageEnglish (US)
Title of host publicationACM-BCB 2023 - 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400701269
DOIs
StatePublished - Sep 3 2023
Event14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2023 - Houston, United States
Duration: Sep 3 2023Sep 6 2023

Publication series

NameACM-BCB 2023 - 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Conference

Conference14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2023
Country/TerritoryUnited States
CityHouston
Period9/3/239/6/23

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software
  • Biomedical Engineering
  • Health Informatics

Cite this