Abstract
Motivation: High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full- length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations. Results: We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a "bridging"system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages "supporting"information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch's significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%-62.1% and PsiCLASS by 23.0%-175.5% on human datasets.
Original language | English (US) |
---|---|
Pages (from-to) | i307-i317 |
Journal | Bioinformatics |
Volume | 40 |
DOIs | |
State | Published - Jul 1 2024 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics