TY - JOUR
T1 - Counting pseudoalignments to novel splicing events
AU - Borozan, Luka
AU - Ringeling, Francisca Rojas
AU - Kao, Shao Yen
AU - Nikonova, Elena
AU - Monteagudo-Mesas, Pablo
AU - Matijević, Domagoj
AU - Spletter, Maria L.
AU - Canzar, Stefan
N1 - Funding Information:
This work was supported by the Deutsche Forschungsgemeinschaft [417912216 to M.L.S. and SFB-TRR 338/1 2021-452881907 to S.C.]; Deutsche Gesellschaft für Muskelkranke (8225310 to M.L.S.); start-up funding from the University of Missouri Kansas City (to M.L.S.); the International Max Planck Research School for Molecular and Cellular Life Sciences (E.N.); and a Deutsche Forschungsgemeinschaft fellowship through the Graduate School of Quantitative Biosciences Munich (P.M.M.).
Publisher Copyright:
© The Author(s) 2023.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Motivation: Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. Results: Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto’s equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila.
AB - Motivation: Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. Results: Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto’s equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila.
UR - http://www.scopus.com/inward/record.url?scp=85164845629&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164845629&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btad419
DO - 10.1093/bioinformatics/btad419
M3 - Article
C2 - 37432342
AN - SCOPUS:85164845629
SN - 1367-4803
VL - 39
JO - Bioinformatics
JF - Bioinformatics
IS - 7
M1 - btad419
ER -