TY - JOUR
T1 - Improved placement of multi-mapping small RNAs
AU - Johnson, Nathan R.
AU - Yeoh, Jonathan M.
AU - Coruh, Ceyda
AU - Axtell, Michael J.
N1 - Publisher Copyright:
© 2016 Johnson et al.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - High-throughput sequencing of small RNAs (sRNA-seq) is a popular method used to discover and annotate microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs), and Piwi-associated RNAs (piRNAs). One of the key steps in sRNA-seq data analysis is alignment to a reference genome. sRNA-seq libraries often have a high proportion of reads that align to multiple genomic locations, which makes determining their true origins difficult. Commonly used sRNA-seq alignment methods result in either very low precision (choosing an alignment at random), or sensitivity (ignoring multi-mapping reads). Here, we describe and test an sRNA-seq alignment strategy that uses local genomic context to guide decisions on proper placements of multi-mapped sRNA-seq reads. Tests using simulated sRNA-seq data demonstrated that this local-weighting method outperforms other alignment strategies using three different plant genomes. Experimental analyses with real sRNA-seq data also indicate superior performance of localweighting methods for both plant miRNAs and heterochromatic siRNAs. The local-weighting methods we have developed are implemented as part of the sRNA-seq analysis program ShortStack, which is freely available under a general public license. Improved genome alignments of sRNA-seq data should increase the quality of downstream analyses and genome annotation efforts.
AB - High-throughput sequencing of small RNAs (sRNA-seq) is a popular method used to discover and annotate microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs), and Piwi-associated RNAs (piRNAs). One of the key steps in sRNA-seq data analysis is alignment to a reference genome. sRNA-seq libraries often have a high proportion of reads that align to multiple genomic locations, which makes determining their true origins difficult. Commonly used sRNA-seq alignment methods result in either very low precision (choosing an alignment at random), or sensitivity (ignoring multi-mapping reads). Here, we describe and test an sRNA-seq alignment strategy that uses local genomic context to guide decisions on proper placements of multi-mapped sRNA-seq reads. Tests using simulated sRNA-seq data demonstrated that this local-weighting method outperforms other alignment strategies using three different plant genomes. Experimental analyses with real sRNA-seq data also indicate superior performance of localweighting methods for both plant miRNAs and heterochromatic siRNAs. The local-weighting methods we have developed are implemented as part of the sRNA-seq analysis program ShortStack, which is freely available under a general public license. Improved genome alignments of sRNA-seq data should increase the quality of downstream analyses and genome annotation efforts.
UR - http://www.scopus.com/inward/record.url?scp=84978395230&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978395230&partnerID=8YFLogxK
U2 - 10.1534/g3.116.030452
DO - 10.1534/g3.116.030452
M3 - Article
C2 - 27175019
AN - SCOPUS:84978395230
SN - 2160-1836
VL - 6
SP - 2103
EP - 2111
JO - G3: Genes, Genomes, Genetics
JF - G3: Genes, Genomes, Genetics
IS - 7
ER -