Unknown

Dataset Information

0

Improved Placement of Multi-mapping Small RNAs.


ABSTRACT: High-throughput sequencing of small RNAs (sRNA-seq) is a popular method used to discover and annotate microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs), and Piwi-associated RNAs (piRNAs). One of the key steps in sRNA-seq data analysis is alignment to a reference genome. sRNA-seq libraries often have a high proportion of reads that align to multiple genomic locations, which makes determining their true origins difficult. Commonly used sRNA-seq alignment methods result in either very low precision (choosing an alignment at random), or sensitivity (ignoring multi-mapping reads). Here, we describe and test an sRNA-seq alignment strategy that uses local genomic context to guide decisions on proper placements of multi-mapped sRNA-seq reads. Tests using simulated sRNA-seq data demonstrated that this local-weighting method outperforms other alignment strategies using three different plant genomes. Experimental analyses with real sRNA-seq data also indicate superior performance of local-weighting methods for both plant miRNAs and heterochromatic siRNAs. The local-weighting methods we have developed are implemented as part of the sRNA-seq analysis program ShortStack, which is freely available under a general public license. Improved genome alignments of sRNA-seq data should increase the quality of downstream analyses and genome annotation efforts.

SUBMITTER: Johnson NR 

PROVIDER: S-EPMC4938663 | biostudies-literature | 2016 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Improved Placement of Multi-mapping Small RNAs.

Johnson Nathan R NR   Yeoh Jonathan M JM   Coruh Ceyda C   Axtell Michael J MJ  

G3 (Bethesda, Md.) 20160707 7


High-throughput sequencing of small RNAs (sRNA-seq) is a popular method used to discover and annotate microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs), and Piwi-associated RNAs (piRNAs). One of the key steps in sRNA-seq data analysis is alignment to a reference genome. sRNA-seq libraries often have a high proportion of reads that align to multiple genomic locations, which makes determining their true origins difficult. Commonly used sRNA-seq alignment methods result in either very  ...[more]

Similar Datasets

| S-EPMC6171561 | biostudies-literature
| S-EPMC2427294 | biostudies-literature
| S-EPMC6899541 | biostudies-literature
| S-EPMC7002195 | biostudies-literature
2016-05-25 | GSE76281 | GEO
| S-EPMC6704100 | biostudies-literature
2018-08-17 | GSE114680 | GEO
2018-08-17 | GSE118369 | GEO
| S-EPMC7946837 | biostudies-literature
| S-EPMC4231740 | biostudies-literature