Unknown

Dataset Information

0

RNASequel: accurate and repeat tolerant realignment of RNA-seq reads.


ABSTRACT: RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the ability to detect RNA editing. We have developed RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts. Its key innovations are a two-pass splice junction alignment system that includes de novo splice junctions and the use of an empirically determined estimate of the fragment size distribution when resolving read pairs. We demonstrate that RNASequel produces improved alignments when used in conjunction with STAR or Tophat2 using two simulated datasets. We then show that RNASequel improves the identification of adenosine to inosine RNA editing sites on biological datasets. This software will be useful in applications requiring the accurate identification of variants in RNA sequencing data, the discovery of RNA editing sites and the analysis of alternative splicing.

SUBMITTER: Wilson GW 

PROVIDER: S-EPMC4605292 | biostudies-literature | 2015 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

RNASequel: accurate and repeat tolerant realignment of RNA-seq reads.

Wilson Gavin W GW   Stein Lincoln D LD  

Nucleic acids research 20150616 18


RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the abili  ...[more]

Similar Datasets

| S-EPMC2952873 | biostudies-other
| S-EPMC4615873 | biostudies-literature
| S-EPMC4889935 | biostudies-literature
| S-EPMC6659269 | biostudies-literature
| S-EPMC4908361 | biostudies-literature
| S-EPMC4970289 | biostudies-literature
| S-EPMC4064318 | biostudies-literature
| S-EPMC5550947 | biostudies-other
| S-EPMC6701478 | biostudies-literature
| S-EPMC4111418 | biostudies-literature