APERO: a genome-wide approach for identifying bacterial small RNAs from RNA-Seq data.
Ontology highlight
ABSTRACT: Small non-coding RNAs (sRNAs) regulate numerous cellular processes in all domains of life. Several approaches have been developed to identify them from RNA-seq data, which are efficient for eukaryotic sRNAs but remain inaccurate for the longer and highly structured bacterial sRNAs. We present APERO, a new algorithm to detect small transcripts from paired-end bacterial RNA-seq data. In contrast to previous approaches that start from the read coverage distribution, APERO analyzes boundaries of individual sequenced fragments to infer the 5' and 3' ends of all transcripts. Since sRNAs are about the same size as individual fragments (50-350 nucleotides), this algorithm provides a significantly higher accuracy and robustness, e.g., with respect to spontaneous internal breaking sites. To demonstrate this improvement, we develop a comparative assessment on datasets from Escherichia coli and Salmonella enterica, based on experimentally validated sRNAs. We also identify the small transcript repertoire of Dickeya dadantii including putative intergenic RNAs, 5' UTR or 3' UTR-derived RNA products and antisense RNAs. Comparisons to annotations as well as RACE-PCR experimental data confirm the precision of the detected transcripts. Altogether, APERO outperforms all existing methods in terms of sRNA detection and boundary precision, which is crucial for comprehensive genome annotations. It is freely available as an open source R package on https://github.com/Simon-Leonard/APERO.
SUBMITTER: Leonard S
PROVIDER: S-EPMC6735904 | biostudies-literature | 2019 Sep
REPOSITORIES: biostudies-literature
ACCESS DATA