Unknown

Dataset Information

0

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.


ABSTRACT: The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data.

SUBMITTER: Hill ST 

PROVIDER: S-EPMC6144860 | biostudies-literature | 2018 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential.

Hill Steven T ST   Kuintzle Rachael R   Teegarden Amy A   Merrill Erich E   Danaee Padideh P   Hendrix David A DA  

Nucleic acids research 20180901 16


The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger R  ...[more]

Similar Datasets

| S-EPMC6508593 | biostudies-literature
| S-EPMC8149884 | biostudies-literature
| S-EPMC6689867 | biostudies-literature
| S-EPMC7843526 | biostudies-literature
| S-EPMC6122789 | biostudies-literature
| S-EPMC6407788 | biostudies-literature
| S-EPMC7862614 | biostudies-literature
| S-EPMC9209415 | biostudies-literature
| S-EPMC6023031 | biostudies-literature
| S-EPMC7813825 | biostudies-literature