Unknown

Dataset Information

0

Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence.


ABSTRACT: Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: (i) gene-family-guided splitting and (ii) ortholog contrasts. The first approach accounts for evolution by constraining model training and testing sets to include different gene families. The second approach uses evolutionarily informed comparisons between orthologous genes to both control for and leverage evolutionary divergence during the training process. The two approaches were explored and validated within the context of mRNA expression level prediction and have the area under the ROC curve (auROC) values ranging from 0.75 to 0.94. Model weight inspections showed biologically interpretable patterns, resulting in the hypothesis that the 3' UTR is more important for fine-tuning mRNA abundance levels while the 5' UTR is more important for large-scale changes.

SUBMITTER: Washburn JD 

PROVIDER: S-EPMC6431157 | biostudies-literature | 2019 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence.

Washburn Jacob D JD   Mejia-Guerra Maria Katherine MK   Ramstein Guillaume G   Kremling Karl A KA   Valluru Ravi R   Buckler Edward S ES   Wang Hai H  

Proceedings of the National Academy of Sciences of the United States of America 20190306 12


Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: (<i>i</i>) gene-family-guided splitting and (<i>ii</i>) ortholog contrasts. The firs  ...[more]

Similar Datasets

2021-06-02 | GSE175942 | GEO
| S-EPMC6129267 | biostudies-literature
| S-EPMC8501764 | biostudies-literature
2023-07-10 | GSE221870 | GEO
| S-EPMC4768299 | biostudies-literature
| S-EPMC6746221 | biostudies-literature
| S-EPMC9046255 | biostudies-literature
| S-EPMC5125009 | biostudies-literature
| S-EPMC10868333 | biostudies-literature
| S-EPMC3091630 | biostudies-literature