Unknown

Dataset Information

0

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.


ABSTRACT: Deep learning methods have recently become the state-of-the-art in a variety of regulatory genomic tasks1-6 including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions, however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluates their utility as personal DNA interpreters. We used paired Whole Genome Sequencing and gene expression from 839 individuals in the ROSMAP study7 to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learnt sequence motif grammar, and suggest new model training strategies to improve performance.

SUBMITTER: Sasse A 

PROVIDER: S-EPMC10055057 | biostudies-literature | 2023 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.

Sasse Alexander A   Ng Bernard B   Spiro Anna E AE   Tasaki Shinya S   Bennett David A DA   Gaiteri Christopher C   De Jager Philip L PL   Chikina Maria M   Mostafavi Sara S  

bioRxiv : the preprint server for biology 20230928


Deep learning methods have recently become the state-of-the-art in a variety of regulatory genomic tasks<sup>1-6</sup> including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions, however, systematic benchmarking is lacking to assess their predictio  ...[more]

Similar Datasets

| S-EPMC8188889 | biostudies-literature
| S-EPMC4725912 | biostudies-other
| S-EPMC5773911 | biostudies-literature
| S-EPMC9436379 | biostudies-literature
| S-EPMC6929458 | biostudies-literature
| S-EPMC10634041 | biostudies-literature
| S-EPMC7797176 | biostudies-literature
| S-EPMC7657843 | biostudies-literature
| S-EPMC5995439 | biostudies-literature
| S-EPMC7324158 | biostudies-literature