Unknown

Dataset Information

0

A self-supervised deep learning method for data-efficient training in genomics.


ABSTRACT: Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

SUBMITTER: Gunduz HA 

PROVIDER: S-EPMC10495322 | biostudies-literature | 2023 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

A self-supervised deep learning method for data-efficient training in genomics.

Gündüz Hüseyin Anil HA   Binder Martin M   To Xiao-Yin XY   Mreches René R   Bischl Bernd B   McHardy Alice C AC   Münch Philipp C PC   Rezaei Mina M  

Communications biology 20230911 1


Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a sel  ...[more]

Similar Datasets

| S-EPMC10493897 | biostudies-literature
| S-EPMC8977509 | biostudies-literature
| S-EPMC6550282 | biostudies-literature
| S-EPMC11227494 | biostudies-literature
| S-EPMC10440826 | biostudies-literature
| S-EPMC11359982 | biostudies-literature
| S-EPMC10529705 | biostudies-literature
| S-EPMC10516353 | biostudies-literature
| S-EPMC7592391 | biostudies-literature
| S-EPMC7248915 | biostudies-literature