Dataset Information

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations.

ABSTRACT:

Background

Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data.

Results

We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation.

Conclusions

The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases.

SUBMITTER: Jia H

PROVIDER: S-EPMC8171027 | biostudies-literature | 2021 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations.

Jia Hao H Park Sung-Joon SJ Nakai Kenta K

BMC bioinformatics 20210602 Suppl 6

<h4>Background</h4>Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning mode ...[more]

PMID: 34078253

Dataset Information

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations.

Background

Results

Conclusions

Publications

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Solo: doublet identification via semi-supervised deep learning
2019-11-13 | GSE140262 | GEO

A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs.
| S-EPMC6281617 | biostudies-literature

Solo: doublet identification via semi-supervised deep learning
| PRJNA589061 | ENA

Deep echocardiography: data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease.
| S-EPMC6550282 | biostudies-literature

NLLSS: Predicting Synergistic Drug Combinations Based on Semi-supervised Learning.
| S-EPMC4945015 | biostudies-literature

Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning.
| S-EPMC7551840 | biostudies-literature

Semi-Supervised, Attention-Based Deep Learning for Predicting TMPRSS2:ERG Fusion Status in Prostate Cancer Using Whole Slide Images.
| S-EPMC10985477 | biostudies-literature

Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning.
| S-EPMC8611875 | biostudies-literature

scSemiAE: a deep model with semi-supervised learning for single-cell transcriptomics.
| S-EPMC9069784 | biostudies-literature

Deep Cerebellar Nuclei Segmentation via Semi-Supervised Deep Context-Aware Learning from 7T Diffusion MRI.
| S-EPMC7351101 | biostudies-literature