Dataset Information

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

ABSTRACT:

Motivation

Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping.

Results

Our DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization.

Availability and implementation

DECODE source code and pre-processing scripts are available at decode.gersteinlab.org.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Chen Z

PROVIDER: S-EPMC8275369 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

Chen Zhanlin Z Zhang Jing J Liu Jason J Dai Yi Y Lee Donghoon D Min Martin Renqiang MR Xu Min M Gerstein Mark M

Bioinformatics (Oxford, England) 20210701 Suppl_1

<h4>Motivation</h4>Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the stati ...[more]

PMID: 34252960

Similar Datasets

Project description:PurposeThe curation of images using human resources is time intensive but an essential step for developing artificial intelligence (AI) algorithms. Our goal was to develop and implement an AI algorithm for image curation in a high-volume setting. We also explored AI tools that will assist in deploying a tiered approach, in which the AI model labels images and flags potential mislabels for human review.DesignImplementation of an AI algorithm.ParticipantsSeven-field stereoscopic images from multiple clinical trials.MethodsThe 7-field stereoscopic image protocol includes 7 pairs of images from various parts of the central retina along with images of the anterior part of the eye. All images were labeled for field number by reading center graders. The model output included classification of the retinal images into 8 field numbers. Probability scores (0-1) were generated to identify misclassified images, with 1 indicating a high probability of a correct label.Main outcome measuresAgreement of AI prediction with grader classification of field number and the use of probability scores to identify mislabeled images.ResultsThe AI model was trained and validated on 17 529 images and tested on 3004 images. The pooled agreement of field numbers between grader classification and the AI model was 88.3% (kappa, 0.87). The pooled mean probability score was 0.97 (standard deviation [SD], 0.08) for images for which the graders agreed with the AI-generated labels and 0.77 (SD, 0.19) for images for which the graders disagreed with the AI-generated labels (P < 0.0001). Using receiver operating characteristic curves, a probability score of 0.99 was identified as a cutoff for distinguishing mislabeled images. A tiered workflow using a probability score of < 0.99 as a cutoff would include 27.6% of the 3004 images for human review and reduce the error rate from 11.7% to 1.5%.ConclusionsThe implementation of AI algorithms requires measures in addition to model validation. Tools to flag potential errors in the labels generated by AI models will reduce inaccuracies, increase trust in the system, and provide data for continuous model development.

Dataset Information

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

Motivation

Results

Availability and implementation

Supplementary information

Publications

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets