Dataset Information

Verifying explainability of a deep learning tissue classifier trained on RNA-seq data.

ABSTRACT: For complex machine learning (ML) algorithms to gain widespread acceptance in decision making, we must be able to identify the features driving the predictions. Explainability models allow transparency of ML algorithms, however their reliability within high-dimensional data is unclear. To test the reliability of the explainability model SHapley Additive exPlanations (SHAP), we developed a convolutional neural network to predict tissue classification from Genotype-Tissue Expression (GTEx) RNA-seq data representing 16,651 samples from 47 tissues. Our classifier achieved an average F1 score of 96.1% on held-out GTEx samples. Using SHAP values, we identified the 2423 most discriminatory genes, of which 98.6% were also identified by differential expression analysis across all tissues. The SHAP genes reflected expected biological processes involved in tissue differentiation and function. Moreover, SHAP genes clustered tissue types with superior performance when compared to all genes, genes detected by differential expression analysis, or random genes. We demonstrate the utility and reliability of SHAP to explain a deep learning model and highlight the strengths of applying ML to transcriptome data.

SUBMITTER: Yap M

PROVIDER: S-EPMC7846764 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Verifying explainability of a deep learning tissue classifier trained on RNA-seq data.

Yap Melvyn M Johnston Rebecca L RL Foley Helena H MacDonald Samual S Kondrashova Olga O Tran Khoa A KA Nones Katia K Koufariotis Lambros T LT Bean Cameron C Pearson John V JV Trzaskowski Maciej M Waddell Nicola N

Scientific reports 20210129 1

For complex machine learning (ML) algorithms to gain widespread acceptance in decision making, we must be able to identify the features driving the predictions. Explainability models allow transparency of ML algorithms, however their reliability within high-dimensional data is unclear. To test the reliability of the explainability model SHapley Additive exPlanations (SHAP), we developed a convolutional neural network to predict tissue classification from Genotype-Tissue Expression (GTEx) RNA-seq ...[more]

PMID: 33514769

Dataset Information

Verifying explainability of a deep learning tissue classifier trained on RNA-seq data.

Publications

Verifying explainability of a deep learning tissue classifier trained on RNA-seq data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data.
| S-EPMC7523643 | biostudies-literature

Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning.
| S-EPMC10246584 | biostudies-literature

Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations.
| S-EPMC7118823 | biostudies-literature

scDLC: a deep learning framework to classify large sample single-cell RNA-seq data.
| S-EPMC9281153 | biostudies-literature

Deep learning using bulk RNA-seq data expands cell landscape identification in tumor microenvironment.
| S-EPMC8890395 | biostudies-literature

A deep-learning-based RNA-seq germline variant caller.
| S-EPMC10320079 | biostudies-literature

Deep-learning augmented RNA-seq analysis of transcript splicing.
| S-EPMC7605494 | biostudies-literature

Deep learning generates synthetic cancer histology for explainability and education.
| S-EPMC10227067 | biostudies-literature

Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data.
| S-EPMC9618578 | biostudies-literature

Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data.
| S-EPMC10498003 | biostudies-literature