Dataset Information

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.

ABSTRACT:

Background

Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction.

Results

Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets.

Conclusions

We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership.

SUBMITTER: Kinalis S

PROVIDER: S-EPMC6615267 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.

Kinalis Savvas S Nielsen Finn Cilius FC Winther Ole O Bagger Frederik Otzen FO

BMC bioinformatics 20190708 1

<h4>Background</h4>Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction.<h4>Results</h4>Here, we present a striking feature with the potential to greatly increase the usability of au ...[more]

PMID: 31286861

Similar Datasets

Project description:In this study we used the maize (Zea mays) inflorescence to investigate gene networks that modulate determinacy, specifically the decision to allow branch growth. We characterized developmental transitions by associating spatiotemporal expression profiles with morphological changes resulting from genetic perturbations that disrupt steps in a pathway controlling branching. These are the RNA-seq datasets used in this study. We profiled changes in gene expression during normal maize ear and tassel development and in developing maize ear primordia upon genetic perturbation of the RAMOSA branching pathway. For the wild-type ear and tassel developmental series, greenhouse-grown B73 inbred plants were used. 10mm ears were collected and sectioned as follows from tip to base along the developmental gradient: tip 1mm sampled (tip; Inflorescence Meristem/Spikelet Pair Meristem), next 1mm discarded, next 1mm sampled (mid; Spikelet Meristem), next 2mm discarded, next 2 mm sampled (base; Floral Meristem), and immediately frozen in liquid nitrogen. Sections from ~30 sampled ears were pooled for each of 2 biological replicates to represent tip, mid, and base stages. Tassels were hand-dissected, measured, separated by stage: 1-2mm (stg1), 3-4mm (stg2), and 5-7mm (stg3), and immediately frozen in liquid N. For each stage, ~20-30 tassels were pooled for each of 2 biological replicates. For ramosa mutant series, segregating families (1:1) of ra1-R, ra2-R, and ra3-fea1 mutant alleles, all introgressed at least 6 times into the B73 inbred background, were grown at CSHL Uplands Farm. Field-grown plants were genotyped and collected 6-7 weeks after germination (V7-V8 stage). First and second ear primordia were immediately hand-dissected, measured, and frozen in liquid nitrogen. For ra1, ra2 and ra3 mutants and wild-type controls, ears were pooled into two size classes: 1) 1mm class included a range of 0.7-1.5mm sized ears and nine ears were pooled for each of 2 biological replicates; 2) 2mm class included a range of 1.8-2.5mm sized ears and six ears were pooled for each of three biological replicates. Wild-type samples were proportional mixtures of heterozygote siblings segregating in ra1, ra2, and ra3 populations. Variability factors (e.g. ear size within class, ear rank on the plant, and time of collection) were distributed evenly across pooled samples.

Dataset Information

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.

Background

Results

Conclusions

Publications

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets