Dataset Information

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.

ABSTRACT: In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.

SUBMITTER: Mieth B

PROVIDER: S-EPMC6937257 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.

Mieth Bettina B Hockley James R F JRF Görnitz Nico N Vidovic Marina M-C MM Müller Klaus-Robert KR Gutteridge Alex A Ziemek Daniel D

Scientific reports 20191230 1

In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. T ...[more]

PMID: 31889137

Dataset Information

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.

Publications

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis.
| S-EPMC8009055 | biostudies-literature

scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning.
| S-EPMC9186323 | biostudies-literature

Prior Knowledge Transfer Across Transcriptional Datasets Using Compositional Statistics
2016-11-08 | GSE73638 | GEO

Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis.
| S-EPMC7214470 | biostudies-literature

SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.
| S-EPMC6477982 | biostudies-literature

Prior Knowledge Transfer Across Transcriptional Datasets Using Compositional Statistics [Tumor]
2016-11-08 | GSE73551 | GEO

MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering.
| S-EPMC9262626 | biostudies-literature

Single-cell RNA-seq clustering: datasets, models, and algorithms.
| S-EPMC7549635 | biostudies-literature

SC3: consensus clustering of single-cell RNA-seq data.
| S-EPMC5410170 | biostudies-literature

Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation.
| S-EPMC7397036 | biostudies-literature