Unknown

Dataset Information

0

Random forest based similarity learning for single cell RNA sequencing data.


ABSTRACT:

Motivation

Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell-cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.

Results

Here, we present RAFSIL, a random forest based approach to learn cell-cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.

Availability and implementation

The RAFSIL R package is available at www.kostkalab.net/software.html.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Pouyan MB 

PROVIDER: S-EPMC6022547 | biostudies-literature | 2018 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Random forest based similarity learning for single cell RNA sequencing data.

Pouyan Maziyar Baran MB   Kostka Dennis D  

Bioinformatics (Oxford, England) 20180701 13


<h4>Motivation</h4>Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell-cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis,  ...[more]

Similar Datasets

| S-EPMC10504339 | biostudies-literature
| S-EPMC8420858 | biostudies-literature
| S-EPMC10071725 | biostudies-literature
| S-EPMC10647110 | biostudies-literature
| S-EPMC8186186 | biostudies-literature
| S-EPMC6377168 | biostudies-literature
| S-EPMC3516432 | biostudies-literature
| S-EPMC7763177 | biostudies-literature
| S-EPMC5737094 | biostudies-literature
| S-EPMC10558043 | biostudies-literature