Unknown

Dataset Information

0

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data.


ABSTRACT: We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

SUBMITTER: Prabhakaran S 

PROVIDER: S-EPMC6004614 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data.

Prabhakaran Sandhya S   Azizi Elham E   Carr Ambrose A   Pe'er Dana D  

JMLR workshop and conference proceedings 20160101


We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not reso  ...[more]

Similar Datasets

| S-EPMC5583037 | biostudies-literature
| S-EPMC9002799 | biostudies-literature
| S-EPMC6454475 | biostudies-literature
| S-EPMC6157162 | biostudies-literature
| S-EPMC3812957 | biostudies-literature
| S-EPMC9364382 | biostudies-literature
| S-EPMC8041769 | biostudies-literature
| S-EPMC9869330 | biostudies-literature
| S-EPMC4550296 | biostudies-literature
| S-EPMC4225571 | biostudies-literature