Dataset Information

PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells.

ABSTRACT: MOTIVATION:New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS:We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13?min, compared with >2?h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION:https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Stassen SV

PROVIDER: S-EPMC7203756 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells.

Stassen Shobana V SV Siu Dickson M D DMD Lee Kelvin C M KCM Ho Joshua W K JWK So Hayden K H HKH Tsia Kevin K KK

Bioinformatics (Oxford, England) 20200501 9

<h4>Motivation</h4>New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity.<h4>Results</h4>We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Usin ...[more]

PMID: 31971583

Dataset Information

PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells.

Publications

PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data.
| S-EPMC9754601 | biostudies-literature

CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.
| S-EPMC5371246 | biostudies-literature

SC3s: efficient scaling of single cell consensus clustering to millions of cells.
| S-EPMC9743492 | biostudies-literature

VPAC: Variational projection for accurate clustering of single-cell transcriptomic data.
| S-EPMC6509870 | biostudies-literature

Clustering millions of tandem mass spectra.
| S-EPMC2533155 | biostudies-literature

Accurate estimation of pathway activity in single cells for clustering and differential analysis.
| S-EPMC11293543 | biostudies-literature

Accurate Single-Cell Clustering through Ensemble Similarity Learning.
| S-EPMC8623803 | biostudies-literature

jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data.
| S-EPMC7953970 | biostudies-literature

scSMD: a deep learning method for accurate clustering of single cells based on auto-encoder.
| S-EPMC11780796 | biostudies-literature

scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections.
| S-EPMC9108753 | biostudies-literature