Ontology highlight
ABSTRACT: Motivation
New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity.Results
We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis.Availability and implementation
https://github.com/ShobiStassen/PARC.Supplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Stassen SV
PROVIDER: S-EPMC7203756 | biostudies-literature | 2020 May
REPOSITORIES: biostudies-literature

Bioinformatics (Oxford, England) 20200501 9
<h4>Motivation</h4>New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity.<h4>Results</h4>We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Usin ...[more]