Dataset Information

Automated quality control and cell identification of droplet-based single-cell data using dropkick.

ABSTRACT: A major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in data sets with disparate library sizes confounded by high technical noise (i.e., batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically determining data set-specific training labels based on predictive global heuristics, dropkick learns a gene-based representation of real cells and ambient noise, calculating a cell probability score for each barcode. Using simulated and real-world scRNA-seq data, we benchmarked dropkick against conventional thresholding approaches and EmptyDrops, a popular computational method, showing greater recovery of rare cell types and exclusion of empty droplets and noisy, uninformative barcodes. We show for both low- and high-background data sets that dropkick's weakly supervised model reliably learns which genes are enriched in ambient barcodes and draws a multidimensional boundary that is more robust to data set-specific variation than existing filtering approaches. dropkick provides a fast, automated tool for reproducible cell identification from scRNA-seq data that is critical to downstream analysis and compatible with popular single-cell Python packages.

SUBMITTER: Heiser CN

PROVIDER: S-EPMC8494217 | biostudies-literature | 2021 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automated quality control and cell identification of droplet-based single-cell data using dropkick.

Heiser Cody N CN Wang Victoria M VM Chen Bob B Hughey Jacob J JJ Lau Ken S KS

Genome research 20210409 10

A major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in data sets with disparate library sizes confounded by high technical noise (i.e., batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically deter ...[more]

PMID: 33837131

Dataset Information

Automated quality control and cell identification of droplet-based single-cell data using dropkick.

Publications

Automated quality control and cell identification of droplet-based single-cell data using dropkick.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

ASHLEYS: automated quality control for single-cell Strand-seq data.
| S-EPMC8504637 | biostudies-literature

Automated Chemotactic Sorting and Single-cell Cultivation of Microbes using Droplet Microfluidics.
| S-EPMC4831006 | biostudies-literature

Functional single-cell hybridoma screening using droplet-based microfluidics.
| S-EPMC3406880 | biostudies-other

Single-cell analysis and sorting using droplet-based microfluidics.
| S-EPMC4128248 | biostudies-literature

Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data.
| S-EPMC8913782 | biostudies-literature

seqQscorer: automated quality control of next-generation sequencing data using machine learning.
| S-EPMC7934511 | biostudies-literature

Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments.
| S-EPMC3570210 | biostudies-literature

Multiplexing droplet-based single cell RNA-sequencing using genetic barcodes
2017-06-13 | GSE96583 | GEO

SpatialQC: automated quality control for spatial transcriptome data.
| S-EPMC11333854 | biostudies-literature

Normalizing and denoising protein expression data from droplet-based single cell profiling.
| S-EPMC9018908 | biostudies-literature