Dataset Information

Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.

ABSTRACT:

Motivation

While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility.

Results

We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully implemented in the R language and built within the drake framework. The pipeline includes quality control, cell and gene filtering, normalization, detection of highly variable genes, dimensionality reduction, clustering, cell type annotation, detection of marker genes, differential expression analysis and integration of multiple samples. The pipeline is reproducible and scalable, has an efficient execution, provides easy extendability and access to intermediate results and outputs rich HTML reports. Scdrake is distributed as a Docker image, which provides a straightforward setup and enhances reproducibility.

Availability and implementation

The source code and documentation are available under the MIT license at https://github.com/bioinfocz/scdrake and https://bioinfocz.github.io/scdrake, respectively.

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

SUBMITTER: Kubovciak J

PROVIDER: S-EPMC10351969 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.

Kubovčiak Jan J Kolář Michal M Novotný Jiří J

Bioinformatics advances 20230706 1

<h4>Motivation</h4>While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility.<h4>Results</h4>We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully ...[more]

PMID: 37465398

Dataset Information

Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.
| S-EPMC6022691 | biostudies-literature

MetaPro: a scalable and reproducible data processing and analysis pipeline for metatranscriptomic investigation of microbial communities.
| S-EPMC10294448 | biostudies-literature

Ultra-fast scalable estimation of single-cell differentiation potency from scRNA-Seq data.
| S-EPMC8275983 | biostudies-literature

Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data.
| S-EPMC10776309 | biostudies-literature

<i>Cascabel</i>: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results.
| S-EPMC7718033 | biostudies-literature

RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data.
| S-EPMC8344557 | biostudies-literature

TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data.
| S-EPMC10578202 | biostudies-literature

Metacell-2: a divide-and-conquer metacell algorithm for scalable scRNA-seq analysis.
| S-EPMC9019975 | biostudies-literature

Novel insights through scRNA-seq analysis
| S-BSST858 | biostudies-other

scGraphformer: unveiling cellular heterogeneity and interactions in scRNA-seq data using a scalable graph transformer network.
| S-EPMC11543810 | biostudies-literature