Ontology highlight
ABSTRACT: Motivation
Single cell RNA sequencing (scRNA-seq) studies provide more granular biological information than bulk RNA sequencing, but bulk RNA sequencing has remained popular due to relatively lower costs per sample, which has allowed investigators to process more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, collecting data from more than one biological replicate has become more feasible, but careful modeling of different layers of biological variation remains challenging for many users. Here, we propose a statistical model for scRNA-seq gene counts, describe a simple method for estimating model parameters, and show that failing to account for additional biological variation in scRNA-seq studies can inflate false discovery rates of statistical tests.Results
In a simulation study, we show that when the gene expression distribution of a population of cells varies between subjects, a naïve approach to differential expression analysis will inflate the false discovery rate. We also compare multiple differential expression testing methods on scRNA-seq data sets from human samples and from animal models. These analyses suggest that a naïve approach to differential expression testing could lead to many false discoveries; in contrast, an approach based on pseudobulk counts has better false discovery rate control.Availability
A software package, aggregateBioVar, is freely available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/aggregateBioVar.html) to accommodate compatibility with upstream and downstream methods in scRNA-seq data analysis pipelines.Supplementary information
Raw gene-by-cell count matrices for pig scRNA-seq data are available as GEO accession GSE150211. Supplementary data are available at Bioinformatics online.
SUBMITTER: Thurman AL
PROVIDER: S-EPMC8504643 | biostudies-literature |
REPOSITORIES: biostudies-literature