Dataset Information

Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size.

ABSTRACT:

Motivation

RNA-seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However, in experiments with small sample size, the per-gene estimates of the dispersion parameter are unreliable.

Method

We propose a simple and effective approach for estimating the dispersions. First, we obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute and is compatible with the exact test of differential expression.

Results

We evaluated the proposed approach using 10 simulated and experimental datasets and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets, sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time.

Availability

http://www.stat.purdue.edu/∼ovitek/Software.html and Bioconductor.

Contact

ovitek@purdue.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Yu D

PROVIDER: S-EPMC3654711 | biostudies-literature | 2013 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size.

Yu Danni D Huber Wolfgang W Vitek Olga O

Bioinformatics (Oxford, England) 20130414 10

<h4>Motivation</h4>RNA-seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However, in experiments with small sample size, the per-gene estimates of the dispersion parameter are unreliable.<h4>Method</h4>We propose a simple and effective approach for estimating the dispersions. First, ...[more]

PMID: 23589650

Similar Datasets

Project description:BackgroundRNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated.ResultsIn this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications.ConclusionsWe have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/~xwan/NBLDA.R or https://github.com/yangchadam/NBLDA.

Dataset Information

Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size.

Motivation

Method

Results

Availability

Contact

Supplementary information

Publications

Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets