Dataset Information

Super-delta: a new differential gene expression analysis procedure with robust data normalization.

ABSTRACT: BACKGROUND:Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. RESULTS:We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. CONCLUSIONS:As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems.

SUBMITTER: Liu Y

PROVIDER: S-EPMC5740711 | biostudies-literature | 2017 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Super-delta: a new differential gene expression analysis procedure with robust data normalization.

Liu Yuhang Y Zhang Jinfeng J Qiu Xing X

BMC bioinformatics 20171221 1

<h4>Background</h4>Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically in ...[more]

PMID: 29268715

Dataset Information

Super-delta: a new differential gene expression analysis procedure with robust data normalization.

Publications

Super-delta: a new differential gene expression analysis procedure with robust data normalization.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A new normalization for Nanostring nCounter gene expression data.
| S-EPMC6614807 | biostudies-literature

Robust Normalization of Luciferase Reporter Data.
| S-EPMC6789503 | biostudies-literature

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data.
| S-EPMC6686202 | biostudies-literature

Accurate Classification of Differential Expression Patterns in a Bayesian Framework With Robust Normalization for Multi-Group RNA-Seq Count Data.
| S-EPMC6614939 | biostudies-literature

Robust modeling of differential gene expression data using normal/independent distributions: a Bayesian approach.
| S-EPMC4409222 | biostudies-literature

NORMA-Gene: a simple and robust method for qPCR normalization based on target gene data.
| S-EPMC3223928 | biostudies-literature

A scaling normalization method for differential expression analysis of RNA-seq data.
| S-EPMC2864565 | biostudies-literature

Robust phenotype prediction from gene expression data using differential shrinkage of co-regulated genes.
| S-EPMC5775343 | biostudies-other

Iterative rank-order normalization of gene expression microarray data.
| S-EPMC3651355 | biostudies-literature

Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data.
| S-EPMC8721966 | biostudies-literature