Dataset Information

How to do quantile normalization correctly for gene expression data analyses.

ABSTRACT: Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split ("Class-specific"). Via simulations with both real and simulated batch effects, we demonstrate that the "Class-specific" strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the "Class-specific" strategy.

SUBMITTER: Zhao Y

PROVIDER: S-EPMC7511327 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

How to do quantile normalization correctly for gene expression data analyses.

Zhao Yaxing Y Wong Limsoon L Goh Wilson Wen Bin WWB

Scientific reports 20200923 1

Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate t ...[more]

PMID: 32968196

Dataset Information

How to do quantile normalization correctly for gene expression data analyses.

Publications

How to do quantile normalization correctly for gene expression data analyses.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.
| S-EPMC5972664 | biostudies-literature

Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data.
| S-EPMC4896498 | biostudies-literature

Smooth quantile normalization.
| S-EPMC5862355 | biostudies-literature

Removing technical variability in RNA-seq data using conditional quantile normalization.
| S-EPMC3297825 | biostudies-literature

The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis.
| S-EPMC3660216 | biostudies-literature

Iterative rank-order normalization of gene expression microarray data.
| S-EPMC3651355 | biostudies-literature

A new normalization for Nanostring nCounter gene expression data.
| S-EPMC6614807 | biostudies-literature

Subset quantile normalization using negative control features.
| S-EPMC3122888 | biostudies-literature

A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation.
| S-EPMC2671163 | biostudies-literature

MatchMixeR: a cross-platform normalization method for gene expression data integration.
| S-EPMC7868049 | biostudies-literature