Dataset Information

Smooth quantile normalization.

ABSTRACT: Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.

SUBMITTER: Hicks SC

PROVIDER: S-EPMC5862355 | biostudies-other | 2018 Apr

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Smooth quantile normalization.

Hicks Stephanie C SC Okrah Kwame K Paulson Joseph N JN Quackenbush John J Irizarry Rafael A RA Bravo Héctor Corrada HC

Biostatistics (Oxford, England) 20180401 2

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduc ...[more]

PMID: 29036413

Dataset Information

Smooth quantile normalization.

Publications

Smooth quantile normalization.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SMOOTH DENSITY SPATIAL QUANTILE REGRESSION.
| S-EPMC8725653 | biostudies-literature

Subset quantile normalization using negative control features.
| S-EPMC3122888 | biostudies-literature

Removing technical variability in RNA-seq data using conditional quantile normalization.
| S-EPMC3297825 | biostudies-literature

SWAN: Subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips.
| S-EPMC3446316 | biostudies-literature

How to do quantile normalization correctly for gene expression data analyses.
| S-EPMC7511327 | biostudies-literature

Quantile normalization of single-cell RNA-seq read counts without unique molecular identifiers.
| S-EPMC7333325 | biostudies-literature

Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.
| S-EPMC5972664 | biostudies-literature

The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis.
| S-EPMC3660216 | biostudies-literature

Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data.
| S-EPMC4896498 | biostudies-literature

A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data.
| S-EPMC3546795 | biostudies-literature