Dataset Information

A Distribution-Free Convolution Model for background correction of oligonucleotide microarray data.

ABSTRACT:

Introduction

Affymetrix GeneChip high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocessing methods are parametric, in that they assume that the background noise generated by microarray data is a random sample from a statistical distribution, typically a normal distribution. The quality of the final results depends on the validity of such assumptions.

Results

We propose a Distribution Free Convolution Model (DFCM) to circumvent observed deficiencies in meeting and validating distribution assumptions of parametric methods. Knowledge of array structure and the biological function of the probes indicate that the intensities of mismatched (MM) probes that correspond to the smallest perfect match (PM) intensities can be used to estimate the background noise. Specifically, we obtain the smallest q2 percent of the MM intensities that are associated with the lowest q1 percent PM intensities, and use these intensities to estimate background.

Conclusion

Using the Affymetrix Latin Square spike-in experiments, we show that the background noise generated by microarray experiments typically is not well modeled by a single overall normal distribution. We further show that the signal is not exponentially distributed, as is also commonly assumed. Therefore, DFCM has better sensitivity and specificity, as measured by ROC curves and area under the curve (AUC) than MAS 5.0, RMA, RMA with no background correction (RMA-noBG), GCRMA, PLIER, and dChip (MBEI) for preprocessing of Affymetrix microarray data. These results hold for two spike-in data sets and one real data set that were analyzed. Comparisons with other methods on two spike-in data sets and one real data set show that our nonparametric methods are a superior alternative for background correction of Affymetrix data.

SUBMITTER: Chen Z

PROVIDER: S-EPMC2709262 | biostudies-literature | 2009 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Distribution-Free Convolution Model for background correction of oligonucleotide microarray data.

Chen Zhongxue Z McGee Monnie M Liu Qingzhong Q Kong Megan M Deng Youping Y Scheuermann Richard H RH

BMC genomics 20090707

<h4>Introduction</h4>Affymetrix GeneChip high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocess ...[more]

PMID: 19594878

Dataset Information

A Distribution-Free Convolution Model for background correction of oligonucleotide microarray data.

Introduction

Results

Conclusion

Publications

A Distribution-Free Convolution Model for background correction of oligonucleotide microarray data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Microarray background correction: maximum likelihood estimation for the normal-exponential convolution.
| S-EPMC2648902 | biostudies-literature

Correction of scaling mismatches in oligonucleotide microarray data.
| S-EPMC1508160 | biostudies-literature

Linear model for fast background subtraction in oligonucleotide microarrays.
| S-EPMC2785812 | biostudies-literature

Correction of spatial bias in oligonucleotide array data.
| S-EPMC3610395 | biostudies-literature

A Distribution-Free Model for Longitudinal Metagenomic Count Data.
| S-EPMC9316307 | biostudies-literature

Filtering genes to improve sensitivity in oligonucleotide microarray data analysis.
| S-EPMC2018638 | biostudies-literature

DBNorm: normalizing high-density oligonucleotide microarray data based on distributions.
| S-EPMC5706403 | biostudies-literature

Statistical methods of background correction for Illumina BeadArray data.
| S-EPMC2654805 | biostudies-literature

Integration of pre-normalized microarray data using quantile correction.
| S-EPMC3044426 | biostudies-literature

Evaluating supervised and unsupervised background noise correction in human gut microbiome data.
| S-EPMC8853548 | biostudies-literature