Dataset Information

Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

ABSTRACT: The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

SUBMITTER: Dazard JE

PROVIDER: S-EPMC3375876 | biostudies-literature | 2012 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

Dazard Jean-Eudes JE Rao J Sunil JS

Computational statistics & data analysis 20120701 7

The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of ...[more]

PMID: 22711950

Dataset Information

Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

Publications

Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models.
| S-EPMC7425805 | biostudies-literature

A robust mean and variance test with application to high-dimensional phenotypes.
| S-EPMC9187575 | biostudies-literature

Regularization method for predicting an ordinal response using longitudinal high-dimensional genomic data.
| S-EPMC4454613 | biostudies-literature

A family-based joint test for mean and variance heterogeneity for quantitative traits.
| S-EPMC4275359 | biostudies-literature

Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis.
| S-EPMC6446588 | biostudies-literature

GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA.
| S-EPMC4654965 | biostudies-literature

The joint lasso: high-dimensional regression for group structured data.
| S-EPMC7868060 | biostudies-literature

Mean-variance portfolio analysis data for optimizing community-based photovoltaic investment.
| S-EPMC4749943 | biostudies-literature

Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes.
| S-EPMC5434559 | biostudies-literature

Survival Analysis with High-Dimensional Omics Data Using a Threshold Gradient Descent Regularization-Based Neural Network Approach.
| S-EPMC9498566 | biostudies-literature