Unknown

Dataset Information

0

Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches.


ABSTRACT: With advancement in genomic technologies, it is common that two high-dimensional datasets are available, both measuring the same underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p markers which is the best available measure of the underlying biological process. This same biological process may also be measured by W, coming from a prior technology but correlated with X. On a moderately sized sample, we have (Y,X,W), and on a larger sample we have (Y,W). We utilize the data on W to boost the prediction of Y by X. When p is large and the subsample containing X is small, this is a p>n situation. When p is small, this is akin to the classical measurement error problem; however, ours is not the typical goal of calibrating W for use in future studies. We propose to shrink the regression coefficients β of Y on X toward different targets that use information derived from W in the larger dataset. We compare these proposals with the classical ridge regression of Y on X, which does not use W. We also unify all of these methods as targeted ridge estimators. Finally, we propose a hybrid estimator which is a linear combination of multiple estimators of β. With an optimal choice of weights, the hybrid estimator balances efficiency and robustness in a data-adaptive way to theoretically yield a smaller prediction error than any of its constituents. The methods, including a fully Bayesian alternative, are evaluated via simulation studies. We also apply them to a gene-expression dataset. mRNA expression measured via quantitative real-time polymerase chain reaction is used to predict survival time in lung cancer patients, with auxiliary information from microarray technology available on a larger sample.

SUBMITTER: Boonstra PS 

PROVIDER: S-EPMC3590922 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8182952 | biostudies-literature
| S-EPMC8281968 | biostudies-literature
| S-EPMC7513067 | biostudies-literature
| S-EPMC8069634 | biostudies-literature
| S-EPMC7455041 | biostudies-literature
| S-EPMC4058572 | biostudies-literature
| S-EPMC6734180 | biostudies-literature