Unknown

Dataset Information

0

Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model.


ABSTRACT: With the development of high-throughput technologies, principal component analysis (PCA) in the high-dimensional regime is of great interest. Most of the existing theoretical and methodological results for high-dimensional PCA are based on the spiked population model in which all the population eigenvalues are equal except for a few large ones. Due to the presence of local correlation among features, however, this assumption may not be satisfied in many real-world datasets. To address this issue, we investigate the asymptotic behavior of PCA under the generalized spiked population model. Based on our theoretical results, we propose a series of methods for the consistent estimation of population eigenvalues, angles between the sample and population eigenvectors, correlation coefficients between the sample and population principal component (PC) scores, and the shrinkage bias adjustment for the predicted PC scores. Using numerical experiments and real data examples from the genetics literature, we show that our methods can greatly reduce bias and improve prediction accuracy.

SUBMITTER: Dey R 

PROVIDER: S-EPMC7441582 | biostudies-literature | 2019 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model.

Dey Rounak R   Lee Seunggeun S  

Journal of multivariate analysis 20190219


With the development of high-throughput technologies, principal component analysis (PCA) in the high-dimensional regime is of great interest. Most of the existing theoretical and methodological results for high-dimensional PCA are based on the spiked population model in which all the population eigenvalues are equal except for a few large ones. Due to the presence of local correlation among features, however, this assumption may not be satisfied in many real-world datasets. To address this issue  ...[more]

Similar Datasets

| S-EPMC4479975 | biostudies-literature
| S-EPMC6152949 | biostudies-literature
| S-EPMC7868054 | biostudies-literature
2011-08-15 | GSE31375 | GEO
| S-EPMC7307984 | biostudies-literature
| S-EPMC9793858 | biostudies-literature
| S-EPMC8559726 | biostudies-literature
| S-EPMC9525499 | biostudies-literature
| S-EPMC2835171 | biostudies-literature