Unknown

Dataset Information

0

Integrative sparse principal component analysis of gene expression data.


ABSTRACT: In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high dimensionality" characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction, integrative analysis techniques, which jointly analyze the raw data of multiple independent datasets, have been developed and shown to outperform "classic" meta-analysis and other multidatasets techniques and single-dataset analysis. In this study, we conduct integrative analysis by developing the iSPCA (integrative SPCA) method. iSPCA achieves the selection and estimation of sparse loadings using a group penalty. To take advantage of the similarity across datasets and generate more accurate results, we further impose contrasted penalties. Different penalties are proposed to accommodate different data conditions. Extensive simulations show that iSPCA outperforms the alternatives under a wide spectrum of settings. The analysis of breast cancer and pancreatic cancer data further shows iSPCA's satisfactory performance.

SUBMITTER: Liu M 

PROVIDER: S-EPMC5912177 | biostudies-literature | 2017 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integrative sparse principal component analysis of gene expression data.

Liu Mengque M   Fan Xinyan X   Fang Kuangnan K   Zhang Qingzhao Q   Ma Shuangge S  

Genetic epidemiology 20171108 8


In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high dimensionality" characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction,  ...[more]

Similar Datasets

| S-EPMC4032817 | biostudies-literature
| S-EPMC4394907 | biostudies-literature
| S-EPMC7663540 | biostudies-literature
| S-EPMC7449232 | biostudies-literature
| S-EPMC3601508 | biostudies-literature
| S-EPMC9793858 | biostudies-literature
| S-EPMC2902448 | biostudies-literature
| S-EPMC7868054 | biostudies-literature
| S-EPMC2992445 | biostudies-literature
| S-EPMC3215429 | biostudies-literature