Unknown

Dataset Information

0

Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach.


ABSTRACT: In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally expensive in applications involving large numbers of variables, as required, for example, in genetic neuroimaging. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into PLSC to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations. We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources.

SUBMITTER: Grellmann C 

PROVIDER: S-EPMC4894907 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach.

Grellmann Claudia C   Neumann Jane J   Bitzer Sebastian S   Kovacs Peter P   Tönjes Anke A   Westlye Lars T LT   Andreassen Ole A OA   Stumvoll Michael M   Villringer Arno A   Horstmann Annette A  

Frontiers in genetics 20160607


In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally expensive in applications involving large numbers of variables, as required, for example, i  ...[more]

Similar Datasets

| S-EPMC2894507 | biostudies-literature
| S-EPMC8276768 | biostudies-literature
| S-EPMC5862274 | biostudies-literature
| S-EPMC3418398 | biostudies-literature
| S-EPMC8764132 | biostudies-literature
| S-EPMC4684346 | biostudies-literature
| S-EPMC5561646 | biostudies-other
| S-EPMC6470876 | biostudies-literature