Unknown

Dataset Information

0

Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data.


ABSTRACT: The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large p small n problem. The proposed TD-based unsupervised FE outperformed other conventional supervised feature selection methods, random forest, categorical regression (also known as analysis of variance, or ANOVA), penalized linear discriminant analysis, and two unsupervised methods, multiple non-negative matrix factorization and principal component analysis (PCA) based unsupervised FE when applied to synthetic datasets and four methods other than PCA based unsupervised FE when applied to multiomics datasets. The genes selected by TD-based unsupervised FE were enriched in genes known to be related to tissues and transcription factors measured. TD-based unsupervised FE was demonstrated to be not only the superior feature selection method but also the method that can select biologically reliable genes. To our knowledge, this is the first study in which TD-based unsupervised FE has been successfully applied to the integration of this variety of multiomics measurements.

SUBMITTER: Taguchi YH 

PROVIDER: S-EPMC7763286 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data.

Taguchi Y-H YH   Turki Turki T  

Genes 20201211 12


The large <i>p</i> small <i>n</i> problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large <i>p</i> small <i>n</i> problem. The proposed TD-based unsupervised FE outperformed other conventional super  ...[more]

Similar Datasets

| S-EPMC8468466 | biostudies-literature
| S-EPMC5571984 | biostudies-literature
| S-EPMC6761323 | biostudies-literature
| S-EPMC7394334 | biostudies-literature
| S-EPMC5763504 | biostudies-literature
| S-EPMC9580456 | biostudies-literature
| S-EPMC5998888 | biostudies-literature
| S-EPMC8076323 | biostudies-literature
| S-EPMC7469919 | biostudies-literature