Unknown

Dataset Information

0

Application of kernel principal component analysis and computational machine learning to exploration of metabolites strongly associated with diet.


ABSTRACT: Computer-based technological innovation provides advancements in sophisticated and diverse analytical instruments, enabling massive amounts of data collection with relative ease. This is accompanied by a fast-growing demand for technological progress in data mining methods for analysis of big data derived from chemical and biological systems. From this perspective, use of a general "linear" multivariate analysis alone limits interpretations due to "non-linear" variations in metabolic data from living organisms. Here we describe a kernel principal component analysis (KPCA)-incorporated analytical approach for extracting useful information from metabolic profiling data. To overcome the limitation of important variable (metabolite) determinations, we incorporated a random forest conditional variable importance measure into our KPCA-based analytical approach to demonstrate the relative importance of metabolites. Using a market basket analysis, hippurate, the most important variable detected in the importance measure, was associated with high levels of some vitamins and minerals present in foods eaten the previous day, suggesting a relationship between increased hippurate and intake of a wide variety of vegetables and fruits. Therefore, the KPCA-incorporated analytical approach described herein enabled us to capture input-output responses, and should be useful not only for metabolic profiling but also for profiling in other areas of biological and environmental systems.

SUBMITTER: Shiokawa Y 

PROVIDER: S-EPMC5821832 | biostudies-other | 2018 Feb

REPOSITORIES: biostudies-other

altmetric image

Publications

Application of kernel principal component analysis and computational machine learning to exploration of metabolites strongly associated with diet.

Shiokawa Yuka Y   Date Yasuhiro Y   Kikuchi Jun J  

Scientific reports 20180221 1


Computer-based technological innovation provides advancements in sophisticated and diverse analytical instruments, enabling massive amounts of data collection with relative ease. This is accompanied by a fast-growing demand for technological progress in data mining methods for analysis of big data derived from chemical and biological systems. From this perspective, use of a general "linear" multivariate analysis alone limits interpretations due to "non-linear" variations in metabolic data from l  ...[more]

Similar Datasets

| S-EPMC3441747 | biostudies-literature
| S-EPMC3176196 | biostudies-literature
| S-EPMC5590884 | biostudies-literature
| S-EPMC5054124 | biostudies-literature
| S-EPMC10530774 | biostudies-literature
2011-08-15 | GSE31375 | GEO
| S-EPMC7304872 | biostudies-literature
| S-EPMC10373489 | biostudies-literature
| S-EPMC8166023 | biostudies-literature
| S-EPMC6137445 | biostudies-other