Unknown

Dataset Information

0

Manuscript Strategy for improved characterisation of human metabolic phenotypes using a COmbined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS).


ABSTRACT:

Motivation

Large-scale population omics data can provide insight into associations between gene-environment interactions and disease. However, existing dimension reduction modelling techniques are often inefficient for extracting detailed information from these complex datasets.

Results

Here we present an interactive software pipeline for exploratory analyses of population-based nuclear magnetic resonance spectral data using a COmbined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS) within the R-library hastaLaVista framework. Principal component analysis models are generated for a sequential series of spectral regions (blocks) to provide more granular detail defining sub-populations within the dataset. Molecular identification of key differentiating signals is achieved by implementing statistical correlation spectroscopy (STOCSY) on the full spectral data to define feature patterns. Finally, the distributions of cross-correlation of the reference patterns across the spectral dataset is used to provide population statistics for identifying underlying features arising from drug intake, latent diseases and diet. The COMPASS thus provides an efficient semi-automated approach for screening population datasets.

Availability and implementation

Source code is available at https://github.com/cheminfo/COMPASS.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Loo RL 

PROVIDER: S-EPMC7850059 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Strategy for improved characterization of human metabolic phenotypes using a COmbined Multi-block Principal components Analysis with Statistical Spectroscopy (COMPASS).

Loo Ruey Leng RL   Chan Queenie Q   Antti Henrik H   Li Jia V JV   Ashrafian H H   Elliott Paul P   Stamler Jeremiah J   Nicholson Jeremy K JK   Holmes Elaine E   Wist Julien J  

Bioinformatics (Oxford, England) 20210101 21


<h4>Motivation</h4>Large-scale population omics data can provide insight into associations between gene-environment interactions and disease. However, existing dimension reduction modelling techniques are often inefficient for extracting detailed information from these complex datasets.<h4>Results</h4>Here, we present an interactive software pipeline for exploratory analyses of population-based nuclear magnetic resonance spectral data using a COmbined Multi-block Principal components Analysis wi  ...[more]

Similar Datasets

| S-EPMC8086023 | biostudies-literature
| S-EPMC6456307 | biostudies-literature
| S-EPMC1448747 | biostudies-other
| S-EPMC3193798 | biostudies-literature
2013-03-14 | E-GEOD-26520 | biostudies-arrayexpress
2013-03-14 | GSE26520 | GEO
| S-EPMC3699599 | biostudies-literature
2018-09-27 | GSE107744 | GEO
| S-EPMC7083277 | biostudies-literature
| S-EPMC3392282 | biostudies-literature