Dataset Information

Composite score analysis for unsupervised comparison and network visualization of metabolomics data.

ABSTRACT: Metabolomics-based approaches are becoming increasingly popular to interrogate the chemical basis for phenotypic differences in biological systems. Successful metabolomics studies employ multivariate data analysis to compare large and highly complex datasets. A primary tool for unsupervised statistical analyses, principal component analysis (PCA), relies on the selection of a subsection of a maximum of three components from a larger model to visually represent similarity. The use of only three principal components limits the comprehensiveness of the model and can mask discrimination between samples. We have developed a new statistical metric, the composite score (CS), as a univariate statistic that incorporates multiple principal components to calculate a correlation matrix that enables quantitative comparisons of sample similarity between samples within one dataset based upon measured metabolome profiles. Composite score values were tabulated using profiles of complex extracts of dietary supplements from the plant Hydrastis canadensis (goldenseal) as a case study. Several outliers were unambiguously identified, and a PCA composite score network was developed to provide a graphical representation of the composite score matrix. Comparison with visualization using PCA score plots or dendrograms from hierarchical clustering analysis (HCA) demonstrates the utility of the composite score to as a tool for metabolomics studies that seek to quantify similarity among samples. An R-script for the calculation of composite score has been made available.

SUBMITTER: Kellogg JJ

PROVIDER: S-EPMC6948848 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Composite score analysis for unsupervised comparison and network visualization of metabolomics data.

Kellogg Joshua J JJ Kvalheim Olav M OM Cech Nadja B NB

Analytica chimica acta 20191016

Metabolomics-based approaches are becoming increasingly popular to interrogate the chemical basis for phenotypic differences in biological systems. Successful metabolomics studies employ multivariate data analysis to compare large and highly complex datasets. A primary tool for unsupervised statistical analyses, principal component analysis (PCA), relies on the selection of a subsection of a maximum of three components from a larger model to visually represent similarity. The use of only three p ...[more]

PMID: 31864629

Similar Datasets

Project description:Covering: 2014 to 2023 for metabolomics, 2002 to 2023 for information visualizationLC-MS/MS-based untargeted metabolomics is a rapidly developing research field spawning increasing numbers of computational metabolomics tools assisting researchers with their complex data processing, analysis, and interpretation tasks. In this article, we review the entire untargeted metabolomics workflow from the perspective of information visualization, visual analytics and visual data integration. Data visualization is a crucial step at every stage of the metabolomics workflow, where it provides core components of data inspection, evaluation, and sharing capabilities. However, due to the large number of available data analysis tools and corresponding visualization components, it is hard for both users and developers to get an overview of what is already available and which tools are suitable for their analysis. In addition, there is little cross-pollination between the fields of data visualization and metabolomics, leaving visual tools to be designed in a secondary and mostly ad hoc fashion. With this review, we aim to bridge the gap between the fields of untargeted metabolomics and data visualization. First, we introduce data visualization to the untargeted metabolomics field as a topic worthy of its own dedicated research, and provide a primer on cutting-edge visualization research into data visualization for both researchers as well as developers active in metabolomics. We extend this primer with a discussion of best practices for data visualization as they have emerged from data visualization studies. Second, we provide a practical roadmap to the visual tool landscape and its use within the untargeted metabolomics field. Here, for several computational analysis stages within the untargeted metabolomics workflow, we provide an overview of commonly used visual strategies with practical examples. In this context, we will also outline promising areas for further research and development. We end the review with a set of recommendations for developers and users on how to make the best use of visualizations for more effective and transparent communication of results.

Project description:Membranous glomerulonephritis (MGN) is one of the most frequent causes of nephrotic syndrome in adults. It is characterized by the thickening of the glomerular basement membrane in the renal tissue. The current diagnosis of MGN is based on renal biopsy and the detection of antibodies to the few podocyte antigens. Due to the limitations of the current diagnostic methods, including invasiveness and the lack of sensitivity of the current biomarkers, there is a requirement to identify more applicable biomarkers. The present study aimed to identify diagnostic metabolites that are involved in the development of the disease using topological features in the component‑reaction‑enzyme‑gene (CREG) network for MGN. Significant differential metabolites in MGN compared with healthy controls were identified using proton nuclear magnetic resonance and gas chromatography‑mass spectrometry techniques, and multivariate analysis. The CREG network for MGN was constructed, and metabolites with a high centrality and a striking fold‑change in patients, compared with healthy controls, were introduced as putative diagnostic biomarkers. In addition, a protein‑protein interaction (PPI) network, which was based on proteins associated with MGN, was built and analyzed using PPI analysis methods, including molecular complex detection and ClueGene Ontology. A total of 26 metabolites were identified as hub nodes in the CREG network, 13 of which had salient centrality and fold‑changes: Dopamine, carnosine, fumarate, nicotinamide D‑ribonucleotide, adenosine monophosphate, pyridoxal, deoxyguanosine triphosphate, L‑citrulline, nicotinamide, phenylalanine, deoxyuridine, tryptamine and succinate. A total of 13 subnetworks were identified using PPI analysis. In total, two of the clusters contained seed proteins (phenylalanine‑4‑hydroxlylase and cystathionine γ‑lyase) that were associated with MGN based on the CREG network. The following biological processes associated with MGN were identified using gene ontology analysis: 'Pyrimidine‑containing compound biosynthetic process', 'purine ribonucleoside metabolic process', 'nucleoside catabolic process', 'ribonucleoside metabolic process' and 'aromatic amino acid family metabolic process'. The results of the present study may be helpful in the diagnostic and therapeutic procedures of MGN. However, validation is required in the future.

Dataset Information

Composite score analysis for unsupervised comparison and network visualization of metabolomics data.

Publications

Composite score analysis for unsupervised comparison and network visualization of metabolomics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets