Dataset Information

CANTARE: finding and visualizing network-based multi-omic predictive models.

ABSTRACT:

Background

One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be "ome aware." Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs.

Methods

We present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting "top table" of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements).

Results

We applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10^-5). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User's Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/ .

Conclusion

CANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.

SUBMITTER: Siebert JC

PROVIDER: S-EPMC7896366 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

CANTARE: finding and visualizing network-based multi-omic predictive models.

Siebert Janet C JC Saint-Cyr Martine M Borengasser Sarah J SJ Wagner Brandie D BD Lozupone Catherine A CA Görg Carsten C

BMC bioinformatics 20210219 1

<h4>Background</h4>One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be "ome aware." Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but ...[more]

PMID: 33607938

Similar Datasets

Project description:To better understand dynamic disease processes, integrated multi-omic methods are needed, yet comparing different types of omic data remains difficult. Integrative solutions benefit experimenters by eliminating potential biases that come with single omic analysis. We have developed the methods needed to explore whether a relationship exists between co-expression network models built from transcriptomic and proteomic data types, and whether this relationship can be used to improve the disease signature discovery process. A naïve, correlation based method is utilized for comparison. Using publicly available infectious disease time series data, we analyzed the related co-expression structure of the transcriptome and proteome in response to SARS-CoV infection in mice. Transcript and peptide expression data was filtered using quality scores and subset by taking the intersection on mapped Entrez IDs. Using this data set, independent co-expression networks were built. The networks were integrated by constructing a bipartite module graph based on module member overlap, module summary correlation, and correlation to phenotypes of interest. Compared to the module level results, the naïve approach is hindered by a lack of correlation across data types, less significant enrichment results, and little functional overlap across data types. Our module graph approach avoids these problems, resulting in an integrated omic signature of disease progression, which allows prioritization across data types for down-stream experiment planning. Integrated modules exhibited related functional enrichments and could suggest novel interactions in response to infection. These disease and platform-independent methods can be used to realize the full potential of multi-omic network signatures. The data (experiment SM001) are publically available through the NIAID Systems Virology (https://www.systemsvirology.org) and PNNL (http://omics.pnl.gov) web portals. Phenotype data is found in the supplementary information. The ProCoNA package is available as part of Bioconductor 2.13.

Dataset Information

CANTARE: finding and visualizing network-based multi-omic predictive models.

Background

Methods

Results

Conclusion

Publications

CANTARE: finding and visualizing network-based multi-omic predictive models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets