Dataset Information

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

ABSTRACT:

Background

Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our aim was to develop a method to find the relationships between microbiome and host transcriptome data and the relevant clinical variables in a complex disease, such as Crohn's disease.

Results

We present here a method to identify interactions based on canonical correlation analysis. We show that the model is the most important factor to identify relationships between blocks using a dataset of Crohn's disease patients with longitudinal sampling. First the analysis was tested in two previously published datasets: a glioma and a Crohn's disease and ulcerative colitis dataset where we describe how to select the optimum parameters. Using such parameters, we analyzed our Crohn's disease data set. We selected the model with the highest inner average variance explained to identify relationships between transcriptome, gut microbiome and clinically relevant variables. Adding the clinically relevant variables improved the average variance explained by the model compared to multiple co-inertia analysis.

Conclusions

The methodology described herein provides a general framework for identifying interactions between sets of omic data and clinically relevant variables. Following this method, we found genes and microorganisms that were related to each other independently of the model, while others were specific to the model used. Thus, model selection proved crucial to finding the existing relationships in multi-omics datasets.

SUBMITTER: Revilla L

PROVIDER: S-EPMC7870068 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

Revilla Lluís L Mayorgas Aida A Corraliza Ana M AM Masamunt Maria C MC Metwaly Amira A Haller Dirk D Tristán Eva E Carrasco Anna A Esteve Maria M Panés Julian J Ricart Elena E Lozano Juan J JJ Salas Azucena A

PloS one 20210208 2

<h4>Background</h4>Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our a ...[more]

PMID: 33556098

Similar Datasets

Project description:Inflammatory bowel diseases (IBDs), including ulcerative colitis and Crohn's disease, affect several million individuals worldwide. These diseases are heterogeneous at the clinical, immunological and genetic levels and result from complex host and environmental interactions. Investigating drug efficacy for IBD can improve our understanding of why treatment response can vary between patients. We propose an explainable machine learning (ML) approach that combines bioinformatics and domain insight, to integrate multi-modal data and predict inter-patient variation in drug response. Using explanation of our models, we interpret the ML models' predictions to infer unique combinations of important features associated with pharmacological responses obtained during preclinical testing of drug candidates in ex vivo patient-derived fresh tissues. Our inferred multi-modal features that are predictive of drug efficacy include multi-omic data (genomic and transcriptomic), demographic, medicinal and pharmacological data. Our aim is to understand variation in patient responses before a drug candidate moves forward to clinical trials. As a pharmacological measure of drug efficacy, we measured the reduction in the release of the inflammatory cytokine TNFα from the fresh IBD tissues in the presence/absence of test drugs. We initially explored the effects of a mitogen-activated protein kinase (MAPK) inhibitor; however, we later showed our approach can be applied to other targets, test drugs or mechanisms of interest. Our best model predicted TNFα levels from demographic, medicinal and genomic features with an error of only 4.98% on unseen patients. We incorporated transcriptomic data to validate insights from genomic features. Our results showed variations in drug effectiveness (measured by ex vivo assays) between patients that differed in gender, age or condition and linked new genetic polymorphisms to patient response variation to the anti-inflammatory treatment BIRB796 (Doramapimod). Our approach models IBD drug response while also identifying its most predictive features as part of a transparent ML precision medicine strategy.

Project description:AimTo study the association between inflammatory bowel disease (IBD) and genetic variations in eosinophil protein X (EPX) and eosinophil cationic protein (ECP).MethodsDNA was extracted from ethylene diamine tetraacetic acid blood of 587 patients with Crohn's disease (CD), 592 with ulcerative colitis (UC) and 300 healthy subjects. The EPX405 (G > C, rs2013109), ECP434 (G > C, rs2073342) and ECP562 (G > C, rs2233860) gene polymorphisms were analysed, by the 5'-nuclease allelic discrimination assay. For determination of intracellular content of EPX and ECP in granulocytes, 39 blood samples was collected and extracted with a buffer containing cetyltrimethylammonium bromide. The intracellular content of EPX was analysed using an enzyme-linked immunosorbent assay. The intracellular content of ECP was analysed with the UniCAP(®) system as described by the manufacturer. Statistical tests for calculations of results were χ(2) test, Fisher's exact test, ANOVA, Student-Newman-Keuls test, and Kaplan-Meier survival curve with Log-rank test for trend, the probability values of P < 0.05 were considered statistically significant.ResultsThe genotype frequency for males with UC and with an age of disease onset of ≥ 45 years (n = 57) was for ECP434 and ECP562, GG = 37%, GC = 60%, CC = 4% and GG = 51%, GC = 49%, CC = 0% respectively. This was significantly different from the healthy subject's genotype frequencies of ECP434 (GG = 57%, GC = 38%, CC = 5%; P = 0.010) and ECP562 (GG = 68%, GC = 29%,CC = 3%; P = 0.009). The genotype frequencies for females, with an age of disease onset of ≥ 45 years with CD (n = 62), was for the ECP434 and ECP562 genotypes GG = 37%, GC = 52%, CC = 11% and GG = 48%, GC = 47% and CC = 5% respectively. This was also statistically different from healthy controls for both ECP434 (P = 0.010) and ECP562 (P = 0.013). The intracellular protein concentration of EPX and ECP was calculated in μg/10(6) eosinophils and then correlated to the EPX 405 genotypes. The protein content of EPX was highest in the patients with the CC genotype of EPX405 (GG = 4.65, GC = 5.93, and CC = 6.57) and for ECP in the patients with the GG genotype of EPX405 (GG = 2.70, GC = 2.47 and CC = 1.90). ANOVA test demonstrated a difference in intracellular protein content for EPX (P = 0.009) and ECP (P = 0.022). The age of disease onset was linked to haplotypes of the EPX405, ECP434 and ECP562 genotypes. Kaplan Maier curve showed a difference between haplotype distributions for the females with CD (P = 0.003). The highest age of disease onset was seen in females with the EPX405CC, ECP434GC, ECP562CC haplotype (34 years) and the lowest in females with the EPX405GC, ECP434GC, ECP562GG haplotype (21 years). For males with UC there was also a difference between the highest and lowest age of the disease onset (EPX405CC, ECP434CC, ECP562CC, mean 24 years vs EPX405GC, ECP434GC, ECP562GG, mean 34 years, P = 0.0009). The relative risk for UC patients with ECP434 or ECP562-GC/CC genotypes to develop dysplasia/cancer was 2.5 (95%CI: 1.2-5.4, P = 0.01) and 2.5 (95%CI: 1.1-5.4, P = 0.02) respectively, compared to patients carrying the GG-genotypes.ConclusionPolymorphisms of EPX and ECP are associated to IBD in an age and gender dependent manner, suggesting an essential role of eosinophils in the pathophysiology of IBD.

Project description:BackgroundInflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and related mechanisms could be investigated through the multi-omic comparative analysis of many inflammatory diseases. Therefore, it is important to use molecular data to study the genesis of the diseases. Here we propose a new methodology to study the relationships between inflammatory diseases and signalling molecules whose dysregulation at molecular levels could lead to systemic pathological events observed in inflammatory diseases.ResultsWe first perform an exploratory analysis of gene expression data of a number of diseases that involve a strong inflammatory component. The comparison of gene expression between disease and healthy samples reveals the importance of members of gene families coding for signalling factors. Next, we focus on interested signalling gene families and a subset of inflammation related diseases with multi-omic features including both gene expression and DNA methylation. We introduce a phylogenetic-based multi-omic method to study the relationships between multi-omic features of inflammation related diseases by integrating gene expression, DNA methylation through sequence based phylogeny of the signalling gene families. The models of adaptations between gene expression and DNA methylation can be inferred from pre-estimated evolutionary relationship of a gene family. Members of the gene family whose expression or methylation levels significantly deviate from the model are considered as the potential disease associated genes.ConclusionsApplying the methodology to four gene families (the chemokine receptor family, the TNF receptor family, the TGF- β gene family, the IL-17 gene family) in nine inflammation related diseases, we identify disease associated genes which exhibit significant dysregulation in gene expression or DNA methylation in the inflammation related diseases, which provides clues for functional associations between the diseases.

Project description:Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.

Dataset Information

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

Background

Results

Conclusions

Publications

Multi-omic modelling of inflammatory bowel disease with regularized canonical correlation analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets