Project description:Rationale: GWAS (Genome-Wide Association Studies) have identified hundreds of genetic loci associated with atrial fibrillation (AF). However, these loci explain only a small proportion of AF heritability. Objective: To develop an approach to identify additional AF-related genes by integrating multiple omics data. Methods and Results: Three types of omics data were integrated: (1) summary statistics from the AFGen 2017 GWAS; (2) a whole blood EWAS (Epigenome-Wide Association Study) of AF; and (3) a whole blood TWAS (Transcriptome-Wide Association Study) of AF. The variant-level GWAS results were collapsed into gene-level associations using fast set-based association analysis. The CpG-level EWAS results were also collapsed into gene-level associations by an adapted SNP-set Kernel Association Test approach. Both GWAS and EWAS gene-based associations were then meta-analyzed with TWAS using a fixed-effects model weighted by the sample size of each data set. A tissue-specific network was subsequently constructed using the NetWAS (Network-Wide Association Study). The identified genes were then compared with the AFGen 2018 GWAS that contained more than triple the number of AF cases compared with AFGen 2017 GWAS. We observed that the multiomics approach identified many more relevant AF-related genes than using AFGen 2018 GWAS alone (1931 versus 206 genes). Many of these genes are involved in the development and regulation of heart- and muscle-related biological processes. Moreover, the gene set identified by multiomics approach explained much more AF variance than those identified by GWAS alone (10.4% versus 3.5%). Conclusions: We developed a strategy to integrate multiple omics data to identify AF-related genes. Our integrative approach may be useful to improve the power of traditional GWAS, which might be particularly useful for rare traits and diseases with limited sample size.
Project description:BACKGROUND: Alzheimer's disease (AD) is a major neurodegenerative disorder leading to amnesia, cognitive impairment and dementia in the elderly. Usually this type of lesions results from dysfunctional protein cooperations in the biological pathways. In addition, AD progression is known to occur in different brain regions with particular features. Thus identification and analysis of crosstalk among dysregulated pathways as well as identification of their clusters in various diseased brain regions are expected to provide deep insights into the pathogenetic mechanism. RESULTS: Here we propose a network-based systems biology approach to detect the crosstalks among AD related pathways, as well as their dysfunctions in the six brain regions of AD patients. Through constructing a network of pathways, the relationships among AD pathway and its neighbor pathways are systematically investigated and visually presented by their intersections. We found that the significance degree of pathways related to the fatal disorders and the pathway overlapping strength can indicate the impacts of these neighbored pathways to AD development. Furthermore, the crosstalks among pathways reveal some evidence that the neighbor pathways of AD pathway closely cooperate and play important tasks in the AD progression. CONCLUSIONS: Our study identifies the common and distinct features of the dysfunctional crosstalk of pathways in various AD brain regions. The global pathway crosstalk network and the clusters of relevant pathways of AD provide evidence of cooperativity among pathways for potential pathogenesis of the neuron complex disease.
Project description:BackgroundResistance to chemotherapy severely limits the effectiveness of chemotherapy drugs in treating cancer. Still, the mechanisms and critical pathways that contribute to chemotherapy resistance are relatively unknown. This study elucidates the chemoresistance-associated pathways retrieved from the integrated biological interaction networks and identifies signature genes relevant for chemotherapy resistance.MethodsAn integrated network was constructed by collecting multiple metabolic interactions from public databases and the k-shortest path algorithm was implemented to identify chemoresistant related pathways. The identified pathways were then scored using differential expression values from microarray data in chemosensitive and chemoresistant ovarian and lung cancers. Finally, another pathway database, Reactome, was used to evaluate the significance of genes within each filtered pathway based on topological characteristics.ResultsBy this method, we discovered pathways specific to chemoresistance. Many of these pathways were consistent with or supported by known involvement in chemotherapy. Experimental results also indicated that integration of pathway structure information with gene differential expression analysis can identify dissimilar modes of gene reactions between chemosensitivity and chemoresistance. Several identified pathways can increase the development of chemotherapeutic resistance and the predicted signature genes are involved in drug resistant during chemotherapy. In particular, we observed that some genes were key factors for joining two or more metabolic pathways and passing down signals, which may be potential key targets for treatment.ConclusionsThis study is expected to identify targets for chemoresistant issues and highlights the interconnectivity of chemoresistant mechanisms. The experimental results not only offer insights into the mode of biological action of drug resistance but also provide information on potential key targets (new biological hypothesis) for further drug-development efforts.
Project description:The central dogma of molecular biology delineates a unidirectional causal flow, i.e., DNA → RNA → protein → trait. Genome-wide association studies, next-generation sequencing association studies, and their meta-analyses have successfully identified ~12,000 susceptibility genetic variants that are associated with a broad array of human physiological traits. However, such conventional association studies ignore the mediate causers (i.e., RNA, protein) and the unidirectional causal pathway. Such studies may not be ideally powerful; and the genetic variants identified may not necessarily be genuine causal variants. In this article, we model the central dogma by a mediate causal model and analytically prove that the more remote an omics level is from a physiological trait, the smaller the magnitude of their correlation is. Under both random and extreme sampling schemes, we numerically demonstrate that the proteome-trait correlation test is more powerful than the transcriptome-trait correlation test, which in turn is more powerful than the genotype-trait association test. In conclusion, integrating RNA and protein expressions with DNA data and causal inference are necessary to gain a full understanding of how genetic causal variants contribute to phenotype variations.
Project description:BackgroundThere is still more to learn about the pathobiology of COVID-19. A multi-omic approach offers a holistic view to better understand the mechanisms of COVID-19. We used state-of-the-art statistical learning methods to integrate genomics, metabolomics, proteomics, and lipidomics data obtained from 123 patients experiencing COVID-19 or COVID-19-like symptoms for the purpose of identifying molecular signatures and corresponding pathways associated with the disease.ResultsWe constructed and validated molecular scores and evaluated their utility beyond clinical factors known to impact disease status and severity. We identified inflammation- and immune response-related pathways, and other pathways, providing insights into possible consequences of the disease.ConclusionsThe molecular scores we derived were strongly associated with disease status and severity and can be used to identify individuals at a higher risk for developing severe disease. These findings have the potential to provide further, and needed, insights into why certain individuals develop worse outcomes.
Project description:BackgroundChronic obstructive pulmonary disease (COPD) is a highly morbid and heterogenous disease. While COPD is defined by spirometry, many COPD characteristics are seen in cigarette smokers with normal spirometry. The extent to which COPD and COPD heterogeneity is captured in omics of lung tissue is not known.MethodsWe clustered gene expression and methylation data in 78 lung tissue samples from former smokers with normal lung function or severe COPD. We applied two integrative omics clustering methods: (1) Similarity Network Fusion (SNF) and (2) Entropy-Based Consensus Clustering (ECC).ResultsSNF clusters were not significantly different by the percentage of COPD cases (48.8% vs. 68.6%, p = 0.13), though were different according to median forced expiratory volume in one second (FEV1) % predicted (82 vs. 31, p = 0.017). In contrast, the ECC clusters showed stronger evidence of separation by COPD case status (48.2% vs. 81.8%, p = 0.013) and similar stratification by median FEV1% predicted (82 vs. 30.5, p = 0.0059). ECC clusters using both gene expression and methylation were identical to the ECC clustering solution generated using methylation data alone. Both methods selected clusters with differentially expressed transcripts enriched for interleukin signaling and immunoregulatory interactions between lymphoid and non-lymphoid cells.ConclusionsUnsupervised clustering analysis from integrated gene expression and methylation data in lung tissue resulted in clusters with modest concordance with COPD, though were enriched in pathways potentially contributing to COPD-related pathology and heterogeneity.
Project description:MotivationIn the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups.ResultsUsing simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites.Availability and implementationDIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters' choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:In solid-organ transplantation, microRNAs (miRNAs) have emerged as key players in the regulation of allograft cells function in response to injury. To gain insight into the role of miRNAs in antibody-mediated rejection, a rejection phenotype histologically defined by microvascular inflammation, kidney allograft biopsies were subjected to miRNA but also messenger RNA (mRNA) profiling. Using a unique multistep selection process specific to the BIOMARGIN study (discovery cohort, N=86; selection cohort, N=99; validation cohort, N=298), six differentially expressed miRNAs were consistently identified: miR-139-5p (down) and miR-142-3p/150-5p/155-5p/222-3p/223-3p (up). Their expression level gradually correlated with microvascular inflammation intensity. The cell specificity of miRNAs target genes was investigated by integrating their in vivo mRNA targets with single-cell RNA sequencing from an independent allograft biopsy cohort. Endothelial-derived miR-139-5p expression correlated negatively with MHC-related genes expression. Conversely, epithelial-derived miR-222-3p overexpression was strongly associated with degraded renal electrolyte homeostasis and repressed immune-related pathways. In immune cells, miR-150-5p regulated NF-κB activation in T lymphocytes whereas miR-155-5p regulated mRNA splicing in antigen-presenting cells. Altogether, integrated omics enabled us to unravel new pathways involved in microvascular inflammation and suggests that metabolism modifications in tubular epithelial cells occur as a consequence of antibody-mediated rejection, beyond the nearby endothelial compartment.
Project description:Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis.
Project description:BackgroundCancer develops when pathways controlling cell survival, cell fate or genome maintenance are disrupted by the somatic alteration of key driver genes. Understanding how pathway disruption is driven by somatic alterations is thus essential for an accurate characterization of cancer biology and identification of therapeutic targets. Unfortunately, current cancer pathway analysis methods fail to fully model the relationship between somatic alterations and pathway activity.ResultsTo address these limitations, we developed a multi-omics method for identifying biologically important pathways and genes in human cancer. Our approach combines single-sample pathway analysis with multi-stage, lasso-penalized regression to find pathways whose gene expression can be explained largely in terms of gene-level somatic alterations in the tumor. Importantly, this method can analyze case-only data sets, does not require information regarding pathway topology and supports personalized pathway analysis using just somatic alteration data for a limited number of cancer-associated genes. The practical effectiveness of this technique is illustrated through an analysis of data from The Cancer Genome Atlas using gene sets from the Molecular Signatures Database.ConclusionsNovel insights into the pathophysiology of human cancer can be obtained from statistical models that predict expression-based pathway activity in terms of non-silent somatic mutations and copy number variation. These models enable the identification of biologically important pathways and genes and support personalized pathway analysis in cases where gene expression data is unavailable.