Project description:PurposeOsteoarthritis (OA) stands as the most prevalent joint disorder. Mitochondrial dysfunction has been linked to the pathogenesis of OA. The main goal of this study is to uncover the pivotal role of mitochondria in the mechanisms driving OA development.Materials and methodsWe acquired seven bulk RNA-seq datasets from the Gene Expression Omnibus (GEO) database and examined the expression levels of differentially expressed genes related to mitochondria in OA. We utilized single-sample gene set enrichment analysis (ssGSEA), gene set enrichment analysis (GSEA), and weighted gene co-expression network analysis (WGCNA) analyses to explore the functional mechanisms associated with these genes. Seven machine learning algorithms were utilized to identify hub mitochondria-related genes and develop a predictive model. Further analyses included pathway enrichment, immune infiltration, gene-disease relationships, and mRNA-miRNA network construction based on these hub mitochondria-related genes. genome-wide association studies (GWAS) analysis was performed using the Gene Atlas database. GSEA, gene set variation analysis (GSVA), protein pathway analysis, and WGCNA were employed to investigate relevant pathways in subtypes. The Harmonizome database was employed to analyze the expression of hub mitochondria-related genes across various human tissues. Single-cell data analysis was conducted to examine patterns of gene expression distribution and pseudo-temporal changes. Additionally, The real-time polymerase chain reaction (RT-PCR) was used to validate the expression of these hub mitochondria-related genes.ResultsIn OA, the mitochondria-related pathway was significantly activated. Nine hub mitochondria-related genes (SIRT4, DNAJC15, NFS1, FKBP8, SLC25A37, CARS2, MTHFD2, ETFDH, and PDK4) were identified. They constructed predictive models with good ability to predict OA. These genes are primarily associated with macrophages. Unsupervised consensus clustering identified two mitochondria-associated isoforms that are primarily associated with metabolism. Single-cell analysis showed that they were all expressed in single cells and varied with cell differentiation. RT-PCR showed that they were all significantly expressed in OA.ConclusionSIRT4, DNAJC15, NFS1, FKBP8, SLC25A37, CARS2, MTHFD2, ETFDH, and PDK4 are potential mitochondrial target genes for studying OA. The classification of mitochondria-associated isoforms could help to personalize treatment for OA patients.
Project description:Background: Lung adenocarcinoma is a common malignant tumor that ranks second in the world and has a high mortality rate. G protein-coupled receptors (GPCRs) have been reported to play an important role in cancer; however, G protein-coupled receptor-associated features have not been adequately investigated. Methods: In this study, GPCR-related genes were screened at single-cell and bulk transcriptome levels based on AUcell, single-sample gene set enrichment analysis (ssGSEA) and weighted gene co-expression network (WGCNA) analysis. And a new machine learning framework containing 10 machine learning algorithms and their multiple combinations was used to construct a consensus G protein-coupled receptor-related signature (GPCRRS). GPCRRS was validated in the training set and external validation set. We constructed GPCRRS-integrated nomogram clinical prognosis prediction tools. Multi-omics analyses included genomics, single-cell transcriptomics, and bulk transcriptomics to gain a more comprehensive understanding of prognostic features. We assessed the response of risk subgroups to immunotherapy and screened for personalized drugs targeting specific risk subgroups. Finally, the expression of key GPCRRS genes was verified by RT-qPCR. Results: In this study, we identified 10 GPCR-associated genes that were significantly associated with the prognosis of lung adenocarcinoma by single-cell transcriptome and bulk transcriptome. Univariate and multivariate showed that the survival rate was higher in low risk than in high risk, which also suggested that the model was an independent prognostic factor for LUAD. In addition, we observed significant differences in biological function, mutational landscape, and immune cell infiltration in the tumor microenvironment between high and low risk groups. Notably, immunotherapy was also relevant in the high and low risk groups. In addition, potential drugs targeting specific risk subgroups were identified. Conclusion: In this study, we constructed and validated a lung adenocarcinoma G protein-coupled receptor-related signature, which has an important role in predicting the prognosis of lung adenocarcinoma and the effect of immunotherapy. It is hypothesized that LDHA, GPX3 and DOCK4 are new potential targets for lung adenocarcinoma, which can achieve breakthroughs in prognosis prediction, targeted prevention and treatment of lung adenocarcinoma and provide important guidance for anti-tumor.
Project description:MotivationMachine learning (ML) methods are motivated by the need to automate information extraction from large datasets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on ML multi-omic data integration and quantitatively explored the goals, techniques and data involved in this field. We were particularly interested in examining how researchers use ML to deal with the volume and complexity of these datasets.ResultsOur main finding is that the methods used are those that address the challenges of datasets with few samples and many features. Dimensionality reduction methods are used to reduce the feature count alongside models that can also appropriately handle relatively few samples. Popular techniques include autoencoders, random forests and support vector machines. We also found that the field is heavily influenced by the use of The Cancer Genome Atlas dataset, which is accessible and contains many diverse experiments.Availability and implementationAll data and processing scripts are available at this GitLab repository: https://gitlab.com/polavieja_lab/ml_multi-omics_review/ or in Zenodo: https://doi.org/10.5281/zenodo.7361807.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Objectives: This study aimed to identify novel biomarkers for osteoarthritis (OA) and explore potential pathological immune cell infiltration. Methods: We identified differentially expressed genes (DEGs) between OA and normal synovial tissues using the limma package in R, and performed enrichment analyses to understand the functions and enriched pathways of DEGs. Weighted gene co-expression network analysis (WGCNA) and distinct machine-learning algorithms were then used to identify hub modules and candidate biomarkers. We assessed the diagnostic value of the candidate biomarkers using receiver operating characteristic (ROC) analysis. We then used the CIBERSORT algorithm to analyze immune cell infiltration patterns, and the Wilcoxon test to screen out hub immune cells that might affect OA occurrence. Finally, the expression levels of hub biomarkers were confirmed by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Results: We identified 102 up-regulated genes and 110 down-regulated genes. The functional enrichment analysis results showed that DEGs are enriched mainly in immune response pathways. Combining the results of the algorithms and ROC analysis, we identified GUCA1A and NELL1 as potential diagnostic biomarkers for OA, and validated their diagnosibility using an external dataset. Construction of a TF-mRNA-miRNA network enabled prediction of potential candidate compounds targeting hub biomarkers. Immune cell infiltration analyses revealed the expression of hub biomarkers to be correlated with CD8 T cells, memory B cells, M0/M2 macrophages, resting mast cells and resting dendritic cells. qRT-PCR results showed both GUCA1A and NELL1 were significantly increased in OA samples (p < 0.01). All validations are consistent with the microarray hybridization, indicating that GUCA1A and NELL1 may be involved in the pathogenesis of OA. Conclusion: The findings suggest that GUCA1A and NELL1, closely related to OA occurrence and progression, represent new OA candidate markers, and that immune cell infiltration plays a significant role in the progression of OA.
Project description:Background: Cancer stem cells (CSCs), which are characterized by self-renewal and plasticity, are highly correlated with tumor metastasis and drug resistance. To fully understand the role of CSCs in colorectal cancer (CRC), we evaluated the stemness traits and prognostic value of stemness-related genes in CRC. Methods: In this study, the data from 616 CRC patients from The Cancer Genome Atlas (TCGA) were assessed and subtyped based on the mRNA expression-based stemness index (mRNAsi). The correlations of cancer stemness with the immune microenvironment, tumor mutational burden (TMB), and N6-methyladenosine (m6A) RNA methylation regulators were analyzed. Weighted gene co-expression network analysis (WGCNA) was performed to identify the crucial stemness-related genes and modules. Furthermore, a prognostic expression signature was constructed using the Lasso-penalized Cox regression analysis. The signature was validated via multiplex immunofluorescence staining of tissue samples in an independent cohort of 48 CRC patients. Results: This study suggests that high-mRNAsi scores are associated with poor overall survival in stage IV CRC patients. Moreover, the levels of TMB and m6A RNA methylation regulators were positively correlated with mRNAsi scores, and low-mRNAsi scores were characterized by increased immune activity in CRC. The analysis identified 34 key genes as candidate prognosis biomarkers. Finally, a three-gene prognostic signature (PARPBP, KNSTRN, and KIF2C) was explored together with specific clinical features to construct a nomogram, which was successfully validated in an external cohort. Conclusion: There is a unique correlation between CSCs and the prognosis of CRC patients, and the novel biomarkers related to cell stemness could accurately predict the clinical outcomes of these patients.
Project description:BackgroundLncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases.ResultsIn this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs.ConclusionsCompared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.
Project description:Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).
Project description:High-throughput screening and gene signature analyses frequently identify lead therapeutic compounds with unknown modes of action (MoAs), and the resulting uncertainties can lead to the failure of clinical trials. We developed a multi-omics approach for uncovering MoAs through an interpretable machine learning model of the effects of compounds on transcriptomic, epigenomic, metabolomic, and proteomic data. We applied this approach to examine compounds with beneficial effects in models of Huntington’s disease, finding common MoAs for previously unrelated compounds that were not predicted based on similarities in the compounds’ structures, connectivity scores, or binding targets. We experimentally validated two such disease-relevant MoAs, autophagy activation and bioenergetics manipulation. This interpretable machine learning approach can be used to find and evaluate MoAs in future drug development efforts.
Project description:High-throughput screening and gene signature analyses frequently identify lead therapeutic compounds with unknown modes of action (MoAs), and the resulting uncertainties can lead to the failure of clinical trials. We developed a multi-omics approach for uncovering MoAs through an interpretable machine learning model of the effects of compounds on transcriptomic, epigenomic, metabolomic, and proteomic data. We applied this approach to examine compounds with beneficial effects in models of Huntington’s disease, finding common MoAs for previously unrelated compounds that were not predicted based on similarities in the compounds’ structures, connectivity scores, or binding targets. We experimentally validated two such disease-relevant MoAs, autophagy activation and bioenergetics manipulation. This interpretable machine learning approach can be used to find and evaluate MoAs in future drug development efforts.