Project description:Genetics and omics studies of Alzheimer's disease and other dementia subtypes enhance our understanding of underlying mechanisms and pathways that can be targeted. We identified key remaining challenges: First, can we enhance genetic studies to address missing heritability? Can we identify reproducible omics signatures that differentiate between dementia subtypes? Can high-dimensional omics data identify improved biomarkers? How can genetics inform our understanding of causal status of dementia risk factors? And which biological processes are altered by dementia-related genetic variation? Artificial intelligence (AI) and machine learning approaches give us powerful new tools in helping us to tackle these challenges, and we review possible solutions and examples of best practice. However, their limitations also need to be considered, as well as the need for coordinated multidisciplinary research and diverse deeply phenotyped cohorts. Ultimately AI approaches improve our ability to interrogate genetics and omics data for precision dementia medicine. HIGHLIGHTS: We have identified five key challenges in dementia genetics and omics studies. AI can enable detection of undiscovered patterns in dementia genetics and omics data. Enhanced and more diverse genetics and omics datasets are still needed. Multidisciplinary collaborative efforts using AI can boost dementia research.
Project description:Enzyme research is important for the development of various scientific fields such as medicine and biotechnology. Enzyme databases facilitate this research by providing a wide range of information relevant to research planning and data analysis. Over the years, various databases that cover different aspects of enzyme biology (e.g., kinetic parameters, enzyme occurrence, and reaction mechanisms) have been developed. Most of the databases are curated manually, which improves reliability of the information; however, such curation cannot keep pace with the exponential growth in published data. Lack of data standardization is another obstacle for data extraction and analysis. Improving machine readability of databases is especially important in the light of recent advances in deep learning algorithms that require big training datasets. This review provides information regarding the current state of enzyme databases, especially in relation to the ever-increasing amount of generated research data and recent advancements in artificial intelligence algorithms. Furthermore, it describes several enzyme databases, providing the reader with necessary information for their use.
Project description:Researchers increasingly turn to explainable artificial intelligence (XAI) to analyze omics data and gain insights into the underlying biological processes. Yet, given the interdisciplinary nature of the field, many findings have only been shared in their respective research community. An overview of XAI for omics data is needed to highlight promising approaches and help detect common issues. Toward this end, we conducted a systematic mapping study. To identify relevant literature, we queried Scopus, PubMed, Web of Science, BioRxiv, MedRxiv and arXiv. Based on keywording, we developed a coding scheme with 10 facets regarding the studies' AI methods, explainability methods and omics data. Our mapping study resulted in 405 included papers published between 2010 and 2023. The inspected papers analyze DNA-based (mostly genomic), transcriptomic, proteomic or metabolomic data by means of neural networks, tree-based methods, statistical methods and further AI methods. The preferred post-hoc explainability methods are feature relevance (n = 166) and visual explanation (n = 52), while papers using interpretable approaches often resort to the use of transparent models (n = 83) or architecture modifications (n = 72). With many research gaps still apparent for XAI for omics data, we deduced eight research directions and discuss their potential for the field. We also provide exemplary research questions for each direction. Many problems with the adoption of XAI for omics data in clinical practice are yet to be resolved. This systematic mapping study outlines extant research on the topic and provides research directions for researchers and practitioners.
Project description:Rapid development of biotechnology has led to the generation of vast amounts of multi-omics data, necessitating the advancement of bioinformatics and artificial intelligence to enable computational modeling to diagnose and predict clinical outcome. Both conventional machine learning and new deep learning algorithms screen existing data unbiasedly to uncover patterns and create models that can be valuable in informing clinical decisions. We summarized published literature on the use of AI models trained on omics datasets, with and without clinical data, to diagnose, risk-stratify, and predict survivability of patients with non-malignant liver diseases. A total of 20 different models were tested in selected studies. Generally, the addition of omics data to regular clinical parameters or individual biomarkers improved the AI model performance. For instance, using NAFLD fibrosis score to distinguish F0-F2 from F3-F4 fibrotic stages, the area under the curve (AUC) was 0.87. When integrating metabolomic data by a GMLVQ model, the AUC drastically improved to 0.99. The use of RF on multi-omics and clinical data in another study to predict progression of NAFLD to NASH resulted in an AUC of 0.84, compared to 0.82 when using clinical data only. A comparison of RF, SVM and kNN models on genomics data to classify immune tolerant phase in chronic hepatitis B resulted in AUC of 0.8793-0.8838 compared to 0.6759-0.7276 when using various serum biomarkers. Overall, the integration of omics was shown to improve prediction performance compared to models built only on clinical parameters, indicating a potential use for personalized medicine in clinical setting.
Project description:Legumes are a better source of proteins and are richer in diverse micronutrients over the nutritional profile of widely consumed cereals. However, when exposed to a diverse range of abiotic stresses, their overall productivity and quality are hugely impacted. Our limited understanding of genetic determinants and novel variants associated with the abiotic stress response in food legume crops restricts its amelioration. Therefore, it is imperative to understand different molecular approaches in food legume crops that can be utilized in crop improvement programs to minimize the economic loss. 'Omics'-based molecular breeding provides better opportunities over conventional breeding for diversifying the natural germplasm together with improving yield and quality parameters. Due to molecular advancements, the technique is now equipped with novel 'omics' approaches such as ionomics, epigenomics, fluxomics, RNomics, glycomics, glycoproteomics, phosphoproteomics, lipidomics, regulomics, and secretomics. Pan-omics-which utilizes the molecular bases of the stress response to identify genes (genomics), mRNAs (transcriptomics), proteins (proteomics), and biomolecules (metabolomics) associated with stress regulation-has been widely used for abiotic stress amelioration in food legume crops. Integration of pan-omics with novel omics approaches will fast-track legume breeding programs. Moreover, artificial intelligence (AI)-based algorithms can be utilized for simulating crop yield under changing environments, which can help in predicting the genetic gain beforehand. Application of machine learning (ML) in quantitative trait loci (QTL) mining will further help in determining the genetic determinants of abiotic stress tolerance in pulses.
Project description:We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.
Project description:Overcoming the growing challenge of antimicrobial resistance (AMR), which affects millions of people worldwide, has driven attention for the exploration of marine-derived antimicrobial peptides (AMPs) for innovative solutions. Cnidarians, such as corals, sea anemones, and jellyfish, are a promising valuable resource of these bioactive peptides due to their robust innate immune systems yet are still poorly explored. Hence, we employed an in silico proteolysis strategy to search for novel AMPs from omics data of 111 Cnidaria species. Millions of peptides were retrieved and screened using shallow- and deep-learning models, prioritizing AMPs with a reduced toxicity and with a structural distinctiveness from characterized AMPs. After complex network analysis, a final dataset of 3130 Cnidaria singular non-haemolytic and non-toxic AMPs were identified. Such unique AMPs were mined for their putative antibacterial activity, revealing 20 favourable candidates for in vitro testing against important ESKAPEE pathogens, offering potential new avenues for antibiotic development.
Project description:Cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) plays a pivotal role in preventing autoimmunity and fostering anticancer immunity by interacting with B7 proteins CD80 and CD86. CTLA-4 is the first immune checkpoint targeted with a monoclonal antibody inhibitor. Checkpoint inhibitors have generated durable responses in many cancer patients, representing a revolutionary milestone in cancer immunotherapy. However, therapeutic efficacy is limited to a small portion of patients, and immune-related adverse events are noteworthy, especially for monoclonal antibodies directed against CTLA-4. Previously, small molecules have been developed to impair the CTLA-4: CD80 interaction; however, they directly targeted CD80 and not CTLA-4. In this study, we performed artificial intelligence (AI)-powered virtual screening of approximately ten million compounds to target CTLA-4. We validated primary hits with biochemical, biophysical, immunological, and experimental animal assays. We then optimized lead compounds and obtained inhibitors with an inhibitory concentration of 1 micromole in disrupting the interaction between CTLA-4 and CD80. Unlike ipilimumab, these small molecules did not degrade CTLA-4. Several compounds inhibited tumor development prophylactically and therapeutically in syngeneic and CTLA-4-humanized mice. This project supports an AI-based framework in designing small molecules targeting immune checkpoints for cancer therapy.
Project description:The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .