Browse
Submit Data
Databases
API
Help

Dataset Information

16 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets.

ABSTRACT: The assessment of protein-ligand interactions is critical at early stage of drug discovery. Computational approaches for efficiently predicting such interactions facilitate drug development. Recently, methods based on deep learning, including structure- and sequence-based models, have achieved impressive performance on several different datasets. However, their application still suffers from a generalizability issue because of insufficient data, especially for structure based models, as well as a heterogeneity problem because of different label measurements and varying proteins across datasets. Here, we present an interpretable multi-task model to evaluate protein-ligand interaction (Multi-PLI). The model can run classification (binding or not) and regression (binding affinity) tasks concurrently by unifying different datasets. The model outperforms traditional docking and machine learning on both binary classification and regression tasks and achieves competitive results compared with some structure-based deep learning methods, even with the same training set size. Furthermore, combined with the proposed occlusion algorithm, the model can predict the important amino acids of proteins that are crucial for binding, thus providing a biological interpretation.

SUBMITTER: Hu F

PROVIDER: S-EPMC8051026 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Biologically interpretable multi-task deep learning pipeline predicts molecular alterations, grade, and prognosis in glioma patients.

Project description:Deep learning models have been developed for various predictions in glioma; yet, they were constrained by manual segmentation, task-specific design, or a lack of biological interpretation. Herein, we aimed to develop an end-to-end multi-task deep learning (MDL) pipeline that can simultaneously predict molecular alterations and histological grade (auxiliary tasks), as well as prognosis (primary task) in gliomas. Further, we aimed to provide the biological mechanisms underlying the model's predictions. We collected multiscale data including baseline MRI images from 2776 glioma patients across two private (FAHZU and HPPH, n = 1931) and three public datasets (TCGA, n = 213; UCSF, n = 410; and EGD, n = 222). We trained and internally validated the MDL model using our private datasets, and externally validated it using the three public datasets. We used the model-predicted deep prognosis score (DPS) to stratify patients into low-DPS and high-DPS subtypes. Additionally, a radio-multiomics analysis was conducted to elucidate the biological basis of the DPS. In the external validation cohorts, the MDL model achieved average areas under the curve of 0.892-0.903, 0.710-0.894, and 0.850-0.879 for predicting IDH mutation status, 1p/19q co-deletion status, and tumor grade, respectively. Moreover, the MDL model yielded a C-index of 0.723 in the TCGA and 0.671 in the UCSF for the prediction of overall survival. The DPS exhibits significant correlations with activated oncogenic pathways, immune infiltration patterns, specific protein expression, DNA methylation, tumor mutation burden, and tumor-stroma ratio. Accordingly, our work presents an accurate and biologically meaningful tool for predicting molecular subtypes, tumor grade, and survival outcomes in gliomas, which provides personalized clinical decision-making in a global and non-invasive manner.

| S-EPMC11329669 | biostudies-literature

Fully interpretable deep learning model of transcriptional control.

Project description:MotivationThe universal expressibility assumption of Deep Neural Networks (DNNs) is the key motivation behind recent worksin the systems biology community to employDNNs to solve important problems in functional genomics and moleculargenetics. Typically, such investigations have taken a 'black box' approach in which the internal structure of themodel used is set purely by machine learning considerations with little consideration of representing the internalstructure of the biological system by the mathematical structure of the DNN. DNNs have not yet been applied to thedetailed modeling of transcriptional control in which mRNA production is controlled by the binding of specific transcriptionfactors to DNA, in part because such models are in part formulated in terms of specific chemical equationsthat appear different in form from those used in neural networks.ResultsIn this paper, we give an example of a DNN whichcan model the detailed control of transcription in a precise and predictive manner. Its internal structure is fully interpretableand is faithful to underlying chemistry of transcription factor binding to DNA. We derive our DNN from asystems biology model that was not previously recognized as having a DNN structure. Although we apply our DNNto data from the early embryo of the fruit fly Drosophila, this system serves as a test bed for analysis of much larger datasets obtained by systems biology studies on a genomic scale. .Availability and implementationThe implementation and data for the models used in this paper are in a zip file in the supplementary material.Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC7355248 | biostudies-literature

Using interpretable deep learning to model cancer dependencies.

Project description:MotivationCancer dependencies provide potential drug targets. Unfortunately, dependencies differ among cancers and even individuals. To this end, visible neural networks (VNNs) are promising due to robust performance and the interpretability required for the biomedical field.ResultsWe design Biological visible neural network (BioVNN) using pathway knowledge to predict cancer dependencies. Despite having fewer parameters, BioVNN marginally outperforms traditional neural networks (NNs) and converges faster. BioVNN also outperforms an NN based on randomized pathways. More importantly, dependency predictions can be explained by correlating with the neuron output states of relevant pathways, which suggest dependency mechanisms. In feature importance analysis, BioVNN recapitulates known reaction partners and proposes new ones. Such robust and interpretable VNNs may facilitate the understanding of cancer dependency and the development of targeted therapies.Availability and implementationCode and data are available at https://github.com/LichtargeLab/BioVNN.Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC8428607 | biostudies-literature

Multi-task learning uncovers robust translation cis-regulatory features

Project description:To validate the sequence motifs identified by our multi-task learning model MTtrans, a new 5' UTR library with around 8,000 synthetic 5'UTRs was built to express EGFP. The reads count was used as a proxy of translation rate here to validate the estimated regulatory effect of motifs that we inferred from multiple datasets, proving the robustness of the sequence motifs.

2022-04-28 | GSE201766 | GEO

Predicting adverse drug reactions through interpretable deep learning framework.

Project description:BACKGROUND:Adverse drug reactions (ADRs) are unintended and harmful reactions caused by normal uses of drugs. Predicting and preventing ADRs in the early stage of the drug development pipeline can help to enhance drug safety and reduce financial costs. METHODS:In this paper, we developed machine learning models including a deep learning framework which can simultaneously predict ADRs and identify the molecular substructures associated with those ADRs without defining the substructures a-priori. RESULTS:We evaluated the performance of our model with ten different state-of-the-art fingerprint models and found that neural fingerprints from the deep learning model outperformed all other methods in predicting ADRs. Via feature analysis on drug structures, we identified important molecular substructures that are associated with specific ADRs and assessed their associations via statistical analysis. CONCLUSIONS:The deep learning model with feature analysis, substructure identification, and statistical assessment provides a promising solution for identifying risky components within molecular structures and can potentially help to improve drug safety evaluation.

| S-EPMC6300887 | biostudies-other

A multi-task, multi-stage deep transfer learning model for early prediction of neurodevelopment in very preterm infants.

Project description:Survivors following very premature birth (i.e., ≤ 32 weeks gestational age) remain at high risk for neurodevelopmental impairments. Recent advances in deep learning techniques have made it possible to aid the early diagnosis and prognosis of neurodevelopmental deficits. Deep learning models typically require training on large datasets, and unfortunately, large neuroimaging datasets with clinical outcome annotations are typically limited, especially in neonates. Transfer learning represents an important step to solve the fundamental problem of insufficient training data in deep learning. In this work, we developed a multi-task, multi-stage deep transfer learning framework using the fusion of brain connectome and clinical data for early joint prediction of multiple abnormal neurodevelopmental (cognitive, language and motor) outcomes at 2 years corrected age in very preterm infants. The proposed framework maximizes the value of both available annotated and non-annotated data in model training by performing both supervised and unsupervised learning. We first pre-trained a deep neural network prototype in a supervised fashion using 884 older children and adult subjects, and then re-trained this prototype using 291 neonatal subjects without supervision. Finally, we fine-tuned and validated the pre-trained model using 33 preterm infants. Our proposed model identified very preterm infants at high-risk for cognitive, language, and motor deficits at 2 years corrected age with an area under the receiver operating characteristic curve of 0.86, 0.66 and 0.84, respectively. Employing such a deep learning model, once externally validated, may facilitate risk stratification at term-equivalent age for early identification of long-term neurodevelopmental deficits and targeted early interventions to improve clinical outcomes in very preterm infants.

| S-EPMC7492237 | biostudies-literature

OmiEmbed: A Unified Multi-Task Deep Learning Framework for Multi-Omics Data.

Project description:High-dimensional omics data contain intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data, due to the large number of molecular features and small number of available samples, which is also called "the curse of dimensionality" in machine learning. To tackle this problem and pave the way for machine learning-aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed supports multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy compared to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various applications of high-dimensional omics data and has great potential to facilitate more accurate and personalised clinical decision making.

| S-EPMC8235477 | biostudies-literature

Improving Protein-Ligand Interaction Modeling with cryo-EM Data, Templates, and Deep Learning in 2021 Ligand Model Challenge.

Project description:Elucidating protein-ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein-ligand interaction is traditionally tackled by molecular docking and simulation, which is based on physical forces and statistical potentials and cannot effectively leverage cryo-EM data and existing protein structural information in the protein-ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein-ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein-ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. These results demonstrate that the deep learning bioinformatics approach is a promising direction for modeling protein-ligand interactions on cryo-EM data using prior structural information.

| S-EPMC9855343 | biostudies-literature

A Multi-Omics Interpretable Machine Learning Model Reveals Modes of Action of Small Molecules

Project description:This SuperSeries is composed of the SubSeries listed below.

2020-01-22 | GSE129144 | GEO

Highly Scalable Task Grouping for Deep Multi-Task Learning in Prediction of Epigenetic Events.

Project description:DNNs trained for predicting cellular events from DNA sequence have become emerging tools to help elucidate biological mechanisms underlying associations identified in genome-wide association studies. To enhance the training, multi-task learning (MTL) has been commonly exploited in previous works where trained networks were needed for multiple profiles differing in either event modality or cell type. All existing works adopted a simple MTL framework where all tasks share a single feature extraction network. Such a strategy even though effective to a certain extent leads to substantial negative transfer, meaning the existence of a large portion of tasks for which models obtained through MTL perform worse than those by single-task learning. There have been methods developed to address such negative transfer in other domains, such as computer vision. However, these methods are generally with limited scalability. In this paper, we propose a highly scalable task grouping framework to address negative transfer by only jointly training tasks that are potentially beneficial to each other. The proposed method exploits the network weights associated with task-specific classification heads that can be cheaply obtained by one-time joint training of all tasks. Our results using a dataset consisting of 367 epigenetic profiles demonstrate the effectiveness of the proposed approach and its superiority over baseline methods.

| S-EPMC10779439 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data