Project description:Target protein-based drug and research compound discovery has been undeniably successful strategy in life science research, yet many diseases and biological processes lack obvious targets to enable these approaches. Here, to overcome this major challenge we have developed a deep-learning based efficacy prediction system (DLEPS) to identify potent agents to treat diverse diseases; DLEPS was trained using L1000 project chemical induced “changes of transcriptional profiles” (CTP) data as input. Strikingly, we found that the CTPs for previously unexamined molecules were precisely predicted (0.74 Pearson correlation coefficient). We used DLEPS to examine 4 disorders, and experimentally validated that perillen, chikusetsusaponin IV, trametinib, and liquiritin confer disease-relevant impacts against obesity, hyperuricemia, NASH, and COVID-19, respectively. Importantly, DLEPS also uncovered the biological insight that the MEK-ERK signaling pathway should be understood as a target for developing anti-NASH agents. Beyond illustrating that DLEPS is an effective tool for drug repurposing and development with diverse diseases (including those lacking targets), our study shows how diverse transcriptomics datasets can be harnessed to identify inhibitors and activator chemicals that can expand the scope of biological investigations well beyond mutant-bases analyses.
Project description:Target protein-based drug and research compound discovery has been undeniably successful strategy in life science research, yet many diseases and biological processes lack obvious targets to enable these approaches. Here, to overcome this major challenge we have developed a deep-learning based efficacy prediction system (DLEPS) to identify potent agents to treat diverse diseases; DLEPS was trained using L1000 project chemical induced “changes of transcriptional profiles” (CTP) data as input. Strikingly, we found that the CTPs for previously unexamined molecules were precisely predicted (0.74 Pearson correlation coefficient). We used DLEPS to examine 4 disorders, and experimentally validated that perillen, chikusetsusaponin IV, trametinib, and liquiritin confer disease-relevant impacts against obesity, hyperuricemia, NASH, and COVID-19, respectively. Importantly, DLEPS also uncovered the biological insight that the MEK-ERK signaling pathway should be understood as a target for developing anti-NASH agents. Beyond illustrating that DLEPS is an effective tool for drug repurposing and development with diverse diseases (including those lacking targets), our study shows how diverse transcriptomics datasets can be harnessed to identify inhibitors and activator chemicals that can expand the scope of biological investigations well beyond mutant-bases analyses.
Project description:Target protein-based drug and research compound discovery has been undeniably successful strategy in life science research, yet many diseases and biological processes lack obvious targets to enable these approaches. Here, to overcome this major challenge we have developed a deep-learning based efficacy prediction system (DLEPS) to identify potent agents to treat diverse diseases; DLEPS was trained using L1000 project chemical induced “changes of transcriptional profiles” (CTP) data as input. Strikingly, we found that the CTPs for previously unexamined molecules were precisely predicted (0.74 Pearson correlation coefficient). We used DLEPS to examine 4 disorders, and experimentally validated that perillen, chikusetsusaponin IV, trametinib, and liquiritin confer disease-relevant impacts against obesity, hyperuricemia, NASH, and COVID-19, respectively. Importantly, DLEPS also uncovered the biological insight that the MEK-ERK signaling pathway should be understood as a target for developing anti-NASH agents. Beyond illustrating that DLEPS is an effective tool for drug repurposing and development with diverse diseases (including those lacking targets), our study shows how diverse transcriptomics datasets can be harnessed to identify inhibitors and activator chemicals that can expand the scope of biological investigations well beyond mutant-bases analyses.
Project description:Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: despite providing accurate predictions, they cannot describe how they arrived at their predictions. Here, using an ``interpretable-by-design'' approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed novel components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
Project description:Prediction of protein localization plays an important role in understanding protein function and mechanism. A deep learning-based localization prediction tool (“MULocDeep”) assessing each amino acid’s contribution to the localization process provides insights into the mechanism of protein sorting and localization motifs. A dataset with 45 sub-organellar localization annotations under 10 major sub-cellular compartments was produced and the tool was tested on an independent dataset of mitochondrial proteins that were extracted from Arabidopsis thaliana cell cultures, Solanum tuberosum tubers, and Vicia faba roots, and analyzed by shotgun mass spectrometry.