Dataset Information

GNL-Scorer: a generalized model for predicting CRISPR on-target activity by machine learning and featurization.

ABSTRACT:

SUBMITTER: Wang J

PROVIDER: S-EPMC7883820 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundDiscover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process.MethodsThis paper presents DTIs prediction model that takes advantage of the special capacity of the structured form of proteins and drugs. Our model obtains features from protein amino-acid sequences using physical and chemical properties, and from drugs smiles (Simplified Molecular Input Line Entry System) strings using encoding techniques. Comparing the proposed model with different existing methods under K-fold cross validation, empirical results show that our model based on ensemble learning algorithms for DTI prediction provide more accurate results from both structures and features data.ResultsThe proposed model is applied on two datasets:Benchmark (feature only) datasets and DrugBank (Structure data) datasets. Experimental results obtained by Light-Boost and ExtraTree using structures and feature data results in 98 % accuracy and 0.97 f-score comparing to 94 % and 0.92 achieved by the existing methods. Moreover, our model can successfully predict more yet undiscovered interactions, and hence can be used as a practical tool to drug repositioning. A case study of applying our prediction model on the proteins that are known to be affected by Corona viruses in order to predict the possible interactions among these proteins and existing drugs is performed. Also, our model is applied on Covid-19 related drugs announced on DrugBank. The results show that some drugs like DB00691 and DB05203 are predicted with 100 % accuracy to interact with ACE2 protein. This protein is a self-membrane protein that enables Covid-19 infection. Hence, our model can be used as an effective tool in drug reposition to predict possible drug treatments for Covid-19.

Project description:ObjectiveThe safe delivery of electrical current to neural tissue depends on many factors, yet previous methods for predicting tissue damage rely on only a few stimulation parameters. Here, we report the development of a machine learning approach that could lead to a more reliable method for predicting electrical stimulation-induced tissue damage by incorporating additional stimulation parameters.ApproachA literature search was conducted to build an initial database of tissue response information after electrical stimulation, categorized as either damaging or non-damaging. Subsequently, we used ordinal encoding and random forest for feature selection, and investigated four machine learning models for classification: Logistic Regression, K-nearest Neighbor, Random Forest, and Multilayer Perceptron. Finally, we compared the results of these models against the accuracy of the Shannon equation.Main resultsWe compiled a database with 387 unique stimulation parameter combinations collected from 58 independent studies conducted over a period of 47 years, with 195 (51%) categorized as non-damaging and 190 (49%) categorized as damaging. The features selected for building our model with a Random Forest algorithm were: waveform shape, geometric surface area, pulse width, frequency, pulse amplitude, charge per phase, charge density, current density, duty cycle, daily stimulation duration, daily number of pulses delivered, and daily accumulated charge. The Shannon equation yielded an accuracy of 63.9% using a k value of 1.79. In contrast, the Random Forest algorithm was able to robustly predict whether a set of stimulation parameters was classified as damaging or non-damaging with an accuracy of 88.3%.SignificanceThis novel Random Forest model can facilitate more informed decision making in the selection of neuromodulation parameters for both research studies and clinical practice. This study represents the first approach to use machine learning in the prediction of stimulation-induced neural tissue damage, and lays the groundwork for neurostimulation driven by machine learning models.

Dataset Information

GNL-Scorer: a generalized model for predicting CRISPR on-target activity by machine learning and featurization.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets