Dataset Information

Classification of HIV-1 Protease Inhibitors by Machine Learning Methods.

ABSTRACT: HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for predicting the active inhibitors were built by machine learning methods, including k-nearest neighors (K-NN), decision tree (DT), random forest (RF), support vector machine (SVM), and deep neural network (DNN). The molecular structures were characterized by (1) fingerprint descriptors including MACCS fingerprints and PubChem fingerprints and (2) physicochemical descriptors calculated by CORINA Symphony. The prediction accuracies of all of the models are more than 70% on the test set; the best accuracy of 83.07% was obtained by model 4A, which was built by the SVM method based on MACCS fingerprint descriptors. Nine consensus models were built with three kinds of different descriptors, which combined all of the machine learning methods using the "consensus prediction". Model C3a developed with MACCS fingerprint descriptors showed the highest accuracy on both training set (91.96%) and test set (83.15%). An external validation set including 35 989 compounds from DUD database and 239 active inhibitors from the recent literature was used to verify the performance of our model. The best prediction accuracy of 98.37% was obtained by model 3C, which was built by RF based on CORINA Symphony descriptors. In addition, from the analysis of molecular descriptors, it shows that the aromatic system and atoms related to hydrogen bonding provide important contributions to the bioactivity of PIs.

SUBMITTER: Li Y

PROVIDER: S-EPMC6288788 | biostudies-literature | 2018 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Classification of HIV-1 Protease Inhibitors by Machine Learning Methods.

Li Yang Y Tian Yujia Y Qin Zijian Z Yan Aixia A

ACS omega 20181121 11

HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for predicting the active inhibitors were built by machine learning methods, including <i>k</i>-nearest neighors (K-NN), decision tree (DT), random forest (RF), support vector machine (SVM), and deep neural network (DNN). Th ...[more]

PMID: 30556015

Similar Datasets

Project description:Protein-protein interactions (PPIs) may represent one of the next major classes of therapeutic targets. So far, only a minute fraction of the estimated 650,000 PPIs that comprise the human interactome are known with a tiny number of complexes being drugged. Such intricate biological systems cannot be cost-efficiently tackled using conventional high-throughput screening methods. Rather, time has come for designing new strategies that will maximize the chance for hit identification through a rationalization of the PPI inhibitor chemical space and the design of PPI-focused compound libraries (global or target-specific). Here, we train machine-learning-based models, mainly decision trees, using a dataset of known PPI inhibitors and of regular drugs in order to determine a global physico-chemical profile for putative PPI inhibitors. This statistical analysis unravels two important molecular descriptors for PPI inhibitors characterizing specific molecular shapes and the presence of a privileged number of aromatic bonds. The best model has been transposed into a computer program, PPI-HitProfiler, that can output from any drug-like compound collection a focused chemical library enriched in putative PPI inhibitors. Our PPI inhibitor profiler is challenged on the experimental screening results of 11 different PPIs among which the p53/MDM2 interaction screened within our own CDithem platform, that in addition to the validation of our concept led to the identification of 4 novel p53/MDM2 inhibitors. Collectively, our tool shows a robust behavior on the 11 experimental datasets by correctly profiling 70% of the experimentally identified hits while removing 52% of the inactive compounds from the initial compound collections. We strongly believe that this new tool can be used as a global PPI inhibitor profiler prior to screening assays to reduce the size of the compound collections to be experimentally screened while keeping most of the true PPI inhibitors. PPI-HitProfiler is freely available on request from our CDithem platform website, www.CDithem.com.

Project description:BackgroundNear-infrared indocyanine green angiography allows experienced surgeons to reliably evaluate parathyroid gland vitality during thyroid and parathyroid operations in order to predict postoperative function. To facilitate equal performance between surgeons, we developed an automatic computational quantification method using computer vision that portrays expert interpretation of visualized parathyroid gland near-infrared indocyanine green angiographic fluorescence signals.MethodsNear-infrared indocyanine green-parathyroid gland angiography video recordings (Fluobeam® LX, Fluoptics, Grenoble-part of Getinge-Göteborg) from patients undergoing endocrine cervical surgery in a high-volume unit were used for model development. Computation (MATLAB, Mathworks, Ireland) included segmentation-identification of the parathyroid gland (by autofluorescence), image stabilization (by linear translation) and adjusted time-fluorescence intensity profile generation. Relative upslope and maximum intensity ratios then trained a simple logistic regression model based on expert interpretation and outcome (including hypoparathyroidism), with subsequent unseen testing for validation.ResultsThe model was trained on 37 patient videos (45 glands, 29 judged well perfused by parathyroid gland angiography experts), achieving feature data separation with 100% accuracy, and tested on 22 unseen videos (27 glands, 15 judged well perfused), including four in real time. Segmentation-guided parathyroid gland detection correctly identified all parathyroid glands during unseen testing along with three additional non-parathyroid gland regions (90% positive predictive value). Subsequent time-fluorescence intensity profile extraction with vitality prediction was shown feasible in all cases within 5 min, with a 96.3% model accuracy (sensitivity and specificity were 93.3 and 100% respectively) when compared with expert judgement.ConclusionAutomatic parathyroid gland perfusion quantification using simple machine learning computational methods discriminates parathyroid gland perfusion in concordance with expert surgeon interpretation, providing a means for near-infrared indocyanine green-parathyroid gland signal evaluation.

Project description:BackgroundIn the future, more medical devices will be based on machine learning (ML) methods. In general, the consideration of risks is a crucial aspect for evaluating medical devices. Accordingly, risks and their associated costs should be taken into account when assessing the performance of ML-based medical devices. This paper addresses the following three research questions towards a risk-based evaluation with a focus on ML-based classification models.MethodsFirst, we analyzed how often risk-based metrics are currently utilized in the context of ML-based classification models. This was performed using a literature research based on a sample of recent scientific publications. Second, we introduce an approach for evaluating such models where expected risks and associated costs are integrated into the corresponding performance metrics. Additionally, we analyze the impact of different risk ratios on the resulting overall performance. Third, we elaborate how such risk-based approaches relate to regulatory requirements in the field of medical devices. A set of use case scenarios were utilized to demonstrate necessities and practical implications, in this regard.ResultsFirst, it was shown that currently most scientific publications do not include risk-based approaches for measuring performance. Second, it was demonstrated that risk-based considerations have a substantial impact on the outcome. The relative increase of the resulting overall risks can go up to 196% when the ratio between different types of risks (false negatives vs. false positives) changes by a factor of 10.0. Third, we elaborated that risk-based considerations need to be included into the assessment of ML-based medical devices, according to the relevant EU regulations and standards. In particular, this applies when a substantial impact on the clinical outcome / in terms of the risk-benefit relationship occurs.ConclusionIn summary, we demonstrated the necessity of a risk-based approach for the evaluation of medical devices which include ML-based classification methods. We showed that currently many scientific papers in this area do not include risk considerations. We developed basic steps towards a risk-based assessment of ML-based classifiers and elaborated consequences that could occur, when these steps are neglected. And, we demonstrated the consistency of our approach with current regulatory requirements in the EU.

Dataset Information

Classification of HIV-1 Protease Inhibitors by Machine Learning Methods.

Publications

Classification of HIV-1 Protease Inhibitors by Machine Learning Methods.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets