Dataset Information

MEMES: Machine learning framework for Enhanced MolEcular Screening†‡ † Dedicated to Professor N. Sathyamurthy on the occasion of his 70th birthday.‡ Electronic supplementary information (ESI) available: Tables of performance of ExactMEMES and DeepMEMES, performance comparison of MEMES with deep docking, figures of structure of top hits, distribution plots of binding affinities, distributions of molecular clusters, distributions of binding affinities of missed hits, fractions matched against the sampled percentage, protein–ligand complexes and protein–ligand interactions, and supplementary discussions and methods. See DOI: 10.1039/d1sc02783b

ABSTRACT: In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as “hits”. In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments. A novel machine learning framework based on Bayesian optimization for efficient sampling of chemical space. The framework is able to identify 90% of top-1000 hits by only sampling 6% of the complete dataset containing ∼100 million compounds.

SUBMITTER: Mehta S

PROVIDER: S-EPMC8442698 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:A computational docking strategy using multiple conformations of the target protein is discussed and evaluated. A series of low molecular weight, competitive, nonpeptide protein tyrosine phosphatase inhibitors are considered for which the x-ray crystallographic structures in complex with protein tyrosine phosphatase 1B (PTP1B) are known. To obtain a quantitative measure of the impact of conformational changes induced by the inhibitors, these were docked to the active site region of various structures of PTP1B using the docking program FlexX. Firstly, the inhibitors were docked to a PTP1B crystal structure cocrystallized with a hexapeptide. The estimated binding energies for various docking modes as well as the RMS differences between the docked compounds and the crystallographic structure were calculated. In this scenario the estimated binding energies were not predictive inasmuch as docking modes with low estimated binding energies corresponded to relatively large RMS differences when aligned with the corresponding crystal structure. Secondly, the inhibitors were docked to their parent protein structures in which they were cocrystallized. In this case, there was a good correlation between low predicted binding energy and a correct docking mode. Thirdly, to improve the predictability of the docking procedure in the general case, where only a single target protein structure is known, we evaluate an approach which takes possible protein side-chain conformational changes into account. Here, side chains exposed to the active site were considered in their allowed rotamer conformations and protein models containing all possible combinations of side-chain rotamers were generated. To evaluate which of these modeled active sites is the most likely binding site conformation for a certain inhibitor, the inhibitors were docked against all active site models. The receptor rotamer model corresponding to the lowest estimated binding energy is taken as the top candidate. Using this protocol, correct inhibitor binding modes could successfully be discriminated from proposed incorrect binding modes. Moreover, the ranking of the estimated ligand binding energies was in good agreement with experimentally observed binding affinities.

Project description:Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, the docking scores are not sufficiently precise to represent the protein-ligand binding affinity. Here, we developed an efficient computational method for calculating protein-ligand binding affinity, which is based on molecular mechanics generalized Born/surface area (MM-GBSA) calculations and Jarzynski identity. Jarzynski identity is an exact relation between free energy differences and the work done through non-equilibrium process, and MM-GBSA is a semimacroscopic approach to calculate the potential energy. To calculate the work distribution when a ligand is pulled out of its binding site, multiple protein-ligand conformations are randomly generated as an alternative to performing an explicit single-molecule pulling simulation. We assessed the new method, multiple random conformation/MM-GBSA (MRC-MMGBSA), by evaluating ligand-binding affinities (scores) for four target proteins, and comparing these scores with experimental data. The calculated scores were qualitatively in good agreement with the experimental binding affinities, and the optimal docking structure could be determined by ranking the scores of the multiple docking poses obtained by the molecular docking process. Furthermore, the scores showed a strong linear response to experimental binding free energies, so that the free energy difference of the ligand binding (??G) could be calculated by linear scaling of the scores. The error of calculated ??G was within ? ± 1.5 kcal.mol(-1) of the experimental values. Particularly, in the case of flexible target proteins, the MRC-MMGBSA scores were more effective in ranking ligands than those generated by the MM-GBSA method using a single protein-ligand conformation. The results suggest that, owing to its lower computational costs and greater accuracy, the MRC-MMGBSA offers efficient means to rank the ligands, in the post-docking process, according to their binding affinities, and to compare these directly with the experimental values.

Dataset Information

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets