Dataset Information

Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?

ABSTRACT: Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein-ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein-ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.

SUBMITTER: Ballester PJ

PROVIDER: S-EPMC3966527 | biostudies-literature | 2014 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?

Ballester Pedro J PJ Schreyer Adrian A Blundell Tom L TL

Journal of chemical information and modeling 20140220 3

Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe ...[more]

PMID: 24528282

Similar Datasets

Project description:Polarization and charge transfer strongly characterize the ligand-receptor interaction when metal atoms are present, as for the Au(I)-biscarbene/DNA G-quadruplex complexes. In a previous work (J Comput Aided Mol Des2022, 36, 851-866) we used the ab initio FMO2 method at the RI-MP2/6-31G* level of theory with the PCM [1] solvation approach to calculate the binding energy (ΔEFMO) of two Au(I)-biscarbene derivatives, [Au(9-methylcaffein-8-ylidene)2]+ and [Au(1,3-dimethylbenzimidazole-2-ylidene)2]+, able to interact with DNA G-quadruplex motif. We found that ΔEFMO and ligand-receptor pair interaction energies (EINT) show very large negative values making the direct comparison with experimental data difficult and related this issue to the overestimation of the embedded charge transfer energy between fragments containing metal atoms. In this work, to improve the accuracy of the FMO method for predicting the binding affinity of metal-based ligands interacting with DNA G-quadruplex (Gq), we assess the effect of the following computational features: (i) the electron correlation, considering the Hartree-Fock (HF) and a post-HF method, namely RI-MP2; (ii) the two (FMO2) and three-body (FMO3) approaches; (iii) the basis set size (polarization functions and double-ζ vs. triple-ζ) and (iv) the embedding electrostatic potential (ESP). Moreover, the partial screening method was systematically adopted to simulate the solvent screening effect for each calculation. We found that the use of the ESP computed using the screened point charges for all atoms (ESP-SPTC) has a critical impact on the accuracy of both ΔEFMO and EINT, eliminating the overestimation of charge transfer energy and leading to energy values with magnitude comparable with typical experimental binding energies. With this computational approach, EINT values describe the binding efficiency of metal-based binders to DNA Gq more accurately than ΔEFMO. Therefore, to study the binding process of metal containing systems with the FMO method, the adoption of partial screening solvent method combined with ESP-SPCT should be considered. This computational protocol is suggested for FMO calculations on biological systems containing metals, especially when the adoption of the default ESP treatment leads to questionable results.

Project description:Protein-ligand interactions are increasingly profiled at high-throughput, playing a vital role in lead compound discovery and drug optimization. Accurate prediction of binding pose and binding affinity constitutes a pivotal challenge in advancing our computational understanding of protein-ligand interactions. However, inherent limitations still exist, including high computational cost for conformational search sampling in traditional molecular docking tools, and the unsatisfactory molecular representation learning and intermolecular interaction modeling in deep learning-based methods. Here we propose a geometry-aware attention-based deep learning model, GAABind, which effectively predicts the pocket-ligand binding pose and binding affinity within a multi-task learning framework. Specifically, GAABind comprehensively captures the geometric and topological properties of both binding pockets and ligands, and employs expressive molecular representation learning to model intramolecular interactions. Moreover, GAABind proficiently learns the intermolecular many-body interactions and simulates the dynamic conformational adaptations of the ligand during its interaction with the protein through meticulously designed networks. We trained GAABind on the PDBbindv2020 and evaluated it on the CASF2016 dataset; the results indicate that GAABind achieves state-of-the-art performance in binding pose prediction and shows comparable binding affinity prediction performance. Notably, GAABind achieves a success rate of 82.8% in binding pose prediction, and the Pearson correlation between predicted and experimental binding affinities reaches up to 0.803. Additionally, we assessed GAABind's performance on the severe acute respiratory syndrome coronavirus 2 main protease cross-docking dataset. In this evaluation, GAABind demonstrates a notable success rate of 76.5% in binding pose prediction and achieves the highest Pearson correlation coefficient in binding affinity prediction compared with all baseline methods.

Dataset Information

Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?

Publications

Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets