Dataset Information

Evaluation of QSAR Equations for Virtual Screening.

ABSTRACT: Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, R2 and QCV2. Similar metrics, calculated on an external set of data (e.g., QF1/F2/F32), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -" ignorant". In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by "classical" metrics, e.g., R2 and QF1/F2/F32 and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable R2 and/or QF1/F2/F32 values were unable to pick a single active compound from within the pool whereas in other cases, models with poor R2 and/or QF1/F2/F32 values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.

SUBMITTER: Spiegel J

PROVIDER: S-EPMC7672587 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evaluation of QSAR Equations for Virtual Screening.

Spiegel Jacob J Senderowitz Hanoch H

International journal of molecular sciences 20201022 21

Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization ...[more]

PMID: 33105703

Similar Datasets

Project description:BackgroundDespite continued efforts using chemical similarity methods in virtual screening, currently developed approaches suffer from time-consuming multistep procedures and low success rates. We recently developed a machine learning-based chemical binding similarity model considering common structural features from molecules binding to the same, or evolutionarily related targets. The chemical binding similarity measures the resemblance of chemical compounds in terms of binding site similarity to better describe functional similarities that arise from target binding. In this study, we have shown how the chemical binding similarity could be used in virtual screening together with the conventional structure-based methods.ResultsThe chemical binding similarity, receptor-based pharmacophore, chemical structure similarity, and molecular docking methods were evaluated to identify an effective virtual screening procedure for desired target proteins. When we tested the chemical binding similarity method with test sets of 51 kinases, it outperformed the traditional structural similarity-based methods as well as structure-based methods, such as molecular docking and receptor-based pharmacophore modeling, in terms of finding active compounds. We further validated the results by performing virtual screening (using the chemical binding similarity and receptor-based pharmacophore methods) against a completely blind dataset for mitogen-activated protein kinase kinase 1 (MEK1), ephrin type-B receptor 4 (EPHB4) and wee1-like protein kinase (WEE1). The in vitro kinase binding assay confirmed that 6 out of 13 (46.2%) for MEK1 and 2 out of 12 (16.7%) for EPHB4 were newly identified only by the chemical binding similarity model.ConclusionsWe report that the virtual screening results could further be improved by combining the chemical binding similarity model with 3D-QSAR pharmacophore and molecular docking models. Not only the new inhibitors are identified in this study, but also many of the identified molecules have low structural similarity scores against already reported inhibitors and that show the revelation of novel scaffolds.

Dataset Information

Evaluation of QSAR Equations for Virtual Screening.

Publications

Evaluation of QSAR Equations for Virtual Screening.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets