Dataset Information

Accelerating high-throughput virtual screening through molecular pool-based active learning.

ABSTRACT: Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of 10⁸ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques, previously employed in other scientific discovery problems, can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we explore the application of these techniques to computational docking datasets and assess the impact of surrogate model architecture, acquisition function, and acquisition batch size on optimization performance. We observe significant reductions in computational costs; for example, using a directed-message passing neural network we can identify 94.8% or 89.3% of the top-50 000 ligands in a 100M member library after testing only 2.4% of candidate ligands using an upper confidence bound or greedy acquisition strategy, respectively. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.

SUBMITTER: Graff DE

PROVIDER: S-EPMC8188596 | biostudies-literature | 2021 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Accelerating high-throughput virtual screening through molecular pool-based active learning.

Graff David E DE Shakhnovich Eugene I EI Coley Connor W CW

Chemical science 20210429 22

Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of 10<sup>8</sup> molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques, previously employed in other scientific discovery problems, can aid in their exploration: a surrogate structure- ...[more]

PMID: 34168840

Similar Datasets

Project description:In the race to combat ever-evolving diseases, the drug discovery process often faces the hurdles of high-cost and time-consuming procedures. To tackle these challenges and enhance the efficiency of identifying new therapeutic agents, we introduce VirtuDockDL, which is a streamlined Python-based web platform utilizing deep learning for drug discovery. This pipeline employs a Graph Neural Network to analyze and predict the effectiveness of various compounds as potential drug candidates. During the validation phase, VirtuDockDL was instrumental in identifying non-covalent inhibitors against the VP35 protein of the Marburg virus, a critical target given the virus's high fatality rate and limited treatment options. Further, in benchmarking, VirtuDockDL achieved 99% accuracy, an F1 score of 0.992, and an AUC of 0.99 on the HER2 dataset, surpassing DeepChem (89% accuracy) and AutoDock Vina (82% accuracy). Compared to RosettaVS, MzDOCK, and PyRMD, VirtuDockDL outperformed them by combining both ligand- and structure-based screening with deep learning. While RosettaVS excels in accurate docking but lacks high-throughput screening, and PyRMD focuses on ligand-based methods without AI integration, VirtuDockDL offers superior predictive accuracy and full automation for large-scale datasets, making it ideal for comprehensive drug discovery workflows. These results underscore the tool's capability to identify high-affinity inhibitors accurately across various targets, including the HER2 protein for cancer therapy, TEM-1 beta-lactamase for bacterial infections, and the CYP51 enzyme for fungal infections like Candidiasis. To sum up, VirtuDockDL combines user-friendly interface design with powerful computational capabilities to facilitate rapid, cost-effective drug discovery and development. The integration of AI in drug discovery could potentially transform the landscape of pharmaceutical research, providing faster responses to global health challenges. The VirtuDockDL is available at https://github.com/FatimaNoor74/VirtuDockDL .

Dataset Information

Accelerating high-throughput virtual screening through molecular pool-based active learning.

Publications

Accelerating high-throughput virtual screening through molecular pool-based active learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets