Dataset Information

Virtual screening of bioassay data.

ABSTRACT:

Background

There are three main problems associated with the virtual screening of bioassay data. The first is access to freely-available curated data, the second is the number of false positives that occur in the physical primary screening process, and finally the data is highly-imbalanced with a low ratio of Active compounds to Inactive compounds. This paper first discusses these three problems and then a selection of Weka cost-sensitive classifiers (Naive Bayes, SVM, C4.5 and Random Forest) are applied to a variety of bioassay datasets.

Results

Pharmaceutical bioassay data is not readily available to the academic community. The data held at PubChem is not curated and there is a lack of detailed cross-referencing between Primary and Confirmatory screening assays. With regard to the number of false positives that occur in the primary screening process, the analysis carried out has been shallow due to the lack of cross-referencing mentioned above. In six cases found, the average percentage of false positives from the High-Throughput Primary screen is quite high at 64%. For the cost-sensitive classification, Weka's implementations of the Support Vector Machine and C4.5 decision tree learner have performed relatively well. It was also found, that the setting of the Weka cost matrix is dependent on the base classifier used and not solely on the ratio of class imbalance.

Conclusions

Understandably, pharmaceutical data is hard to obtain. However, it would be beneficial to both the pharmaceutical industry and to academics for curated primary screening and corresponding confirmatory data to be provided. Two benefits could be gained by employing virtual screening techniques to bioassay data. First, by reducing the search space of compounds to be screened and secondly, by analysing the false positives that occur in the primary screening process, the technology may be improved. The number of false positives arising from primary screening leads to the issue of whether this type of data should be used for virtual screening. Care when using Weka's cost-sensitive classifiers is needed - across the board misclassification costs based on class ratios should not be used when comparing differing classifiers for the same dataset.

SUBMITTER: Schierz AC

PROVIDER: S-EPMC2820499 | biostudies-literature | 2009 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Virtual screening of bioassay data.

Schierz Amanda C AC

Journal of cheminformatics 20091222

<h4>Background</h4>There are three main problems associated with the virtual screening of bioassay data. The first is access to freely-available curated data, the second is the number of false positives that occur in the physical primary screening process, and finally the data is highly-imbalanced with a low ratio of Active compounds to Inactive compounds. This paper first discusses these three problems and then a selection of Weka cost-sensitive classifiers (Naive Bayes, SVM, C4.5 and Random Fo ...[more]

PMID: 20150999

Similar Datasets

Project description:The transient receptor potential vanilloid type 1 (TRPV1) is a heat-activated cation channel protein, which contributes to inflammation, acute and persistent pain. Antagonists of human TRPV1 (hTRPV1) represent a novel therapeutic approach for the treatment of pain. Developing various antagonists of hTRPV1, however, has been hindered by the unavailability of a 3D structure of hTRPV1. Recently, the 3D structures of rat TRPV1 (rTRPV1) in the presence and absence of ligand have been reported as determined by cryo-EM. rTRPV1 shares 85.7% sequence identity with hTRPV1. In the present work, we constructed and reported the 3D homology tetramer model of hTRPV1 based on the cryo-EM structures of rTRPV1. Molecular dynamics (MD) simulations, energy minimizations, and prescreen were applied to select and validate the best model of hTRPV1. The predicted binding pocket of hTRPV1 consists of two adjacent monomers subunits, which were congruent with the experimental rTRPV1 data and the cyro-EM structures of rTRPV1. The detailed interactions between hTRPV1 and its antagonists or agonists were characterized by molecular docking, which helped us to identify the important residues. Conformational changes of hTRPV1 upon antagonist/agonist binding were also explored by MD simulation. The different movements of compounds led to the different conformational changes of monomers in hTRPV1, indicating that TRPV1 works in a concerted way, resembling some other channel proteins such as aquaporins. We observed that the selective filter was open when hTRPV1 bound with an agonist during MD simulation. For the lower gate of hTRPV1, we observed large similarities between hTRPV1 bound with antagonist and with agonist. A five-point pharmacophore model based on several antagonists was established, and the structural model was used to screen in silico for new antagonists for hTRPV1. By using the 3D TRPV1 structural model above, the pilot in silico screening has begun to yield promising hits with activity as hTRPV1 antagonists, several of which showed substantial potency.

Dataset Information

Virtual screening of bioassay data.

Background

Results

Conclusions

Publications

Virtual screening of bioassay data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets