Unknown

Dataset Information

0

QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.


ABSTRACT: Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.

SUBMITTER: Tarasova OA 

PROVIDER: S-EPMC7738000 | biostudies-literature | 2015 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.

Tarasova Olga A OA   Urusova Aleksandra F AF   Filimonov Dmitry A DA   Nicklaus Marc C MC   Zakharov Alexey V AV   Poroikov Vladimir V VV  

Journal of chemical information and modeling 20150629 7


Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from  ...[more]

Similar Datasets

| S-EPMC3257145 | biostudies-other
| S-EPMC5655543 | biostudies-literature
2018-12-12 | E-MTAB-7087 | biostudies-arrayexpress
| S-EPMC6158299 | biostudies-literature
| S-EPMC3495901 | biostudies-literature
| S-EPMC7462486 | biostudies-literature
| S-EPMC2573056 | biostudies-literature
| S-EPMC9279714 | biostudies-literature
| S-EPMC3002298 | biostudies-literature
| S-EPMC1635531 | biostudies-literature