Browse
Submit Data
Databases
API
Help

Dataset Information

28 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Benchmarks for interpretation of QSAR models.

ABSTRACT: Interpretation of QSAR models is useful to understand the complex nature of biological or physicochemical processes, guide structural optimization or perform knowledge-based validation of QSAR models. Highly predictive models are usually complex and their interpretation is non-trivial. This is particularly true for modern neural networks. Various approaches to interpretation of these models exist. However, it is difficult to evaluate and compare performance and applicability of these ever-emerging methods. Herein, we developed several benchmark data sets with end-points determined by pre-defined patterns. These data sets are purposed for evaluation of the ability of interpretation approaches to retrieve these patterns. They represent tasks with different complexity levels: from simple atom-based additive properties to pharmacophore hypothesis. We proposed several quantitative metrics of interpretation performance. Applicability of benchmarks and metrics was demonstrated on a set of conventional models and end-to-end graph convolutional neural networks, interpreted by the previously suggested universal ML-agnostic approach for structural interpretation. We anticipate these benchmarks to be useful in evaluation of new interpretation approaches and investigation of decision making of complex "black box" models.

SUBMITTER: Matveieva M

PROVIDER: S-EPMC8157407 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Integration of QSAR and SAR methods for the mechanistic interpretation of predictive models for carcinogenicity.

Project description:The knowledge-based Toxtree expert system (SAR approach) was integrated with the statistically based counter propagation artificial neural network (CP ANN) model (QSAR approach) to contribute to a better mechanistic understanding of a carcinogenicity model for non-congeneric chemicals using Dragon descriptors and carcinogenic potency for rats as a response. The transparency of the CP ANN algorithm was demonstrated using intrinsic mapping technique specifically Kohonen maps. Chemical structures were represented by Dragon descriptors that express the structural and electronic features of molecules such as their shape and electronic surrounding related to reactivity of molecules. It was illustrated how the descriptors are correlated with particular structural alerts (SAs) for carcinogenicity with recognized mechanistic link to carcinogenic activity. Moreover, the Kohonen mapping technique enables one to examine the separation of carcinogens and non-carcinogens (for rats) within a family of chemicals with a particular SA for carcinogenicity. The mechanistic interpretation of models is important for the evaluation of safety of chemicals.

| S-EPMC3962111 | biostudies-literature

Benchmarks of Biomembrane Force Probe Spring Constant Models.

Project description: Not available

| S-EPMC5771216 | biostudies-literature

Performance benchmarks for open source porous electrode theory models.

Project description:The electrochemical response characteristics of existing and emerging porous electrode theory (PET) models was benchmarked to establish a common basis to assess their physical reaches, limitations, and accuracy. Three open source PET models: dualfoil, MPET, and LIONSIMBA were compared to simulate the discharge of a LiMn2O4-graphite cell against experimental data. For C-rates below 2C, the simulated discharge voltage curves matched the experimental data within 4% deviation for dualfoil, MPET, and LIONSIMBA, while for C-rates above 3C, dualfoil and MPET show smaller deviations, within 5%, against experiments. The electrochemical profiles of all three codes exhibit significant qualitative differences, despite showing the same macroscopic voltage response, leading the user to different conclusions regarding the battery performance and possible degradation mechanisms of the analyzed system.

| S-EPMC11004710 | biostudies-literature

Building and assessing atomic models of proteins from structural templates: learning and benchmarks.

Project description:One approach to predict a protein fold from a sequence (a target) is based on structures of related proteins that are used as templates. We present an algorithm that examines a set of candidates for templates, builds from each of the templates an atomically detailed model, and ranks the models. The algorithm performs a hierarchical selection of the best model using a diverse set of signals. After a quick and suboptimal screening of template candidates from the protein data bank, the current method fine-tunes the selection to a few models. More detailed signals test the compatibility of the sequence and the proposed structures, and are merged to give a global fitness measure using linear programming. This algorithm is a component of the prediction server LOOPP (http://www.loopp.org). Large-scale training and tests sets were designed and are presented. Recent results of the LOOPP server in CASP8 are discussed.

| S-EPMC2719020 | biostudies-literature

A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models.

Project description:A novel automated lazy learning quantitative structure-activity relationship (ALL-QSAR) modeling approach has been developed on the basis of the lazy learning theory. The activity of a test compound is predicted from a locally weighted linear regression model using chemical descriptors and the biological activity of the training set compounds most chemically similar to this test compound. The weights with which training set compounds are included in the regression depend on the similarity of those compounds to a test compound. We have applied the ALL-QSAR method to several experimental chemical data sets including 48 anticonvulsant agents with known ED50 values, 48 dopamine D1-receptor antagonists with known competitive binding affinities (Ki), and a Tetrahymena pyriformis data set containing 250 phenolic compounds with toxicity IGC50 values. When applied to database screening, models developed for anticonvulsant agents identified several known anticonvulsant compounds that were not only absent in the training set but highly chemically dissimilar to the training set compounds. This initial success indicates that ALL-QSAR can be further exploited as a general tool for accurate bioactivity prediction and database screening in drug design and discovery. Because of its local nature, the ALL-QSAR approach appears to be especially well-suited for the development of highly predictive models for the sparse or unevenly distributed data sets.

| S-EPMC2536695 | biostudies-literature

Tuning HERG out: antitarget QSAR models for drug development.

Project description:Several non-cardiovascular drugs have been withdrawn from the market due to their inhibition of hERG K+ channels that can potentially lead to severe heart arrhythmia and death. As hERG safety testing is a mandatory FDArequired procedure, there is a considerable interest for developing predictive computational tools to identify and filter out potential hERG blockers early in the drug discovery process. In this study, we aimed to generate predictive and well-characterized quantitative structure-activity relationship (QSAR) models for hERG blockage using the largest publicly available dataset of 11,958 compounds from the ChEMBL database. The models have been developed and validated according to OECD guidelines using four types of descriptors and four different machine-learning techniques. The classification accuracies discriminating blockers from non-blockers were as high as 0.83-0.93 on external set. Model interpretation revealed several SAR rules, which can guide structural optimization of some hERG blockers into non-blockers. We have also applied the generated models for screening the World Drug Index (WDI) database and identify putative hERG blockers and non-blockers among currently marketed drugs. The developed models can reliably identify blockers and non-blockers, which could be useful for the scientific community. A freely accessible web server has been developed allowing users to identify putative hERG blockers and non-blockers in chemical libraries of their interest (http://labmol.farmacia.ufg.br/predherg).

| S-EPMC4593700 | biostudies-literature

Predictive QSAR Models for the Toxicity of Disinfection Byproducts.

Project description:Several hundred disinfection byproducts (DBPs) in drinking water have been identified, and are known to have potentially adverse health effects. There are toxicological data gaps for most DBPs, and the predictive method may provide an effective way to address this. The development of an in-silico model of toxicology endpoints of DBPs is rarely studied. The main aim of the present study is to develop predictive quantitative structure-activity relationship (QSAR) models for the reactive toxicities of 50 DBPs in the five bioassays of X-Microtox, GSH+, GSH-, DNA+ and DNA-. All-subset regression was used to select the optimal descriptors, and multiple linear-regression models were built. The developed QSAR models for five endpoints satisfied the internal and external validation criteria: coefficient of determination (R²) > 0.7, explained variance in leave-one-out prediction (Q²LOO) and in leave-many-out prediction (Q²LMO) > 0.6, variance explained in external prediction (Q²F1, Q²F2, and Q²F3) > 0.7, and concordance correlation coefficient (CCC) > 0.85. The application domains and the meaning of the selective descriptors for the QSAR models were discussed. The obtained QSAR models can be used in predicting the toxicities of the 50 DBPs.

| S-EPMC6151816 | biostudies-literature

QSAR models for predicting cathepsin B inhibition by small molecules--continuous and binary QSAR models to classify cathepsin B inhibition activities of small molecules.

Project description:Cathepsin B is a potential target for the development of drugs to treat several important human diseases. A number of inhibitors targeting this protein have been developed in the past several years. Recently, a group of small molecules were identified to have inhibitory activity against cathepsin B through high throughput screening (HTS) tests. In this study, traditional continuous and binary QSAR models were built to classify the biological activities of previously identified compounds and to distinguish active compounds from inactive compounds for drug development based on the calculated molecular and physicochemical properties. Strong correlations were obtained for the continuous QSAR models with regression correlation coefficients (r(2)) and cross-validated correlation coefficients (q(2)) of 0.77 and 0.61 for all compounds, and 0.82 and 0.68 for the compound set excluding 3 outliers, respectively. The models were further validated through the leave-one-out (LOO) method and the training-test set method. The binary models demonstrated a strong level of predictability in distinguishing the active compounds from inactive compounds with accuracies of 0.89 and 0.94 for active and inactive compounds, respectively, in non-cross-validated models. Similar results were obtained for the cross-validated models. Collectively, these results demonstrate the models' ability to discriminate between active and inactive compounds, suggesting that the models may be used to pre-screen compounds to facilitate compound optimization and to design novel inhibitors for drug development.

| S-EPMC2873115 | biostudies-literature

Log Odds and the Interpretation of Logit Models.

Project description:OBJECTIVE:We discuss how to interpret coefficients from logit models, focusing on the importance of the standard deviation (σ) of the error term to that interpretation. STUDY DESIGN:We show how odds ratios are computed, how they depend on the standard deviation (σ) of the error term, and their sensitivity to different model specifications. We also discuss alternatives to odds ratios. PRINCIPAL FINDINGS:There is no single odds ratio; instead, any estimated odds ratio is conditional on the data and the model specification. Odds ratios should not be compared across different studies using different samples from different populations. Nor should they be compared across models with different sets of explanatory variables. CONCLUSIONS:To communicate information regarding the effect of explanatory variables on binary {0,1} dependent variables, average marginal effects are generally preferable to odds ratios, unless the data are from a case-control study.

| S-EPMC5867187 | biostudies-other

Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction.

Project description:Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale with 417 assays. Moreover, we analyzed in detail the improvements shown by the imputation models. We found that test compounds that were dissimilar to training compounds, as well as test compounds with a large number of experimental values for other assays, showed the largest improvements. We also investigated the impact of sparsity on the improvements seen as well as the relatedness of the assays being considered. Our results show that even a small amount of additional information can provide imputation methods with a strong boost in predictive performance over traditional single task and multi-task predictive models.

| S-EPMC9172131 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data