Dataset Information

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

ABSTRACT: In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeling nonlinear structure-activity relationships and predicting compound potency values. For the closely related SVM and SVR methods, fingerprints (i.e., bit string or feature set representations of chemical structure and properties) are generally preferred descriptors. Herein, we have compared SVM and SVR calculations for the same compound data sets to evaluate which features are responsible for predictions. On the basis of systematic feature weight analysis, rather surprising results were obtained. Fingerprint features were frequently identified that contributed differently to the corresponding SVM and SVR models. The overlap between feature sets determining the predictive performance of SVM and SVR was only very small. Furthermore, features were identified that had opposite effects on SVM and SVR predictions. Feature weight analysis in combination with feature mapping made it also possible to interpret individual predictions, thus balancing the black box character of SVM/SVR modeling.

SUBMITTER: Rodriguez-Perez R

PROVIDER: S-EPMC6045367 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

Rodríguez-Pérez Raquel R Vogt Martin M Bajorath Jürgen J

ACS omega 20171004 10

In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeling nonlinear structure-activity relationships and predicting compound potency values. For the closely related SVM and SVR methods, fingerprints (i.e., bit string or feature set representations of chemical structure and ...[more]

PMID: 30023518

Dataset Information

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

Publications

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis.
| S-EPMC10097675 | biostudies-literature

Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.
| S-EPMC4350943 | biostudies-literature

Multiclass cancer classification by using fuzzy support vector machine and binary decision tree with gene selection.
| S-EPMC1184049 | biostudies-literature

Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine.
| S-EPMC5013239 | biostudies-literature

Support vector machine classification of streptavidin-binding aptamers.
| S-EPMC4057401 | biostudies-literature

Targeted Local Support Vector Machine for Age-Dependent Classification.
| S-EPMC4183366 | biostudies-literature

Principal weighted support vector machines for sufficient dimension reduction in binary classification.
| S-EPMC5793677 | biostudies-literature

Support vector machine with quantile hyper-spheres for pattern classification.
| S-EPMC6377146 | biostudies-literature

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.
| S-EPMC4873242 | biostudies-other

GISMO--gene identification using a support vector machine for ORF classification.
| S-EPMC1802617 | biostudies-literature