Unknown

Dataset Information

0

Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.


ABSTRACT: Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets.

SUBMITTER: Balfer J 

PROVIDER: S-EPMC4350943 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.

Balfer Jenny J   Bajorath Jürgen J  

PloS one 20150305 3


Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction  ...[more]

Similar Datasets

| S-EPMC6045367 | biostudies-literature
| S-EPMC7279352 | biostudies-literature
| S-EPMC2909371 | biostudies-literature
| S-EPMC10963322 | biostudies-literature
| S-EPMC3924408 | biostudies-literature
| S-EPMC8062522 | biostudies-literature
| S-EPMC2910724 | biostudies-literature
| S-EPMC5564774 | biostudies-literature
| S-EPMC4669521 | biostudies-literature
| S-EPMC4608539 | biostudies-other