Dataset Information

Explaining Support Vector Machines: A Color Based Nomogram.

ABSTRACT:

Problem setting

Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, the use of SVMs is less supported in areas where interpretability is important and where people are held responsible for the decisions made by models.

Objective

In this work, we investigate whether SVMs using linear, polynomial and RBF kernels can be explained such that interpretations for model-based decisions can be provided. We further indicate when SVMs can be explained and in which situations interpretation of SVMs is (hitherto) not possible. Here, explainability is defined as the ability to produce the final decision based on a sum of contributions which depend on one single or at most two input variables.

Results

Our experiments on simulated and real-life data show that explainability of an SVM depends on the chosen parameter values (degree of polynomial kernel, width of RBF kernel and regularization constant). When several combinations of parameter values yield the same cross-validation performance, combinations with a lower polynomial degree or a larger kernel width have a higher chance of being explainable.

Conclusions

This work summarizes SVM classifiers obtained with linear, polynomial and RBF kernels in a single plot. Linear and polynomial kernels up to the second degree are represented exactly. For other kernels an indication of the reliability of the approximation is presented. The complete methodology is available as an R package and two apps and a movie are provided to illustrate the possibilities offered by the method.

SUBMITTER: Van Belle V

PROVIDER: S-EPMC5056733 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Explaining Support Vector Machines: A Color Based Nomogram.

Van Belle Vanya V Van Calster Ben B Van Huffel Sabine S Suykens Johan A K JA Lisboa Paulo P

PloS one 20161010 10

<h4>Problem setting</h4>Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, th ...[more]

PMID: 27723811

Similar Datasets

Project description:Human longevity is a complex phenotype that has a significant genetic predisposition. Like other biological processes, ageing process is governed through the regulation of signaling pathways and transcription factors. The DNA damage theory of ageing suggests that ageing is a consequence of un-repaired DNA damage accumulation. Intensive research has been carried out to elucidate the role of DNA repair systems in the ageing process. Decision Trees and Naive Bayesian Algorithm are two data-mining based classification methods for systematically analyzing data about human DNA repair genes. In this paper we develop a linearly combined kernel with Support Vector Machine (SVM) to analyze the ageing related data. The popular supervised learning algorithm enables better discrimination between ageing-related and non-ageing-related DNA repair genes. The linear combination of linear kernel and polynomial kernel of degree 3 in conjunction with SVM allows better classification accuracy in DNA repair gene data set. Compared to Decision Trees and Naive Bayesian Algorithm, SVM with the proposed kernel can achieve 65% AUC (Area Under ROC Curve) values, in contrast to 51.1% and 52.1% respectively. More importantly, we obtain 5 significant ageingrelated genes selected through the training on the whole data set and they are PCNA, PARP, APEX1, MLH1 and XRCC6. Different from the two methods, we can identify another important gene PCNA in the pathways the two methods targeted, while they failed to. And two novel genes PARP, MLH1 are selected as well. The two genes might provide potential insights for biologists in ageing research. SVM is a powerful and robust classification algorithm that can yield higher predictive accuracies. The selection of proper kernel plays a more important role in fulfilling the classification task. The important genes identified not only can target critical pathways related to ageing but also detected genes that may reveal possible related ageing biomarkers.

Dataset Information

Explaining Support Vector Machines: A Color Based Nomogram.

Problem setting

Objective

Results

Conclusions

Publications

Explaining Support Vector Machines: A Color Based Nomogram.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets