Dataset Information

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.

ABSTRACT:

Background

Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis.

Results

The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios.

Conclusions

The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.

SUBMITTER: Sanz H

PROVIDER: S-EPMC6245920 | biostudies-literature | 2018 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.

Sanz Hector H Valim Clarissa C Vegas Esteban E Oller Josep M JM Reverter Ferran F

BMC bioinformatics 20181119 1

<h4>Background</h4>Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of ...[more]

PMID: 30453885

Dataset Information

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.

Background

Results

Conclusions

Publications

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Early selection of task-relevant features through population gating.
| S-EPMC10603060 | biostudies-literature

Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.
| S-EPMC3137602 | biostudies-literature

Multivariate classification of smokers and nonsmokers using SVM-RFE on structural MRI images.
| S-EPMC5531448 | biostudies-literature

Estimating densities with non-linear support by using Fisher-Gaussian kernels.
| S-EPMC9286319 | biostudies-literature

Screening key genes for intracranial aneurysm rupture using LASSO regression and the SVM-RFE algorithm.
| S-EPMC11743535 | biostudies-literature

Screening of Biomarkers in Liver Tissue after Bariatric Surgery Based on WGCNA and SVM-RFE Algorithms.
| S-EPMC9902125 | biostudies-literature

A novel gene expression test method of minimizing breast cancer risk in reduced cost and time by improving SVM-RFE gene selection method combined with LASSO.
| S-EPMC7856389 | biostudies-literature

Predicting linear B-cell epitopes using string kernels.
| S-EPMC2683948 | biostudies-literature

Gravimetry through non-linear optomechanics.
| S-EPMC6133990 | biostudies-literature

Full Bayesian identification of linear dynamic systems using stable kernels.
| S-EPMC10161125 | biostudies-literature