Dataset Information

Intervention in prediction measure: a new approach to assessing variable importance for random forests.

ABSTRACT:

Background

Random forests are a popular method in many fields since they can be successfully applied to complex data, with a small sample size, complex interactions and correlations, mixed type predictors, etc. Furthermore, they provide variable importance measures that aid qualitative interpretation and also the selection of relevant predictors. However, most of these measures rely on the choice of a performance measure. But measures of prediction performance are not unique or there is not even a clear definition, as in the case of multivariate response random forests.

Methods

A new alternative importance measure, called Intervention in Prediction Measure, is investigated. It depends on the structure of the trees, without depending on performance measures. It is compared with other well-known variable importance measures in different contexts, such as a classification problem with variables of different types, another classification problem with correlated predictor variables, and problems with multivariate responses and predictors of different types.

Results

Several simulation studies are carried out, showing the new measure to be very competitive. In addition, it is applied in two well-known bioinformatics applications previously used in other papers. Improvements in performance are also provided for these applications by the use of this new measure.

Conclusions

This new measure is expressed as a percentage, which makes it attractive in terms of interpretability. It can be used with new observations. It can be defined globally, for each class (in a classification problem) and case-wise. It can easily be computed for any kind of response, including multivariate responses. Furthermore, it can be used with any algorithm employed to grow each individual tree. It can be used in place of (or in addition to) other variable importance measures.

SUBMITTER: Epifanio I

PROVIDER: S-EPMC5414143 | biostudies-literature | 2017 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Intervention in prediction measure: a new approach to assessing variable importance for random forests.

Epifanio Irene I

BMC bioinformatics 20170502 1

<h4>Background</h4>Random forests are a popular method in many fields since they can be successfully applied to complex data, with a small sample size, complex interactions and correlations, mixed type predictors, etc. Furthermore, they provide variable importance measures that aid qualitative interpretation and also the selection of relevant predictors. However, most of these measures rely on the choice of a performance measure. But measures of prediction performance are not unique or there is ...[more]

PMID: 28464827

Dataset Information

Intervention in prediction measure: a new approach to assessing variable importance for random forests.

Background

Methods

Results

Conclusions

Publications

Intervention in prediction measure: a new approach to assessing variable importance for random forests.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

An AUC-based permutation variable importance measure for random forests.
| S-EPMC3626572 | biostudies-literature

Variable importance-weighted Random Forests.
| S-EPMC6051549 | biostudies-literature

Surrogate minimal depth as an importance measure for variables in random forests.
| S-EPMC6761946 | biostudies-literature

Variable importance for sustaining macrophyte presence via random forests: data imputation and model settings.
| S-EPMC6162213 | biostudies-literature

Maximal conditional chi-square importance in random forests.
| S-EPMC2832825 | biostudies-literature

r2VIM: A new variable selection method for random forests in genome-wide association studies.
| S-EPMC4736152 | biostudies-literature

Thresholding Gini variable importance with a single-trained random forest: An empirical Bayes approach.
| S-EPMC10497997 | biostudies-literature

Prediction of glycosylation sites using random forests.
| S-EPMC2651179 | biostudies-literature

A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction.
| S-EPMC4684346 | biostudies-literature

Exploring the variable importance in random forests under correlations: a general concept applied to donor organ quality in post-transplant survival.
| S-EPMC10507897 | biostudies-literature