Unknown

Dataset Information

0

Nonparametric variable importance assessment using machine learning techniques.


ABSTRACT: In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. This measure is a property of the true data-generating mechanism. Specifically, we discuss a generalization of the analysis of variance variable importance measure and discuss how it facilitates the use of machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. The importance of each feature or group of features in the data can then be described individually, using this measure. We describe how to construct an efficient estimator of this measure as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of risk factors for cardiovascular disease in South Africa.

SUBMITTER: Williamson BD 

PROVIDER: S-EPMC7946807 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Nonparametric variable importance assessment using machine learning techniques.

Williamson Brian D BD   Gilbert Peter B PB   Carone Marco M   Simon Noah N  

Biometrics 20201208 1


In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable  ...[more]

Similar Datasets

| S-EPMC7820584 | biostudies-literature
| S-EPMC6041872 | biostudies-other
| S-EPMC6794897 | biostudies-other
| S-EPMC6841656 | biostudies-literature
| S-EPMC11244764 | biostudies-literature
| S-EPMC6719509 | biostudies-literature
| S-EPMC6805717 | biostudies-literature
| S-EPMC6450320 | biostudies-literature
| S-EPMC6153184 | biostudies-literature
2024-05-17 | GSE267438 | GEO