Dataset Information

Variable selection in Logistic regression model with genetic algorithm.

ABSTRACT: Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.

SUBMITTER: Zhang Z

PROVIDER: S-EPMC5879502 | biostudies-literature | 2018 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Variable selection in Logistic regression model with genetic algorithm.

Zhang Zhongheng Z Trevino Victor V Hoseini Sayed Shahabuddin SS Belciug Smaranda S Boopathi Arumugam Manivanna AM Zhang Ping P Gorunescu Florin F Subha Velappan V Dai Songshi S

Annals of translational medicine 20180201 3

Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. ...[more]

PMID: 29610737

Dataset Information

Variable selection in Logistic regression model with genetic algorithm.

Publications

Variable selection in Logistic regression model with genetic algorithm.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates.
| S-EPMC5935273 | biostudies-literature

Bayesian variable selection logistic regression with paired proteomic measurements.
| S-EPMC6175404 | biostudies-literature

Robust Variable and Interaction Selection for Logistic Regression and General Index Models.
| S-EPMC7451675 | biostudies-literature

Fast Model-Fitting of Bayesian Variable Selection Regression Using the Iterative Complex Factorization Algorithm.
| S-EPMC6788783 | biostudies-literature

Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.
| S-EPMC5026200 | biostudies-literature

Variable selection in logistic regression for detecting SNP-SNP interactions: the rheumatoid arthritis example.
| S-EPMC3786179 | biostudies-literature

Stable variable ranking and selection in regularized logistic regression for severely imbalanced big binary data.
| S-EPMC9844919 | biostudies-literature

Accelerating L1-penalized expectation maximization algorithm for latent variable selection in multidimensional two-parameter logistic models.
| S-EPMC9844851 | biostudies-literature

Purposeful selection of variables in logistic regression.
| S-EPMC2633005 | biostudies-literature

Simultaneous clustering and variable selection: A novel algorithm and model selection procedure.
| S-EPMC10439051 | biostudies-literature