Dataset Information

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes.

ABSTRACT:

Introduction

Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying interactions can be difficult if not hypothesized a priori. We evaluate the performance of several variable selection approaches for clustered binary outcomes to provide guidance for choosing between the methods.

Methods

We conducted simulations comparing stepwise selection, penalized GLMM, boosted GLMM, and boosted GEE for variable selection considering main effects and two-way interactions in data with repeatedly measured binary outcomes and evaluate a two-stage approach to reduce bias and error in parameter estimates. We compared these approaches in real data applications: hypothermia during surgery and treatment response in lupus nephritis.

Results

Penalized and boosted approaches recovered correct predictors and interactions more frequently than stepwise selection. Penalized GLMM recovered correct predictors more often than boosting, but included many spurious predictors. Boosted GLMM yielded parsimonious models and identified correct predictors well at large sample and effect sizes, but required excessive computation time. Boosted GEE was computationally efficient and selected relatively parsimonious models, offering a compromise between computation and parsimony. The two-stage approach reduced the bias and error in regression parameters in all approaches.

Conclusion

Penalized and boosted approaches are effective for variable selection in data with clustered binary outcomes. The two-stage approach reduces bias and error and should be applied regardless of method. We provide guidance for choosing the most appropriate method in real applications.

SUBMITTER: Wolf BJ

PROVIDER: S-EPMC8057419 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes.

Wolf Bethany J BJ Jiang Yunyun Y Wilson Sylvia H SH Oates Jim C JC

Journal of clinical and translational science 20201116 1

<h4>Introduction</h4>Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying interactions can be difficult if not hypothesized <i>a priori</i>. We evaluate the performance of several variable selection approaches for clustered binary outcomes to provide guidance ...[more]

PMID: 33948279

Dataset Information

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes.

Introduction

Methods

Results

Conclusion

Publications

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Identifying Gene-Environment Interactions With Robust Marginal Bayesian Variable Selection.
| S-EPMC8693717 | biostudies-literature

Repeatedly measured predictors: a comparison of methods for prediction modeling.
| S-EPMC6460730 | biostudies-literature

Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors.
| S-EPMC4848399 | biostudies-literature

Variable selection methods for predicting clinical outcomes following allogeneic hematopoietic cell transplantation.
| S-EPMC7865009 | biostudies-literature

Randomized Trials With Repeatedly Measured Outcomes: Handling Irregular and Potentially Informative Assessment Times.
| S-EPMC10362939 | biostudies-literature

A comparison of random forest variable selection methods for regression modeling of continuous outcomes.
| S-EPMC11891652 | biostudies-literature

Operator-induced structural variable selection for identifying materials genes.
| S-EPMC11343493 | biostudies-literature

Variable selection for quantile autoregressive model: Bayesian methods versus classical methods.
| S-EPMC11018091 | biostudies-literature

Semiparametric Bayesian variable selection for gene-environment interactions.
| S-EPMC7467082 | biostudies-literature

Lack of identification in semiparametric instrumental variable models with binary outcomes.
| S-EPMC4070936 | biostudies-literature