Dataset Information

Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification.

ABSTRACT:

Background

Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.

Methods

We conducted a simulation study assessing the influence of treating variables as "known predictors" in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a "known" predictor if a predefined number of preceding studies identified it as relevant.

Results

Even if several preceding studies identified a variable as a "true" predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.

Conclusions

The source of "background knowledge" should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.

SUBMITTER: Hafermann L

PROVIDER: S-EPMC8480029 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification.

Hafermann Lorena L Becher Heiko H Herrmann Carolin C Klein Nadja N Heinze Georg G Rauch Geraldine G

BMC medical research methodology 20210929 1

<h4>Background</h4>Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors migh ...[more]

PMID: 34587892

Dataset Information

Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification.

Background

Methods

Results

Conclusions

Publications

Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Using Background Knowledge from Preceding Studies for Building a Random Forest Prediction Model: A Plasmode Simulation Study.
| S-EPMC9222226 | biostudies-literature

Building a statistical model for predicting cancer genes.
| S-EPMC3499550 | biostudies-literature

Dynamic model updating (DMU) approach for statistical learning model building with missing data.
| S-EPMC8086098 | biostudies-literature

Evaluating Model Misspecification in Independent Component Analysis.
| S-EPMC4309392 | biostudies-literature

Avoiding background knowledge: literature based discovery from important information.
| S-EPMC10013236 | biostudies-literature

Improving structural similarity based virtual screening using background knowledge.
| S-EPMC3928642 | biostudies-literature

Evaluating mixture models for building RNA knowledge-based potentials.
| S-EPMC4038748 | biostudies-literature

Human-computer interaction based on background knowledge and emotion certainty
| S-EPMC10280641 | biostudies-literature

Unifying model for molecular determinants of the preselection V? repertoire.
| S-EPMC3752219 | biostudies-literature

Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome's distribution and metric properties.
| S-EPMC10369499 | biostudies-literature