Unknown

Dataset Information

0

Comparison of imputation methods for missing laboratory data in medicine.


ABSTRACT:

Objectives

Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models.

Design

Retrospective cohort analysis of two large data sets.

Setting

A tertiary level care institution in Ann Arbor, Michigan.

Participants

The Cirrhosis cohort had 446 patients and the Inflammatory Bowel Disease cohort had 395 patients.

Methods

Non-missing laboratory data were randomly removed with varying frequencies from two large data sets, and we then compared the ability of four methods-missForest, mean imputation, nearest neighbour imputation and multivariate imputation by chained equations (MICE)-to impute the simulated missing data. We characterised the accuracy of the imputation and the effect of the imputation on predictive ability in two large data sets.

Results

MissForest had the least imputation error for both continuous and categorical variables at each frequency of missingness, and it had the smallest prediction difference when models used imputed laboratory values. In both data sets, MICE had the second least imputation error and prediction difference, followed by the nearest neighbour and mean imputation.

Conclusions

MissForest is a highly accurate method of imputation for missing laboratory data and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in two clinical predicative models.

SUBMITTER: Waljee AK 

PROVIDER: S-EPMC3733317 | biostudies-literature | 2013 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Comparison of imputation methods for missing laboratory data in medicine.

Waljee Akbar K AK   Mukherjee Ashin A   Singal Amit G AG   Zhang Yiwei Y   Warren Jeffrey J   Balis Ulysses U   Marrero Jorge J   Zhu Ji J   Higgins Peter Dr PD  

BMJ open 20130801 8


<h4>Objectives</h4>Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models.<h4>Design</h4>Retrospective cohort analysis of two large data sets.<h4>Setting</h4>A tertiary level care institution in Ann Arbor, M  ...[more]

Similar Datasets

| S-EPMC6292063 | biostudies-literature
| S-EPMC10870437 | biostudies-literature
| S-EPMC3019210 | biostudies-literature
| S-EPMC9205685 | biostudies-literature
| S-EPMC10169455 | biostudies-literature
| S-EPMC4287494 | biostudies-literature
| S-EPMC11232582 | biostudies-literature
| S-EPMC7744924 | biostudies-literature
| S-EPMC8323724 | biostudies-literature
| S-EPMC7049189 | biostudies-literature