Unknown

Dataset Information

0

On Missingness Features in Machine Learning Models for Critical Care: Observational Study.


ABSTRACT:

Background

Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient's health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated.

Objective

The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings.

Methods

A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration.

Results

Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections.

Conclusions

This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.

SUBMITTER: Singh J 

PROVIDER: S-EPMC8701717 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6245501 | biostudies-other
| S-BSST733 | biostudies-other
| S-EPMC9030498 | biostudies-literature
| S-EPMC10901806 | biostudies-literature
| S-EPMC6354017 | biostudies-literature
| S-EPMC7433773 | biostudies-literature
| S-EPMC7528791 | biostudies-literature
| S-EPMC6474354 | biostudies-other
| S-EPMC7947498 | biostudies-literature
| S-EPMC10317605 | biostudies-literature