Dataset Information

A clustering approach for detecting implausible observation values in electronic health records data.

ABSTRACT:

Background

Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.

Methods

The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance.

Results

We found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases.

Conclusion

Our contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm's job and initiate necessary actions that need to be taken in order to improve the quality of data.

SUBMITTER: Estiri H

PROVIDER: S-EPMC6652024 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A clustering approach for detecting implausible observation values in electronic health records data.

Estiri Hossein H Klann Jeffrey G JG Murphy Shawn N SN

BMC medical informatics and decision making 20190723 1

<h4>Background</h4>Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.<h4>Methods</h4>The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible ...[more]

PMID: 31337390

Dataset Information

A clustering approach for detecting implausible observation values in electronic health records data.

Background

Methods

Results

Conclusion

Publications

A clustering approach for detecting implausible observation values in electronic health records data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Automated identification of implausible values in growth data from pediatric electronic health records.
| S-EPMC7651915 | biostudies-literature

Deep representation learning for clustering longitudinal survival data from electronic health records.
| S-EPMC11909183 | biostudies-literature

Detecting Miscoded Diabetes Diagnosis Codes in Electronic Health Records for Quality Improvement: Temporal Deep Learning Approach.
| S-EPMC7775195 | biostudies-literature

Unsupervised clustering of longitudinal clinical measurements in electronic health records.
| S-EPMC11478862 | biostudies-literature

Detecting Inappropriate Access to Electronic Health Records Using Collaborative Filtering.
| S-EPMC3967851 | biostudies-literature

Comorbidity analysis and clustering of endometriosis patients using electronic health records.
| S-EPMC11844609 | biostudies-literature

Comorbidity analysis and clustering of endometriosis patients using electronic health records.
| S-EPMC12432376 | biostudies-literature

Bayesian profiling multiple imputation for missing hemoglobin values in electronic health records.
| S-EPMC9600600 | biostudies-literature

Integrating cancer genomic data into electronic health records.
| S-EPMC5081968 | biostudies-literature

Learning About Missing Data Mechanisms in Electronic Health Records-based Research: A Survey-based Approach.
| S-EPMC4666800 | biostudies-literature