Unknown

Dataset Information

0

A Data Element-Function Conceptual Model for Data Quality Checks.


ABSTRACT:

Introduction

In aggregate, existing data quality (DQ) checks are currently represented in heterogeneous formats, making it difficult to compare, categorize, and index checks. This study contributes a data element-function conceptual model to facilitate the categorization and indexing of DQ checks and explores the feasibility of leveraging natural language processing (NLP) for scalable acquisition of knowledge of common data elements and functions from DQ checks narratives.

Methods

The model defines a "data element", the primary focus of the check, and a "function", the qualitative or quantitative measure over a data element. We applied NLP techniques to extract both from 172 checks for Observational Health Data Sciences and Informatics (OHDSI) and 3,434 checks for Kaiser Permanente's Center for Effectiveness and Safety Research (CESR).

Results

The model was able to classify all checks. A total of 751 unique data elements and 24 unique functions were extracted. The top five frequent data element-function pairings for OHDSI were Person-Count (55 checks), Insurance-Distribution (17), Medication-Count (16), Condition-Count (14), and Observations-Count (13); for CESR, they were Medication-Variable Type (175), Medication-Missing (172), Medication-Existence (152), Medication-Count (127), and Socioeconomic Factors-Variable Type (114).

Conclusions

This study shows the efficacy of the data element-function conceptual model for classifying DQ checks, demonstrates early promise of NLP-assisted knowledge acquisition, and reveals the great heterogeneity in the focus in DQ checks, confirming variation in intrinsic checks and use-case specific "fitness-for-use" checks.

SUBMITTER: Rogers JR 

PROVIDER: S-EPMC6484368 | biostudies-literature | 2019 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

A Data Element-Function Conceptual Model for Data Quality Checks.

Rogers James R JR   Callahan Tiffany J TJ   Kang Tian T   Bauck Alan A   Khare Ritu R   Brown Jeffrey S JS   Kahn Michael G MG   Weng Chunhua C  

EGEMS (Washington, DC) 20190423 1


<h4>Introduction</h4>In aggregate, existing data quality (DQ) checks are currently represented in heterogeneous formats, making it difficult to compare, categorize, and index checks. This study contributes a data element-function conceptual model to facilitate the categorization and indexing of DQ checks and explores the feasibility of leveraging natural language processing (NLP) for scalable acquisition of knowledge of common data elements and functions from DQ checks narratives.<h4>Methods</h4  ...[more]

Similar Datasets

| S-EPMC5096987 | biostudies-literature
| S-EPMC5489162 | biostudies-literature
| S-EPMC7045642 | biostudies-literature
| S-EPMC3457925 | biostudies-literature
| S-EPMC7708573 | biostudies-literature
| S-EPMC4642215 | biostudies-literature
| S-EPMC4305458 | biostudies-literature
| S-EPMC2280010 | biostudies-literature
| S-EPMC6354027 | biostudies-other
| S-EPMC6380372 | biostudies-literature