Unknown

Dataset Information

0

Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network.


ABSTRACT: Objective:To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. Materials and Methods:We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules' performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool-OneFL Deduper-that (1) creates seeded hash codes of combinations of patients' quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. Results:We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. Conclusions:Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources.

SUBMITTER: Bian J 

PROVIDER: S-EPMC6994009 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network.

Bian Jiang J   Loiacono Alexander A   Sura Andrei A   Mendoza Viramontes Tonatiuh T   Lipori Gloria G   Guo Yi Y   Shenkman Elizabeth E   Hogan William W  

JAMIA open 20190927 4


<h4>Objective</h4>To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network.<h4>Materials and methods</h4>We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules' performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-base  ...[more]

Similar Datasets

| S-EPMC3932473 | biostudies-literature
| S-EPMC10161965 | biostudies-literature
| S-EPMC6180364 | biostudies-other
| S-EPMC9968283 | biostudies-literature
| S-EPMC8761329 | biostudies-literature
| S-EPMC11491627 | biostudies-literature
| S-EPMC8896632 | biostudies-literature
| S-EPMC9499274 | biostudies-literature
| S-EPMC9620597 | biostudies-literature
| S-EPMC7482515 | biostudies-literature