Unknown

Dataset Information

0

A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology.


ABSTRACT:

Objective

The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease.

Materials and methods

Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance.

Results

16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently.

Discussion

The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique.

Conclusion

When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.

SUBMITTER: Ong TC 

PROVIDER: S-EPMC7647290 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology.

Ong Toan C TC   Duca Lindsey M LM   Kahn Michael G MG   Crume Tessa L TL  

Journal of the American Medical Informatics Association : JAMIA 20200401 4


<h4>Objective</h4>The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease.<h4>Materials and methods</h4>Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic h  ...[more]

Similar Datasets

| S-EPMC5005943 | biostudies-literature
| S-EPMC3766252 | biostudies-literature
| S-EPMC6326114 | biostudies-literature
| S-EPMC9064948 | biostudies-literature
| S-EPMC6893033 | biostudies-literature
| S-EPMC6596957 | biostudies-literature
| S-EPMC8277618 | biostudies-literature
| S-EPMC2638543 | biostudies-literature
| S-EPMC8730440 | biostudies-literature
| S-EPMC7542414 | biostudies-literature