Unknown

Dataset Information

0

A guide to evaluating linkage quality for the analysis of linked data.


ABSTRACT: Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on how to determine the potential impact of linkage error. We describe how linkage quality can be evaluated and provide widely applicable guidance for both data providers and researchers. Using an illustrative example of a linked dataset of maternal and baby hospital records, we demonstrate three approaches for evaluating linkage quality: applying the linkage algorithm to a subset of gold standard data to quantify linkage error; comparing characteristics of linked and unlinked data to identify potential sources of bias; and evaluating the sensitivity of results to changes in the linkage procedure. These approaches can inform our understanding of the potential impact of linkage error and provide an opportunity to select the most appropriate linkage procedure for a specific analysis. Evaluating linkage quality in this way will improve the quality and transparency of epidemiological and clinical research using linked data.

SUBMITTER: Harron KL 

PROVIDER: S-EPMC5837697 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

A guide to evaluating linkage quality for the analysis of linked data.

Harron Katie L KL   Doidge James C JC   Knight Hannah E HE   Gilbert Ruth E RE   Goldstein Harvey H   Cromwell David A DA   van der Meulen Jan H JH  

International journal of epidemiology 20171001 5


Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on how to determine the potential impact of linkage error. We describe how linkage quality can be evaluate  ...[more]

Similar Datasets

| S-EPMC4015706 | biostudies-literature
| S-EPMC6945814 | biostudies-literature
| S-EPMC3688507 | biostudies-literature
| S-EPMC7197788 | biostudies-literature
| S-EPMC6696682 | biostudies-literature
| S-EPMC9198588 | biostudies-literature
| S-EPMC8445153 | biostudies-literature
| S-EPMC6949293 | biostudies-literature
| S-EPMC6096346 | biostudies-literature
| S-EPMC2234437 | biostudies-literature