Project description:BackgroundLinkage errors that occur according to linkage levels can adversely affect the accuracy and reliability of analysis results. This study aimed to identify the differences in results according to personally identifiable information linkage level, sample size, and analysis methods through empirical analysis.MethodsThe difference between the results of linkage in directly identifiable information (DII) and indirectly identifiable information (III) linkage levels was set as III linkage based on name, date of birth, and sex and DII linkage based on resident registration number. The datasets linked at each level were named as databaseIII (DBIII) and databaseDII (DBDII), respectively. Considering the analysis results of the DII-linked dataset as the gold standard, descriptive statistics, group comparison, incidence estimation, treatment effect, and moderation effect analysis results were assessed.ResultsThe linkage rates for DBDII and DBIII were 71.1% and 99.7%, respectively. Regarding descriptive statistics and group comparison analysis, the difference in effect in most cases was "none" to "very little." With respect to cervical cancer that had a relatively small sample size, analysis of DBIII resulted in an underestimation of the incidence in the control group and an overestimation of the incidence in the treatment group (hazard ratio [HR] = 2.62 [95% confidence interval (CI): 1.63-4.23] in DBIII vs. 1.80 [95% CI: 1.18-2.73] in DBDII). Regarding prostate cancer, there was a conflicting tendency with the treatment effect being over or underestimated according to the surveillance, epidemiology, and end results summary staging (HR = 2.27 [95% CI: 1.91-2.70] in DBIII vs. 1.92 [95% CI: 1.70-2.17] in DBDII for the localized stage; HR = 1.80 [95% CI: 1.37-2.36] in DBIII vs. 2.05 [95% CI: 1.67-2.52] in DBDII for the regional stage).ConclusionsTo prevent distortion of the analyses results in health and medical research, it is important to check that the patient population and sample size by each factor of interest (FOI) are sufficient when different data are linked using DBDII. In cases involving a rare disease or with a small sample size for FOI, there is a high likelihood that a DII linkage is unavoidable.
Project description:BackgroundPolicy makers, clinicians and researchers are demonstrating increasing interest in using data linked from multiple sources to support measurement of clinical performance and patient health outcomes. However, the utility of data linkage may be compromised by sub-optimal or incomplete linkage, leading to systematic bias. In this study, we synthesize the evidence identifying participant or population characteristics that can influence the validity and completeness of data linkage and may be associated with systematic bias in reported outcomes.MethodsA narrative review, using structured search methods was undertaken. Key words "data linkage" and Mesh term "medical record linkage" were applied to Medline, EMBASE and CINAHL databases between 1991 and 2007. Abstract inclusion criteria were; the article attempted an empirical evaluation of methodological issues relating to data linkage and reported on patient characteristics, the study design included analysis of matched versus unmatched records, and the report was in English. Included articles were grouped thematically according to patient characteristics that were compared between matched and unmatched records.ResultsThe search identified 1810 articles of which 33 (1.8%) met inclusion criteria. There was marked heterogeneity in study methods and factors investigated. Characteristics that were unevenly distributed among matched and unmatched records were; age (72% of studies), sex (50% of studies), race (64% of studies), geographical/hospital site (93% of studies), socio-economic status (82% of studies) and health status (72% of studies).ConclusionA number of relevant patient or population factors may be associated with incomplete data linkage resulting in systematic bias in reported clinical outcomes. Readers should consider these factors in interpreting the reported results of data linkage studies.
Project description:Introduction:Innovative data platforms (e.g. biobanks, repositories) continually emerge to facilitate data sharing. Extant and emerging data platforms must navigate myriad tensions for successful data sharing and re-use. Two Alberta data platforms navigated such processes and factors regarding administrative, research and nonprofit data: the Child & Youth Data Laboratory (CYDL) and Secondary Analysis to Generate Evidence (SAGE). Objectives:To clarify the social and policy factors that influenced CYDL and SAGE establishment and implementation, and the relationships, if any, between these factors and data type. Methods:This paper involves a qualitative secondary analysis of two developmental evaluations on CYDL and SAGE establishment. Six-years post-implementation, the CYDL evaluation entailed document review; website user analysis; interviews (n=30); online stakeholder survey (n=260); and an environmental scan. One-year post implementation, the SAGE evaluation included 15 interviews and document review. We used thematic analysis and comparisons with the literature to identify key factors. Results:Three (not mutually exclusive) categories of social and policy factors influenced the navigation towards CYDL and SAGE realization: trusting relationships; sustainability amidst readiness; and privacy within social context. For these platforms to be able to manage, link or share data, trust had to be fostered and maintained across multiple, dynamic and intersecting relationships between primary data producers, data subjects, secondary users and institutions. Platform sustainability required capacity building and innovation. Privacy and information sharing evolved culturally and correspondingly for these data platforms, which required constant flexibility and awareness. Conclusions:This analysis calls for more empirical research on the value of data re-use or the detriment in not re-using data. While the culture of information sharing is progressing towards greater openness and capacity for data sharing and re-use, successful data platforms must advocate, facilitate and mobilize analysis and innovation using data re-use while being cognizant of social and policy influences.
Project description:Introduction:Analysis of linked health data can generate important, even life-saving, insights into population health. Yet obstacles both legal and organisational in nature can impede this work. Approach:We focus on three UK infrastructures set up to link and share data for research: the Administrative Data Research Network, NHS Digital, and the Secure Anonymised Information Linkage Databank. Bringing an interdisciplinary perspective, we identify key issues underpinning their challenges and successes in linking health data for research. Results:We identify examples of uncertainty surrounding legal powers to share and link data, and around data protection obligations, as well as systemic delays and historic public backlash. These issues require updated official guidance on the relevant law, approaches to linkage which are planned for impact and ongoing utility, greater transparency between data providers and researchers, and engagement with the patient population which is both high-profile and carefully considered. Conclusions:Health data linkage for research presents varied challenges, to which there can be no single solution. Our recommendations would require action from a number of data providers and regulators to be meaningfully advanced. This illustrates the scale and complexity of the challenge of health data linkage, in the UK and beyond: a challenge which our case studies suggest no single organisation can combat alone. Planned programmes of linkage are critical because they allow time for organisations to address these challenges without adversely affecting the feasibility of individual research projects.
Project description:BackgroundThe SARS-CoV-2 pandemic has highlighted once more the great need for comprehensive access to, and uncomplicated use of, pre-existing patient data for medical research. Enabling secondary research-use of patient-data is a prerequisite for the efficient and sustainable promotion of translation and personalisation in medicine, and for the advancement of public-health. However, balancing the legitimate interests of scientists in broad and unrestricted data-access and the demand for individual autonomy, privacy and social justice is a great challenge for patient-based medical research.MethodsWe therefore conducted two questionnaire-based surveys among North-German outpatients (n = 650) to determine their attitude towards data-donation for medical research, implemented as an opt-out-process.ResultsWe observed a high level of acceptance (75.0%), the most powerful predictor of a positive attitude towards data-donation was the conviction that every citizen has a duty to contribute to the improvement of medical research (> 80% of participants approving data-donation). Interestingly, patients distinguished sharply between research inside and outside the EU, despite a general awareness that universities and public research institutions cooperate with commercial companies, willingness to allow use of donated data by the latter was very low (7.1% to 29.1%, depending upon location of company). The most popular measures among interviewees to counteract reservations against commercial data-use were regulation by law (61.4%), stipulating in the process that data are not sold or resold (84.6%). A majority requested control of both the use (46.8%) and the protection (41.5%) of the data by independent bodies.ConclusionsIn conclusion, data-donation for medical research, implemented as a combination of legal entitlement and easy-to-exercise-right to opt-out, was found to be widely supported by German patients and therefore warrants further consideration for a transposition into national law.
Project description:There is an ongoing challenge as to how best manage and understand 'big data' in precision medicine settings. This paper describes the potential for a Linked Data approach, using a Resource Description Framework (RDF) model, to combine multiple datasets with temporal and spatial elements of varying dimensionality. This "AVERT model" provides a framework for converting multiple standalone files of various formats, from both clinical and environmental settings, into a single data source. This data source can thereafter be queried effectively, shared with outside parties, more easily understood by multiple stakeholders using standardized vocabularies, incorporating provenance metadata and supporting temporo-spatial reasoning. The approach has further advantages in terms of data sharing, security and subsequent analysis. We use a case study relating to anti-Glomerular Basement Membrane (GBM) disease, a rare autoimmune condition, to illustrate a technical proof of concept for the AVERT model.
Project description:IntroductionData linkage for health research purposes enables the answering of countless new research questions, is said to be cost effective and less intrusive than other means of data collection. Nevertheless, health researchers are currently dealing with a complicated, fragmented, and inconsistent regulatory landscape with regard to the processing of data, and progress in health research is hindered.AimWe designed a qualitative study to assess what different stakeholders perceive as ethical and legal obstacles to data linkage for health research purposes, and how these obstacles could be overcome.MethodsTwo focus groups and eighteen semi-structured in-depth interviews were held to collect opinions and insights of various stakeholders. An inductive thematic analysis approach was used to identify overarching themes.ResultsThis study showed that the ambiguity regarding the 'correct' interpretation of the law, the fragmentation of policies governing the processing of personal health data, and the demandingness of legal requirements are experienced as causes for the impediment of data linkage for research purposes by the participating stakeholders. To remove or reduce these obstacles authoritative interpretations of the laws and regulations governing data linkage should be issued. The participants furthermore encouraged the harmonisation of data linkage policies, as well as promoting trust and transparency and the enhancement of technical and organisational measures. Lastly, there is a demand for legislative and regulatory modifications amongst the participants.ConclusionsTo overcome the obstacles in data linkage for scientific research purposes, perhaps we should shift the focus from adapting the current laws and regulations governing data linkage, or even designing completely new laws, towards creating a more thorough understanding of the law and making better use of the flexibilities within the existing legislation. Important steps in achieving this shift could be clarification of the legal provisions governing data linkage by issuing authoritative interpretations, as well as the strengthening of ethical-legal oversight bodies.