Dataset Information

Identifying clinical features in primary care electronic health record studies: methods for codelist development.

ABSTRACT: OBJECTIVE:Analysis of routinely collected electronic health record (EHR) data from primary care is reliant on the creation of codelists to define clinical features of interest. To improve scientific rigour, transparency and replicability, we describe and demonstrate a standardised reproducible methodology for clinical codelist development. DESIGN:We describe a three-stage process for developing clinical codelists. First, the clear definition a priori of the clinical feature of interest using reliable clinical resources. Second, development of a list of potential codes using statistical software to comprehensively search all available codes. Third, a modified Delphi process to reach consensus between primary care practitioners on the most relevant codes, including the generation of an 'uncertainty' variable to allow sensitivity analysis. SETTING:These methods are illustrated by developing a codelist for shortness of breath in a primary care EHR sample, including modifiable syntax for commonly used statistical software. PARTICIPANTS:The codelist was used to estimate the frequency of shortness of breath in a cohort of 28?216 patients aged over 18 years who received an incident diagnosis of lung cancer between 1 January 2000 and 30 November 2016 in the Clinical Practice Research Datalink (CPRD). RESULTS:Of 78 candidate codes, 29 were excluded as inappropriate. Complete agreement was reached for 44 (90%) of the remaining codes, with partial disagreement over 5 (10%). 13?091 episodes of shortness of breath were identified in the cohort of 28?216 patients. Sensitivity analysis demonstrates that codes with the greatest uncertainty tend to be rarely used in clinical practice. CONCLUSIONS:Although initially time consuming, using a rigorous and reproducible method for codelist generation 'future-proofs' findings and an auditable, modifiable syntax for codelist generation enables sharing and replication of EHR studies. Published codelists should be badged by quality and report the methods of codelist generation including: definitions and justifications associated with each codelist; the syntax or search method; the number of candidate codes identified; and the categorisation of codes after Delphi review.

SUBMITTER: Watson J

PROVIDER: S-EPMC5719324 | biostudies-other | 2017 Nov

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Identifying clinical features in primary care electronic health record studies: methods for codelist development.

Watson Jessica J Nicholson Brian D BD Hamilton Willie W Price Sarah S

BMJ open 20171122 11

<h4>Objective</h4>Analysis of routinely collected electronic health record (EHR) data from primary care is reliant on the creation of codelists to define clinical features of interest. To improve scientific rigour, transparency and replicability, we describe and demonstrate a standardised reproducible methodology for clinical codelist development.<h4>Design</h4>We describe a three-stage process for developing clinical codelists. First, the clear definition a priori of the clinical feature of int ...[more]

PMID: 29170293

Similar Datasets

Project description:Backgroundfrailty is an especially problematic expression of population ageing. International guidelines recommend routine identification of frailty to provide evidence-based treatment, but currently available tools require additional resource.Objectivesto develop and validate an electronic frailty index (eFI) using routinely available primary care electronic health record data.Study design and settingretrospective cohort study. Development and internal validation cohorts were established using a randomly split sample of the ResearchOne primary care database. External validation cohort established using THIN database.Participantspatients aged 65-95, registered with a ResearchOne or THIN practice on 14 October 2008.Predictorswe constructed the eFI using the cumulative deficit frailty model as our theoretical framework. The eFI score is calculated by the presence or absence of individual deficits as a proportion of the total possible. Categories of fit, mild, moderate and severe frailty were defined using population quartiles.Outcomesoutcomes were 1-, 3- and 5-year mortality, hospitalisation and nursing home admission.Statistical analysishazard ratios (HRs) were estimated using bivariate and multivariate Cox regression analyses. Discrimination was assessed using receiver operating characteristic (ROC) curves. Calibration was assessed using pseudo-R(2) estimates.Resultswe include data from a total of 931,541 patients. The eFI incorporates 36 deficits constructed using 2,171 CTV3 codes. One-year adjusted HR for mortality was 1.92 (95% CI 1.81-2.04) for mild frailty, 3.10 (95% CI 2.91-3.31) for moderate frailty and 4.52 (95% CI 4.16-4.91) for severe frailty. Corresponding estimates for hospitalisation were 1.93 (95% CI 1.86-2.01), 3.04 (95% CI 2.90-3.19) and 4.73 (95% CI 4.43-5.06) and for nursing home admission were 1.89 (95% CI 1.63-2.15), 3.19 (95% CI 2.73-3.73) and 4.76 (95% CI 3.92-5.77), with good to moderate discrimination but low calibration estimates.Conclusionsthe eFI uses routine data to identify older people with mild, moderate and severe frailty, with robust predictive validity for outcomes of mortality, hospitalisation and nursing home admission. Routine implementation of the eFI could enable delivery of evidence-based interventions to improve outcomes for this vulnerable group.

Project description:ObjectiveTo identify observational studies which used data from more than one primary care electronic health record (EHR) database, and summarise key characteristics including: objective and rationale for using multiple data sources; methods used to manage, analyse and (where applicable) combine data; and approaches used to assess and report heterogeneity between data sources.DesignA systematic review of published studies.Data sourcesPubmed and Embase databases were searched using list of named primary care EHR databases; supplementary hand searches of reference list of studies were retained after initial screening.Study selectionObservational studies published between January 2000 and May 2018 were selected, which included at least two different primary care EHR databases.Results6054 studies were identified from database and hand searches, and 109 were included in the final review, the majority published between 2014 and 2018. Included studies used 38 different primary care EHR data sources. Forty-seven studies (44%) were descriptive or methodological. Of 62 analytical studies, 22 (36%) presented separate results from each database, with no attempt to combine them; 29 (48%) combined individual patient data in a one-stage meta-analysis and 21 (34%) combined estimates from each database using two-stage meta-analysis. Discussion and exploration of heterogeneity was inconsistent across studies.ConclusionsComparing patterns and trends in different populations, or in different primary care EHR databases from the same populations, is important and a common objective for multi-database studies. When combining results from several databases using meta-analysis, provision of separate results from each database is helpful for interpretation. We found that these were often missing, particularly for studies using one-stage approaches, which also often lacked details of any statistical adjustment for heterogeneity and/or clustering. For two-stage meta-analysis, a clear rationale should be provided for choice of fixed effect and/or random effects or other models.

Project description:BACKGROUND:The increased use of electronic medical records (EMRs) in Canadian primary health care practice has resulted in an expansion of the availability of EMR data. Potential users of these data need to understand their quality in relation to the uses to which they are applied. Herein, we propose a basic model for assessing primary health care EMR data quality, comprising a set of data quality measures within four domains. We describe the process of developing and testing this set of measures, share the results of applying these measures in three EMR-derived datasets, and discuss what this reveals about the measures and EMR data quality. The model is offered as a starting point from which data users can refine their own approach, based on their own needs. METHODS:Using an iterative process, measures of EMR data quality were created within four domains: comparability; completeness; correctness; and currency. We used a series of process steps to develop the measures. The measures were then operationalized, and tested within three datasets created from different EMR software products. RESULTS:A set of eleven final measures were created. We were not able to calculate results for several measures in one dataset because of the way the data were collected in that specific EMR. Overall, we found variability in the results of testing the measures (e.g. sensitivity values were highest for diabetes, and lowest for obesity), among datasets (e.g. recording of height), and by patient age and sex (e.g. recording of blood pressure, height and weight). CONCLUSIONS:This paper proposes a basic model for assessing primary health care EMR data quality. We developed and tested multiple measures of data quality, within four domains, in three different EMR-derived primary health care datasets. The results of testing these measures indicated that not all measures could be utilized in all datasets, and illustrated variability in data quality. This is one step forward in creating a standard set of measures of data quality. Nonetheless, each project has unique challenges, and therefore requires its own data quality assessment before proceeding.

Dataset Information

Identifying clinical features in primary care electronic health record studies: methods for codelist development.

Publications

Identifying clinical features in primary care electronic health record studies: methods for codelist development.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets