Dataset Information

Challenges in risk estimation using routinely collected clinical data: The example of estimating cervical cancer risks from electronic health-records.

ABSTRACT: Electronic health-records (EHR) are increasingly used by epidemiologists studying disease following surveillance testing to provide evidence for screening intervals and referral guidelines. Although cost-effective, undiagnosed prevalent disease and interval censoring (in which asymptomatic disease is only observed at the time of testing) raise substantial analytic issues when estimating risk that cannot be addressed using Kaplan-Meier methods. Based on our experience analysing EHR from cervical cancer screening, we previously proposed the logistic-Weibull model to address these issues. Here we demonstrate how the choice of statistical method can impact risk estimates. We use observed data on 41,067 women in the cervical cancer screening program at Kaiser Permanente Northern California, 2003-2013, as well as simulations to evaluate the ability of different methods (Kaplan-Meier, Turnbull, Weibull and logistic-Weibull) to accurately estimate risk within a screening program. Cumulative risk estimates from the statistical methods varied considerably, with the largest differences occurring for prevalent disease risk when baseline disease ascertainment was random but incomplete. Kaplan-Meier underestimated risk at earlier times and overestimated risk at later times in the presence of interval censoring or undiagnosed prevalent disease. Turnbull performed well, though was inefficient and not smooth. The logistic-Weibull model performed well, except when event times didn't follow a Weibull distribution. We have demonstrated that methods for right-censored data, such as Kaplan-Meier, result in biased estimates of disease risks when applied to interval-censored data, such as screening programs using EHR data. The logistic-Weibull model is attractive, but the model fit must be checked against Turnbull non-parametric risk estimates.

SUBMITTER: Landy R

PROVIDER: S-EPMC5930038 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Challenges in risk estimation using routinely collected clinical data: The example of estimating cervical cancer risks from electronic health-records.

Landy Rebecca R Cheung Li C LC Schiffman Mark M Gage Julia C JC Hyun Noorie N Wentzensen Nicolas N Kinney Walter K WK Castle Philip E PE Fetterman Barbara B Poitras Nancy E NE Lorey Thomas T Sasieni Peter D PD Katki Hormuzd A HA

Preventive medicine 20171206

Electronic health-records (EHR) are increasingly used by epidemiologists studying disease following surveillance testing to provide evidence for screening intervals and referral guidelines. Although cost-effective, undiagnosed prevalent disease and interval censoring (in which asymptomatic disease is only observed at the time of testing) raise substantial analytic issues when estimating risk that cannot be addressed using Kaplan-Meier methods. Based on our experience analysing EHR from cervical ...[more]

PMID: 29222045

Similar Datasets

Project description:BackgroundRoutinely-collected healthcare data provide a valuable resource for epidemiological research. Validation studies have shown that for most conditions, simple lists of clinical codes can reliably be used for case finding in primary care, however, studies exploring the robustness of this approach are lacking for diseases such as idiopathic pulmonary fibrosis (IPF) which are largely managed in secondary care.MethodUsing the UK's Clinical Practice Research Datalink (CPRD) Aurum dataset, which comprises patient-level primary care records linked to national hospital admissions and cause-of-death data, we compared the positive predictive value (PPV) of eight diagnostic algorithms. Algorithms were developed based on the literature and IPF diagnostic guidelines using combinations of clinical codes in primary and secondary care (SNOMED-CT or ICD-10) with/without additional information. The positive predictive value (PPV) was estimated for each algorithm using the death record as the gold standard. Utilization of the reviewed codes across the study period was observed to evaluate any change in coding practices over time.ResultA total of 17,559 individuals had a least one record indicative of IPF in one or more of our three linked datasets between 2008 and 2018. The PPV of case-finding algorithms based on clinical codes alone ranged from 64.4% (95%CI:63.3-65.3) for a "broad" codeset to 74.9% (95%CI:72.8-76.9) for a "narrow" codeset comprising highly-specific codes. Adding confirmatory evidence, such as a CT scan, increased the PPV of our narrow code-based algorithm to 79.2% (95%CI:76.4-81.8) but reduced the sensitivity to under 10%. Adding evidence of hospitalisation to the standalone code-based algorithms also improved PPV, (PPV = 78.4 vs. 64.4%; sensitivity = 53.5% vs. 38.1%). IPF coding practices changed over time, with the increased use of specific IPF codes.ConclusionHigh diagnostic validity was achieved by using a restricted set of IPF codes. While adding confirmatory evidence increased diagnostic accuracy, the benefits of this approach need to be weighed against the inevitable loss of sample size and convenience. We would recommend use of an algorithm based on a broader IPF code set coupled with evidence of hospitalisation.

Project description:Importance:Inpatient violence remains a significant problem despite existing risk assessment methods. The lack of robustness and the high degree of effort needed to use current methods might be mitigated by using routinely registered clinical notes. Objective:To develop and validate a multivariable prediction model for assessing inpatient violence risk based on machine learning techniques applied to clinical notes written in patients' electronic health records. Design, Setting, and Participants:This prognostic study used retrospective clinical notes registered in electronic health records during admission at 2 independent psychiatric health care institutions in the Netherlands. No exclusion criteria for individual patients were defined. At site 1, all adults admitted between January 2013 and August 2018 were included, and at site 2 all adults admitted to general psychiatric wards between June 2016 and August 2018 were included. Data were analyzed between September 2018 and February 2019. Main Outcomes and Measures:Predictive validity and generalizability of prognostic models measured using area under the curve (AUC). Results:Clinical notes recorded during a total of 3189 admissions of 2209 unique individuals at site 1 (mean [SD] age, 34.0 [16.6] years; 1536 [48.2%] male) and 3253 admissions of 1919 unique individuals at site 2 (mean [SD] age, 45.9 [16.6] years; 2097 [64.5%] male) were analyzed. Violent outcome was determined using the Staff Observation Aggression Scale-Revised. Nested cross-validation was used to train and evaluate models that assess violence risk during the first 4 weeks of admission based on clinical notes available after 24 hours. The predictive validity of models was measured at site 1 (AUC = 0.797; 95% CI, 0.771-0.822) and site 2 (AUC = 0.764; 95% CI, 0.732-0.797). The validation of pretrained models in the other site resulted in AUCs of 0.722 (95% CI, 0.690-0.753) at site 1 and 0.643 (95% CI, 0.610-0.675) at site 2; the difference in AUCs between the internally trained model and the model trained on other-site data was significant at site 1 (AUC difference = 0.075; 95% CI, 0.045-0.105; P < .001) and site 2 (AUC difference = 0.121; 95% CI, 0.085-0.156; P < .001). Conclusions and Relevance:Internally validated predictions resulted in AUC values with good predictive validity, suggesting that automatic violence risk assessment using routinely registered clinical notes is possible. The validation of trained models using data from other sites corroborates previous findings that violence risk assessment generalizes modestly to different populations.

Dataset Information

Challenges in risk estimation using routinely collected clinical data: The example of estimating cervical cancer risks from electronic health-records.

Publications

Challenges in risk estimation using routinely collected clinical data: The example of estimating cervical cancer risks from electronic health-records.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets