Dataset Information

ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.

ABSTRACT: Data from electronic health records (EHR) are prone to errors, which are often correlated across multiple variables. The error structure is further complicated when analysis variables are derived as functions of two or more error-prone variables. Such errors can substantially impact estimates, yet we are unaware of methods that simultaneously account for errors in covariates and time-to-event outcomes. Using EHR data from 4217 patients, the hazard ratio for an AIDS-defining event associated with a 100 cell/mm³ increase in CD4 count at ART initiation was 0.74 (95%CI: 0.68-0.80) using unvalidated data and 0.60 (95%CI: 0.53-0.68) using fully validated data. Our goal is to obtain unbiased and efficient estimates after validating a random subset of records. We propose fitting discrete failure time models to the validated subsample and then multiply imputing values for unvalidated records. We demonstrate how this approach simultaneously addresses dependent errors in predictors, time-to-event outcomes, and inclusion criteria. Using the fully validated dataset as a gold standard, we compare the mean squared error of our estimates with those from the unvalidated dataset and the corresponding subsample-only dataset for various subsample sizes. By incorporating reasonably sized validated subsamples and appropriate imputation models, our approach had improved estimation over both the naive analysis and the analysis using only the validation subsample.

SUBMITTER: Giganti MJ

PROVIDER: S-EPMC7523695 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.

Giganti Mark J MJ Shaw Pamela A PA Chen Guanhua G Bebawy Sally S SS Turner Megan M MM Sterling Timothy R TR Shepherd Bryan E BE

The annals of applied statistics 20200629 2

Data from electronic health records (EHR) are prone to errors, which are often correlated across multiple variables. The error structure is further complicated when analysis variables are derived as functions of two or more error-prone variables. Such errors can substantially impact estimates, yet we are unaware of methods that simultaneously account for errors in covariates and time-to-event outcomes. Using EHR data from 4217 patients, the hazard ratio for an AIDS-defining event associated with ...[more]

PMID: 32999698

Similar Datasets

Project description:ObjectiveTo create an electronic frailty index (eFRAGICAP) using electronic health records (EHR) in Catalunya (Spain) and assess its predictive validity with a two-year follow-up of the outcomes: homecare need, institutionalization and mortality in the elderly. Additionally, to assess its concurrent validity compared to other standardized measures: the Clinical Frailty Scale (CFS) and the Risk Instrument for Screening in the Community (RISC).MethodsThe eFRAGICAP was based on the electronic frailty index (eFI) developed in United Kingdom, and includes 36 deficits identified through clinical diagnoses, prescriptions, physical examinations, and questionnaires registered in the EHR of primary health care centres (PHC). All subjects > 65 assigned to a PHC in Barcelona on 1st January, 2016 were included. Subjects were classified according to their eFRAGICAP index as: fit, mild, moderate or severe frailty. Predictive validity was assessed comparing results with the following outcomes: institutionalization, homecare need, and mortality at 24 months. Concurrent validation of the eFRAGICAP was performed with a sample of subjects (n = 333) drawn from the global cohort and the CFS and RISC. Discrimination and calibration measures for the outcomes of institutionalization, homecare need, and mortality and frailty scales were calculated.Results253,684 subjects had their eFRAGICAP index calculated. Mean age was 76.3 years (59.5% women). Of these, 41.1% were classified as fit, and 32.2% as presenting mild, 18.7% moderate, and 7.9% severe frailty. The mean age of the subjects included in the validation subsample (n = 333) was 79.9 years (57.7% women). Of these, 12.6% were classified as fit, and 31.5% presented mild, 39.6% moderate, and 16.2% severe frailty. Regarding the outcome analyses, the eFRAGICAP was good in the detection of subjects who were institutionalized, required homecare assistance, or died at 24 months (c-statistic of 0.841, 0.853, and 0.803, respectively). eFRAGICAP was also good in the detection of frail subjects compared to the CFS (AUC 0.821) and the RISC (AUC 0.848).ConclusionThe eFRAGICAP has a good discriminative capacity to identify frail subjects compared to other frailty scales and predictive outcomes.

Dataset Information

ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.

Publications

ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets