Dataset Information

Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.

ABSTRACT: Models for predicting the probability of experiencing various health outcomes or adverse events over a certain time frame (e.g., having a heart attack in the next 5years) based on individual patient characteristics are important tools for managing patient care. Electronic health data (EHD) are appealing sources of training data because they provide access to large amounts of rich individual-level data from present-day patient populations. However, because EHD are derived by extracting information from administrative and clinical databases, some fraction of subjects will not be under observation for the entire time frame over which one wants to make predictions; this loss to follow-up is often due to disenrollment from the health system. For subjects without complete follow-up, whether or not they experienced the adverse event is unknown, and in statistical terms the event time is said to be right-censored. Most machine learning approaches to the problem have been relatively ad hoc; for example, common approaches for handling observations in which the event status is unknown include (1) discarding those observations, (2) treating them as non-events, (3) splitting those observations into two observations: one where the event occurs and one where the event does not. In this paper, we present a general-purpose approach to account for right-censored outcomes using inverse probability of censoring weighting (IPCW). We illustrate how IPCW can easily be incorporated into a number of existing machine learning algorithms used to mine big health care data including Bayesian networks, k-nearest neighbors, decision trees, and generalized additive models. We then show that our approach leads to better calibrated predictions than the three ad hoc approaches when applied to predicting the 5-year risk of experiencing a cardiovascular adverse event, using EHD from a large U.S. Midwestern healthcare system.

SUBMITTER: Vock DM

PROVIDER: S-EPMC4893987 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.

Vock David M DM Wolfson Julian J Bandyopadhyay Sunayan S Adomavicius Gediminas G Johnson Paul E PE Vazquez-Benitez Gabriela G O'Connor Patrick J PJ

Journal of biomedical informatics 20160316

Models for predicting the probability of experiencing various health outcomes or adverse events over a certain time frame (e.g., having a heart attack in the next 5years) based on individual patient characteristics are important tools for managing patient care. Electronic health data (EHD) are appealing sources of training data because they provide access to large amounts of rich individual-level data from present-day patient populations. However, because EHD are derived by extracting informatio ...[more]

PMID: 26992568

Similar Datasets

Project description:BackgroundIn the LATITUDE study (ClinicalTrials.gov, NCT01715285), compared with placebos, abiraterone acetate plus prednisone (AAP) with androgen deprivation therapy (ADT) provided significant overall survival (OS) benefit in high-risk metastatic castration-sensitive prostate cancer (mCSPC) patients. It is controversial whether survival benefits would remain if all patients in the placebo group subsequently received life-extending therapies.ObjectiveTo estimate treatment effect in the case of all patients in the placebo group receiving life-extending subsequent therapies.Design setting and participantsA post hoc analysis of LATITUDE final-analysis data was carried out (setting and participants have been reported previously).InterventionAAP or placebos plus ADT.Outcome measurements and statistical analysisWe applied the inverse probability of censoring weighting (IPCW) method to represent the situation in which all patients in the placebo group would have received life-extending subsequent therapies. The OS hazard ratio (HR) of AAP versus placebos and associated 95% confidence interval (CI) were estimated using a Cox proportional hazards model.Results and limitationsOf the 581 eligible patients in the placebo group, 237 (40.8%) did not receive life-extending subsequent therapies. From the unadjusted intention-to-treat analysis, the HR for OS for AAP versus placebos was 0.661 (95% CI 0.564-0.775). Using IPCW to adjust for patients in the placebo group without life-extending subsequent therapies, the HR was 0.732 (95% CI 0.604-0.887). A limitation is a lack of proof that the Cox proportional hazards model for the absence of life-extending subsequent therapy is correctly specified for the IPCW method.ConclusionsTreatment with AAP exerts OS benefit over placebos in high-risk mCSPC patients, regardless of whether life-extending subsequent therapy is given.Patient summaryIn a previous study, high-risk metastatic castration-sensitive prostate cancer patients who received abiraterone acetate plus prednisone (AAP) with androgen deprivation therapy generally survived longer than those given placebos. The benefit of adding AAP continues regardless of whether life-extending subsequent therapy is given.

Dataset Information

Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.

Publications

Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets