Dataset Information

Methodological variations in lagged regression for detecting physiologic drug effects in EHR data.

ABSTRACT: We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) differencing (lagged rates of change achieved by taking differences between consecutive measurements), (v) explanatory variables, and (vi) regression models) on performance of lagged linear methods in this context. We generated two gold standards (one knowledge-base derived, one expert-curated) for expected pairwise relationships between 7 drugs and 4 labs, and evaluated how the 64 unique combinations of methodological perturbations reproduce the gold standards. Our 28 cohorts included patients in the Columbia University Medical Center/NewYork-Presbyterian Hospital clinical database, and ranged from 2820 to 79,514 patients with between 8 and 209 average time points per patient. The most accurate methods achieved AUROC of 0.794 for knowledge-base derived gold standard (95%CI [0.741, 0.847]) and 0.705 for expert-curated gold standard (95% CI [0.629, 0.781]). We observed a mean AUROC of 0.633 (95%CI [0.610, 0.657], expert-curated gold standard) across all methods that re-parameterize time according to sequence and use either a joint autoregressive model with time-series differencing or an independent lag model without differencing. The complement of this set of methods achieved a mean AUROC close to 0.5, indicating the importance of these choices. We conclude that time-series analysis of EHR data will likely rely on some of the beneficial pre-processing and modeling methodologies identified, and will certainly benefit from continued careful analysis of methodological perturbations. This study found that methodological variations, such as pre-processing and representations, have a large effect on results, exposing the importance of thoroughly evaluating these components when comparing machine-learning methods.

SUBMITTER: Levine ME

PROVIDER: S-EPMC6207533 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Methodological variations in lagged regression for detecting physiologic drug effects in EHR data.

Levine Matthew E ME Albers David J DJ Hripcsak George G

Journal of biomedical informatics 20180830

We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) differencing (lagged rates of change achieved by taking differences between consecutive measurements), (v) explanatory variables, and (vi) regression models) on performance of lagged lin ...[more]

PMID: 30172760

Dataset Information

Methodological variations in lagged regression for detecting physiologic drug effects in EHR data.

Publications

Methodological variations in lagged regression for detecting physiologic drug effects in EHR data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Lagged WQS regression for mixtures with many components.
| S-EPMC7489300 | biostudies-literature

An Ordered Lasso and Sparse Time-Lagged Regression.
| S-EPMC10004099 | biostudies-literature

Detecting early physiologic changes through cardiac implantable electronic device data among patients with COVID-19.
| S-EPMC9349024 | biostudies-literature

Cox regression is robust to inaccurate EHR-extracted event time: an application to EHR-based GWAS.
| S-EPMC10060718 | biostudies-literature

Measurement error and outcome distributions: Methodological issues in regression analyses of behavioral coding data.
| S-EPMC4688045 | biostudies-literature

Modeling the health effects of time-varying complex environmental mixtures: Mean field variational Bayes for lagged kernel machine regression.
| S-EPMC6345544 | biostudies-literature

Most published meta-regression analyses based on aggregate data suffer from methodological pitfalls: a meta-epidemiological study.
| S-EPMC8207572 | biostudies-literature

Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression.
| S-EPMC8892984 | biostudies-literature

Detecting copy number variations from array CGH data based on a conditional random field model.
| S-EPMC3326659 | biostudies-literature

HiNT: a computational method for detecting copy number variations and translocations from Hi-C data.
| S-EPMC7087379 | biostudies-literature