Project description:In causal inference, parametric models are usually employed to address causal questions estimating the effect of interest. However, parametric models rely on the correct model specification assumption that, if not met, leads to biased effect estimates. Correct model specification is challenging, especially in high-dimensional settings. Incorporating Machine Learning (ML) into causal analyses may reduce the bias arising from model misspecification, since ML methods do not require the specification of a functional form of the relationship between variables. However, when ML predictions are directly plugged in a predefined formula of the effect of interest, there is the risk of introducing a "plug-in bias" in the effect measure. To overcome this problem and to achieve useful asymptotic properties, new estimators that combine the predictive potential of ML and the ability of traditional statistical methods to make inference about population parameters have been proposed. For epidemiologists interested in taking advantage of ML for causal inference investigations, we provide an overview of three estimators that represent the current state-of-art, namely Targeted Maximum Likelihood Estimation (TMLE), Augmented Inverse Probability Weighting (AIPW) and Double/Debiased Machine Learning (DML).
Project description:Systems models, which by design aim to capture multi-level complexity, are a natural choice of tool for bridging the divide between social epidemiology and causal inference. In this commentary, we discuss the potential uses of complex systems models for improving our understanding of quantitative causal effects in social epidemiology. To put systems models in context, we will describe how this approach could be used to optimise the distribution of COVID-19 response resources to minimise social inequalities during and after the pandemic.
Project description:Purpose of reviewWe review the application and limitations of two implementations of the "case-only design" in injury epidemiology with example analyses of Fatality Analysis Reporting System data.Recent findingsThe term "case-only design" covers a variety of epidemiologic designs; here, two implementations of the design are reviewed: (1) studies to uncover etiological heterogeneity and (2) studies to measure exposure effect modification. These two designs produce results that require different interpretations and rely upon different assumptions. The key assumption of case-only designs for exposure effect modification, the more commonly used of the two designs, does not commonly hold for injuries and so results from studies using this design cannot be interpreted. Case-only designs to identify etiological heterogeneity in injury risk are interpretable but only when the case-series is conceptualized as arising from an underlying cohort.SummaryThe results of studies using case-only designs are commonly misinterpreted in the injury literature.
Project description:Calls for the adoption of complex systems approaches, including agent-based modeling, in the field of epidemiology have largely centered on the potential for such methods to examine complex disease etiologies, which are characterized by feedback behavior, interference, threshold dynamics, and multiple interacting causal effects. However, considerable theoretical and practical issues impede the capacity of agent-based methods to examine and evaluate causal effects and thus illuminate new areas for intervention. We build on this work by describing how agent-based models can be used to simulate counterfactual outcomes in the presence of complexity. We show that these models are of particular utility when the hypothesized causal mechanisms exhibit a high degree of interdependence between multiple causal effects and when interference (i.e., one person's exposure affects the outcome of others) is present and of intrinsic scientific interest. Although not without challenges, agent-based modeling (and complex systems methods broadly) represent a promising novel approach to identify and evaluate complex causal effects, and they are thus well suited to complement other modern epidemiologic methods of etiologic inquiry.
Project description:We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as "universal." The method is very simple and is based on a modified version of the usual likelihood-ratio statistic that we call "the split likelihood-ratio test" (split LRT) statistic. The (limiting) null distribution of the classical likelihood-ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum-likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid P values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.
Project description:Difference-in-differences (DiD) estimators provide unbiased treatment effect estimates when, in the absence of treatment, the average outcomes for the treated and control groups would have followed parallel trends over time. This assumption is implausible in many settings. An alternative assumption is that the potential outcomes are independent of treatment status, conditional on past outcomes. This paper considers three methods that share this assumption: the synthetic control method, a lagged dependent variable (LDV) regression approach, and matching on past outcomes. Our motivating empirical study is an evaluation of a hospital pay-for-performance scheme in England, the best practice tariffs programme. The conclusions of the original DiD analysis are sensitive to the choice of approach. We conduct a Monte Carlo simulation study that investigates these methods' performance. While DiD produces unbiased estimates when the parallel trends assumption holds, the alternative approaches provide less biased estimates of treatment effects when it is violated. In these cases, the LDV approach produces the most efficient and least biased estimates.
Project description:BackgroundSeveral approaches are commonly used to estimate the effect of diet on changes of various intermediate disease markers in prospective studies, including "change-score analysis", "concurrent change-change analysis" and "lagged change-change analysis". Although empirical evidence suggests that concurrent change-change analysis is most robust, consistent, and biologically plausible, in-depth dissection and comparison of these approaches from a causal inference perspective is lacking. We intend to explicitly elucidate and compare the underlying causal model, causal estimand and interpretation of these approaches, intuitively illustrate it with directed acyclic graph (DAG), and further clarify strengths and limitations of the recommended concurrent change-change analysis through simulations.MethodsCausal model and DAG are deployed to clarify the causal estimand and interpretation of each approach theoretically. Monte Carlo simulation is used to explore the performance of distinct approaches under different extents of time-invariant heterogeneity and the performance of concurrent change-change analysis when its causal identification assumptions are violated.ResultsConcurrent change-change analysis targets the contemporaneous effect of exposure on outcome (measured at the same survey wave), which is more relevant and plausible in studying the associations of diet and intermediate biomarkers in prospective studies, while change-score analysis and lagged change-change analysis target the effect of exposure on outcome after one-period timespan (typically several years). Concurrent change-change analysis always yields unbiased estimates even with severe unobserved time-invariant confounding, while the other two approaches are always biased even without time-invariant heterogeneity. However, concurrent change-change analysis produces almost linearly increasing estimation bias with violation of its causal identification assumptions becoming more serious.ConclusionsConcurrent change-change analysis might be the most superior method in studying the diet and intermediate biomarkers in prospective studies, which targets the most plausible estimand and circumvents the bias from unobserved individual heterogeneity. Importantly, careful examination of the vital identification assumptions behind it should be underscored before applying this promising method.
Project description:ObjectiveTo evaluate the effects of the parent-reported medical home status on health care utilization, expenditures, and quality for children.Data sourcesMedical Expenditure Panel Survey (MEPS) during 2004-2012, including a total of 9,153 children who were followed up for 2 years in the survey.Study designWe took a causal difference-in-differences approach using inverse probability weighting and doubly robust estimators to study how changes in medical home status over a 2-year period affected children's health care outcomes. Our analysis adjusted for children's sociodemographic, health, and insurance statuses. We conducted sensitivity analyses using alternative statistical methods, different approaches to outliers and missing data, and accounting for possible common-method biases.Principal findingsCompared with children whose parents reported having medical homes in both years 1 and 2, those who had medical homes in year 1 but lost them in year 2 had significantly lower parent-reported ratings of health care quality and higher utilization of emergency care. Compared with children whose parents reported having no medical homes in both years, those who did not have medical homes in year 1 but gained them in year 2 had significantly higher ratings of health care quality, but no significant differences in health care expenditures and utilization.ConclusionsHaving a medical home may help improve health care quality for children; losing a medical home may lead to higher utilization of emergency care.