AIPW: An R Package for Augmented Inverse Probability-Weighted Estimation of Average Causal Effects.
Ontology highlight
ABSTRACT: An increasing number of recent studies have suggested that doubly robust estimators with cross-fitting should be used when estimating causal effects with machine learning methods. However, not all existing programs that implement doubly robust estimators support machine learning methods and cross-fitting, or provide estimates on multiplicative scales. To address these needs, we developed AIPW, a software package implementing augmented inverse probability weighting (AIPW) estimation of average causal effects in R (R Foundation for Statistical Computing, Vienna, Austria). Key features of the AIPW package include cross-fitting and flexible covariate adjustment for observational studies and randomized controlled trials (RCTs). In this paper, we use a simulated RCT to illustrate implementation of the AIPW estimator. We also perform a simulation study to evaluate the performance of the AIPW package compared with other doubly robust implementations, including CausalGAM, npcausal, tmle, and tmle3. Our simulation showed that the AIPW package yields performance comparable to that of other programs. Furthermore, we also found that cross-fitting substantively decreases the bias and improves the confidence interval coverage for doubly robust estimators fitted with machine learning algorithms. Our findings suggest that the AIPW package can be a useful tool for estimating average causal effects with machine learning methods in RCTs and observational studies.
Project description:Commonly used semiparametric estimators of causal effects specify parametric models for the propensity score (PS) and the conditional outcome. An example is an augmented inverse probability weighting (IPW) estimator, frequently referred to as a doubly robust estimator, because it is consistent if at least one of the two models is correctly specified. However, in many observational studies, the role of the parametric models is often not to provide a representation of the data-generating process but rather to facilitate the adjustment for confounding, making the assumption of at least one true model unlikely to hold. In this paper, we propose a crude analytical approach to study the large-sample bias of estimators when the models are assumed to be approximations of the data-generating process, namely, when all models are misspecified. We apply our approach to three prototypical estimators of the average causal effect, two IPW estimators, using a misspecified PS model, and an augmented IPW (AIPW) estimator, using misspecified models for the outcome regression (OR) and the PS. For the two IPW estimators, we show that normalization, in addition to having a smaller variance, also offers some protection against bias due to model misspecification. To analyze the question of when the use of two misspecified models is better than one we derive necessary and sufficient conditions for when the AIPW estimator has a smaller bias than a simple IPW estimator and when it has a smaller bias than an IPW estimator with normalized weights. If the misspecification of the outcome model is moderate, the comparisons of the biases of the IPW and AIPW estimators show that the AIPW estimator has a smaller bias than the IPW estimators. However, all biases include a scaling with the PS-model error and we suggest caution in modeling the PS whenever such a model is involved. For numerical and finite sample illustrations, we include three simulation studies and corresponding approximations of the large-sample biases. In a dataset from the National Health and Nutrition Examination Survey, we estimate the effect of smoking on blood lead levels.
Project description:Missing data is a common occurrence in epidemiologic research. In this paper, 3 data sets with induced missing values from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are provided as examples of prototypical epidemiologic studies with missing data. Our goal was to estimate the association of maternal smoking behavior with spontaneous abortion while adjusting for numerous confounders. At the same time, we did not necessarily wish to evaluate the joint distribution among potentially unobserved covariates, which is seldom the subject of substantive scientific interest. The inverse probability weighting (IPW) approach preserves the semiparametric structure of the underlying model of substantive interest and clearly separates the model of substantive interest from the model used to account for the missing data. However, IPW often will not result in valid inference if the missing-data pattern is nonmonotone, even if the data are missing at random. We describe a recently proposed approach to modeling nonmonotone missing-data mechanisms under missingness at random to use in constructing the weights in IPW complete-case estimation, and we illustrate the approach using 3 data sets described in a companion article (Am J Epidemiol. 2018;187(3):568-575).
Project description:Modeling recurrent event data with multiple event types has drawn interest in recent biomedical studies due to its flexibility for understanding different risk factors for multiple recurrent event processes. However, in such data type, missing event type appears frequently because of various reasons such as recording ignorance or resource limitation. In this study, we aim to propose an inverse probability weighted estimation that is commonly used in the missing data literature to correct possibly biased estimation by a complete-case analysis. This approach is not limited to a specific form of the recurrent event model. We derive the large sample theory in a general form. We demonstrate that our approach can be applied to either multiplicative or additive rates model with practical sample size via comprehensive simulations. Nonmucoid and mucoid Pseudomonas aeruginosa infections of 14 888 patients in 2016 Cystic Fibrosis Foundation Patient Registry data are analyzed to show that, without including 12% events with missing event type in the analysis, several factors may be misidentified as risk factors for the nonmucoid type of infections.
Project description:Estimating population-level effects of a vaccine is challenging because there may be interference, that is, the outcome of one individual may depend on the vaccination status of another individual. Partial interference occurs when individuals can be partitioned into groups such that interference occurs only within groups. In the absence of interference, inverse probability weighted (IPW) estimators are commonly used to draw inference about causal effects of an exposure or treatment. Tchetgen Tchetgen and VanderWeele proposed a modified IPW estimator for causal effects in the presence of partial interference. Motivated by a cholera vaccine study in Bangladesh, this paper considers an extension of the Tchetgen Tchetgen and VanderWeele IPW estimator to the setting where the outcome is subject to right censoring using inverse probability of censoring weights (IPCW). Censoring weights are estimated using proportional hazards frailty models. The large sample properties of the IPCW estimators are derived, and simulation studies are presented demonstrating the estimators' performance in finite samples. The methods are then used to analyze data from the cholera vaccine study.
Project description:Traditional epidemiologic approaches allow us to compare counterfactual outcomes under 2 exposure distributions, usually 100% exposed and 100% unexposed. However, to estimate the population health effect of a proposed intervention, one may wish to compare factual outcomes under the observed exposure distribution to counterfactual outcomes under the exposure distribution produced by an intervention. Here, we used inverse probability weights to compare the 5-year mortality risk under observed antiretroviral therapy treatment plans to the 5-year mortality risk that would had been observed under an intervention in which all patients initiated therapy immediately upon entry into care among patients positive for human immunodeficiency virus in the US Centers for AIDS Research Network of Integrated Clinical Systems multisite cohort study between 1998 and 2013. Therapy-naïve patients (n = 14,700) were followed from entry into care until death, loss to follow-up, or censoring at 5 years or on December 31, 2013. The 5-year cumulative incidence of mortality was 11.65% under observed treatment plans and 10.10% under the intervention, yielding a risk difference of -1.57% (95% confidence interval: -3.08, -0.06). Comparing outcomes under the intervention with outcomes under observed treatment plans provides meaningful information about the potential consequences of new US guidelines to treat all patients with human immunodeficiency virus regardless of CD4 cell count under actual clinical conditions.
Project description:This article discusses the augmented inverse propensity weighted (AIPW) estimator as an estimator for average treatment effects. The AIPW combines both the properties of the regression-based estimator and the inverse probability weighted (IPW) estimator and is therefore a "doubly robust" method in that it requires only either the propensity or outcome model to be correctly specified but not both. Even though this estimator has been known for years, it is rarely used in practice. After explaining the estimator and proving the double robustness property, I conduct a simulation study to compare the AIPW efficiency with IPW and regression under different scenarios of misspecification. In 2 real-world examples, I provide a step-by-step guide on implementing the AIPW estimator in practice. I show that it is an easily usable method that extends the IPW to reduce variability and improve estimation accuracy.[Box: see text].
Project description:Doubly truncated data arise when event times are observed only if they fall within subject-specific, possibly random, intervals. While non-parametric methods for survivor function estimation using doubly truncated data have been intensively studied, only a few methods for fitting regression models have been suggested, and only for a limited number of covariates. In this article, we present a method to fit the Cox regression model to doubly truncated data with multiple discrete and continuous covariates, and describe how to implement it using existing software. The approach is used to study the association between candidate single nucleotide polymorphisms and age of onset of Parkinson's disease.
Project description:Evaluation of impact of potential uncontrolled confounding is an important component for causal inference based on observational studies. In this article, we introduce a general framework of sensitivity analysis that is based on inverse probability weighting. We propose a general methodology that allows both non-parametric and parametric analyses, which are driven by two parameters that govern the magnitude of the variation of the multiplicative errors of the propensity score and their correlations with the potential outcomes. We also introduce a specific parametric model that offers a mechanistic view on how the uncontrolled confounding may bias the inference through these parameters. Our method can be readily applied to both binary and continuous outcomes and depends on the covariates only through the propensity score that can be estimated by any parametric or non-parametric method. We illustrate our method with two medical data sets.
Project description:BackgroundAlthough many studies have established significant associations between short-term air pollution and the risk of getting cardiovascular diseases, there is a lack of evidence based on causal distributed lag modeling.MethodsInverse probability weighting (ipw) propensity score models along with conditional logistic outcome regression models based on a case-crossover study design were applied to get the causal unconstrained distributed (lag0-lag5) as well as cumulative lag effect of short-term exposure to PM2.5/Ozone on hospital admissions of acute myocardial infarction (AMI), congestive heart failure (CHF) and ischemic stroke (IS) among New England Medicare participants during 2000-2012. Effect modification by gender, race, secondary diagnosis of Chronic Obstructive Pulmonary Diseases (COPD) and Diabetes (DM) was explored.ResultsEach 10 ?g/m3 increase in lag0-lag5 cumulative PM2.5 exposure was associated with an increase of 4.3% (95% confidence interval: 2.2%, 6.4%, percentage change) in AMI hospital admission rate, an increase of 3.9% (2.4%, 5.5%) in CHF rate and an increase of 2.6% (0.4%, 4.7%) in IS rate. A weakened lagging effect of PM2.5 from lag0 to lag5 could be observed. No cumulative short-term effect of ozone on CVD was found. People with secondary diagnosis of COPD, diabetes, female gender and black race are sensitive population.ConclusionsBased on our causal distributed lag modeling, we found that short-term exposure to an increased ambient PM2.5 level had the potential to induce higher risk of CVD hospitalization in a causal way. More attention should be paid to population of COPD, diabetes, female gender and black race.
Project description:In life history studies, interest often lies in the analysis of the interevent, or gap times and the association between event times. Gap time analyses are challenging however, even when the length of follow-up is determined independently of the event process, because associations between gap times induce dependent censoring for second and subsequent gap times. This article discusses nonparametric estimation of the association between consecutive gap times based on Kendall's ? in the presence of this type of dependent censoring. A nonparametric estimator that uses inverse probability of censoring weights is provided. Estimates of conditional gap time distributions can be obtained following specification of a particular copula function. Simulation studies show the estimator performs well and compares favorably with an alternative estimator. Generalizations to a piecewise constant Clayton copula are given. Several simulation studies and illustrations with real data sets are also provided.