Dataset Information

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.

ABSTRACT: In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare.

SUBMITTER: Antonelli J

PROVIDER: S-EPMC5862356 | biostudies-literature | 2017 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.

Antonelli Joseph J Zigler Corwin C Dominici Francesca F

Biostatistics (Oxford, England) 20170701 3

In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates i ...[more]

PMID: 28334230

Similar Datasets

Project description:BackgroundReal-world COVID-19 vaccine effectiveness (VE) studies are investigating exposures of increasing complexity accounting for time since vaccination. These studies require methods that adjust for the confounding that arises when morbidities and demographics are associated with vaccination and the risk of outcome events. Methods based on propensity scores (PS) are well-suited to this when the exposure is dichotomous, but present challenges when the exposure is multinomial.ObjectiveThis simulation study aimed to investigate alternative methods to adjust for confounding in VE studies that have a test-negative design.MethodsAdjustment for a disease risk score (DRS) is compared with multivariable logistic regression. Both stratification on the DRS and direct covariate adjustment of the DRS are examined. Multivariable logistic regression with all the covariates and with a limited subset of key covariates is considered. The performance of VE estimators is evaluated across a multinomial vaccination exposure in simulated datasets.ResultsBias in VE estimates from multivariable models ranged from -5.3% to 6.1% across 4 levels of vaccination. Standard errors of VE estimates were unbiased, and 95% coverage probabilities were attained in most scenarios. The lowest coverage in the multivariable scenarios was 93.7% (95% CI 92.2%-95.2%) and occurred in the multivariable model with key covariates, while the highest coverage in the multivariable scenarios was 95.3% (95% CI 94.0%-96.6%) and occurred in the multivariable model with all covariates. Bias in VE estimates from DRS-adjusted models was low, ranging from -2.2% to 4.2%. However, the DRS-adjusted models underestimated the standard errors of VE estimates, with coverage sometimes below the 95% level. The lowest coverage in the DRS scenarios was 87.8% (95% CI 85.8%-89.8%) and occurred in the direct adjustment for the DRS model. The highest coverage in the DRS scenarios was 94.8% (95% CI 93.4%-96.2%) and occurred in the model that stratified on DRS. Although variation in the performance of VE estimates occurred across modeling strategies, variation in performance was also present across exposure groups.ConclusionsOverall, models using a DRS to adjust for confounding performed adequately but not as well as the multivariable models that adjusted for covariates individually.

Project description:BackgroundGene expression profiling studies of mastitis in ruminants have provided key but fragmented knowledge for the understanding of the disease. A systematic combination of different expression profiling studies via meta-analysis techniques has the potential to test the extensibility of conclusions based on single studies. Using the program Pointillist, we performed meta-analysis of transcription-profiling data from six independent studies of infections with mammary gland pathogens, including samples from cattle challenged in vivo with S. aureus, E. coli, and S. uberis, samples from goats challenged in vivo with S. aureus, as well as cattle macrophages and ovine dendritic cells infected in vitro with S. aureus. We combined different time points from those studies, testing different responses to mastitis infection: overall (common signature), early stage, late stage, and cattle-specific.ResultsIngenuity Pathway Analysis of affected genes showed that the four meta-analysis combinations share biological functions and pathways (e.g. protein ubiquitination and polyamine regulation) which are intrinsic to the general disease response. In the overall response, pathways related to immune response and inflammation, as well as biological functions related to lipid metabolism were altered. This latter observation is consistent with the milk fat content depression commonly observed during mastitis infection. Complementarities between early and late stage responses were found, with a prominence of metabolic and stress signals in the early stage and of the immune response related to the lipid metabolism in the late stage; both mechanisms apparently modulated by few genes, including XBP1 and SREBF1.The cattle-specific response was characterized by alteration of the immune response and by modification of lipid metabolism. Comparison of E. coli and S. aureus infections in cattle in vivo revealed that affected genes showing opposite regulation had the same altered biological functions and provided evidence that E. coli caused a stronger host response.ConclusionsThis meta-analysis approach reinforces previous findings but also reveals several novel themes, including the involvement of genes, biological functions, and pathways that were not identified in individual studies. As such, it provides an interesting proof of principle for future studies combining information from diverse heterogeneous sources.

Dataset Information

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.

Publications

Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets