Project description:Gene expression profiles of clinical cohorts can be used to identify genes that are correlated with a clinical variable of interest such as patient outcome or response to a particular drug. However, expression measurements are susceptible to technical bias caused by variation in extraneous factors such as RNA quality and array hybridization conditions. If such technical bias is correlated with the clinical variable of interest, the likelihood of identifying false positive genes is increased. Here we describe a method to visualize an expression matrix as a projection of all genes onto a plane defined by a clinical variable and a technical nuisance variable. The resulting plot indicates the extent to which each gene is correlated with the clinical variable or the technical variable. We demonstrate this method by applying it to three clinical trial microarray data sets, one of which identified genes that may have been driven by a confounding technical variable. This approach can be used as a quality control step to identify data sets that are likely to yield false positive results.
Project description:Purpose of review:Violence prevention research has enhanced our understanding of individual and community risk and protective factors for aggression and violence. However, our knowledge of risk and protective factors for violence is highly dependent on observational studies, since there are few randomized trials of risk and protective factors for violence. Observational studies are susceptible to systematic errors, specifically confounding, and may lack internal validity. Recent findings:Many violence prevention studies utilize methods that do not correctly identify the set of covariates needed for statistical adjustment. This results in unwarranted matching and restriction leading to further confounding or selection bias. Covariate adjustment based on purely statistical criteria generates inconsistent results and uncertain conclusions. Summary:Conventional methods used to identify confounding in violence prevention research are often inadequate. Causal diagrams have potential to improve the understanding and identification of potential confounding biases in observational violence prevention studies, and methods like sensitivity analysis using quantitative bias analysis can help to address unmeasured confounding. Violence research studies should make more use of these methods.
Project description:BackgroundThe lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded.ResultsThe test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases.ConclusionsThe proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers.
Project description:Drawing causal inference with observational studies is the central pillar of many disciplines. One sufficient condition for identifying the causal effect is that the treatment-outcome relationship is unconfounded conditional on the observed covariates. It is often believed that the more covariates we condition on, the more plausible this unconfoundedness assumption is. This belief has had a huge impact on practical causal inference, suggesting that we should adjust for all pretreatment covariates. However, when there is unmeasured confounding between the treatment and outcome, estimators adjusting for some pretreatment covariate might have greater bias than estimators without adjusting for this covariate. This kind of covariate is called a bias amplifier, and includes instrumental variables that are independent of the confounder, and affect the outcome only through the treatment. Previously, theoretical results for this phenomenon have been established only for linear models. We fill in this gap in the literature by providing a general theory, showing that this phenomenon happens under a wide class of models satisfying certain monotonicity assumptions. We further show that when the treatment follows an additive or multiplicative model conditional on the instrumental variable and the confounder, these monotonicity assumptions can be interpreted as the signs of the arrows of the causal diagrams.
Project description:We present results that allow the researcher in certain cases to determine the direction of the bias that arises when control for confounding is inadequate. The results are given within the context of the directed acyclic graph causal framework and are stated in terms of signed edges. Rigorous definitions for signed edges are provided. We describe cases in which intuition concerning signed edges fails and we characterize the directed acyclic graphs that researchers can use to draw conclusions about the sign of the bias of unmeasured confounding. If there is only one unmeasured confounding variable on the graph, then nonincreasing or nondecreasing average causal effects suffice to draw conclusions about the direction of the bias. When there are more than one unmeasured confounding variable, nonincreasing and nondecreasing average causal effects can be used to draw conclusions only if the various unmeasured confounding variables are independent of one another conditional on the measured covariates. When this conditional independence property does not hold, stronger notions of monotonicity are needed to draw conclusions about the direction of the bias.
Project description:BackgroundConfounding bias is a common concern in epidemiological research. Its presence is often determined by comparing exposure effects between univariable- and multivariable regression models, using an arbitrary threshold of a 10% difference to indicate confounding bias. However, many clinical researchers are not aware that the use of this change-in-estimate criterion may lead to wrong conclusions when applied to logistic regression coefficients. This is due to a statistical phenomenon called noncollapsibility, which manifests itself in logistic regression models. This paper aims to clarify the role of noncollapsibility in logistic regression and to provide guidance in determining the presence of confounding bias.MethodsA Monte Carlo simulation study was designed to uncover patterns of confounding bias and noncollapsibility effects in logistic regression. An empirical data example was used to illustrate the inability of the change-in-estimate criterion to distinguish confounding bias from noncollapsibility effects.ResultsThe simulation study showed that, depending on the sign and magnitude of the confounding bias and the noncollapsibility effect, the difference between the effect estimates from univariable- and multivariable regression models may underestimate or overestimate the magnitude of the confounding bias. Because of the noncollapsibility effect, multivariable regression analysis and inverse probability weighting provided different but valid estimates of the confounder-adjusted exposure effect. In our data example, confounding bias was underestimated by the change in estimate due to the presence of a noncollapsibility effect.ConclusionIn logistic regression, the difference between the univariable- and multivariable effect estimate might not only reflect confounding bias but also a noncollapsibility effect. Ideally, the set of confounders is determined at the study design phase and based on subject matter knowledge. To quantify confounding bias, one could compare the unadjusted exposure effect estimate and the estimate from an inverse probability weighted model.
Project description:Genetically informative research designs are becoming increasingly popular as a way to strengthen causal inference with their ability to control for genetic and shared environmental confounding. Co-twin control (CTC) models, a special case of these designs using twin samples, decompose the overall effect of exposure on outcome into a within- and between-twin-pair term. Ideally, the within-twin-pair term would serve as an estimate of the exposure effect controlling for genetic and shared environmental factors, but it is often confounded by factors not shared within a twin-pair. Previous simulation work has shown that if twins are less similar on an unmeasured confounder than they are on an exposure, the within-twin-pair estimate will be a biased estimate of the exposure effect, even more biased than the individual, unpaired estimate. The current study uses simulation and analytical derivations to show that while incorporating a covariate related to the nonshared confounder in CTC models always reduces bias in the within-pair estimate, it will be less biased than the individual estimate only in a narrow set of circumstances. The best case for bias reduction in the within-pair estimate occurs when the within-twin-pair correlation in exposure is less than the correlation in the confounder and the twin-pair correlation in the covariate is high. Additionally, the form of covariate inclusion is compared between adjustment for only one's own covariate value and adjustment for the deviation of one's own value from the covariate twin-pair mean. Results show that adjusting for the deviation from the twin-pair mean results in equal or reduced bias.
Project description:Recommendations for reporting instrumental variable analyses often include presenting the balance of covariates across levels of the proposed instrument and levels of the treatment. However, such presentation can be misleading as relatively small imbalances among covariates across levels of the instrument can result in greater bias because of bias amplification. We introduce bias plots and bias component plots as alternative tools for understanding biases in instrumental variable analyses. Using previously published data on proposed preference-based, geography-based, and distance-based instruments, we demonstrate why presenting covariate balance alone can be problematic, and how bias component plots can provide more accurate context for bias from omitting a covariate from an instrumental variable versus non-instrumental variable analysis. These plots can also provide relevant comparisons of different proposed instruments considered in the same data. Adaptable code is provided for creating the plots.
Project description:Noncausal associations between exposures and outcomes are a threat to validity of causal inference in observational studies. Many techniques have been developed for study design and analysis to identify and eliminate such errors. Such problems are not expected to compromise experimental studies, where careful standardization of conditions (for laboratory work) and randomization (for population studies) should, if applied properly, eliminate most such noncausal associations. We argue, however, that a routine precaution taken in the design of biologic laboratory experiments--the use of "negative controls"--is designed to detect both suspected and unsuspected sources of spurious causal inference. In epidemiology, analogous negative controls help to identify and resolve confounding as well as other sources of error, including recall bias or analytic flaws. We distinguish 2 types of negative controls (exposure controls and outcome controls), describe examples of each type from the epidemiologic literature, and identify the conditions for the use of such negative controls to detect confounding. We conclude that negative controls should be more commonly employed in observational studies, and that additional work is needed to specify the conditions under which negative controls will be sensitive detectors of other sources of error in observational studies.