Dataset Information

Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies.

ABSTRACT: Multiple imputation (MI) is a well-established method for dealing with missing data. MI is computationally intensive when imputing missing covariates with high-dimensional outcome data (e.g., DNA methylation data in epigenome-wide association studies (EWAS)), because every outcome variable must be included in the imputation model to avoid biasing associations towards the null. Instead, EWAS analyses are reduced to only complete cases, limiting statistical power and potentially causing bias. We used simulations to compare 5 MI methods for high-dimensional data under 2 missingness mechanisms. All imputation methods had increased power over complete-case (C-C) analyses. Imputing missing values separately for each variable was computationally inefficient, but dividing sites at random into evenly sized bins improved efficiency and gave low bias. Methods imputing solely using subsets of sites identified by the C-C analysis suffered from bias towards the null. However, if these subsets were added into random bins of sites, this bias was reduced. The optimal methods were applied to an EWAS with missingness in covariates. All methods identified additional sites over the C-C analysis, and many of these sites had been replicated in other studies. These methods are also applicable to other high-dimensional data sets, including the rapidly expanding area of "-omics" studies.

SUBMITTER: Mills HL

PROVIDER: S-EPMC6825836 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies.

Mills Harriet L HL Heron Jon J Relton Caroline C Suderman Matt M Tilling Kate K

American journal of epidemiology 20191101 11

Multiple imputation (MI) is a well-established method for dealing with missing data. MI is computationally intensive when imputing missing covariates with high-dimensional outcome data (e.g., DNA methylation data in epigenome-wide association studies (EWAS)), because every outcome variable must be included in the imputation model to avoid biasing associations towards the null. Instead, EWAS analyses are reduced to only complete cases, limiting statistical power and potentially causing bias. We u ...[more]

PMID: 31504104

Dataset Information

Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies.

Publications

Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Updates to data versions and analytic methods influence the reproducibility of results from epigenome-wide association studies.
| S-EPMC9601563 | biostudies-literature

A semiparametric missing-data-induced intensity method for missing covariate data in individually matched case-control studies.
| S-EPMC2910202 | biostudies-literature

Estimands in epigenome-wide association studies.
| S-EPMC8086103 | biostudies-literature

A comparative analysis of cell-type adjustment methods for epigenome-wide association studies based on simulated and real data sets.
| S-EPMC6954449 | biostudies-literature

Genome-wide association studies and epigenome-wide association studies go together in cancer control.
| S-EPMC5551540 | biostudies-literature

Covariate adaptive familywise error rate control for genome-wide association studies.
| S-EPMC8598971 | biostudies-literature

Covariate-modulated local false discovery rate for genome-wide association studies.
| S-EPMC4103587 | biostudies-literature

The EpiDiverse Plant Epigenome-Wide Association Studies (EWAS) Pipeline.
| S-EPMC8594691 | biostudies-literature

Epigenome-wide association studies: current knowledge, strategies and recommendations.
| S-EPMC8645110 | biostudies-literature

Epigenome-wide association studies of allergic disease and the environment.
| S-EPMC10564109 | biostudies-literature