Unknown

Dataset Information

0

A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank.


ABSTRACT: The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G × E) effects. Compared with marginal genetic association studies, G × E analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G × E effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G × E analysis), that is applicable for genome-wide scale phenome-wide G × E studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.

SUBMITTER: Bi W 

PROVIDER: S-EPMC6904814 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank.

Bi Wenjian W   Zhao Zhangchen Z   Dey Rounak R   Fritsche Lars G LG   Mukherjee Bhramar B   Lee Seunggeun S  

American journal of human genetics 20191114 6


The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G × E) effects. Compared with marginal genetic association studies, G × E analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G × E effects. However, due t  ...[more]

Similar Datasets

| S-EPMC7413891 | biostudies-literature
| S-EPMC5400281 | biostudies-literature
| S-EPMC6699064 | biostudies-literature
| S-EPMC4925291 | biostudies-literature
| S-EPMC9364550 | biostudies-literature
| S-EPMC10246137 | biostudies-literature
| S-EPMC5837456 | biostudies-literature
| S-EPMC7641476 | biostudies-literature
| S-EPMC7210889 | biostudies-literature
| S-EPMC7853505 | biostudies-literature