Dataset Information

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

ABSTRACT:

Motivation

Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available.

Results

In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses.

Availability and implementation

Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/.

Contact

bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu

Supplementary information

Supplementary materials are available at Bioinformatics online.

SUBMITTER: Pasaniuc B

PROVIDER: S-EPMC4184260 | biostudies-literature | 2014 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

Pasaniuc Bogdan B Zaitlen Noah N Shi Huwenbo H Bhatia Gaurav G Gusev Alexander A Pickrell Joseph J Hirschhorn Joel J Strachan David P DP Patterson Nick N Price Alkes L AL

Bioinformatics (Oxford, England) 20140701 20

<h4>Motivation</h4>Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available.<h4>Results</h4>In simulations using 1000 Genomes (1000G) data, this method r ...[more]

PMID: 24990607

Dataset Information

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

Motivation

Results

Availability and implementation

Contact

Supplementary information

Publications

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

RAISS: robust and accurate imputation from summary statistics.
| S-EPMC6853677 | biostudies-literature

Accurate and adaptive imputation of summary statistics in mixed-ethnicity cohorts.
| S-EPMC6129295 | biostudies-literature

DISSCO: direct imputation of summary statistics allowing covariates.
| S-EPMC4514926 | biostudies-literature

DIST: direct imputation of summary statistics for unmeasured SNPs.
| S-EPMC3810851 | biostudies-literature

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.
| S-EPMC4576696 | biostudies-literature

LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.
| S-EPMC4632058 | biostudies-literature

Deep Learning Enables Fast and Accurate Imputation of Gene Expression.
| S-EPMC8076954 | biostudies-literature

LDAK-GBAT: Fast and powerful gene-based association testing using summary statistics.
| S-EPMC9892699 | biostudies-literature

AMAS: a fast tool for alignment manipulation and computing of summary statistics.
| S-EPMC4734057 | biostudies-literature

FISH: fast and accurate diploid genotype imputation via segmental hidden Markov model.
| S-EPMC4071209 | biostudies-literature