Dataset Information

Estimating effect sizes in genome-wide association studies.

ABSTRACT: Knowledge about the proportion of markers without effects (p( 0 )) and the effect sizes in large scale genetic studies is important to understand the basic properties of the data and for applications such as the control of false discoveries and designing adequately powered replication studies. Many p(0) estimators have been proposed. However, high dimensional data sets typically comprise a large range of effect sizes and it is unclear whether the estimated p(0) is related to the whole range, including markers with very small effects, or just the markers with large effects. In this article we develop an estimation procedure that can be used in all scenarios where the test statistic distribution under the alternative can be characterized by a single parameter (e.g. non-centrality parameter of the non-central chi-square or F distribution). The estimation procedure starts with estimating the largest effect in the data set, then the second largest effect, then the third largest effect, etc. We stop when the effect sizes become so small that they cannot be estimated precisely anymore for the given sample size. Once the individual effect sizes are estimated, they can be used to calculate an interpretable estimate of p(0). Thus, our method results in both an interpretable estimate of p(0) as well as estimates of the effect sizes present in the whole marker set by repeatedly estimating a single parameter. Simulations suggest that the effects are estimated precisely with only a small upward bias. The R codes that compute the effect estimates are freely downloadable from the website: http://www.people.vcu.edu/~jbukszar/.

SUBMITTER: Bukszar J

PROVIDER: S-EPMC3923086 | biostudies-literature | 2010 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Estimating effect sizes in genome-wide association studies.

Bukszár József J van den Oord Edwin J C G EJ

Behavior genetics 20100106 3

Knowledge about the proportion of markers without effects (p( 0 )) and the effect sizes in large scale genetic studies is important to understand the basic properties of the data and for applications such as the control of false discoveries and designing adequately powered replication studies. Many p(0) estimators have been proposed. However, high dimensional data sets typically comprise a large range of effect sizes and it is unclear whether the estimated p(0) is related to the whole range, inc ...[more]

PMID: 20052610

Similar Datasets

Project description:Brain-wide association studies (BWAS) are a fundamental tool in discovering brain-behavior associations. Several recent studies showed that thousands of study participants are required to improve the replicability of BWAS because actual effect sizes are much smaller than those reported in smaller studies. Here, we perform analyses and meta-analyses of a robust effect size index (RESI) using 63 longitudinal and cross-sectional magnetic resonance imaging studies (77,695 total scans) to demonstrate that optimizing study design is an important way to improve standardized effect sizes and replicability in BWAS. A meta-analysis of brain volume associations with age indicates that BWAS with larger covariate variance have larger effect size estimates and that the longitudinal studies we examined have systematically larger standardized effect sizes than cross-sectional studies. We propose a cross-sectional RESI to adjust for the systematic difference in effect sizes between cross-sectional and longitudinal studies that allows investigators to quantify the benefit of conducting their study longitudinally. Analyzing age effects on global and regional brain measures in the Lifespan Brain Chart Consortium, we show that modifying longitudinal study design to increase between-subject variability and adding a single additional longitudinal measurement per subject improves effect sizes. However, evaluating these longitudinal sampling schemes on cognitive, psychopathology, and demographic associations with structural and functional brain outcome measures in the Adolescent Brain and Cognitive Development dataset shows that longitudinal studies can, counterintuitively, be detrimental to effect sizes. We demonstrate that the benefit of conducting longitudinal studies depends on the strengths of the between- and within-subject associations of the brain and non-brain measures. Explicitly modeling between- and within-subject effects avoids conflating the effects and allows optimizing effect sizes for them separately. These findings underscore the importance of considering design features in BWAS and emphasize that increasing sample size is not the only approach to improve the replicability of BWAS.

Project description:BackgroundRecently mixed linear models are used to address the issue of "missing" heritability in traditional Genome-wide association studies (GWAS). The models assume that all single-nucleotide polymorphisms (SNPs) are associated with the phenotypes of interest. However, it is more common that only a small proportion of SNPs have significant effects on the phenotypes, while most SNPs have no or very small effects. To incorporate this feature, we propose an efficient Hierarchical Bayesian Model (HBM) that extends the existing mixed models to enforce automatic selection of significant SNPs. The HBM models the SNP effects using a mixture distribution of a point mass at zero and a normal distribution, where the point mass corresponds to those non-associative SNPs.ResultsWe estimate the HBM using Gibbs sampling. The estimation performance of our method is first demonstrated through two simulation studies. We make the simulation setups realistic by using parameters fitted on the Framingham Heart Study (FHS) data. The simulation studies show that our method can accurately estimate the proportion of SNPs associated with the simulated phenotype and identify these SNPs, as well as adapt to certain model mis-specification than the standard mixed models. In addition, we analyze data from the FHS and the Health and Retirement Study (HRS) to study the association between Body Mass Index (BMI) and SNPs on Chromosome 16, and replicate the identified genetic associations. The analysis of the FHS data identifies 0.3% SNPs on Chromosome 16 that affect BMI, including rs9939609 and rs9939973 on the FTO gene. These two SNPs are in strong linkage disequilibrium with rs1558902 (Rsq =0.901 for rs9939609 and Rsq =0.905 for rs9939973), which has been reported to be linked with obesity in previous GWAS. We then replicate the findings using the HRS data: the analysis finds 0.4% of SNPs associated with BMI on Chromosome 16. Furthermore, around 25% of the genes that are identified to be associated with BMI are common between the two studies.ConclusionsThe results demonstrate that the HBM and the associated estimation algorithm offer a powerful tool for identifying significant genetic associations with phenotypes of interest, among a large number of SNPs that are common in modern genetics studies.

Dataset Information

Estimating effect sizes in genome-wide association studies.

Publications

Estimating effect sizes in genome-wide association studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets