Dataset Information

A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait.

ABSTRACT: Several methods have been proposed to estimate the variance in disease liability explained by large sets of genetic markers. However, current methods do not scale up well to large sample sizes. Linear mixed models require solving high-dimensional matrix equations, and methods that use polygenic scores are very computationally intensive. Here we propose a fast analytic method that uses polygenic scores, based on the formula for the non-centrality parameter of the association test of the score. We estimate model parameters from the results of multiple polygenic score tests based on markers with p values in different intervals. We estimate parameters by maximum likelihood and use profile likelihood to compute confidence intervals. We compare various options for constructing polygenic scores, based on nested or disjoint intervals of p values, weighted or unweighted effect sizes, and different numbers of intervals, in estimating the variance explained by a set of markers, the proportion of markers with effects, and the genetic covariance between a pair of traits. Our method provides nearly unbiased estimates and confidence intervals with good coverage, although estimation of the variance is less reliable when jointly estimated with the covariance. We find that disjoint p value intervals perform better than nested intervals, but the weighting did not affect our results. A particular advantage of our method is that it can be applied to summary statistics from single markers, and so can be quickly applied to large consortium datasets. Our method, named AVENGEME (Additive Variance Explained and Number of Genetic Effects Method of Estimation), is implemented in R software.

SUBMITTER: Palla L

PROVIDER: S-EPMC4573448 | biostudies-literature | 2015 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait.

Palla Luigi L Dudbridge Frank F

American journal of human genetics 20150716 2

Several methods have been proposed to estimate the variance in disease liability explained by large sets of genetic markers. However, current methods do not scale up well to large sample sizes. Linear mixed models require solving high-dimensional matrix equations, and methods that use polygenic scores are very computationally intensive. Here we propose a fast analytic method that uses polygenic scores, based on the formula for the non-centrality parameter of the association test of the score. We ...[more]

PMID: 26189816

Similar Datasets

Project description:ImportanceWe investigated the variation in neuropsychological function explained by risk alleles at the psychosis susceptibility gene ZNF804A and its interacting partners using single nucleotide polymorphisms (SNPs), polygenic scores, and epistatic analyses. Of particular importance was the relative contribution of the polygenic score vs epistasis in variation explained.ObjectivesTo (1) assess the association between SNPs in ZNF804A and the ZNF804A polygenic score with measures of cognition in cases with psychosis and (2) assess whether epistasis within the ZNF804A pathway could explain additional variation above and beyond that explained by the polygenic score.Design, setting, and participantsPatients with psychosis (n = 424) were assessed in areas of cognitive ability impaired in schizophrenia including IQ, memory, attention, and social cognition. We used the Psychiatric GWAS Consortium 1 schizophrenia genome-wide association study to calculate a polygenic score based on identified risk variants within this genetic pathway. Cognitive measures significantly associated with the polygenic score were tested for an epistatic component using a training set (n = 170), which was used to develop linear regression models containing the polygenic score and 2-SNP interactions. The best-fitting models were tested for replication in 2 independent test sets of cases: (1) 170 individuals with schizophrenia or schizoaffective disorder and (2) 84 patients with broad psychosis (including bipolar disorder, major depressive disorder, and other psychosis).Main outcomes and measuresParticipants completed a neuropsychological assessment battery designed to target the cognitive deficits of schizophrenia including general cognitive function, episodic memory, working memory, attentional control, and social cognition.ResultsHigher polygenic scores were associated with poorer performance among patients on IQ, memory, and social cognition, explaining 1% to 3% of variation on these scores (range, P = .01 to .03). Using a narrow psychosis training set and independent test sets of narrow phenotype psychosis (schizophrenia and schizoaffective disorder), broad psychosis, and control participants (n = 89), the addition of 2 interaction terms containing 2 SNPs each increased the R2 for spatial working memory strategy in the independent psychosis test sets from 1.2% using the polygenic score only to 4.8% (P = .11 and .001, respectively) but did not explain additional variation in control participants.Conclusions and relevanceThese data support a role for the ZNF804A pathway in IQ, memory, and social cognition in cases. Furthermore, we showed that epistasis increases the variation explained above the contribution of the polygenic score.

Project description:Isolation of mutants in populations of microorganisms has been a valuable tool in experimental genetics for decades. The main disadvantage, however, is the inability of isolating mutants in non-selectable polygenic traits. Most traits of organisms, however, are non-selectable and polygenic, including industrially important properties of microorganisms. The advent of powerful technologies for polygenic analysis of complex traits has allowed simultaneous identification of multiple causative mutations among many thousands of irrelevant mutations. We now show that this also applies to haploid strains of which the genome has been loaded with induced mutations so as to affect as many non-selectable, polygenic traits as possible. We have introduced about 900 mutations into single haploid yeast strains using multiple rounds of EMS mutagenesis, while maintaining the mating capacity required for genetic mapping. We screened the strains for defects in flavor production, an important non-selectable, polygenic trait in yeast alcoholic beverage production. A haploid strain with multiple induced mutations showing reduced ethyl acetate production in semi-anaerobic fermentation, was selected and the underlying quantitative trait loci (QTLs) were mapped using pooled-segregant whole-genome sequence analysis after crossing with an unrelated haploid strain. Reciprocal hemizygosity analysis and allele exchange identified PMA1 and CEM1 as causative mutant alleles and TPS1 as a causative genetic background allele. The case of CEM1 revealed that relevant mutations without observable effect in the haploid strain with multiple induced mutations (in this case due to defective mitochondria) can be identified by polygenic analysis as long as the mutations have an effect in part of the segregants (in this case those that regained fully functional mitochondria). Our results show that genomic saturation mutagenesis combined with complex trait polygenic analysis could be used successfully to identify causative alleles underlying many non-selectable, polygenic traits in small collections of haploid strains with multiple induced mutations.

Project description:In a companion paper Balbona et al. (Behav Genet, in press), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing [Formula: see text] values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation ([Formula: see text]), standard errors of these standardized estimates are reasonable ([Formula: see text]) for [Formula: see text] trios, and can even be reasonable for smaller sample sizes (e.g., down to 4K) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g., [Formula: see text] as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS's on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring.

Dataset Information

A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait.

Publications

A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets