Project description:Hexaploids, a group of organisms containing three complete sets of chromosomes in a single nucleus, are of utmost importance to evolutionary studies and breeding programs. Many studies have focused on hexaploid linkage analysis and QTL mapping in controlled crosses, but little methodology has been developed to reveal how hexaploids diversify and evolve in natural populations. We formulate a general framework for studying the pattern of genetic variation in autohexaploid populations through testing deviation from Hardy-Weinberg equilibrium (HWE) at individual molecular markers. We confirm that hexaploids cannot reach exact HWE but can approach asymptotic HWE at 8-9 generations of random mating. We derive a statistical algorithm for testing HWE and the occurrence of double reduction for autopolyploids, a phenomenon that affects population variation during long evolutionary processes. We perform computer simulation to validate the statistical behavior of our test procedure and demonstrate its usefulness by analyzing a real data set for autohexaploid chrysanthemum. When extended to allohexaploids, our test procedure will provide a generic tool for illustrating the genome structure of hexaploids in the quest to infer their evolutionary status and design association studies of complex traits.
Project description:Testing for Hardy-Weinberg equilibrium (HWE) is an important component in almost all analyses of population genetic data. Genetic markers that violate HWE are often treated as special cases; for example, they may be flagged as possible genotyping errors, or they may be investigated more closely for evolutionary signatures of interest. The presence of population structure is one reason why genetic markers may fail a test of HWE. This is problematic because almost all natural populations studied in the modern setting show some degree of structure. Therefore, it is important to be able to detect deviations from HWE for reasons other than structure. To this end, we extend statistical tests of HWE to allow for population structure, which we call a test of "structural HWE." Additionally, our new test allows one to automatically choose tuning parameters and identify accurate models of structure. We demonstrate our approach on several important studies, provide theoretical justification for the test, and present empirical evidence for its utility. We anticipate the proposed test will be useful in a broad range of analyses of genome-wide population genetic data.
Project description:Recently, there have been many case-control studies proposed to test for association between haplotypes and disease, which require the Hardy-Weinberg equilibrium (HWE) assumption of haplotype frequencies. As such, haplotype inference of unphased genotypes and development of haplotype-based HWE tests are crucial prior to fine mapping. The goodness-of-fit test is a frequently-used method to test for HWE for multiple tightly-linked loci. However, its degrees of freedom dramatically increase with the increase of the number of loci, which may lack the test power. Therefore, in this paper, to improve the test power for haplotype-based HWE, we first write out two likelihood functions of the observed data based on the Niu's model (NM) and inbreeding model (IM), respectively, which can cause the departure from HWE. Then, we use two expectation-maximization algorithms and one expectation-conditional-maximization algorithm to estimate the model parameters under the HWE, IM and NM models, respectively. Finally, we propose the likelihood ratio tests LRT[Formula: see text] and LRT[Formula: see text] for haplotype-based HWE under the NM and IM models, respectively. We simulate the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results show that both of the tests control the type I error rates well in testing for haplotype-based HWE. If the NM model is true, then LRT[Formula: see text] is more powerful. While, if the true model is the IM model, then LRT[Formula: see text] has better performance in power. Under the population stratification model, LRT[Formula: see text] is still more powerful. To this end, LRT[Formula: see text] is generally recommended. Application of the proposed methods to a rheumatoid arthritis data set further illustrates their utility for real data analysis.
Project description:Testing for Hardy-Weinberg equilibrium (HWE) is commonly used as a quality control filter in genome-wide scans for markers with experimentally determined genotypes. In contrast, for markers with imputed genotypes, there are post-imputation metrics of quality that can be used as screens but there are no formal tests of deviation from HWE. Similarly, there are no formal tests of deviation from HWE for probabilistic genotypes that are generated by sequencing projects. Here, I describe generalizations of the approximate χ(2) and exact tests of HWE for use with uncertain genotypes. The tests fully account for the probabilities of all possible genotypes at a marker for each individual. By computer simulation, the approximate and exact tests are shown to maintain valid control of the type I error rate. Calculations of the loss of power as the uncertainty in genotypes increases are illustrated. The tests are compatible with chip-based genotypes for single-nucleotide polymorphisms and copy number polymorphisms, imputed genotypes, and probabilistic assignments of genotype from variable-coverage sequence data.
Project description:Statistical tests for Hardy-Weinberg equilibrium are important elementary tools in genetic data analysis. X-chromosomal variants have long been tested by applying autosomal test procedures to females only, and gender is usually not considered when testing autosomal variants for equilibrium. Recently, we proposed specific X-chromosomal exact test procedures for bi-allelic variants that include the hemizygous males, as well as autosomal tests that consider gender. In this study, we present the extension of the previous work for variants with multiple alleles. A full enumeration algorithm is used for the exact calculations of tri-allelic variants. For variants with many alternate alleles, we use a permutation test. Some empirical examples with data from the 1,000 genomes project are discussed.
Project description:The use of posterior probabilities to summarize genotype uncertainty is pervasive across genotype, sequencing and imputation platforms. Prior work in many contexts has shown the utility of incorporating genotype uncertainty (posterior probabilities) in downstream statistical tests. Typical approaches to incorporating genotype uncertainty when testing Hardy-Weinberg equilibrium tend to lack calibration in the type I error rate, especially as genotype uncertainty increases. We propose a new approach in the spirit of genomic control that properly calibrates the type I error rate, while yielding improved power to detect deviations from Hardy-Weinberg Equilibrium. We demonstrate the improved performance of our method on both simulated and real genotypes.
Project description:BackgroundThis study is motivated by National Household Surveys that collect genetic data, in which complex samples (e.g. stratified multistage cluster sample), partially from the same family, are selected. In addition to the differential selection probabilities of selecting households and persons within the sampled households, there are two levels of correlations of the collected genetic data in National Genetic Household Surveys (NGHS). The first level of correlation is induced by the hierarchical geographic clustered sampling of households and the second level of correlation is induced by biological inheritances from individuals sampled in the same household.ResultsTo test for Hardy-Weinberg Equilibrium (HWE) in NGHS, two test statistics, the CCS method [1] and the QS method [2], appear to be the only existing methods that take account of both correlations. In this paper, I evaluate both methods in terms of the test size and power under a variety of complex designs with different weighting schemes and varying magnitudes of the two correlation effects. Both methods are applied to a real data example from the Hispanic Health and Nutrition Examination Survey with simulated genotype data.ConclusionsThe QS method maintains the nominal size well and consistently achieves higher power than the CCS method in testing HWE under a variety of sample designs, and therefore is recommended for testing HWE of genetic survey data with complex designs.
Project description:Detecting departures from Hardy-Weinberg equilibrium (HWE) of marker-genotype frequencies is a crucial first step in almost all human genetic analyses. When a sample is stratified by multiple ethnic groups, it is important to allow the marker-allele frequencies to differ over the strata. In this situation, it is common to test for HWE by using an exact test within each stratum and then using the minimum P value as a global test. This approach does not account for multiple testing, and, because it does not combine information over strata, it does not have optimal power. Several approximate methods to combine information over strata have been proposed, but most of them sum over strata a measure of departure from HWE; if the departures are in different directions, then summing can diminish the overall evidence of departure from HWE. An exact stratified test is more appealing because it uses the probability of genotype configurations across the strata as evidence for global departures from HWE. We developed an exact stratified test for HWE for diallelic markers, such as single-nucleotide polymorphisms (SNPs), and an exact test for homogeneity of Hardy-Weinberg disequilibrium. By applying our methods to data from Perlegen and HapMap--a combined total of more than five million SNP genotypes, with three to four strata and strata sizes ranging from 23 to 60 subjects--we illustrate that the exact stratified test provides more-robust and more-powerful results than those obtained by either the minimum of exact test P values over strata or approximate stratified tests that sum measures of departure from HWE. Hence, our new methods should be useful for samples composed of multiple ethnic groups.
Project description:Objective: Departure from Hardy Weinberg Equilibrium (HWE) may occur due to a variety of causes, including purifying selection, inbreeding, population substructure, copy number variation or genotyping error. We searched for specific characteristics of HWE-departure due to genotyping error. Methods: Genotypes of a random set of genetic variants were obtained from the Exome Aggregation Consortium (ExAC) database. Variants with <80% successful genotypes or with minor allele frequency (MAF) <1% were excluded. HWE-departure (d-HWE) was considered significant at p < 10E-05 and classified as d-HWE with loss of heterozygosity (LoH d-HWE) or d-HWE with excess heterozygosity (gain of heterozygosity: GoH d-HWE). Missing genotypes, variant type (single nucleotide polymorphism (SNP) vs. insertion/deletion); MAF, standard deviation (SD) of MAF across populations (MAF-SD) and copy number variation were evaluated for association with HWE-departure. Results: The study sample comprised 3,204 genotype distributions. HWE-departure was observed in 134 variants: LoH d-HWE in 41 (1.3%), GoH d-HWE in 93 (2.9%) variants. LoH d-HWE was more likely in variants located within deletion polymorphisms (p < 0.001) and in variants with higher MAF-SD (p = 0.0077). GoH d-HWE was associated with low genotyping rate, with variants of insertion/deletion type and with high MAF (all at p < 0.001). In a sub-sample of 2,196 variants with genotyping rate >98%, LoH d-HWE was found in 29 (1.3%) variants, but no GoH d-HWE was detected. The findings of the non-random distribution of HWE-violating SNPs along the chromosome, the association with common deletion polymorphisms and indel-variant type, and the finding of excess heterozygotes in genomic regions that are prone to cross-hybridization were confirmed in a large sample of short variants from the 1,000 Genomes Project. Conclusions: We differentiated between two types of HWE-departure. GoH d-HWE was suggestive for genotyping error. LoH d-HWE, on the contrary, pointed to natural variabilities such as population substructure or common deletion polymorphisms.
Project description:The century-old Hardy-Weinberg law remains fundamental to population genetics. Typically Hardy-Weinberg equilibrium is tested in unrelated individuals using a chi(2) goodness-of-fit test that compares expected and observed numbers of heterozygotes and homozygotes. In this report, we propose a likelihood ratio test for Hardy-Weinberg equilibrium that accommodates a mixture of pedigree and random sample data. The underlying statistical model depends on a parameter gamma determining the ratio of heterozygous genotypes to homozygous genotypes among pedigree founders. As our heterozygous-homozygous test accommodates markers with dominant and recessive alleles, it can handle the phase ambiguities encountered in combining several linked single nucleotide polymorphisms into a single supermarker. No prior haplotyping is necessary. Our experience on real and simulated data suggests that the heterozygous-homozygous test has good type-one error and power.