Project description:Fetal growth is an important determinant of cardiometabolic disease risk during childhood and adulthood. The genetic architecture of fetal growth remains largely understudied in ancestrally diverse populations. We conducted genome-wide admixture mapping scan and analysis of genetic ancestry among Hispanic American, African American, European American, and Asian American pregnant women to identify genetic loci associated with fetal growth measures across 13-40 weeks gestation. Fetal growth measures were associated with genome-wide average African, European, Amerindigenous and East Asian ancestry proportions (P ranged from10-3 to 4.8 × 10-2). Admixture mapping analysis identified ten African ancestry loci and three Amerindigenous ancestry loci significantly associated with fetal growth measures at Bonferroni-corrected levels of significance (P ranged from 2.18 × 10-8 to 3.71 × 10-6). At the chr2q23.3-24.2 locus in which higher African ancestry was associated with long bone (femur and humerus) lengths, the T allele of rs13030825 (GALNT13) was associated with longer humerus length in African Americans (β = 0.44, P = 6.25 × 10-6 at week 27; β = 0.39, P = 7.72 × 10-5 at week 40). The rs13030825 SNP accounted for most of the admixture association at the chr2q23.3-24.2 locus and has substantial allele frequency difference between African and European reference samples (FST = 0.55, P = 0.03). Regulatory annotation shows that rs13030825 overlaps with the serum response factor (SRF) transcription factor previously implicated in postnatal bone development of mice. Overall, we identified ancestry-related maternal genetic loci that influence fetal growth, shedding light on molecular pathways that regulate fetal growth and potential effects on health across the lifespan.Clinical trials registration ClinicalTrials.gov, NCT00912132.
Project description:The heritability explained by local ancestry markers in an admixed population hγ2 provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present a novel approach, Heritability estimation from Admixture Mapping Summary STAtistics (HAMSTA), which uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ~5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2=0.012+/-9.2×10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2=0.30+/-0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 +/- 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.
Project description:Over the past two decades, genome-wide association studies (GWASs) have successfully advanced our understanding of the genetic basis of complex traits. Despite the fruitful discovery of GWASs, most GWAS samples are collected from European populations, and these GWASs are often criticized for their lack of ancestry diversity. Trans-ancestry association mapping (TRAM) offers an exciting opportunity to fill the gap of disparities in genetic studies between non-Europeans and Europeans. Here, we propose a statistical method, LOG-TRAM, to leverage the local genetic architecture for TRAM. By using biobank-scale datasets, we showed that LOG-TRAM can greatly improve the statistical power of identifying risk variants in under-represented populations while producing well-calibrated p values. We applied LOG-TRAM to the GWAS summary statistics of various complex traits/diseases from BioBank Japan, UK Biobank, and African populations. We obtained substantial gains in power and achieved effective correction of confounding biases in TRAM. Finally, we showed that LOG-TRAM can be successfully applied to identify ancestry-specific loci and the LOG-TRAM output can be further used for construction of more accurate polygenic risk scores in under-represented populations.
Project description:Population genetic analyses of local ancestry tracts routinely assume that the ancestral admixture process is identical for both parents of an individual, an assumption that may be invalid when considering recent admixture. Here, we present Parental Admixture Proportion Inference (PAPI), a Bayesian tool for inferring the admixture proportions and admixture times for each parent of a single admixed individual. PAPI analyzes unphased local ancestry tracts and has two components: a binomial model that leverages genome-wide ancestry fractions to infer parental admixture proportions and a hidden Markov model (HMM) that infers admixture times from tract lengths. Crucially, the HMM accounts for unobserved within-ancestry recombination by approximating the pedigree crossover dynamics, enabling inference of parental admixture times. In simulations, we find that PAPI's admixture proportion estimates deviate from the truth by 0.047 on average, outperforming ANCESTOR and PedMix by 46.0% and 57.6%, respectively. Moreover, PAPI's admixture time estimates were strongly correlated with the truth (R=0.76) but have an average downward bias of 1.01 generations that is partly attributable to inaccuracies in local ancestry inference. As an illustration of its utility, we ran PAPI on African American genotypes from the PAGE study (N = 5,786) and found strong evidence of assortative mating by ancestry proportion: couples' ancestry proportions are highly correlated (R = 0.87) and are closer to each other than expected under random mating (p < 10-6). We anticipate that PAPI will be useful in studying the population dynamics of admixture and will also be of interest to individuals seeking to learn about their personal genealogies.
Project description:Admixture between long-separated populations is a defining feature of the genomes of many species. The mosaic block structure of admixed genomes can provide information about past contact events, including the time and extent of admixture. Here, we describe an improved wavelet-based technique that better characterizes ancestry block structure from observed genomic patterns. principal components analysis is first applied to genomic data to identify the primary population structure, followed by wavelet decomposition to develop a new characterization of local ancestry information along the chromosomes. For testing purposes, this method is applied to human genome-wide genotype data from Indonesia, as well as virtual genetic data generated using genome-scale sequential coalescent simulations under a wide range of admixture scenarios. Time of admixture is inferred using an approximate Bayesian computation framework, providing robust estimates of both admixture times and their associated levels of uncertainty. Crucially, we demonstrate that this revised wavelet approach, which we have released as the R package adwave, provides improved statistical power over existing wavelet-based techniques and can be used to address a broad range of admixture questions.
Project description:Admixture, the mixing of genetically distinct populations, is increasingly recognized as a fundamental biological process. One major goal of admixture analyses is to estimate the timing of admixture events. Whereas most methods today can only detect the most recent admixture event, here, we present coalescent theory and associated software that can be used to estimate the timing of multiple admixture events in an admixed population. We extensively validate this approach and evaluate the conditions under which it can successfully distinguish one- from two-pulse admixture models. We apply our approach to real and simulated data of Drosophila melanogaster We find evidence of a single very recent pulse of cosmopolitan ancestry contributing to African populations, as well as evidence for more ancient admixture among genetically differentiated populations in sub-Saharan Africa. These results suggest our method can quantify complex admixture histories involving genetic material introduced by multiple discrete admixture pulses. The new method facilitates the exploration of admixture and its contribution to adaptation, ecological divergence, and speciation.
Project description:BackgroundAdmixture mapping is a powerful gene mapping approach for an admixed population formed from ancestral populations with different allele frequencies. The power of this method relies on the ability of ancestry informative markers (AIMs) to infer ancestry along the chromosomes of admixed individuals. In this study, more than one million SNPs from HapMap databases and simulated data have been interrogated in admixed populations using various measures of ancestry informativeness: Fisher Information Content (FIC), Shannon Information Content (SIC), F statistics (FST), Informativeness for Assignment Measure (In), and the Absolute Allele Frequency Differences (delta, δ). The objectives are to compare these measures of informativeness to select SNP markers for ancestry inference, and to determine the accuracy of AIM panels selected by each measure in estimating the contributions of the ancestors to the admixed population.ResultsFST and In had the highest Spearman correlation and the best agreement as measured by Kappa statistics based on deciles. Although the different measures of marker informativeness performed comparably well, analyses based on the top 1 to 10% ranked informative markers of simulated data showed that In was better in estimating ancestry for an admixed population.ConclusionsAlthough millions of SNPs have been identified, only a small subset needs to be genotyped in order to accurately predict ancestry with a minimal error rate in a cost-effective manner. In this article, we compared various methods for selecting ancestry informative SNPs using simulations as well as SNP genotype data from samples of admixed populations and showed that the In measure estimates ancestry proportion (in an admixed population) with lower bias and mean square error.
Project description:Multiple sclerosis (MS) is an autoimmune disease with high prevalence among populations of northern European ancestry. Past studies have shown that exposure to ultraviolet radiation could explain the difference in MS prevalence across the globe. In this study, we investigate whether the difference in MS prevalence could be explained by European genetic risk factors. We characterized the ancestry of MS-associated alleles using RFMix, a conditional random field parameterized by random forests, to estimate their local ancestry in the largest assembled admixed population to date, with 3,692 African Americans, 4,915 Asian Americans, and 3,777 Hispanics. The majority of MS-associated human leukocyte antigen (HLA) alleles, including the prominent HLA-DRB1*15:01 risk allele, exhibited cosmopolitan ancestry. Ancestry-specific MS-associated HLA alleles were also identified. Analysis of the HLA-DRB1*15:01 risk allele in African Americans revealed that alleles on the European haplotype conferred three times the disease risk compared to those on the African haplotype. Furthermore, we found evidence that the European and African HLA-DRB1*15:01 alleles exhibit single nucleotide polymorphism (SNP) differences in regions encoding the HLA-DRB1 antigen-binding heterodimer. Additional evidence for increased risk of MS conferred by the European haplotype were found for HLA-B*07:02 and HLA-A*03:01 in African Americans. Most of the 200 non-HLA MS SNPs previously established in European populations were not significantly associated with MS in admixed populations, nor were they ancestrally more European in cases compared to controls. Lastly, a genome-wide search of association between European ancestry and MS revealed a region of interest close to the ZNF596 gene on chromosome 8 in Hispanics; cases had a significantly higher proportion of European ancestry compared to controls. In conclusion, our study established that the genetic ancestry of MS-associated alleles is complex and implicated that difference in MS prevalence could be explained by the ancestry of MS-associated alleles.
Project description:It has become clear that hybridization between species is much more common than previously recognized. As a result, we now know that the genomes of many modern species, including our own, are a patchwork of regions derived from past hybridization events. Increasingly researchers are interested in disentangling which regions of the genome originated from each parental species using local ancestry inference methods. Due to the diverse effects of admixture, this interest is shared across disparate fields, from human genetics to research in ecology and evolutionary biology. However, local ancestry inference methods are sensitive to a range of biological and technical parameters which can impact accuracy. Here we present paired simulation and ancestry inference pipelines, mixnmatch and ancestryinfer, to help researchers plan and execute local ancestry inference studies. mixnmatch can simulate arbitrarily complex demographic histories in the parental and hybrid populations, selection on hybrids, and technical variables such as coverage and contamination. ancestryinfer takes as input sequencing reads from simulated or real individuals, and implements an efficient local ancestry inference pipeline. We perform a series of simulations with mixnmatch to pinpoint factors that influence accuracy in local ancestry inference and highlight useful features of the two pipelines. mixnmatch is a powerful tool for simulations of hybridization while ancestryinfer facilitates local ancestry inference on real or simulated data.
Project description:Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation that have been used in human complex trait genetics are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019). As users rather than developers, we sought to perform direct comparisons of accuracy, runtime, memory usage, and usability of these software tools to determine which is best for incorporation into association study pipelines. We find that in the majority of cases RFMix has the highest median accuracy with the ranking of the remaining software dependent on the ancestral architecture of the population tested. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software increase linearly with respect to sample size. The only exception is RFMix, which increases quadratically with respect to runtime and linearly with respect to memory. Effective local ancestry estimation tools are necessary to increase diversity and prevent population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform just as well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking.