Project description:The vast majority of human populations and individuals have mixed ancestry. Consequently, adjustment for locus-specific ancestry is essential for genetic association studies. To empower association studies for all populations, it is necessary to integrate effects of locus-specific ancestry and genotype. We developed a joint test of ancestry and association that can be performed with summary statistics, that is independent of study design, can take advantage of locus-specific ancestry effects to boost power in association testing, and can utilize association effects to fine map admixture peaks. We illustrate the test using the association between serum triglycerides and LPL. By combining data from African Americans, European Americans, and West Africans, we identify three conditionally independent variants with varying amounts of ancestrally differentiated allele frequencies. Using out-of-sample data, we demonstrate improved prediction achievable by accounting for multiple causal variants and locus-specific ancestry effects at a single locus.
Project description:BackgroundThe universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set.ResultsFor the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history) is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation), readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial data.ConclusionsFor K&W's artificial protein data, sequence similarity is the predominant factor influencing the preference for common ancestry. In contrast, for the real proteins, model selection tests show that phylogenetic structure is much more important than sequence similarity. Hence, the model selection tests demonstrate that real universally conserved proteins are homologous, a conclusion based primarily on the specific nested patterns of correlations induced in genetically related protein sequences.ReviewersThis article was reviewed by Rob Knight, Robert Beiko (nominated by Peter Gogarten), and Michael Gilchrist.
Project description:Consumer uptake of direct-to-consumer (DTC) DNA ancestry testing is accelerating, yet few empirical studies have examined test impacts on recipients despite the DTC ancestry industry being two decades old. Participants in a longitudinal cohort study of response to health-related DTC genomic testing also received personal DNA ancestry testing at no additional cost. Baseline survey data from the primary study were analyzed together with responses to an additional follow-up survey focused on the response to ancestry results. Ancestry results were generated for 3466 individuals. Of those, 1317 accessed their results, and 322 individuals completed an ancestry response survey, in other words, approximately one in ten who received ancestry testing responded to the survey. Self-reported race/ethnicity was predictive of those most likely to view their results. While 46% of survey responders (N = 147) reported their ancestry results as surprising or unexpected, less than 1% (N = 3) were distressed by them. Importantly, however, 21% (N = 67) reported that their results reshaped their personal identity. Most (81%; N = 260) planned to share results with family, and 12% (N = 39) intended to share results with a healthcare provider. Many (61%; N = 196) reported test benefits (e.g., health insights), while 12% (N = 38) reported negative aspects (e.g., lack of utility). Over half (N = 162) reported being more likely to have other genetic tests in the future. DNA ancestry testing affected individuals with respect to personal identity, intentions to share genetic information with family and healthcare providers, and the likelihood to engage with other genetic tests in the future. These findings have implications for medical care and research, specifically, provider readiness to engage with genetic ancestry information.
Project description:The recent discovery of novel alphacoronaviruses (alpha-CoVs) in European and Asian rodents revealed that rodent coronaviruses (CoVs) sampled worldwide formed a discrete phylogenetic group within this genus. To determine the evolutionary history of rodent CoVs in more detail, particularly the relative frequencies of virus-host co-divergence and cross-species transmission, we recovered longer fragments of CoV genomes from previously discovered European rodent alpha-CoVs using a combination of PCR and high-throughput sequencing. Accordingly, the full genome sequence was retrieved from the UK rat coronavirus, along with partial genome sequences from the UK field vole and Poland-resident bank vole CoVs, and a short conserved ORF1b fragment from the French rabbit CoV. Genome and phylogenetic analysis showed that despite their diverse geographic origins, all rodent alpha-CoVs formed a single monophyletic group and shared similar features, such as the same gene constellations, a recombinant beta-CoV spike gene, and similar core transcriptional regulatory sequences (TRS). These data suggest that all rodent alpha CoVs sampled so far originate from a single common ancestor, and that there has likely been a long-term association between alpha CoVs and rodents. Despite this likely antiquity, the phylogenetic pattern of the alpha-CoVs was also suggestive of relatively frequent host-jumping among the different rodent species.
Project description:For samples of admixed individuals, it is possible to test for both ancestry effects via admixture mapping and genotype effects via association mapping. Here, we describe a joint test called BMIX that combines admixture and association statistics at single markers. We first perform high-density admixture mapping using local ancestry. We then perform association mapping using stratified regression, wherein for each marker genotypes are stratified by local ancestry. In both stages, we use generalized linear models, providing the advantage that the joint test can be used with any phenotype distribution with an appropriate link function. To define the alternative densities for admixture mapping and association mapping, we describe a method based on autocorrelation to empirically estimate the testing burdens of admixture mapping and association mapping. We then describe a joint test that uses the posterior probabilities from admixture mapping as prior probabilities for association mapping, capitalizing on the reduced testing burden of admixture mapping relative to association mapping. By simulation, we show that BMIX is potentially orders-of-magnitude more powerful than the MIX score, which is currently the most powerful frequentist joint test. We illustrate the gain in power through analysis of fasting plasma glucose among 922 unrelated, non-diabetic, admixed African Americans from the Howard University Family Study. We detected loci at 1q24 and 6q26 as genome-wide significant via admixture mapping; both loci have been independently reported from linkage analysis. Using the association data, we resolved the 1q24 signal into two regions. One region, upstream of the gene FAM78B, contains three binding sites for the transcription factor PPARG and two binding sites for HNF1A, both previously implicated in the pathology of type 2 diabetes. The fact that both loci showed ancestry effects may provide novel insight into the genetic architecture of fasting plasma glucose in individuals of African ancestry.
Project description:BackgroundGenetic variants that contribute to asthma susceptibility might be present at varying frequencies in different populations, which is an important consideration and advantage for performing genetic association studies in admixed populations.ObjectiveWe sought to identify asthma-associated loci in African American subjects.MethodsWe compared local African and European ancestry estimated from dense single nucleotide polymorphism genotype data in African American adults with asthma and nonasthmatic control subjects. Allelic tests of association were performed within the candidate regions identified, correcting for local European admixture.ResultsWe identified a significant ancestry association peak on chromosome 6q. Allelic tests for association within this region identified a single nucleotide polymorphism (rs1361549) on 6q14.1 that was associated with asthma exclusively in African American subjects with local European admixture (odds ratio, 2.2). The risk allele is common in Europe (42% in the HapMap population of Utah residents with Northern and Western European ancestry from the Centre d'Etude du Polymorphisme Humain collection) but absent in West Africa (0% in the HapMap population of Yorubans in Ibadan, Nigeria), suggesting the allele is present in African American subjects because of recent European admixture. We replicated our findings in Puerto Rican subjects and similarly found that the signal of association is largely specific to subjects who are heterozygous for African and non-African ancestry at 6q14.1. However, we found no evidence for association in European American or Puerto Rican subjects in the absence of local African ancestry, suggesting that the association with asthma at rs1361549 is due to an environmental or genetic interaction.ConclusionWe identified a novel asthma-associated locus that is relevant to admixed populations with African ancestry and highlight the importance of considering local ancestry in genetic association studies of admixed populations.
Project description:We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era.
Project description:Eukaryotic cells are defined by compartments through which the trafficking of macromolecules is mediated by large complexes, such as the nuclear pore, transport vesicles and intraflagellar transport. The assembly and maintenance of these complexes is facilitated by endomembrane coatomers, long suspected to be divergently related on the basis of structural and more recently phylogenomic analysis. By performing supervised walks in sequence space across coatomer superfamilies, we uncover subtle sequence patterns that have remained elusive to date, ultimately unifying eukaryotic coatomers by divergent evolution. The conserved residues shared by 3,502 endomembrane coatomer components are mapped onto the solenoid superhelix of nucleoporin and COPII protein structures, thus determining the invariant elements of coatomer architecture. This ancient structural motif can be considered as a universal signature connecting eukaryotic coatomers involved in multiple cellular processes across cell physiology and human disease.
Project description:Despite very restricted gene exchange between Escherichia coli and Salmonella typhimurium, both species harbor several of the same classes of insertion sequences. To determine whether the present-day distribution of these transposable elements is due to common ancestry or to horizontal transfer, we determined the sequences of IS1 and IS200 from natural isolates of S. typhimurium and E. coli. One strain of S. typhimurium harbored an IS1 element identical to that originally recovered from E. coli, suggesting that the element was recently transferred between these two species. The level of sequence divergence between copies of IS200 from E. coli and S. typhimurium ranged from 9.5 to 10.7%, indicating that IS200, unlike IS1, has not been repeatedly transferred between these enteric species since E. coli and S. typhimurium diverged from a common ancestor. Levels of variability in IS1 and IS200 for strains of E. coli and S. typhimurium show that each class of insertion sequence has a characteristic pattern of transposition within and among host genomes.