Project description:Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R2 > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10-8) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
Project description:In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the single-variant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing α = 2.5 × 10-6). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10-7, including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.
Project description:BackgroundCystic fibrosis (CF) is an autosomal recessive disease caused by genetic variants of the cystic fibrosis transmembrane conductance regulator (CFTR) gene. It is a common hereditary disease in Caucasians while rare in the Chinese. Until now, only 87 Chinese patients have been reported with molecular confirmations. The variant spectrum and clinical features of Chinese CF patients are obviously different from those of Caucasians.Materials and methodsWhole-exome sequencing was applied to analyze the exome of three individuals who have only the typical CF phenotype in the respiratory system from two consanguineous families. The protein domain and structure analysis were applied to predict the impact of the variants. Sanger sequencing was applied to validate the candidate variants.ResultsA previously reported homozygous variant in CFTR (NM_000492.4: c.1000C > T, p.R334W) was identified in proband I. A novel homozygous variant in a polymorphic position (NM_000492.4: c.1409T > A, p.V470E) was identified in two individuals in the family II. The novel CFTR variant predicted to be disease-causing is the first, to the best of our knowledge, to be reported in CFTR. However, in vitro validation is still needed.ConclusionOur finding expands the variant spectrum of CFTR, reveals clearer clinical phenotype distinction and variant spectrum distinction between Chinese and Caucasian CF patients, and contributes to a more rapid genetic diagnosis and future genetic counseling.
Project description:Pulmonary function is an indicator of well-being, and pulmonary pathologies are the third major cause of death worldwide. We analysed the UK Biobank genome-wide association summary statistics of pulmonary function for Europeans and individuals of recent African descent to identify variants associated with the trait in the two ancestries. Here, we show 627 variants in Europeans and 3 in Africans associated with three pulmonary function parameters. In addition to the 110 variants in Europeans previously reported to be associated with phenotypes related to pulmonary function, we identify 279 novel loci, including an ISX intergenic variant rs369476290 on chromosome 22 in Africans. Remarkably, we find no shared variants among Africans and Europeans. Furthermore, enrichment analyses of variants separately for each ancestry background reveal significant enrichment for terms related to pulmonary phenotypes in Europeans but not Africans. Further analysis of studies of pulmonary phenotypes reveals that individuals of European background are disproportionally overrepresented in datasets compared to Africans, with the gap widening over the past five years. Our findings extend our understanding of the different variants that modify the pulmonary function in Africans and Europeans, a promising finding for future GWASs and medical studies.
Project description:Genome-wide association studies have discovered hundreds of associations between common genotypes and kidney function but cannot comprehensively investigate rare coding variants. Here, we apply a genotype imputation approach to whole exome sequencing data from the UK Biobank to increase sample size from 166,891 to 408,511. We detect 158 rare variants and 105 genes significantly associated with one or more of five kidney function traits, including genes not previously linked to kidney disease in humans. The imputation-powered findings derive support from clinical record-based kidney disease information, such as for a previously unreported splice allele in PKD2, and from functional studies of a previously unreported frameshift allele in CLDN10. This cost-efficient approach boosts statistical power to detect and characterize both known and novel disease susceptibility variants and genes, can be generalized to larger future studies, and generates a comprehensive resource ( https://ckdgen-ukbb.gm.eurac.edu/ ) to direct experimental and clinical studies of kidney disease.
Project description:A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.
Project description:The human default mode network (DMN) is implicated in several unique mental capacities. In this study, we tested whether brain-wide interregional communication in the DMN can be derived from population variability in intrinsic activity fluctuations, gray-matter morphology, and fiber tract anatomy. In a sample of 10,000 UK Biobank participants, pattern-learning algorithms revealed functional coupling states in the DMN that are linked to connectivity profiles between other macroscopical brain networks. In addition, DMN gray matter volume was covaried with white matter microstructure of the fornix. Collectively, functional and structural patterns unmasked a possible division of labor within major DMN nodes: Subregions most critical for cortical network interplay were adjacent to subregions most predictive of fornix fibers from the hippocampus that processes memories and places.
Project description:IntroductionA previous study of 200,000 exome-sequenced UK Biobank participants to test for association of rare coding variants with hypertension implicated two genes at exome-wide significance, DNMT3A and FES. A total of 42 genes had an uncorrected p value <0.001. These results were followed up in a larger sample of 470,000 exome-sequenced participants.MethodsWeighted burden analysis of rare coding variants in a new sample of 97,050 cases and 172,263 controls was carried out for these 42 genes. Those showing evidence for association were then analysed in the combined sample of 167,127 cases and 302,691 controls.ResultsThe association of DNMT3A and FES with hypertension was replicated in the new sample and they and the previously implicated gene NPR1, which codes for a membrane-bound guanylate cyclase, were all exome-wide significant in the combined sample. Also exome-wide significant as risk genes for hypertension were GUCY1A1, ASXL1, and SMAD6, while GUCY1B1 had a nominal p value of <0.0001. GUCY1A1 and GUCY1B1 code for subunits of a soluble guanylate cyclase. For two genes, DBH, which codes for dopamine beta hydroxylase, and INPPL1, rare coding variants predicted to impair gene function were protective against hypertension, again with exome-wide significance.ConclusionThe findings offer new insights into biological risk factors for hypertension which could be the subject of further investigation. In particular, genetic variants predicted to impair the function of either membrane-bound guanylate cyclase, activated by natriuretic peptides, or soluble guanylate cyclase, activated by nitric oxide, increase risk of hypertension. Conversely, variants impairing the function of dopamine beta hydroxylase, responsible for the synthesis of norepinephrine, reduce hypertension risk.
Project description:Copy number variants are duplications and deletions of the genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested Simulator of Exome Copy Number Variants (SECNVs), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.
Project description:Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data1,2. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank3. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.