Project description:Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
Project description:Identifying regions of the genome that are depleted of mutations can distinguish potentially deleterious variants. Short tandem repeats (STRs), also known as microsatellites, are among the largest contributors of de novo mutations in humans. However, per-locus studies of STR mutations have been limited to highly ascertained panels of several dozen loci. Here we harnessed bioinformatics tools and a novel analytical framework to estimate mutation parameters for each STR in the human genome by correlating STR genotypes with local sequence heterozygosity. We applied our method to obtain robust estimates of the impact of local sequence features on mutation parameters and used these estimates to create a framework for measuring constraint at STRs by comparing observed versus expected mutation rates. Constraint scores identified known pathogenic variants with early-onset effects. Our metric will provide a valuable tool for prioritizing pathogenic STRs in medical genetics studies.
Project description:Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary constraint.
Project description:Despite enormous body plan variation, genes regulating embryonic development are highly conserved. Here, we probe the mechanisms that predispose ancient regulatory genes to reutilization and diversification rather than evolutionary loss. The Hox gene fushi tarazu (ftz) arose as a homeotic gene but functions as a pair-rule segmentation gene in Drosophila. ftz shows extensive variation in expression and protein coding regions but has managed to elude loss from arthropod genomes. We asked what properties prevent this loss by testing the importance of different protein motifs and partners in the developing CNS, where ftz expression is conserved. Drosophila Ftz proteins with mutated protein motifs were expressed under the control of a neurogenic-specific ftz cis-regulatory element (CRE) in a ftz mutant background rescued for segmentation defects. Ftz CNS function did not require the variable motifs that mediate differential cofactor interactions involved in homeosis or segmentation, which vary in arthropods. Rather, CNS function did require the shared DNA-binding homeodomain, which plays less of a role in Ftz segmentation activity. The Antennapedia homeodomain substituted for Ftz homeodomain function in the Drosophila CNS, but full-length Antennapedia did not rescue CNS defects. These results suggest that a core CNS function retains ftz in arthropod genomes. Acquisition of a neurogenic CRE led to ftz expression in unique CNS cells, differentiating its role from neighboring Hox genes, rendering it nonredundant. The inherent flexibility of modular CREs and protein domains allows for stepwise acquisition of new functions, explaining broad retention of regulatory genes during animal evolution.
Project description:Transcript abundance was measured in whole-body virgin male Drosophila serrata from 41 inbred lines that had diverged through 27 generations of mutation accumulation. Pleiotropic mutations are the ultimate source of genetic variation in complex traits, including many human diseases. However, the nature and extent of mutational pleiotropy remain largely unknown. Here, we investigate the variation in 11,604 gene expression traits among 41 mutation accumulation lines of Drosophila serrata, which had diverged for 27 generations. We detected significant mutational variance in 4.6% of ESTs, but 70% of ESTs were invariant among lines, allowing us to reject a null hypothesis of phenome-wide universal pleiotropy. Mutational covariance among ESTs was detected at a frequency of only 1 in 193 random pairs of variable EST, bu t was detected among random combinations of five ESTs in 1 in 5 cases, revealing that mutational covariance among multiple ESTs was common. The observed frequency of significant multivariate covariance among random ESTs implied that a substantial number of ESTs (>70) must be pleiotropically affected by at least some mutations. We measured gene expression of male Drosophila serrata from 41 mutation accumulation lines (whole-body). Data from two replicates for each line are presented.
Project description:Rothmund-Thomson syndrome (RTS) is an autosomal recessive disorder caused by deleterious mutations in the RECQL4 gene on chromosome 8. The RECQL4 gene structure is unusual because it contains many small introns <100 bp. We describe a proband with RTS who has a novel 11-bp intronic deletion, and we show that this mutation results in a 66-bp intron too small for proper splicing. Constraint on intron size may represent a general mutational mechanism, since human-genome analysis reveals that approximately 15% of genes have introns <100 bp and are therefore susceptible to size constraint. Thus, monitoring of intron size may allow detection of mutations missed by exon-by-exon approaches.
Project description:Transcript abundance was measured in whole-body virgin male Drosophila serrata from 41 inbred lines that had diverged through 27 generations of mutation accumulation. Pleiotropic mutations are the ultimate source of genetic variation in complex traits, including many human diseases. However, the nature and extent of mutational pleiotropy remain largely unknown. Here, we investigate the variation in 11,604 gene expression traits among 41 mutation accumulation lines of Drosophila serrata, which had diverged for 27 generations. We detected significant mutational variance in 4.6% of ESTs, but 70% of ESTs were invariant among lines, allowing us to reject a null hypothesis of phenome-wide universal pleiotropy. Mutational covariance among ESTs was detected at a frequency of only 1 in 193 random pairs of variable EST, bu t was detected among random combinations of five ESTs in 1 in 5 cases, revealing that mutational covariance among multiple ESTs was common. The observed frequency of significant multivariate covariance among random ESTs implied that a substantial number of ESTs (>70) must be pleiotropically affected by at least some mutations.
Project description:Quantification of the tolerance of protein sites to genetic variation has become a cornerstone of variant interpretation. We hypothesize that the constraint on missense variation at individual amino acid sites is largely shaped by direct interactions with 3D neighboring sites. To quantify this constraint, we introduce a framework called COntact Set MISsense tolerance (or COSMIS) and comprehensively map the landscape of 3D mutational constraint on 6.1 million amino acid sites covering 16,533 human proteins. We show that 3D mutational constraint is pervasive and that the level of constraint is strongly associated with disease relevance both at the site and the protein level. We demonstrate that COSMIS performs significantly better at variant interpretation tasks than other population-based constraint metrics while also providing structural insight into the functional roles of constrained sites. We anticipate that COSMIS will facilitate the interpretation of protein-coding variation in evolution and prioritization of sites for mechanistic investigation.
Project description:Novel target discovery is warranted to improve treatment in adult T-cell acute lymphoblastic leukemia (T-ALL) patients. We provide a comprehensive study on mutations to enhance the understanding of therapeutic targets and studied 81 adult T-ALL patients. NOTCH1 exhibitedthe highest mutation rate (53%). Mutation frequencies of FBXW7 (10%), WT1 (10%), JAK3 (12%), PHF6 (11%), and BCL11B (10%) were in line with previous reports. We identified recurrent alterations in transcription factors DNM2, and RELN, the WNT pathway associated cadherin FAT1, and in epigenetic regulators (MLL2, EZH2). Interestingly, we discovered novel recurrent mutations in the DNA repair complex member HERC1, in NOTCH2, and in the splicing factor ZRSR2. A frequently affected pathway was the JAK/STAT pathway (18%) and a significant proportion of T-ALL patients harboured mutations in epigenetic regulators (33%), both predominantly found in the unfavourable subgroup of early T-ALL. Importantly, adult T-ALL patients not only showed a highly heterogeneous mutational spectrum, but also variable subclonal allele frequencies implicated in therapy resistance and evolution of relapse. In conclusion, we provide novel insights in genetic alterations of signalling pathways (e.g. druggable by ?-secretase inhibitors, JAK inhibitors or EZH2 inhibitors), present in over 80% of all adult T-ALL patients, that could guide novel therapeutic approaches.