DNA variants affecting the expression of numerous genes in trans have diverse mechanisms of action and evolutionary histories
Ontology highlight
ABSTRACT: DNA variants that alter gene expression contribute to variation in many phenotypic traits. In particular, trans-acting variants, which are often located on different chromosomes from the genes they affect, are an important source of heritable gene expression variation. However, our knowledge about the identity and mechanism of causal trans-acting variants remains limited. Here, we developed a fine-mapping strategy called CRISPR-Swap and dissected three expression quantitative trait locus (eQTL) hotspots known to alter the expression of multiple genes in trans in the yeast Saccharomyces cerevisiae. Causal variants were identified by engineering recombinant alleles and quantifying the effects of these alleles on the expression of a green fluorescent protein-tagged gene affected by the given locus in trans. We validated the effect of each variant on the expression of multiple genes by RNA-sequencing. The three variants were strikingly different in their molecular mechanism, the type of genes they reside in, and their distribution in natural populations. While a missense leucine-to-serine variant at position 63 in the transcription factor Oaf1 (L63S) was almost exclusively present in the reference laboratory strain, the two other variants were frequent among S. cerevisiae isolates. A causal missense variant in the glucose receptor Rgt2 (V539I) occurred at a poorly conserved amino acid residue and its effect was strongly dependent on the concentration of glucose in the culture medium. A noncoding variant in the conserved fatty acid regulated (FAR) element of the OLE1 promoter influenced the expression of the fatty acid desaturase Ole1 in cis and, by modulating the level of this essential enzyme, other genes in trans. The OAF1 and OLE1 variants showed a non-additive genetic interaction, and affected cellular lipid metabolism. These results revealed remarkable diversity in the molecular basis of trans-regulatory variation, highlighting the challenges in predicting which natural genetic variants affect gene expression.
Project description:Ubiquitin-proteasome system (UPS) protein degradation regulates protein abundance and eliminates misfolded and damaged proteins from eukaryotic cells. Variation in UPS activity influences numerous cellular and organismal phenotypes. However, to what extent such variation results from individual genetic differences is almost entirely unknown. Here, we developed a statistically powerful mapping approach to characterize the genetic basis of variation in UPS activity. Using the yeast Saccharomyces cerevisiae, we systematically mapped genetic influences on the N-end rule, a UPS pathway that recognizes N-degrons, degradation-promoting signals in protein N-termini. We identified 149 genomic loci that influence UPS activity across the complete set of N-degrons. Resolving four loci to individual causal nucleotides identified regulatory and missense variants in ubiquitin system genes whose products process (NTA1), recognize (UBR1 and DOA10), and ubiquitinate (UBC6) cellular proteins. Each of these genes contained multiple causal variants and several individual variants had substrate-specific effects on UPS activity. A cis-acting promoter variant that modulates UPS activity by altering UBR1 expression also alters the abundance of 36 proteins without affecting levels of the corresponding mRNAs. Our results demonstrate that natural genetic variation shapes the full sequence of molecular events in protein ubiquitination and implicate genetic influences on the UPS as a prominent source of post-translational variation in gene expression.
Project description:Cancer genomes are rife with genetic variants; one key outcome of this variation is gain-of-cysteine, which is the most frequently acquired amino acid due to missense variants in COSMIC. Acquired cysteines are also both driver mutations and sites targeted by precision therapies. However, despite their ubiquity, nearly all acquired cysteines remain uncharacterized. Here, we pair cysteine chemoproteomics—a technique that enables proteome-wide pinpointing of functional, redox sensitive, and potentially druggable residues—with genomics to reveal the hidden landscape of cysteine acquisition. For both cancer and healthy genomes, we find that cysteine acquisition is a ubiquitous consequence of genetic variation that is further elevated in the context of decreased DNA repair. Our chemoproteogenomics platform integrates chemoproteomic, whole exome, and RNA-seq data, with a customized two-stage false detection rate (FDR) error control -based FragPipe-enabled proteomic search, further enabled with a user-friendly FragPipe interface. By deploying chemoproteogenomics across 11 total cell lines, we identify gain of cysteines, including those liganded by electrophilic druglike molecules. Reference cysteines proximal to missense variants were also found to be pervasive, supporting further heretofore untapped opportunities for proteoform-specific chemical probe development campaigns. Many of the identified acquired cysteines, particularly those in mismatch repair deficient (dMMR) cell lines, were found to be predicted to be highly deleterious, providing evidence for likely functional and therapeutic relevance. The sample-matched combinatorial variant databases built into chemoproteogenomics afforded enhanced coverage and enabled identification of highly reactive and ligandable cysteine residues in the highly polymorphic MHC-Class I complexes, including for pathogenic alleles.
Project description:Determining the pathogenicity of human genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes will likely require generalizable, scalable assays. Here we describe Variant Abundance by Massively Parallel Sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance in a single experiment. We applied VAMP-seq to quantify the abundance of many thousands of single amino acid variants of two proteins, PTEN and TPMT, in which functional variants are clinically actionable.
Project description:Despite widespread advances in DNA sequencing in the past decade, the functional consequences of most rare genetic variants remain poorly understood, severely limiting our ability to connect variants to their consequences on protein function, identify biochemical mechanisms by which variation causes disease, and interpret variant pathogenicity. Multiplexed Assays of Variant Effect (MAVEs), which can measure the function of tens of thousands variants, are beginning to address this problem. However, existing MAVEs cannot be applied to the approximately 10% of human genes encoding secreted proteins, about a quarter of which are associated with disease. We developed a flexible and scalable human cell surface display method, Multiplexed Surface Tethering of Extracellular Proteins (MultiSTEP), that can simultaneously measure the functional effects of tens of thousands of variants in secreted proteins. We used MultiSTEP to study the consequences of missense variation in coagulation factor IX (FIX), a vitamin K-dependent plasma serine protease where variation can cause FIX deficiency and the bleeding disorder hemophilia B. We used a panel of antibodies to detect FIX secretion or FIX post-translational modification, measuring a total of 45,024 effects for 9,007 variants. 43.8% of all possible F9 missense variants impact FIX secretion, post-translational modification or both. We also identify new signals of functional constraint on secretion including within the signal peptide, folded domains, and for nearly all variants that caused gain or loss of cysteine. FIX secretion scores correlate strongly with FIX levels in patient plasma and also reveal that most F9 missense variants causing severe hemophilia do so by profoundly impacting secretion. We integrate the secretion and post-translational modification data to develop a F9 variant classifier that can identify loss of function variants with high specificity. We use the resulting classifications to reinterpret and upgrade 62 of 97 F9 variants of uncertain significance (VUS) in the MyLifeOurFuture hemophilia genotyping project to likely pathogenic. Lastly, we show that MultiSTEP can be applied to a wide variety of secreted proteins, ranging from small signaling proteins like insulin to large proteins like factor VIII. Thus, we establish a multiplexed, multimodal, and generalizable method for systematically assessing variant effects for secreted proteins at scale, paving the way for improved understanding of biochemical mechanisms of disease and clinical variant interpretation.
Project description:Missense variants that change the amino acid sequences of proteins cause one third of human genetic diseases. Tens of millions of missense variants exist in the current human population, with the vast majority having unknown functional consequences. Here we present the first large-scale experimental analysis of human missense variants. Using DNA synthesis and cellular selection experiments we quantify the impact of >500,000 variants on the abundance of >500 human protein domains. This dataset, Domainome 1.0, reveals that >60% of disease-causing variants destabilize proteins. The contribution of stability to protein fitness varies across proteins and diseases, and is particularly important in recessive disorders. Combining experimental stability measurements with large language models we annotate functionally important sites across domains. Fitting energy models to the data demonstrates the conservation of mutation effects in homologous domains and allows stability to be accurately predicted for entire domain families. Domainome 1.0 demonstrates the feasibility of assaying human protein variant effects at scale and provides a large consistent reference dataset for clinical variant interpretation and the training and benchmarking of computational methods.
Project description:Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5,832 natural DNA variants in the promoters of 2,503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, consistent with the action of negative selection. Causal variants were enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
Project description:Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5,832 natural DNA variants in the promoters of 2,503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, consistent with the action of negative selection. Causal variants were enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
Project description:Humans co-existed and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,224 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 969 exonic splicing mutations (ESMs) that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than in Neanderthals. Adaptively introgressed variants were enriched for moderate effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a novel tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a novel Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide novel insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Project description:Genome-directed oncology has the potential to revolutionize patient treatment, but is limited by an abundance of rare, uncharacterized and therapeutically uninformative somatic variants. To accelerate characterization of the “long tail” of rare somatic variants, we quantified the activity and drug responsiveness of virtually all possible (99.84%) missense variants in the Ser/Thr kinase MAPK1/ERK2. We identified recurrent and rare hypermorphic and loss-of-function alleles, revealing that variant activity is uncorrelated with mutational frequency. Somatic ERK2 variants displayed variable responses to RAF-, MEK- and ERK-directed therapies, potentially informing clinical treatment strategies for patients whose tumors harbor these alterations. A subset of recurrent and rare somatic variants co-localized on ERK2 protein-protein interfaces, yet engendered contrasting phenotypes based on their specific sub-domain localization. The approach presented here represents an allele-characterization framework that compliments existing computational efforts and supports current and future somatic variant discovery efforts, advancing the promise of genome-guided treatment strategies.
Project description:CYP2C9 encodes a cytochrome P450 enzyme responsible for metabolizing up to 15% of small molecule drugs, and CYP2C9 variants can alter the safety and efficacy of these therapeutics. In particular, the anti-coagulant warfarin is prescribed to over 15 million people annually and polymorphisms in CYP2C9 can affect patient response leading to an increased risk of hemorrhage. We developed Click-seq, a pooled yeast-based activity assay to test thousands of variants. Using Click-seq, we measured the activity of 6,142 missense variants expressed in yeast. We also measured the steady-state cellular abundance of 6,370 missense variants expressed in a human cell line using Variant Abundance by Massively Parallel sequencing (VAMP-seq). These data revealed that almost two-thirds of CYP2C9 variants showed decreased activity, and that protein abundance accounted for half of the variation in CYP2C9 function. We also measured activity scores for 319 previously unannotated human variants, many of which may have clinical relevance.