Project description:Missense variants that change the amino acid sequences of proteins cause one third of human genetic diseases. Tens of millions of missense variants exist in the current human population, with the vast majority having unknown functional consequences. Here we present the first large-scale experimental analysis of human missense variants. Using DNA synthesis and cellular selection experiments we quantify the impact of >500,000 variants on the abundance of >500 human protein domains. This dataset, Domainome 1.0, reveals that >60% of disease-causing variants destabilize proteins. The contribution of stability to protein fitness varies across proteins and diseases, and is particularly important in recessive disorders. Combining experimental stability measurements with large language models we annotate functionally important sites across domains. Fitting energy models to the data demonstrates the conservation of mutation effects in homologous domains and allows stability to be accurately predicted for entire domain families. Domainome 1.0 demonstrates the feasibility of assaying human protein variant effects at scale and provides a large consistent reference dataset for clinical variant interpretation and the training and benchmarking of computational methods.
Project description:To illuminate the extent and roles of exonic sequences in the splicing of human RNA transcripts we conducted saturation mutagenesis of a 51 nt internal exon in a 3-exon minigene. All possible single and tandem dinucleotide substitutions were surveyed. Using high throughput genetics, 5560 minigene molecules were assayed for splicing in HEK293 cells. Over 70% of mutations produced substantial (>2X) phenotypes of either increased or decreased splicing. Of all predicted secondary structural elements only a single 15 nt stem-loop, showed a strong correlation with splicing, acting negatively. The in vitro formation of exon-protein complexes between the mutant molecules and proteins associated with spliceosome formation (U2AF35, U2AF65, U1Aa, and U1-70K) correlated with splicing efficiencies, suggesting exon definition as the step affected by most mutations. The measured relative binding affinities of dozens of human RNA binding protein domains as reported in the CISBP-RNA database were found to correlate either positively or negatively with splicing efficiency, more than could fit on the 51 nt test exon simultaneously. Surprisingly, such correlations extended to weak relative protein-sequence affinities. These myriad protein binding correlations point to a dynamic and heterogeneous population of pre-mRNA molecules, each responding to a particular collection of binding proteins.
Project description:The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we performed saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitution and deletion mutations. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or various integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and also comprise a gold-standard dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations.
Project description:The CRISPR-Cas9 system enables efficient sequence-specific mutagenesis for creating germline mutants of model organisms. Key constraints in vivo remain the expression and delivery of active Cas9-guideRNA ribonucleoprotein complexes (RNPs) with minimal toxicity, variable mutagenesis efficiencies depending on targeting sequence, and high mutation mosaicism. Here, we established in vitro-assembled, fluorescent Cas9-sgRNA RNPs in stabilizing salt solution to achieve maximal mutagenesis efficiency in zebrafish embryos. Sequence analysis of targeted loci in individual embryos reveals highly efficient bi-allelic mutagenesis that reaches saturation at several tested gene loci. Such virtually complete mutagenesis reveals preliminary loss-of-function phenotypes for candidate genes in somatic mutant embryos for subsequent generation of stable germline mutants. We further show efficient targeting of functional non-coding elements in gene-regulatory regions using saturating mutagenesis towards uncovering functional control elements in transgenic reporters and endogenous genes. Our results suggest that in vitro assembled, fluorescent Cas9-sgRNA RNPs provide a rapid reverse-genetics tool for direct and scalable loss-of-function studies beyond zebrafish applications.
Project description:BACKGROUND: Various DNA manipulation methods have been developed to prepare mutant genes for protein engineering. However, development of more efficient and convenient method is still demanded. Homologous DNA assembly methods, which do not depend on restriction enzymes, have been used as convenient tools for cloning and have been applied to site-directed mutagenesis recently. This study describes an optimized homologous DNA assembly method, termed as multiple patch cloning (MUPAC), for multiple site-directed and saturation mutagenesis. RESULTS: To demonstrate MUPAC, we introduced five back mutations to a mutant green fluorescent protein (GFPuv) with five deleterious mutations at specific sites and transformed Escherichia coli (E. coli) with the plasmids obtained. We observed that the over 90% of resulting colonies possessed the plasmids containing the reverted GFPuv gene and exhibited fluorescence. We extended the test to introduce up to nine mutations in Moloney Murine Leukemia Virus reverse transcriptase (M-MLV RT) by assembling 11 DNA fragments using MUPAC. Analysis of the cloned plasmid by electrophoresis and DNA sequencing revealed that approximately 30% of colonies had the objective mutant M-MLV RT gene. Furthermore, we also utilized this method to prepare a library of mutant GFPuv genes containing saturation mutations at five specific sites, and we found that MUPAC successfully introduced NNK codons at all five sites, whereas other site remained intact. CONCLUSIONS: MUPAC could efficiently introduce various mutations at multiple specific sites within a gene. Furthermore, it could facilitate the preparation of experimental gene materials important to molecular and synthetic biology research.