Project description:Mutation effects prediction is a fundamental challenge in biotechnology and biomedicine. State-of-the-art computational methods have demonstrated the benefits of including semantically rich representations learned from protein sequences, but leave structural constraints out of reach. Here we developed Protein Mutational Effect Predictor (ProMEP), a general and multimodal deep representation learning method that simultaneously learns sequence context and structural constraints from proteins at the scale of evolution. ProMEP markedly outperforms current leading methods and enables accurate zero-shot mutational effects prediction across a variety of deep mutational scanning experiments. The application of ProMEP in the transposon-associated TnpB enzyme engineering task further demonstrates its ability for high-throughput protein space exploration. Without prior knowledge of TnpB, ProMEP accurately identifies multiple mutations that significantly improve the editing efficiency from millions of variants.
Project description:Insertion sequences (IS) are compact and pervasive transposable elements found in bacteria, which encode only the genes necessary for their mobilization and maintenance. IS200/IS605 elements undergo ‘peel-and-paste’ transposition catalyzed by a TnpA transposase, but intriguingly, they also encode diverse, TnpB-family genes that are evolutionarily related to the CRISPR-associated effectors Cas9 and Cas12. Recent studies demonstrated that TnpB-family enzymes function as RNA-guided DNA endonucleases, but the broader biological role of this activity has remained enigmatic. Here we show that IscB and TnpB are essential to prevent loss of the donor IS element and potential transposon extinction as a consequence of the TnpA transposition mechanism. We first performed phylogenetic analysis of IscB/TnpB proteins and selected a family of related IS elements from Geobacillus stearothermophilus that we predicted would be mobilized by a common TnpA homolog. After reconstituting transposition using a heterologous expression system in E. coli, we found that IS elements were readily lost from the donor site due to the activity of TnpA in rejoining the flanking sequences back together upon excision. However, these IS elements also encode non-coding RNAs that guide TnpB and IscB nucleases to precisely recognize and cleave these excision products, leading either to elimination of the excision product or re-installation of the transposon through recombination. Indeed, under experimental conditions in which TnpA and TnpB-RNA complexes were co-expressed together with a genomically integrated IS element, transposon retention was significantly increased relative to conditions expressing TnpA alone. Remarkably, both TnpA and TnpB recognize the same AT-rich transposon-adjacent motif (TAM) during transposon excision and RNA-guided DNA cleavage, respectively, revealing a striking convergence in the evolution of DNA sequence specificity between transposase and nuclease. Collectively, our study reveals that RNA-guided DNA cleavage is a primal biochemical activity that arose to bias the selfish inheritance of transposable elements, which was later co-opted during the evolution of CRISPR-Cas adaptive immunity for antiviral defense.
Project description:Transposon-encoded tnpB and iscB genes encode RNA-guided DNA nucleases that promote their own selfish spread through targeted DNA cleavage and homologous recombination. These widespread gene families were repeatedly domesticated over evolutionary timescales, leading to the emergence of diverse CRISPR-associated nucleases including Cas9 and Cas12. We set out to test the hypothesis that TnpB nucleases may have also been repurposed for novel, unexpected functions other than CRISPR-Cas. Here, using phylogenetics, structural predictions, comparative genomics, and functional assays, we uncover multiple instances of programmable transcription factors that we name TnpB-like nuclease-dead repressors (TldR). These proteins employ naturally occurring guide RNAs to specifically target conserved promoter regions of the genome, leading to potent gene repression in a mechanism akin to CRISPRi technologies invented by humans. Focusing on a TldR clade found broadly in Enterobacteriaceae, we discover that bacteriophages exploit the combined action of TldR and an adjacently encoded phage gene to alter the expression and composition of the host flagellar assembly, a transformation with the potential to impact motility, phage susceptibility, and host immunity. Collectively, this work showcases the diverse molecular innovations that were enabled through repeated exaptation of transposon-encoded genes, and reveals the evolutionary trajectory of diverse RNA-guided transcription factors.
Project description:TnpB nucleases represent the evolutionary precursors to CRISPR-Cas12 and are widespread in all domains of life. IS605-family TnpB homologs function in bacteria as programmable RNA-guided homing endonucleases driving transposon maintenance through DSB-stimulated homologous recombination. Here we uncover molecular mechanisms of transposition lifecycle of IS607-family elements that, remarkably, also encode group I introns. We discover molecular features for a candidate ‘IStron’ from Clostridium botulinum that allow the element to carefully control the relative levels of spliced products versus functional guide RNAs. Our results suggest that IStron transcripts have evolved a sensitive equilibrium to balance competing and mutually exclusive activities that promote transposon maintenance while limiting adverse fitness costs on the host. Collectively, this work highlights molecular innovation in the multi-functional utility of transposon-encoded noncoding RNAs.
Project description:TnpB nucleases represent the evolutionary precursors to CRISPR-Cas12 and are widespread in all domains of life. IS605-family TnpB homologs function in bacteria as programmable RNA-guided homing endonucleases driving transposon maintenance through DSB-stimulated homologous recombination. Here we uncover molecular mechanisms of transposition lifecycle of IS607-family elements that, remarkably, also encode group I introns. We discover molecular features for a candidate ‘IStron’ from Clostridium botulinum that allow the element to carefully control the relative levels of spliced products versus functional guide RNAs. Our results suggest that IStron transcripts have evolved a sensitive equilibrium to balance competing and mutually exclusive activities that promote transposon maintenance while limiting adverse fitness costs on the host. Collectively, this work highlights molecular innovation in the multi-functional utility of transposon-encoded noncoding RNAs.
Project description:Somatic transposon mutagenesis in mice is an efficient strategy to investigate the genetic mechanisms of tumorigenesis. The identification of tumor driving transposon insertions traditionally requires the generation of large tumor cohorts to obtain information about common insertion sites. Tumor driving insertions are also characterized by their clonal expansion in tumor tissue, a phenomenon that is facilitated by the slow and evolving transformation process of transposon mutagenesis. We describe here an improved approach for the detection of tumor driving insertions that assesses the clonal expansion of insertions by quantifying the relative proportion of sequence reads obtained in individual tumors. To this end, we have developed a protocol for insertion site sequencing that utilizes acoustic shearing of tumor DNA and Illumina sequencing. We analyzed various solid tumors generated by PiggyBac mutagenesis and for each tumor >10^6 reads corresponding to >10^4 insertion sites were obtained. In each tumor, 9 to 25 insertions stood out by their enriched sequence read frequencies when compared to frequencies obtained from tail DNA controls. These enriched insertions are potential clonally expanded tumor driving insertions, and thus identify candidate cancer genes. The candidate cancer genes of our study comprised many established cancer genes, but also novel candidate genes such as Mastermind-like1 (Mamld1) and Diacylglycerolkinase delta (Dgkd). We show that clonal expansion analysis by high-throughput sequencing is a robust approach for the identification of candidate cancer genes in insertional mutagenesis screens on the level of individual tumors. Solid tumors in mice were generated by somatic transposon mutagenesis with a PiggyBac transposon system. Insertion sites of transposons in 11 tumors and 6 non-cancerous tail controls were determined by Illumina high-throughput sequencing. Insertions were determined both on 5' and 3' sides of the transposon (PB5 and PB3, respectively). Quantitative analysis of read numbers revealed enrichment of certain insertions in tumors, but not in controls, and these enriched insertions identify candidate cancer genes.
Project description:We present LoxTnSeq, a new methodology to generate and catalogue libraries of genome reduction mutants. LoxTnSeq combines random integration of Lox sites by transposon mutagenesis, and the generation of mutants via cre recombinase, catalogued via deep-sequencing. When LoxTnSeq was applied to the naturally genome reduced bacterium Mycoplasma pneumoniae, we obtained a mutant pool containing 285 unique deletions. These deletions spanned from >50 bp to 28 Kb, which represent 21% of the total genome. LoxTnSeq also highlighted large regions of non-essential genes that could be removed simultaneously, and other similar regions that could not, providing a guide for future genome reductions.
Project description:We obtained genome-wide digital gene expression tag profiles within the first three days of P. patens protoplast reprogramming. At four time-points during protoplast reprogramming, the transcript levels of 4827 genes changed more than four-fold and their expression correlated with the reprogramming phase. Gene ontology (GO) and pathway enrichment analysis of differentially expressed genes (DEGs) identified a set of significantly enriched GO terms and pathways, most of which were associated with photosynthesis, protein synthesis and stress responses. DEGs were grouped into six clusters that showed specific expression patterns using a K-means clustering algorithm. An investigation of function and expression patterns of genes identified a number of key candidate genes and pathways in early stages of protoplast reprogramming, which provided important clues to reveal the molecular mechanisms responsible for protoplast reprogramming. We identified genes that show highly dynamic changes in expression during protoplast reprogramming into stem cells in P. patens. These genes are potential targets for further functional characterization and should be valuable for exploration of the mechanisms of stem cell reprogramming. In particular, our data provides evidence that protoplasts of P. patens are an ideal model system for elucidation of the molecular mechanisms underlying differentiated plant cell reprogramming. Examination of 4 different sampling times.