Project description:This study aims to predict the activity and specificity of CRISPR/Cas9 by deep learning at genome-scale among different cell lines. Here, we have focused on embracing and modifying a system for evaluating SpCas9 activity of on-target and off-target using >1,000,000 guide RNAs (gRNAs) covering ~20,000 protein-coding genes and ~10,000 non-coding genes in synthetic constructs with a high-throughput manner. With the help of deep learning algorithms in the field of artificial intelligence, three prediction models with the best generalization performance now are constructed: Aidit_Cas9-ON, Aidit_Cas9-OFF, and Aidit_Cas9-DSB. Moreover, through systematically investigating the influence of diverse cellular environment on gRNA activity and specificity, we noticed that distinct features are favored from H1 cell line compared with the other 2 cell lines for on-target activity and the overall distribution of repair outcomes is markedly different across 3 cell lines, especially in Jurkat. Finally, we identify a key effect protein DNTT strongly influences editing outcomes induced by CRISPR/Cas9. We confirm that this study will greatly facilitate CRISPR-based genome editing.
Project description:In this study, we used a surrogate lentivirus library to capture CRISPR editing outcome in HEK cells. The dataset include quantification of indel frequencies for SpCas9 gRNAs in 12,000 surrogate sites. After filtering low quality sites, the high quality SpCas9 gRNA activities from a total of 10592 sites have been used to develop an improved deep learning-based prediction model CRISPRon (https://rth.dk/resources/crispr/.).
Project description:CRISPR interference (CRISPRi), the targeting of a catalytically dead Cas protein to block transcription, is the leading technique to silence gene expression in bacteria. However, design rules for CRISPRi remain poorly defined, limiting predictable design for gene interrogation, pathway manipulation, and high-throughput screens. Here we develop a best-in-class prediction algorithm for guide silencing efficiency by systematically investigating factors influencing guide depletion in multiple genome-wide essentiality screens, with the surprising discovery that gene-specific features such as transcriptional activity substantially impact prediction of guide activity. Accounting for these features as part of algorithm development allowed us to develop a mixed-effect random forest regression model that provides better estimates of guide efficiency than existing methods, as demonstrated in an independent saturating screen. We further applied methods from explainable AI to extract interpretable design rules from the model, such as sequence preferences in the vicinity of the PAM distinct from those previously described for genome engineering applications. Our approach provides a blueprint for the development of predictive models for CRISPR technologies where only indirect measurements of guide activity are available.
Project description:Transcriptional enhancers act as docking stations for combinations of transcription factors and thereby regulate spatiotemporal activation of their target genes. A single enhancer, of a few hundred base pairs in length, can autonomously and independently of its location and orientation drive cell-type specific expression of a gene or transgene. It has been a long-standing goal in the field to decode the regulatory logic of an enhancer and to understand the details of how spatiotemporal gene expression is encoded in an enhancer sequence. Recently, deep learning models have yielded unprecedented insight into the enhancer code, and well-trained models are reaching a level of understanding that may be close to complete. As a consequence, we hypothesized that deep learning models can be used to guide the directed design of synthetic, cell type specific enhancers, and that this process would allow for a detailed tracing of all enhancer features at nucleotide-level resolution. Here we implemented and compared three different design strategies, each built on a deep learning model: (1) directed sequence evolution; (2) directed iterative motif implanting; and (3) generative design. We evaluated the function of fully synthetic enhancers to specifically target Kenyon cells or glial cells in the fruit fly brain using transgenic animals. We then exploited this concept further by creating “dual-code” enhancers that target two cell types, and minimal enhancers smaller than 50 base pairs that are fully functional. By examining the trajectories followed during state space searches towards functional enhancers, we could accurately define the enhancer code as the optimal strength, combination, and relative distance of TF activator motifs, and the absence of TF repressor motifs. Finally, we applied the same three strategies to successfully design human enhancers, finding highly similar design principles as in Drosophila. In conclusion, enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.