Project description:SYBA uses a fragment-based approach to classify whether a molecule is easy or hard to synthesize, and it can also be used to analyze the contribution of individual fragments to the total synthetic accessibility. The easy-to-synthesize dataset is an extract of the ZINC purchasable compounds, and the hard-to-synthesize dataset is generated using a Nonpher approach (introducing small molecular perturbations to transform molecules into more complex compounds). The fragments are calculated with ECFP8 descriptors, and independence between fragments is assumed.
Model Type: Predictive machine learning model.
Model Relevance: Prediction of synthetic accessibility
Model Encoded by: Miquel Duran-Frigola (Ersilia)
Metadata Submitted in BioModels by: Zainab Ashimiyu-Abdusalam
Implementation of this model code by Ersilia is available here:
https://github.com/ersilia-os/eos7pw8
Project description:Small RNA sequencing on trophoblast debris samples was employed to profile the small RNA contents in either normotensive or preeclamptic trophoblast debris. We have identified 1278 miRNAs and 2646 non-miRNA small RNA fragments across all trophoblast debris samples. Differential expression analysis was executed by iSRAP small RNA sequencing analysis pipeline and we identified 16 miRNAs, 5 tRNA fragments from 3 different tRNAs, 13 snRNA fragments and 85 rRNA fragments differentially contained between preeclamptic and normotensive trophoblast debris
Project description:Mapping DNase I hypersensitive sites (DHSs) within nuclear chromatin is a traditional and powerful method of identifying genetic regulatory elements. DHSs have been mapped by capturing the ends of long DNase I-cut fragments (>100,000 bp), or 100-1200 bp DNase I-double cleavage fragments (also called double-hit fragments). But next generation sequencing requires a DNA library containing DNA fragments of 100-500bp. Therefore, we have modified the double-hit method and use short DNA fragments to generate DNA libraries for next generation sequencing. We call this method Short DHS Assay (Short DNAse I Hypersensitive Site assay). The short segments are 100-300bp and can be directly cloned and used for high-throughput sequencing. We identified 83,897 DHSs in 2,343,479 tags across the human genome. Our results indicate that the DHSs identified by the Short DHS assay are consistent with those identified by longer fragments in previous studies.
Project description:ChIP-seq has become the method of choice for studying functional DNA-protein interactions on a genome wide scale. The method is based on co-immunoprecipitation of DNA binding proteins with formaldehyde cross-linked DNA, followed by deep sequencing of immunoprecipitated chromatin fragments, allowing for the identification of binding sites with high accuracy [1-5]. Traditionally, genome-wide tiling path microarrays were used for the detection of specifically immunoprecipitated DNA fragments, but current next-generation sequencers now generate sufficient data to assay multiple samples in a single sequencing run, making this method more time and cost effective compared to microarray-based approaches [2]. However, due to limitations in read length of next generation sequencing technologies (50-76bp), it is not possible to sequence the complete length of immunoprecipitated DNA fragments which can be up to 2kb long reflecting biology in case of big protein complexes. As a consequence only the ends of the immunoprecipitated DNA fragments are sequenced. Deconvolution of sequencing reads mapping to the positive and negative strand is required to identify the real DNA binding site [4-9]. This limitation is most obvious in case of large complex regions where multiple binding sites of various regulatory elements are clustered close to each other. Deconvolution of such regions and the exact identification of individual binding positions is very challenging [4]. Similar problems can occur when studying histone positioning in cases where the length of ChIP fragments is bigger than the average distance of 2 neighboring nucleosomes. In addition frequently only narrow size range of immunoprecipitated fragments is selected for sequencing and thus possible bias towards binding regions within the selected range can be expected by loosing larger fragments possible originated from protein complexes interacting with DNA. To circumvent these complications, we modified the procedure for the preparation of ChIP-seq samples by introducing an additional extensive fragmentation round after isolation of immunoprecipitated chromatin to generate 70-110 bp long DNA fragments. For testing, we choose Tcf4 protein which is a well-studied downstream element of the Wnt-pathway for which the genome wide binding site profile is known was already known from ChIP-on-CHIP experiments [10]. Using the modified approach, we were able to identify Tcf4 binding sites at near nucleotide resolution without computational deconvolution. Furthermore, high-resolution information on genome-wide binding site regions allowed for the identification of potential novel Tcf4 co-factors as well as target genes. 5 samples + 3 input samples
Project description:Cis-regulatory elements (CREs) control how genes respond to external signals, but the principles governing their structure and function remain poorly understood. While differential transcription factor binding is known to regulate gene expression, how CREs integrate the amount and combination of inputs to secure precise spatiotemporal profiles of gene expression remains unclear. Here, we developed a high-throughput combinatorial screening strategy, that we term NeMECiS , to investigate signal-dependent synthetic CREs (synCREs) in differentiating mammalian stem cells. By concatenating fragments of functional CREs from genes that respond to Sonic Hedgehog in the developing vertebrate neural tube, we found that CRE activity follows hierarchical design rules. While individual 200-base-pair fragments showed minimal activity, their combinations generated thousands of functional signal-responsive synCREs, many exceeding the activity of natural sequences. Statistical modelling revealed CRE function can be decomposed into specific quantitative contributions in which sequence fragments combine through a multiplicative rule, tuned by their relative positioning and spacing. These findings provide a predictive framework for CRE redesign, which we used to engineer synthetic CREs that alter the pattern of motor neuron differentiation in neural tissue. These findings establish quantitative principles for engineering synthetic regulatory elements with programmable signal responses to rewire genetic circuits and control stem cell differentiation, providing a basis for understanding developmental gene regulation and designing therapeutic gene expression systems.
Project description:An ability to map the global interactions of a chemical entity with chromatin genome-wide could provide new insights into the mechanisms by which a small molecule perturbs cellular functions. we developed a method that uses chemical derivatives and massively parallel DNA sequencing (Chem-Seq) to identify the sites bound by small chemical molecules throughout the human genome. We developed in vivo and in vitro Chem-Seq protocols with a biotinylated derivative of small molecules. In the in vivo protocol, Cells were first treated with biotinylated ligand and cross-linked with formaldehyde at the same time. Cells were then lysed, sonicated to shear the DNA, and streptavidin beads were used to isolate biotinylated ligand and associated chromatin fragments. We then used massively parallel sequencing to identify the enriched DNA fragments, and mapped these sequences to the genome. In in vitrol protocol, MM1.S cells were fixed and the derived sonicated lysate incubated with biotinylated drug to enrich for bound chromatin regions in vitro. We then used massively parallel sequencing to identify the enriched DNA fragments, and mapped these sequences to the genome.
Project description:Extracellular RNAs (exRNAs) in blood and other biofluids have attracted great interest as potential biomarkers in liquid biopsy applications, as well as for their potential biological functions. Whereas it is well-established that extracellular microRNAs are present in human blood circulation, the degree to which messenger RNAs (mRNA) and long noncoding RNAs (lncRNA) are represented in plasma is less clear. Here we report that mRNA and lncRNA species are present as small fragments in plasma that are not detected by standard small RNA-seq methods, because they lack 5’-phosphorylation or carry 3’-phosphorylation. We developed a modified sequencing protocol (termed “phospho-sRNA-seq”) that incorporates upfront RNA treatment with T4 polynucleotide kinase (which also has 3’ phosphatase activity) and compared it to a standard small RNA-seq protocol, using as input both a pool of synthetic RNAs with diverse 5’ and 3’ end chemistries, as well exRNA isolated from human blood plasma. This series uses the synthetic pools of small RNAs to demonstrate the efficacy of phospho-sRNA-seq to enable capture of sRNAs lacking a 5’ phosphate and/or having a 3’ phosphate.
Project description:To study the evolution of nucleosome positioning we mapped nucleosome positioning in two species of yeasts. Identified differences in nucleosome positioning were classified into cis-based changes or trans-bseed changes based on the pattern of nucleosomes in the hybrid. This analysis was performed for wild-type strains as well as for strains deleted of 5 chromatin regulatoirs allolwing us to examine their roles in determining nucleosome positioning. Illumina sequencing of mono-nucleosome fragments isolated by MNase digestion. Samples include pooled DNA fragments of S. cerevisiae and S. paradoxus or DNA fragments of the interspecific hybrid. Experiments were performed for WT strains as well as strains deleted of 5 chromatin regulators.
Project description:We performed the cleavage under targets and tagmentation (Cut & tag) assay followed by sequencing enriched DNA fragments to reveal the direct downstream targets of Pbx1. Firstly, we overexpressed Pbx1b with Pbx1b-IRES-GFP retrovirus in murine peripheral B cells to ensure the yields of DNA fragments. CUT & tag libraries were generated following instructions of the manufacturer’s protocol (Vazyme; cat TD901-01) and the Pbx1 antibody (CST; cat 4342) was used for signal enrichment.