Project description:Transcriptional regulatory elements (TREs), including enhancers and promoters, determine the transcription levels of associated genes. We have recently shown that global run-on and sequencing (GRO-seq) with enrichment for 5'-capped RNAs reveals active TREs with high accuracy. Here, we demonstrate that active TREs can be identified by applying sensitive machine-learning methods to standard GRO-seq data. This approach allows TREs to be assayed together with gene expression levels and other transcriptional features in a single experiment. Our prediction method, called discriminative Regulatory Element detection from GRO-seq (dREG), summarizes GRO-seq read counts at multiple scales and uses support vector regression to identify active TREs. The predicted TREs are more strongly enriched for several marks of transcriptional activation, including eQTL, GWAS-associated SNPs, H3K27ac, and transcription factor binding than those identified by alternative functional assays. Using dREG, we survey TREs in eight human cell types and provide new insights into global patterns of TRE function. We analyzed GRO-seq or PRO-seq data from eight human cell lines. Please note that this study comprises new sample data plus reanalysis of old Sample data submitted by another user. Existing PRO-seq or GRO-seq data was combined as detailed in the GSE66031_readme.txt. See GSM1613181 and GSM1613182 Sample records for data processing information.
Project description:Transitions between pluripotent stem cells and differentiated cells are executed by key transcription regulators. Comparative measurements of RNA polymerase distribution over the genomeM-bM-^@M-^Ys primary transcription units in different cell states can identify the genes and steps in the transcription cycle that are regulated during such transitions. To identify the complete transcriptional profiles of RNA polymerases with high sensitivity and resolution, as well as the critical regulated steps upon which regulatory factors act, we used genome-wide, nuclear run-on (GRO-seq) to map the density and orientation of transcriptionally-engaged RNA polymerases in mouse embryonic stem cells (ESCs) and embryonic fibroblasts (MEFs). In both cell types, progression of a promoter-proximal, paused RNA polymerase II (Pol II) into productive elongation is a rate-limiting step in transcription of ~40% of mRNA-encoding genes. Importantly, quantitative comparisons between cell types reveal that transcription is controlled frequently at paused Pol IIM-bM-^@M-^Ys entry into elongation. Furthermore, M-bM-^@M-^\bivalentM-bM-^@M-^] ESC genes (exhibiting both active and repressive histone modifications) bound by Polycomb Group Complexes PRC 1 and PRC2 show dramatically reduced levels of paused Pol II at promoters relative to an average gene. In contrast, bivalent promoters bound by only PRC2 allow Pol II pausing, but it is confined to extremely 5M-bM-^@M-^Y proximal regions. Altogether, these findings identify rate-limiting targets for transcription regulation during cell differentiation. Mapping engaged RNA polymerase density in two cell types by sequencing run-on transcripts. SUPPLEMENTARY FILES: All fastq files have sanger-fastq format q values. Alignments were generated with eland and the mm9 mouse genome assembly. Reads aligning to regions annotated as similar to rRNA by RepeatMasker were then removed. Wiggle files are in units of RPKM (reads per kilobase per million aligned reads) and are broken up by cell type and chromosome to aid in uploading to UCSC. Each file furthermore contains two tracks - one for each strand. As in the published paper, plus strand RPKM densities are in red with positive values and minus strand RPKM densities are in blue with negative values.
Project description:Production of mRNA depends critically on the rate of RNA polymerase II (Pol II) elongation. To dissect Pol II dynamics in mouse ES cells, we inhibited Pol II transcription at either initiation or promoter-proximal pause escape with Triptolide or Flavopiridol, and tracked Pol II kinetically using GRO-seq. Both inhibitors block transcription of more than 95% of genes, showing that pause escape, like initiation, is a ubiquitous and crucial step within the transcription cycle. Moreover, paused Pol II is relatively stable, as evidenced from half-life measurements at ~3200 genes. Finally, tracking the progression of Pol II after drug treatment establishes Pol II elongation rates at over 1,000 genes. Notably, Pol II accelerates dramatically while transcribing through genes, but slows at exons. Furthermore, intergenic variance in elongation rates is substantial, and is influenced by a positive effect of H3K79me2 and negative effects of exon density and CG content within genes. We isolated replicates of nuclei of untreated mESCs and cells treated for 2, 5, 12.5, 25 and 50 min with 300nM flavopiridol, as well as nuclei treated for 12.5, 25, and 50 min with 500nM triptolide and performed GRO-seq with these.
Project description:Tau (MAPT) is a microtubule-associated protein causing frequent neurodegenerative diseases or inherited frontotemporal lobar degenerations. Emerging evidence for non-canonical functions of Tau in DNA protection and P53 regulation suggests its involvement in cancer. Indeed, Tau expression correlates with cancer-specific survival or response to microtubule therapeutics. These data may imply common molecular pathways involved in the pathogenesis of neurodegenerative disorders and cancer. To bring new evidence that Tau represents a key protein in cancer, we present an in silico pan-cancer analysis of MAPT transcriptomic profile in over 11000 clinical samples and over 1300 pre-clinical samples provided by the TCGA and the DEPMAP datasets respectively. We completed this analysis by exploring a possible interplay of MAPT with wild-type or mutated P53. Then, we calculated the impact of MAPT expression on clinical outcome and drug response. Overall, the results support a relevant role of the MAPT gene in several cancer types, although the contribution of Tau to cancer appears to very much depend on the cellular context.
Project description:Annexin A1 (ANXA1) is a Ca2+-binding protein involved in pancreatic cancer (PC) progression. It is able to mediate cytoskeletal organization maintaining a malignant phenotype. ANXA1 Knock-Out (KO) MIA PaCa-2 cells partially lost their migratory and invasive capabilities and also the metastatization process is affected in vivo. Here, we investigated the microRNA (miRNA) profile in ANXA1 KO cells. The analysis of the modification in miRNA expression remarked the significant involvement of ANXA1 in PC progression. In this study, we focused on miR-196a which is a well known oncogenic factor in several tumour models and it appeared down-modulated in absence of ANXA1. Furthermore, both ANXA1 and miR-196a are able to trigger the mechanisms of the epithelial to mesenchymal transition (EMT). Our results show that the reintroduction of miR-196a through the mimic sequence restored the early aggressive phenotype of MIA PaCa-2. Then, ANXA1 seems to support the expression of miR-196a and its role. On the other hand, this miRNA is able to mediate some of protein functions in PC progression. This work elucidates the correlation between ANXA1 and specific miRNA sequences, particularly miR-196a, and provides new knowledge about the protein intracellular role.
Project description:To study target sequence specificity, selectivity, and reaction kinetics of Streptococcus pyogenes Cas9 activity, we challenged libraries of random variant targets with purified Cas9::guide RNA complexes in vitro. Cleavage kinetics were nonlinear, with a burst of initial activity followed by slower sustained cleavage. Consistent with other recent analyses of Cas9 sequence specificity, we observe considerable (albeit incomplete) impairment of cleavage for targets mutated in the PAM sequence or in "seed" sequences matching the proximal 8 bp of the guide. A second target region requiring close homology was located at the other end of the guide::target duplex (positions 13-18 relative to the PAM). Strikingly, a subset of variants which broke homology in the intervening region consistently increased the capacity of Cas9 to cleave in extended reactions. Sequences flanking the guide+PAM region had measurable (albeit modest) effects on cleavage. Taken together, these studies provide both a basis for predicting effective cleavage targets and a basis for potential optimization of guide RNAs to yield efficiency beyond that of the simple perfect-match guides. 118 samples anaylzed. Controls have con in sample name. To quantitatively measure cleavage efficiency of a single gRNA, we created a population of random variant target sequences to two gRNA targets. The targets used were "unc-22A", [a sequence from the well-characterized unc-22 gene of Caenorhabditis elegans], and "protospacer 4" (ps4), a previously characterized sequence from a natural spacer from S. pyogenes MGAS10750 . Using custom mixtures of oligonucleotide precursors for each base during chemical synthesis, a set of polymorphic target libraries ('Random Variant Libraries') were designed to have a baseline variation rate at each position. On each side of the gRNA homology and PAM regions, 6 bps of random sequence were added. The first base of intended gRNA homology is designated base 1 . The entire 35 bp random variant library mixture was cloned into a standard plasmid vector (pHRL-TK). Several thousand colonies from plates were washed in pools and prepared by standard plasmid preparation methods. The complexity of the libraries were estimated based on Illumina sequencing of the uncut libraries and filtering for minimum representation expected from the pooling. Approximately 1500-3000 unique species were obtained in the unc-22A libraries and 5000 unique sequences in the ps4 library (see Materials and Methods). To assay cleavage, purified Cas9 was first incubated with gRNA, followed by incubation with the variant library for various time points and under various conditions. DNA template is among the conditions varied in the experiments. After protein removal, flanking sequences outside of the target region are used for PCR amplification and plasmid cleavage was measured through loss of PCR products that span the region of interest. A set of perfectly matched targets and highly mutated versions present in the random variant library served as internal positive and negative controls respectively. A log retention score for each sequence in each experiment was calculated by quantifying the representation of each sequence before and after addition of the Cas9 protein. Two approaches were used for normalization: first we used a population of ps4 targets "spiked" into the library as an uncleaved control, second, we used a population of unc-22A targets with large numbers of variations from the perfect target (between 4 and 7), and hence likely limited if any cleavage. Equivalent results are obtained with these two normalization approaches (see Computational Methods for details). Retention scores are expressed as the log2 of the normalized ratio, so that a more negative retention score indicates efficient cleavage of substrate while a less negative score indicates less cleavage. Templates which are uncleaved will yield a retention score at or near zero. Comparisons between multiple experiments indicate strong correlation between independent retention measurements. GSM1410678-GSM1410761; AF_SOL*.dat' files contain the calculated final retentions for each experiment. Each experiment labeled: M-bM-^@M-^\AF_SOL_###_t###M-bM-^@M-^]. M-bM-^@M-^\AF_SOL_###M-bM-^@M-^] corresponds to the experiment run ID and M-bM-^@M-^\t###M-bM-^@M-^] corresponds to the incubation time of the experiment. For example AF_SOL_513_t360, corresponds to experiment 513 on the protospacer 4 guide and DNA target and the incubation time was 360 mins. The experimental conditions and ID can be found in the associated publication. GSM1544297-GSM1544332; unc*.dat file is a tab-delimited file of all considered sequences in each experiment. The names of the files and the AF_SOL_# run number can be found in the associated publication (Supplementary Materials) with the details of the conditions. Each filename starts with the type of gRNA used (either unc-22WT or the mutant version unc22C11G). The next number (#min) is indication of the time of incubation for the experiment and this is either followed by #pcr_AF_SOL_# or just AF_SOL_#. If followed by #pcr, that is the indication of the number of PCR cycles used in the experiments. Finally, AF_SOL_# denotes the sequencing run ID number.