Project description:BackgroundPost-transcriptional gene regulation controls the amount of protein produced from an individual mRNA by altering rates of decay and translation. Many sequence elements that direct post-transcriptional regulation have been found; in mammals, most such elements are located within the 3' untranslated regions (3'UTRs). Comparative genomic studies demonstrate that mammalian 3'UTRs contain extensive conserved sequence tracts, yet only a small fraction corresponds to recognized elements, implying that many additional novel elements exist. Despite a variety of computational, molecular, and biochemical approaches, identifying functional 3'UTRs elements remains difficult.ResultsWe created a high-throughput cell-based screen that enables identification of functional post-transcriptional 3'UTR regulatory elements. Our system exploits integrated single-copy reporters, which are expressed and processed as endogenous genes. We screened many thousands of short random sequences for their regulatory potential. Control sequences with known effects were captured effectively using our approach, establishing that our methodology was robust. We found hundreds of functional sequences, which we validated in traditional reporter assays, including verifying their regulatory impact in native sequence contexts. Although 3'UTRs are typically considered repressive, most of the functional elements were activating, including ones that were preferentially conserved. Additionally, we adapted our screening approach to examine the effect of elements on RNA abundance, revealing that most elements act by altering mRNA stability.ConclusionsWe developed and used a high-throughput approach to discover hundreds of post-transcriptional cis-regulatory elements. These results imply that most human 3'UTRs contain many previously unrecognized cis-regulatory elements, many of which are activating, and that the post-transcriptional fate of an mRNA is largely due to the actions of many individual cis-regulatory elements within its 3'UTR.
Project description:Post-transcriptional gene regulation controls the amount of a protein produced from an individual mRNA transcript by altering mRNA decay and translation rates. Many putative post-transcriptional cis-regulatory elements have been identified from computational, molecular biology, and biochemical studies studies; however, identifying which sequence elements are sufficient to regulate expression remains challenging. We created a high-throughput, cell-based screen that tested the post-transcriptional regulatory potential for thousands of short sequence elements. Sequences with known effects have the expected performance in this screen, showing this methodology is robust. Hundreds of novel short sequences were identified as being able to alter gene expression, both by increasing and decreasing protein production from the fluorescence reporter, and we validated the effects for fifty of these sequences. Importantly, sequences discovered in this screen are conserved in human 3′UTRs, and furthermore, the sequences can regulate expression in the context of those endogenous 3′UTRs. Hundreds of previously unknown post-transcriptional cis-regulatory elements exist, many of which increase gene expression. These results suggest that each human 3′UTR has many small cis-regulatory elements that interact with RNA binding proteins, and these interactions control the fate of an mRNA transcript.
Project description:Post-transcriptional gene regulation controls the amount of a protein produced from an individual mRNA transcript by altering mRNA decay and translation rates. Many putative post-transcriptional cis-regulatory elements have been identified from computational, molecular biology, and biochemical studies studies; however, identifying which sequence elements are sufficient to regulate expression remains challenging. We created a high-throughput, cell-based screen that tested the post-transcriptional regulatory potential for thousands of short sequence elements. Sequences with known effects have the expected performance in this screen, showing this methodology is robust. Hundreds of novel short sequences were identified as being able to alter gene expression, both by increasing and decreasing protein production from the fluorescence reporter, and we validated the effects for fifty of these sequences. Importantly, sequences discovered in this screen are conserved in human 3?UTRs, and furthermore, the sequences can regulate expression in the context of those endogenous 3?UTRs. Hundreds of previously unknown post-transcriptional cis-regulatory elements exist, many of which increase gene expression. These results suggest that each human 3?UTR has many small cis-regulatory elements that interact with RNA binding proteins, and these interactions control the fate of an mRNA transcript.
Project description:Growing evidence suggests that functional cis-regulatory elements (cis-REs) not only exist in epigenetically marked but also in unmarked sites of the human genome. While it is already difficult to identify cis-REs in the epigenetically marked sites, interrogating cis-REs residing within the unmarked sites is even more challenging. Here, we report adapting Reel-seq, an in vitro high-throughput (HTP) technique, to fine-map cis-REs at high resolution over a large region of the human genome in a systematic and continuous manner. Using Reel-seq, as a proof-of-principle, we identified 408 candidate cis-REs by mapping a 58 kb core region on the aging-related CDKN2A/B locus that harbors p16INK4a. By coupling Reel-seq with FREP-MS, a proteomics analysis technique, we characterized two cis-REs, one in an epigenetically marked site and the other in an epigenetically unmarked site. These elements are shown to regulate the p16INK4a expression over an ∼100 kb distance by recruiting the poly(A) binding protein PABPC1 and the transcription factor FOXC2. Downregulation of either PABPC1 or FOXC2 in human endothelial cells (ECs) can induce the p16INK4a-dependent cellular senescence. Thus, we confirmed the utility of Reel-seq and FREP-MS analyses for the systematic identification of cis-REs at high resolution over a large region of the human genome.
Project description:BackgroundDespite substantial progress in mosquito genomic and genetic research, few cis-regulatory elements (CREs), DNA sequences that control gene expression, have been identified in mosquitoes or other non-model insects. Formaldehyde-assisted isolation of regulatory elements paired with DNA sequencing, FAIRE-seq, is emerging as a powerful new high-throughput tool for global CRE discovery. FAIRE results in the preferential recovery of open chromatin DNA fragments that are not bound by nucleosomes, an evolutionarily conserved indicator of regulatory activity, which are then sequenced. Despite the power of the approach, FAIRE-seq has not yet been applied to the study of non-model insects. In this investigation, we utilized FAIRE-seq to profile open chromatin and identify likely regulatory elements throughout the genome of the human disease vector mosquito Aedes aegypti. We then assessed genetic variation in the regulatory elements of dengue virus susceptible (Moyo-S) and refractory (Moyo-R) mosquito strains.ResultsAnalysis of sequence data obtained through next generation sequencing of FAIRE DNA isolated from A. aegypti embryos revealed >121,000 FAIRE peaks (FPs), many of which clustered in the 1 kb 5' upstream flanking regions of genes known to be expressed at this stage. As expected, known transcription factor consensus binding sites were enriched in the FPs, and of these FoxA1, Hunchback, Gfi, Klf4, MYB/ph3 and Sox9 are most predominant. All of the elements tested in vivo were confirmed to drive gene expression in transgenic Drosophila reporter assays. Of the >13,000 single nucleotide polymorphisms (SNPs) recently identified in dengue virus-susceptible and refractory mosquito strains, 3365 were found to map to FPs.ConclusionFAIRE-seq analysis of open chromatin in A. aegypti permitted genome-wide discovery of CREs. The results of this investigation indicate that FAIRE-seq is a powerful tool for identification of regulatory DNA in the genomes of non-model organisms, including human disease vector mosquitoes.
Project description:MotivationThe accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected.ResultsWe present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays.Availability and implementationBLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspellerContactKlaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair-level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth.
Project description:The discovery and mapping of cis-regulatory elements is important for understanding regulation of gene transcription in mosquito vectors of human diseases. Genome sequence data are available for 3 species, Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus (Diptera: Culicidae), representing 2 subfamilies (Culicinae and Anophelinae) that are estimated to have diverged 145 to 200 million years ago. Comparative genomics tools were used to screen genomic DNA fragments located in the 5'-end flanking regions of orthologous genes. These analyses resulted in the identification of 137 sequences, designated "mosquito motifs," 7 to 9 nucleotides in length, representing 18 families of putative cis-regulatory elements conserved significantly among the 3 species when compared to the fruit fly, Drosophila melanogaster. Forty-one of the motifs were implicated previously in experiments as sites for binding transcription factors or functioning in the regulation of mosquito gene expression. Further analyses revealed associations between specific motifs and expression profiles, particularly in those genes that show increased or decreased mRNA abundance in females following a blood meal, and those accumulating transcription products exclusively or preferentially in the midgut, fat bodies, or ovaries. These results validate the methodology and support a relationship between the discovered motifs and the conservation of hematophagy in mosquitoes.