Project description:Eucalyptus species are widely used in the forestry industry, and a significant increase in the number of sequences available in database repositories has been observed for these species. In proteomics, a protein is identified by correlating the theoretical fragmentation spectrum derived from genomic/transcriptomic data against the experimental fragmentation mass spectrum acquired from large-scale analysis of protein mixtures. Proteogenomics is an alternative approach that can identify novel proteins encoded by regions previously considered as non-coding. This study aimed to confidently identify and confirm the existence of previously unknown protein-coding sequences in the Eucalyptus grandis genome.
Project description:Despite knowledge of complex prokaryotic transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have played a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ~64% of all genes including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction datasets revealed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes - events usually considered spurious or non-functional. With experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements This SuperSeries is composed of the SubSeries listed below.
Project description:We describe a refined approach to identify new human RNA-protein interactions. In vitro transcribed labeled RNA is bound to ~9,400 human recombinant proteins spotted on protein microarrays. This approach identified 137 RNA-protein interactions for 10 human coding and non-coding RNAs, including an interaction between Staufen 1 protein and TP53 mRNA that promoted the latter’s stability. RNA hybridization to protein microarrays allows rapid identification of human RNA-protein interactions on a large scale. Sense and antisense strands for 10 RNA transcripts representing protein coding RNAs TP53, HRAS, MYC, BCL2 and non-coding sequences PWRN1, SOX2OT, OCC1, IGF2RNC, lncRBM26 and DLEU1 were in vitro transcribed, labeled with Cy5 and independently hybridized on human protein microarrays. The labeling process was optimized in order to achieve ~ 3 pmol dye per every microgram RNA with average efficacy of 1 dye molecule for approximately every 850 bp RNA to minimally influence RNA native structure and at the same time yield in signal intensities that were readily visualized.
Project description:This SuperSeries is composed of the following subset Series: GSE12923: Halobacterium salinarum NRC-1 growth curve, tiling arrays. GSE12977: Halobacterium salinarum NRC-1 growth curve GSE13108: Halobacterium salinarum NRC-1 conditional ChIP-chip for transcription initiation factor IIB 4 (TFBd) GSE7045: ChIP-Chip of General Transcription factors in Halobacterium NRC-1 GSE15786: Halobacterium sp. NRC-1 ChIP-chip for TFBa, TFBd and TFBf, high resolution array GSE15788: Halobacterium salinarum NRC-1 total RNA hybridization of TFBd overexpression versus Reference sample Despite knowledge of complex prokaryotic transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have played a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ~64% of all genes including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction datasets revealed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes - events usually considered spurious or non-functional. With experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements Refer to individual Series
Project description:In the yeast genome, a large proportion of nucleosomes occupy well-defined positions. While the contribution of chromatin remodelers and DNA binding proteins to maintain this organization is well established, the relevance of the DNA sequence to nucleosome positioning in the genomic context remains controversial. Through genome-wide, quantitative analysis of nucleosome positioning and high-resolution mutagenenesis of mononucleosomal DNA, we show that sequence changes distort the nucleosomal pattern at the level of individual nucleosomes. This effect is equally detected in transcribed and non-transcribed regions, suggesting the existence of sequence elements contributing to positioning. To identify such elements, we incorporated information from nucleosomal signatures into artificial synthetic DNA molecules and found that they generated regular nucleosomal arrays indistinguishable from those of endogenous sequences. Strikingly, this information is species-specific and can be combined with coding information through the use of synonymous codons such that genes from one species can be engineered to adopt the nucleosomal organization of another. These findings open up the possibility of designing coding and non-coding DNA molecules capable of directing their own nucleosomal organization.
Project description:This is a Random Forest algorithm-based machine learning model to predict lncRNAs from coding mRNAs in plant transcriptomic data. The model assigns 1 for coding sequences and 2 for long non-coding sequences. The prediction is performed using a combination of Open Reading Frame (ORF) based, Sequence-based and Codon-bias features. Users need to download the curated ONNX model and also need to convert the sequences into feature matrix as mentioned in PLIT paper (Deshpande et al. 2019) to make predictions on sequences from Zea Mays sequence data.
Project description:Mature messenger RNAs (mRNAs) consist of coding sequence (CDS) and 5’ and 3’ untranslated regions, typically expected to show similar abundance within a given neuron. Examining mRNA from defined neurons we unexpectedly show extremely common unbalanced expression of cognate 3’ UTR and CDS sequences, observing many genes with high UTR relative to CDS, and others with high CDS to UTR. By in situ hybridization 19 of 19 genes examined show a broad range of UTR to CDS expression ratios in different neurons and other tissues. These ratios may be spatially graded or change with developmental age, but are consistent across animals. Further, for two genes examined, a UTR to CDS ratio above a particular threshold in any given neuron correlated with reduced or undetectable protein expression. Our findings raise questions about the role of isolated UTR sequences in regulation of protein expression, and highlight the importance of separately examining UTR and CDS sequences in gene expression analyses.