Project description:The noncoding genome plays an important role in de novo gene birth and the emergence of genetic novelty. Nevertheless, how the properties of noncoding sequences could promote the birth of novel genes and shape the structural diversity and evolution of proteins remains unclear. Here, we investigated the potential of the noncoding genome of yeast to produce novel protein bricks that can give rise to novel genes or be integrated in pre-existing proteins, thus participating in protein structure evolution and diversity. Combining different bioinformatics approaches, we showed that intergenic ORFs of yeast encompass the large structural diversity of canonical proteins with the majority encoding peptides predicted as foldable. Then, we investigated the early stages of de novo gene birth with Ribosome Profiling and systematic reconstruction of yeast de novo gene ancestral sequences. We highlighted sequence and structural factors determining de novo gene birth and protein evolution. Finally, we showed a strong correlation between the fold potential of de novo genes and their ancestral ORFs reflecting the relationship between the noncoding genome and the protein structure universe.
Project description:The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.
Project description:This data set represents the results of two reverse labeled experiments from wild-type and RRP6delta S. cerevisiae that has been hybed to arrays containing PCR products for ORFs and Intergenic Features Keywords: genetic modification
Project description:We used a microarray covering the whole genome of R. conorii to check if intergenic sequences were found transcribed. We checked the expression signals for probes corresponding to spacers as compared to probes corresponding to Open Reading Frames (ORFs). We got total RNA from R. conorii XTC cultures; we performed cDNA synthesis and then hybridizations. The hybridizations were repeated four times, and data were compared to check the reproducibility.
Project description:We conducted a genome-wide placental transcriptome study aiming at the identification of functional pathways representing the molecular link between maternal pre-pregnancy BMI and fetal growth. We used RNA microarray (Agilent 8 X 60 K), medical records, and questionnaire data from 183 mother-newborn pairs from the ENVIRONAGE birth cohort study (Flanders, Belgium). We applied a weighted gene co-expression network analysis (WGCNA) and identified genes modules and hub genes that were associated with maternal BMI as well as newborn birth weight. Modules of interest were further characterized by gene ontology (GO) and pathway enrichment analyses. We assessed the mediating effects of modules and hub genes in the association between maternal BMI and newborn weight.
Project description:Preterm birth, defined as birth <37 weeks of gestation, is a leading cause of infant morbidity and mortality. In the United States, approximately 12% of all births are preterm.1 Despite decades of research, there has been little progress in developing effective interventions to prevent preterm birth. In fact, the rate of preterm birth has increased slightly over the last several decades.2 The ultimate goal of the Genomic and Proteomic Network for Preterm Birth Research (GPN-PBR) is to identify possible biomarkers that could predict the susceptibility to spontaneous preterm birth (SPTB) as well as to shed light on the molecular mechanisms involved in its etiologies. Understanding those mechanisms will help us predict SPTB and may facilitate the introduction of more effective prevention and treatment strategies.
Project description:Identification of the coding elements in the genome is a fundamental step to understanding the building blocks of living systems. Short peptides (< 100 aa) have emerged as important regulators of development and physiology, but their identification has been limited by their size. We have leveraged the periodicity of ribosome movement on the mRNA to define actively translated ORFs by ribosome footprinting. This approach identifies several hundred translated small ORFs in zebrafish and human. Computational prediction of small ORFs from codon conservation patterns corroborates and extends these findings and identifies conserved sequences in zebrafish and human, suggesting functional peptide products (micropeptides). These results identify micropeptide‐encoding genes in vertebrates, providing an entry point to define their function in vivo.
Project description:Contains a row for each intergenic region in each sample, indicating whether our analysis called a window within the region "present" or "absent" in the sample. Intergenic regions are regions containing neither annotated genes nor intergenic clusters. Region coordinates refer to version 3 of the TIGR annotation as submitted to GenBank (GI numbers 22330780, 22326553, 22331929, 22329272, 22328163). Keywords: other
Project description:Contains a row for each intergenic region in each sample, indicating whether our analysis called a window within the region "present" or "absent" in the sample. Intergenic regions are regions containing neither annotated genes nor intergenic clusters. Region coordinates refer to version 3 of the TIGR annotation as submitted to GenBank (GI numbers 22330780, 22326553, 22331929, 22329272, 22328163).