Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution
Ontology highlight
ABSTRACT: The noncoding genome plays an important role in de novo gene birth and the emergence of genetic novelty. Nevertheless, how the properties of noncoding sequences could promote the birth of novel genes and shape the structural diversity and evolution of proteins remains unclear. Here, we investigated the potential of the noncoding genome of yeast to produce novel protein bricks that can give rise to novel genes or be integrated in pre-existing proteins, thus participating in protein structure evolution and diversity. Combining different bioinformatics approaches, we showed that intergenic ORFs of yeast encompass the large structural diversity of canonical proteins with the majority encoding peptides predicted as foldable. Then, we investigated the early stages of de novo gene birth with Ribosome Profiling and systematic reconstruction of yeast de novo gene ancestral sequences. We highlighted sequence and structural factors determining de novo gene birth and protein evolution. Finally, we showed a strong correlation between the fold potential of de novo genes and their ancestral ORFs reflecting the relationship between the noncoding genome and the protein structure universe.
Project description:Pervasive translation is a widespread phenomenon that plays an important role in de novo gene birth; however, its underlying mechanisms remain unclear. Based on multiple Ribosome Profiling datasets, we investigated the translational landscape of coding and noncoding regions of yeast. Therefore, we developed a new representation framework which allows the visual and comprehensive representation of the diversity of translation behaviors in yeast coding and noncoding regions. We show that if coding regions are restricted to specific regions of the translation landscape, noncoding regions are associated with a wide diversity of translation behaviors and, in contrast, populate the entire yeast translational landscape. In particular, we reveal that noncoding regions are associated with canonical translation signals but also with novel categories of translation events absent from coding regions, and which seem to be a hallmark of pervasive translation. Notably, we report thousands of translated noncoding ORFs among which, 256 led to detectable products with Mass Spectrometry while being characterized by canonical but also non-canonical translation signals. Finally, we show that the translation behavior of noncoding ORFs is not explained by features related to the emergence of function, but is rather determined by the translation start codon and the codon distribution in the three competing RNA frames. Overall, our results enable us to propose a topology of the pervasive translation landscape of a species, and open the way to future comparative analyses of this translation landscape under different conditions.
Project description:Increasing numbers of small proteins with diverse physiological roles are being identified and characterized in both prokaryotic and eukaryotic systems, but the origins and evolution of these proteins remain unclear. Recent genomic sequence analyses in several organisms suggest that new functions encoded by small open reading frames (sORFs) may emerge de novo from noncoding sequences. However, experimental data demonstrating if and how randomly generated sORFs can confer beneficial effects to cells are limited. Here we show that by up-regulating hisB expression, de novo small proteins (≤ 50 amino acids in length) selected from random sequence libraries can rescue Escherichia coli cells that lack the conditionally essential SerB enzyme. The recovered small proteins are hydrophobic and confer their rescue effect by binding to the 5’ end regulatory region of the his operon mRNA, suggesting that protein binding promotes structural rearrangements of the RNA that allow increased hisB expression. This study adds RNA regulatory elements as another interacting partner for de novo proteins isolated from random sequence libraries, and provides further experimental evidence that small proteins with selective benefits can originate from the expression of nonfunctional sequences.
Project description:We combined multi-omics approaches including de novo transcriptome assembly, ribosome profiling and MS-based peptidomics to study the global role of mRNA translation and small ORFs (sORFs) in rice herbicide resistant mutant.
Project description:One of the central goals of evolutionary biology is to explain and predict the molecular basis of adaptive evolution. We studied the evolution of genetic networks in Saccharomyces cerevisiae (budding yeast) populations propagated for more than 200 generations in different nitrogen-limiting conditions. We find that rapid adaptive evolution in nitrogen-poor environments is dominated by the de novo generation and selection of copy number variants (CNVs), a large fraction of which contain genes encoding specific nitrogen transporters including PUT4, DUR3 and DAL4. The large fitness increases associated with these alleles limits the genetic heterogeneity of adapting populations even in environments with multiple nitrogen sources. Complete identification of acquired point mutations, in individual lineages and entire populations, identified heterogeneity at the level of genetic loci but common themes at the level of functional modules, including genes controlling phosphatidylinositol-3-phosphate metabolism and vacuole biogenesis. Adaptive strategies shared with other nutrient-limited environments point to selection of genetic variation in the TORC1 and Ras/PKA signaling pathways as a general mechanism underlying improved growth in nutrient-limited environments. Within a single population we observed the repeated independent selection of a multi-locus genotype, comprised of the functionally related genes GAT1, MEP2 and LST4. By studying the fitness of individual alleles, and their combination, as well as the evolutionary history of the evolving population, we find that the order in which these mutations are acquired is constrained by epistasis. The identification of repeatedly selected variation at functionally related loci that interact epistatically suggests that gene network polymorphisms (GNPs) may be a frequent outcome of adaptive evolution. Our results provide insight into the mechanistic basis by which cells adapt to nutrient-limited environments and suggest that knowledge of the selective environment and the regulatory mechanisms important for growth and survival in that environment greatly increases the predictability of adaptive evolution. mRNA from each evolved clone or from the ancestral strain growing in the specificied nitrogen-limited condition was co-hybridized with mRNA from the ancestral strain grown in ammonium limited media
Project description:Cytosine methylation of DNA is a widespread modification of DNA that plays numerous critical roles, yet has been lost many times in diverse eukaryotic lineages. In the yeast Cryptococcus neoformans, CG methylation occurs in transposon-rich repeats and requires the DNA methyltransferase, Dnmt5. We show that Dnmt5 displays exquisite maintenance-type specificity in vitro and in vivo and utilizes similar in vivo cofactors as the metazoan maintenance methylase Dnmt1. Remarkably, phylogenetic and functional analysis revealed that the ancestral species lost the gene for a de novo methylase, DnmtX, between 50-150 MYA. We examined how methylation has persisted since the ancient loss of DnmtX. Experimental and comparative studies reveal efficient replication of methylation patterns in C. neoformans, rare stochastic methylation loss and gain events, and the action of natural selection. We propose that an epigenome has been propagated for >50 MY through a process analogous to Darwinian evolution of the genome.
Project description:Directed evolution (DE) is a process of mutation and iterative artificial selection to breed biomolecules with new or improved activity. DE platforms are primarily prokaryotic or yeast-based, and stable highly mutagenic mammalian systems have been challenging to establish and apply. To this end, we developed PROTein Evolution Using Selection (PROTEUS), a new platform that uses chimeric virus-like vesicles (VLVs) to enable extended mammalian DE campaigns without loss of system integrity. This platform, consisting of a minimal modified Semliki Forest virus genome controlling expression of the Indiana vesiculovirus G coat protein, is stable and can generate sufficient diversity for DE in mammalian systems. Using PROTEUS, we altered the doxycycline responsiveness of tetracycline-controlled transactivators, generating a more sensitive TetON-4G tool for gene regulation. PROTEUS is also compatible with intracellular nanobody evolution, and we use it to design a novel DNA damage-responsive anti-p53 nanobody. Overall, PROTEUS is a robust, efficient, and stable platform to direct evolution of biomolecules within mammalian cells.
Project description:Directed evolution (DE) is a process of mutation and iterative artificial selection to breed biomolecules with new or improved activity. DE platforms are primarily prokaryotic or yeast-based, and stable highly mutagenic mammalian systems have been challenging to establish and apply. To this end, we developed PROTein Evolution Using Selection (PROTEUS), a new platform that uses chimeric virus-like vesicles (VLVs) to enable extended mammalian DE campaigns without loss of system integrity. This platform, consisting of a minimal modified Semliki Forest virus genome controlling expression of the Indiana vesiculovirus G coat protein, is stable and can generate sufficient diversity for DE in mammalian systems. Using PROTEUS, we altered the doxycycline responsiveness of tetracycline-controlled transactivators, generating a more sensitive TetON-4G tool for gene regulation. PROTEUS is also compatible with intracellular nanobody evolution, and we use it to design a novel DNA damage-responsive anti-p53 nanobody. Overall, PROTEUS is a robust, efficient, and stable platform to direct evolution of biomolecules within mammalian cells.
Project description:New genes contribute substantially to adaptive evolutionary innovation, but the functional evolution of new mammalian genes has been little explored at a broad scale. Previous work established mRNA-derived gene duplicates, known as retrocopies, as useful models for the study of new gene origination. Here we combine extensive mammalian transcriptomic and epigenomic data to unveil the processes underlying the evolution of stripped-down retrocopies into complex new genes. We show that although some robustly expressed retrocopies are transcribed from preexisting promoters, the majority evolved new promoters from scratch or recruited proto-promoters in their genomic vicinity. In particular, many retrocopy promoters emerged from ancestral enhancers or bivalent regulatory elements, as well as from CpG islands not associated to other genes. Altogether, these mechanisms facilitated the birth of up to 280 retrogenes in each therian species. Furthermore, the regulatory evolution of the originally monoexonic retrocopies was frequently accompanied by exon gain, which facilitated the cooption of distant promoters and in many cases allowed the expression of alternative isoforms. While young retrogenes are often initially expressed in the testis, increased regulatory and structural complexities allowed retrogenes to functionally diversify and evolve somatic organ functions, sometimes as complex as those of their parents. Thus, some retrogenes evolved the capacity to temporarily substitute their parents during the process of male (meiotic) X inactivation, while others even rendered parental functions completely superfluous, allowing for parental gene loss. Overall, our reconstruction of the complete â??life historyâ?? of mammalian retrogenes highlights the usefulness of retroposition as a general model for understanding new gene birth and functional evolution. Assembly and expression of vertebrate retrogene transcripts