Dataset Information

RRE-Finder: a Genome-Mining Tool for Class-Independent RiPP Discovery.

ABSTRACT: Many ribosomally synthesized and posttranslationally modified peptide classes (RiPPs) are reliant on a domain called the RiPP recognition element (RRE). The RRE binds specifically to a precursor peptide and directs the posttranslational modification enzymes to their substrates. Given its prevalence across various types of RiPP biosynthetic gene clusters (BGCs), the RRE could theoretically be used as a bioinformatic handle to identify novel classes of RiPPs. In addition, due to the high affinity and specificity of most RRE-precursor peptide complexes, a thorough understanding of the RRE domain could be exploited for biotechnological applications. However, sequence divergence of RREs across RiPP classes has precluded automated identification based solely on sequence similarity. Here, we introduce RRE-Finder, a new tool for identifying RRE domains with high sensitivity. RRE-Finder can be used in precision mode to confidently identify RREs in a class-specific manner or in exploratory mode to assist in the discovery of novel RiPP classes. RRE-Finder operating in precision mode on the UniProtKB protein database retrieved ∼25,000 high-confidence RREs spanning all characterized RRE-dependent RiPP classes, as well as several yet-uncharacterized RiPP classes that require future experimental confirmation. Finally, RRE-Finder was used in precision mode to explore a possible evolutionary origin of the RRE domain. The results suggest RREs originated from a co-opted DNA-binding transcriptional regulator domain. Altogether, RRE-Finder provides a powerful new method to probe RiPP biosynthetic diversity and delivers a rich data set of RRE sequences that will provide a foundation for deeper biochemical studies into this intriguing and versatile protein domain.IMPORTANCE Bioinformatics-powered discovery of novel ribosomal natural products (RiPPs) has historically been hindered by the lack of a common genetic feature across RiPP classes. Herein, we introduce RRE-Finder, a method for identifying RRE domains, which are present in a majority of prokaryotic RiPP biosynthetic gene clusters (BGCs). RRE-Finder identifies RRE domains 3,000 times faster than current methods, which rely on time-consuming secondary structure prediction. Depending on user goals, RRE-Finder can operate in precision mode to accurately identify RREs present in known RiPP classes or in exploratory mode to assist with novel RiPP discovery. Employing RRE-Finder on the UniProtKB database revealed several high-confidence RREs in novel RiPP-like clusters, suggesting that many new RiPP classes remain to be discovered.

SUBMITTER: Kloosterman AM

PROVIDER: S-EPMC7470986 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

RRE-Finder: a Genome-Mining Tool for Class-Independent RiPP Discovery.

Kloosterman Alexander M AM Shelton Kyle E KE van Wezel Gilles P GP Medema Marnix H MH Mitchell Douglas A DA

mSystems 20200901 5

Many ribosomally synthesized and posttranslationally modified peptide classes (RiPPs) are reliant on a domain called the RiPP recognition element (RRE). The RRE binds specifically to a precursor peptide and directs the posttranslational modification enzymes to their substrates. Given its prevalence across various types of RiPP biosynthetic gene clusters (BGCs), the RRE could theoretically be used as a bioinformatic handle to identify novel classes of RiPPs. In addition, due to the high affinity ...[more]

PMID: 32873609

Similar Datasets

Project description:Natural products (NPs) and their derivatives are a major contributor to modern medicine. Historically, microorganisms such as bacteria and fungi have been instrumental in generating drugs and lead compounds because of the ease of culturing and genetically manipulating them. However, the ever-increasing demand for novel drugs highlights the need to bioprospect previously unexplored taxa for their biosynthetic potential. Next-generation sequencing technologies have expanded the range of organisms that can be explored for their biosynthetic content, as these technologies can provide a glimpse of an organism's entire biosynthetic landscape, without the need for cultivation. The entirety of biosynthetic genes can be compared to the genes of known function to identify the gene clusters potentially coding for novel products. In this study, we mine the genomes of nine lichen-forming fungal species of the genus Umbilicaria for biosynthetic genes, and categorize the biosynthetic gene clusters (BGCs) as "associated product structurally known" or "associated product putatively novel". Although lichen-forming fungi have been suggested to be a rich source of NPs, it is not known how their biosynthetic diversity compares to that of bacteria and non-lichenized fungi. We found that 25%-30% of biosynthetic genes are divergent as compared to the global database of BGCs, which comprises 1,200,000 characterized biosynthetic genes from plants, bacteria, and fungi. Out of 217 BGCs, 43 were highly divergant suggesting that they potentially encode structurally and functionally novel NPs. Clusters encoding the putatively novel metabolic diversity comprise polyketide synthases (30), non-ribosomal peptide synthetases (12), and terpenes (1). Our study emphasizes the utility of genomic data in bioprospecting microorganisms for their biosynthetic potential and in advancing the industrial application of unexplored taxa. We highlight the untapped structural metabolic diversity encoded in the lichenized fungal genomes. To the best of our knowledge, this is the first investigation identifying genes coding for NPs with potentially novel properties in lichenized fungi.

Project description:Copy Number Variations (CNVs) play pivotal roles in the etiology of complex diseases and are variable across diverse populations. Understanding the association between CNVs and disease susceptibility is of significant importance in disease genetics research and often requires analysis of large sample sizes. One of the most cost-effective and scalable methods for detecting CNVs is based on normalized signal intensity values, such as Log R Ratio (LRR) and B Allele Frequency (BAF), from Illumina genotyping arrays. In this study, we present CNV-Finder, a novel pipeline integrating deep learning techniques on array data, specifically a Long Short-Term Memory (LSTM) network, to expedite the large-scale identification of CNVs within predefined genomic regions. This facilitates the efficient prioritization of samples for subsequent, costly analyses such as short-read and long-read whole genome sequencing. We focus on five genes-Parkin (PRKN), Leucine Rich Repeat And Ig Domain Containing 2 (LINGO2), Microtubule Associated Protein Tau (MAPT), alpha-Synuclein (SNCA), and Amyloid Beta Precursor Protein (APP)-which may be relevant to neurological diseases such as Alzheimer's disease (AD), Parkinson's disease (PD), or related disorders such as essential tremor (ET). By training our models on expert-annotated samples and validating them across diverse cohorts, including those from the Global Parkinson's Genetics Program (GP2) and additional dementia-specific databases, we demonstrate the efficacy of CNV-Finder in accurately detecting deletions and duplications. Our pipeline outputs app-compatible files for visualization within CNV-Finder's interactive web application. This interface enables researchers to review predictions and filter displayed samples by model prediction values, LRR range, and variant count in order to explore or confirm results. Our pipeline integrates this human feedback to enhance model performance and reduce false positive rates. Through a series of comprehensive analyses and validations using both short-read and long-read sequencing data, we demonstrate the robustness and adaptability of CNV-Finder in identifying CNVs with regions of varied sparsity, noise, and size. Our findings highlight the significance of contextual understanding and human expertise in enhancing the precision of CNV identification, particularly in complex genomic regions like 17q21.31. The CNV-Finder pipeline is a scalable, publicly available resource for the scientific community, available on GitHub (https://github.com/GP2code/CNV-Finder; DOI 10.5281/zenodo.14182563). CNV-Finder not only expedites accurate candidate identification but also significantly reduces the manual workload for researchers, enabling future targeted validation and downstream analyses in regions or phenotypes of interest.

Dataset Information

RRE-Finder: a Genome-Mining Tool for Class-Independent RiPP Discovery.

Publications

RRE-Finder: a Genome-Mining Tool for Class-Independent RiPP Discovery.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets