Dataset Information

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.

ABSTRACT:

Summary

Nonribosomally synthesized peptides (NRPs) are natural products with widespread applications in medicine and biotechnology. Many algorithms have been developed to predict the substrate specificities of nonribosomal peptide synthetase adenylation (A) domains from DNA sequences, which enables prioritization and dereplication, and integration with other data types in discovery efforts. However, insufficient training data and a lack of clarity regarding prediction quality have impeded optimal use. Here, we introduce prediCAT, a new phylogenetics-inspired algorithm, which quantitatively estimates the degree of predictability of each A-domain. We then systematically benchmarked all algorithms on a newly gathered, independent test set of 434 A-domain sequences, showing that active-site-motif-based algorithms outperform whole-domain-based methods. Subsequently, we developed SANDPUMA, a powerful ensemble algorithm, based on newly trained versions of all high-performing algorithms, which significantly outperforms individual methods. Finally, we deployed SANDPUMA in a systematic investigation of 7635 Actinobacteria genomes, suggesting that NRP chemical diversity is much higher than previously estimated. SANDPUMA has been integrated into the widely used antiSMASH biosynthetic gene cluster analysis pipeline and is also available as an open-source, standalone tool.

Availability and implementation

SANDPUMA is freely available at https://bitbucket.org/chevrm/sandpuma and as a docker image at https://hub.docker.com/r/chevrm/sandpuma/ under the GNU Public License 3 (GPL3).

Contact

chevrette@wisc.edu or marnix.medema@wur.nl.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Chevrette MG

PROVIDER: S-EPMC5860034 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.

Chevrette Marc G MG Aicheler Fabian F Kohlbacher Oliver O Currie Cameron R CR Medema Marnix H MH

Bioinformatics (Oxford, England) 20171001 20

<h4>Summary</h4>Nonribosomally synthesized peptides (NRPs) are natural products with widespread applications in medicine and biotechnology. Many algorithms have been developed to predict the substrate specificities of nonribosomal peptide synthetase adenylation (A) domains from DNA sequences, which enables prioritization and dereplication, and integration with other data types in discovery efforts. However, insufficient training data and a lack of clarity regarding prediction quality have impede ...[more]

PMID: 28633438

Similar Datasets

Project description:Polyketides (PKs) and nonribosomal peptides (NRPs) are two microbial secondary metabolite (SM) families known for their variety of functions, including antimicrobials, siderophores, and others. Despite their involvement in bacterium-bacterium and bacterium-plant interactions, root-associated SMs are largely unexplored due to the limited cultivability of bacteria. Here, we analyzed the diversity and expression of SM-encoding biosynthetic gene clusters (BGCs) in root microbiomes by culture-independent amplicon sequencing, shotgun metagenomics, and metatranscriptomics. Roots (tomato and lettuce) harbored distinct compositions of nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) relative to the adjacent bulk soil, and specific BGC markers were both enriched and highly expressed in the root microbiomes. While several of the highly abundant and expressed sequences were remotely associated with known BGCs, the low similarity to characterized genes suggests their potential novelty. Low-similarity genes were screened against a large set of soil-derived cosmid libraries, from which five whole BGCs of unknown function were retrieved. Three clusters were taxonomically affiliated with Actinobacteria, while the remaining were not associated with known bacteria. One Streptomyces-derived BGC was predicted to encode a polyene with potential antifungal activity, while the others were too novel to predict chemical structure. Screening against a suite of metagenomic data sets revealed higher abundances of retrieved clusters in roots and soil samples. In contrast, they were almost completely absent in aquatic and gut environments, supporting the notion that they might play an important role in root ecosystems. Overall, our results indicate that root microbiomes harbor a specific assemblage of undiscovered SMs.IMPORTANCE We identified distinct secondary-metabolite-encoding genes that are enriched (relative to adjacent bulk soil) and expressed in root ecosystems yet almost completely absent in human gut and aquatic environments. Several of the genes were distantly related to genes encoding antimicrobials and siderophores, and their high sequence variability relative to known sequences suggests that they may encode novel metabolites and may have unique ecological functions. This study demonstrates that plant roots harbor a diverse array of unique secondary-metabolite-encoding genes that are highly enriched and expressed in the root ecosystem. The secondary metabolites encoded by these genes might assist the bacteria that produce them in colonization and persistence in the root environment. To explore this hypothesis, future investigations should assess their potential role in interbacterial and bacterium-plant interactions.

Dataset Information

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.

Summary

Availability and implementation

Contact

Supplementary information

Publications

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets