Dataset Information

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture.

ABSTRACT: Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge. This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected to shotgun metaproteomic analysis. Then, both the microbial mixture and the single microorganisms were subjected to next generation sequencing to obtain experimental metagenomic- and genomic-derived databases, which were used along with public databases (namely, NCBI, UniProtKB/SwissProt and UniProtKB/TrEMBL, parsed at different taxonomic levels) to analyze the metaproteomic dataset. First, a quantitative comparison in terms of number and overlap of peptide identifications was carried out among all databases. As a result, only 35% of peptides were common to all database classes; moreover, genus/species-specific databases provided up to 17% more identifications compared to databases with generic taxonomy, while the metagenomic database enabled a slight increment in respect to public databases. Then, database behavior in terms of false discovery rate and peptide degeneracy was critically evaluated. Public databases with generic taxonomy exhibited a markedly different trend compared to the counterparts. Finally, the reliability of taxonomic attribution according to the lowest common ancestor approach (using MEGAN and Unipept software) was assessed. The level of misassignments varied among the different databases, and specific thresholds based on the number of taxon-specific peptides were established to minimize false positives. This study confirms that database selection has a significant impact in metaproteomics, and provides critical indications for improving depth and reliability of metaproteomic results. Specifically, the use of iterative searches and of suitable filters for taxonomic assignments is proposed with the aim of increasing coverage and trustworthiness of metaproteomic data.

SUBMITTER: Tanca A

PROVIDER: S-EPMC3857319 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture.

Tanca Alessandro A Palomba Antonio A Deligios Massimo M Cubeddu Tiziana T Fraumene Cristina C Biosa Grazia G Pagnozzi Daniela D Addis Maria Filippa MF Uzzau Sergio S

PloS one 20131209 12

Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge. This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected t ...[more]

PMID: 24349410

Similar Datasets

Project description:BackgroundMicrobiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools.MethodsWe tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification.ResultsWe concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases.ConclusionsIn conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.

Project description:Analysis of the metaproteome of microbial communities is important to provide an insight of community physiology and pathogenicity. This study evaluated the metaproteome of endodontic infections associated with acute apical abscesses and asymptomatic apical periodontitis lesions. Proteins persisting or expressed after root canal treatment were also evaluated. Finally, human proteins associated with these infections were identified. Samples were taken from root canals of teeth with asymptomatic apical periodontitis before and after chemomechanical treatment using either NaOCl or chlorhexidine as the irrigant. Samples from abscesses were taken by aspiration of the purulent exudate. Clinical samples were processed for analysis of the exoproteome by using two complementary mass spectrometry platforms: nanoflow liquid chromatography coupled with linear ion trap quadrupole Velos Orbitrap and liquid chromatography-quadrupole time-of-flight. A total of 308 proteins of microbial origin were identified. The number of proteins in abscesses was higher than in asymptomatic cases. In canals irrigated with chlorhexidine, the number of identified proteins decreased substantially, while in the NaOCl group the number of proteins increased. The large majority of microbial proteins found in endodontic samples were related to metabolic and housekeeping processes, including protein synthesis, energy metabolism and DNA processes. Moreover, several other proteins related to pathogenicity and resistance/survival were found, including proteins involved with adhesion, biofilm formation and antibiotic resistance, stress proteins, exotoxins, invasins, proteases and endopeptidases (mostly in abscesses), and an archaeal protein linked to methane production. The majority of human proteins detected were related to cellular processes and metabolism, as well as immune defense. Interrogation of the metaproteome of endodontic microbial communities provides information on the physiology and pathogenicity of the community at the time of sampling. There is a growing need for expanded and more curated protein databases that permit more accurate identifications of proteins in metaproteomic studies.

Dataset Information

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture.

Publications

Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets