Dataset Information

Direct mapping of peptide-spectral-matches to genome information facilitates qualifying proteomics information

ABSTRACT: The data set consist of three different sources. 1) All files with ecoli_* derive from a pure culture of Escherichia coli K-12 (MG1655). 2) All files with SIHUMI_standard_* derive from a mixed culture of 8 bacteria (SIHUMIx) Anaerostipes caccae (DSMZ 14662), Bacteroides thetaiotaomicron (DSMZ 2079), Bifidobacterium longum (NCC 2705), Blautia producta (DSMZ 2950), Clostridium butyricum (DSMZ 10702), Clostridium ramosum (DSMZ 1402), Escherichia coli K-12 (MG1655) and Lactobacillus plantarum (DSMZ 20174). A standard proteomic protocol was used for purification. 3) All files with SIHUMI_small_* derive from the same bacteria culture as second source in contrast a variety of different proteomic protocols were used to enhance enrichment of small (<100 AS) Proteins. The goal of the project was to design a workflow to quickly prioritize novel protein candidates. The workflow was designed to be robust in a meta-omics context and facilitate the integration of transcriptomic and other information on a genomic level. The MS-data from the first source was used to test the workflow under well controlled conditions, namely in pure culture and near complete annotation. The workflow was used with data from the second source to see if good results can be produced in a mixed culture. To enhance the chances of finding novel proteins we incorporated the data from the third source.

INSTRUMENT(S): Q Exactive

ORGANISM(S): Erysipelatoclostridium Ramosum Dsm 1402 Blautia Producta Atcc 27340 = Dsm 2950 Clostridium Butyricum Dsm 10702 Anaerostipes Caccae Dsm 14662 Bifidobacterium Longum Ncc2705 Lactobacillus Plantarum Subsp. Plantarum Atcc 14917 = Jcm 1149 = Cgmcc 1.2437 Escherichia Coli Str. K-12 Substr. Mg1655 Bacteroides Thetaiotaomicron (strain Atcc 29148 / Dsm 2079 / Nctc 10582 / E50 / Vpi-5482)

TISSUE(S): Cell In Vitro

SUBMITTER: John Anders

LAB HEAD: Prof. Dr. Martin von Bergen

PROVIDER: PXD023243 | Pride | 2021-09-09

REPOSITORIES: Pride

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	SIHUMI_small_Beads_purple_inSolution_HP.mzML	Mzml
	SIHUMI_small_Beads_purple_inSolution_HP.pep.xml	Pepxml
	SIHUMI_small_Beads_purple_inSolution_HP.raw	Raw
	SIHUMI_small_Beads_white_inSolution_HP.mzML	Mzml
	SIHUMI_small_Beads_white_inSolution_HP.pep.xml	Pepxml

Items per page:

1 - 5 of 520

Publications

A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations.

Anders John J Petruschke Hannes H Jehmlich Nico N Haange Sven-Bastiaan SB von Bergen Martin M Stadler Peter F PF

BMC bioinformatics 20210526 1

<h4>Background</h4>Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small prote ...[more]

PMID: 34039272

Similar Datasets

Project description:The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, in modulating immunity and in regulating metabolic processes. We use the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species to study potential synergistic effects with a particular focus on detecting novel proteins with less than 100 amino acids (= sProteins), some of which may contribute to regulate the simplified human intestinal microbiota. Although several studies have shown that sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. In this study, we created a multi-species integrated proteogenomics search database (multi-species iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were first sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied prior to mass spectrometric analysis to specifically increase the chances for novel sProtein discovery. Searching the MS/MS data against the multi-species iPtgxDB enabled us to identify 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. Importantly, when comparing the expression of these novel sProteins in single strain cultivations to the multi-species community grown in a bioreactor, we found that six of them were only identified in the SIHUMIx community, indicating a possible important role of sProteins in the organization of microbial communities. Furthermore, in silico prediction suggested that two of these novel sProteins have a potential antimicrobial function. We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in microbial communities that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the multi-species community is expected to enable new insights into the structure, regulation and function of bacterial communities such as those of the human intestinal tract.