Dataset Information

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.

ABSTRACT: Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation.IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.

SUBMITTER: Willems P

PROVIDER: S-EPMC7593589 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.

Willems Patrick P Fijalkowski Igor I Van Damme Petra P

mSystems 20201027 5

Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for <i>Salmonella enterica</i> serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-sp ...[more]

PMID: 33109751

Dataset Information

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.

Publications

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics.
| S-EPMC2040164 | biostudies-literature

"Lost and Found": snoRNA Annotation in the Xenopus Genome and Implications for Evolutionary Studies.
| S-EPMC6984369 | biostudies-literature

Improving HIV proteome annotation: new features of BioAfrica HIV Proteomics Resource.
| S-EPMC4834208 | biostudies-literature

A story of data won, data lost and data re-found: the realities of ecological data preservation.
| S-EPMC6235994 | biostudies-literature

Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage.
| S-EPMC2643335 | biostudies-other

Proteome driven re-evaluation and functional annotation of the Streptococcus pyogenes SF370 genome.
| S-EPMC3224786 | biostudies-literature

A Streamlined High-Throughput Plasma Proteomics Platform for Clinical Proteomics with Improved Proteome Coverage, Reproducibility, and Robustness.
| S-EPMC10080683 | biostudies-literature

IceR improves proteome coverage and data completeness in global and single-cell proteomics.
| S-EPMC8352929 | biostudies-literature

Lost and found IV.: Merulicium fusisporum
| PRJEB63641 | ENA

Increasing Coverage of Proteome Identification of the Fruiting Body of Agaricus bisporus by Shotgun Proteomics.
| S-EPMC7278689 | biostudies-literature