Dataset Information

Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

ABSTRACT: In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time. To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier. Our analysis showed the differences between the main protein databases (International Protein Index (IPI), UniProt Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein identifiers on proteomics data. Based on these findings, UniProtKB seems the best database for applications that rely on the long-term storage of proteomics data.

SUBMITTER: Griss J

PROVIDER: S-EPMC3186200 | biostudies-literature | 2011 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

Griss Johannes J Côté Richard G RG Gerner Christopher C Hermjakob Henning H Vizcaíno Juan Antonio JA

Molecular & cellular proteomics : MCP 20110623 9

In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time. To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by Nov ...[more]

PMID: 21700957

Dataset Information

Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

Publications

Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Long-term data storage in diamond.
| S-EPMC5091352 | biostudies-literature

RiceProteomeDB (RPDB): a user-friendly database for proteomics data storage, retrieval, and analysis.
| S-EPMC10864295 | biostudies-literature

MitoMiner, an integrated database for the storage and analysis of mitochondrial proteomics data.
| S-EPMC2690483 | biostudies-literature

Long-term storage of surface-adsorbed protein machines.
| S-EPMC3104519 | biostudies-literature

The plant phenological online database (PPODB): an online database for long-term phenological data.
| S-EPMC3745622 | biostudies-literature

Histone Lysine Crotonylation Regulates Long-Term Memory Storage
2024-11-13 | GSE281007 | GEO

Long term conservation of DNA at ambient temperature. Implications for DNA data storage.
| S-EPMC8585539 | biostudies-literature

Long-Term Storage of Human Cells at Ambient Temperature
2005-08-01 | GSE1364 | GEO

Data collection and storage in long-term ecological and evolutionary studies: The Mongoose 2000 system.
| S-EPMC5760034 | biostudies-other

Barley seeds miRNome stability during long-term storage and aging
2021-01-11 | GSE164512 | GEO