Dataset Information

The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools.

ABSTRACT: Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.

SUBMITTER: Klimek J

PROVIDER: S-EPMC2577160 | biostudies-literature | 2008 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools.

Klimek John J Eddes James S JS Hohmann Laura L Jackson Jennifer J Peterson Amelia A Letarte Simon S Gafken Philip R PR Katz Jonathan E JE Mallick Parag P Lee Hookeun H Schmidt Alexander A Ossola Reto R Eng Jimmy K JK Aebersold Ruedi R Martin Daniel B DB

Journal of proteome research 20070821 1

Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, ...[more]

PMID: 17711323

Dataset Information

The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools.

Publications

The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Improved MeSH analysis software tools for farm animals.
| S-EPMC9300174 | biostudies-literature

XLink-DB: database and software tools for storing and visualizing protein interaction topology data.
| S-EPMC3744611 | biostudies-literature

A gold standard set of mechanistically diverse enzyme superfamilies.
| S-EPMC1431709 | biostudies-literature

Contributing to agricultural mix:analysis of the living standard measurement study - Integrated survey on agriculture data set.
| S-EPMC6082989 | biostudies-literature

LocText: relation extraction of protein localizations to assist database curation.
| S-EPMC5773052 | biostudies-literature

NIST Mass Spectrometry Data Center standard reference libraries and software tools: Application to seized drug analysis.
| S-EPMC10517720 | biostudies-literature

BLAST: at the core of a powerful and diverse set of sequence analysis tools.
| S-EPMC441573 | biostudies-literature

The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.
| S-EPMC3531112 | biostudies-literature

MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences.
| S-EPMC4987941 | biostudies-literature

Structure Activity Relationships (SARs) Using a Structurally Diverse Drug Database: Validating Success of Predictor Tools.
| S-EPMC4605434 | biostudies-literature