Dataset Information

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.

ABSTRACT: Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. Traditional search engines, which match peptide sequences with tandem mass spectra to identify the samples' proteins, use protein sequence databases to suggest peptide candidates for consideration. Although the acquisition of tandem mass spectra is not biased toward well-understood protein isoforms, this computational strategy is failing to identify peptides from alternative splicing and coding SNP protein isoforms despite the acquisition of good-quality tandem mass spectra. We propose, instead, that expressed sequence tags (ESTs) be searched. Ordinarily, such a strategy would be computationally infeasible due to the size of EST sequence databases; however, we show that a sophisticated sequence database compression strategy, applied to human ESTs, reduces the sequence database size approximately 35-fold. Once compressed, our EST sequence database is comparable in size to other commonly used protein sequence databases, making routine EST searching feasible. We demonstrate that our EST sequence database enables the discovery of novel peptides in a variety of public data sets.

SUBMITTER: Edwards NJ

PROVIDER: S-EPMC1865584 | biostudies-literature | 2007

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.

Edwards Nathan J NJ

Molecular systems biology 20070417

Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. Traditional search engines, which match peptide sequences with tandem mass spectra to identify the samples' proteins, use protein sequence databases to suggest peptide candidates for consideration. Although the acquisition of tandem mass spectra is not biased toward well-understood protein isoforms, this computational strategy is failing to identify peptides fro ...[more]

PMID: 17437027

Dataset Information

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.

Publications

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Peptide identification from mixture tandem mass spectra.
| S-EPMC2938093 | biostudies-literature

Faster SEQUEST searching for peptide identification from tandem mass spectra.
| S-EPMC3166376 | biostudies-literature

Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra.
| S-EPMC2621003 | biostudies-literature

Tandem mass spectrometry with ultrahigh mass accuracy clarifies peptide identification by database retrieval.
| S-EPMC2753674 | biostudies-literature

Peptide de novo sequencing of mixture tandem mass spectra.
| S-EPMC5297990 | biostudies-literature

Improved sequence tag generation method for peptide identification in tandem mass spectrometry.
| S-EPMC3744226 | biostudies-literature

ScanRanker: Quality assessment of tandem mass spectra via sequence tagging.
| S-EPMC3128668 | biostudies-literature

SPEQ: Quality Assessment of Peptide Tandem Mass Spectra with Deep Learning.
| S-EPMC8896601 | biostudies-literature

A Tandem Mass Spectrometry Sequence Database Search Method for Identification of O-Fucosylated Proteins by Mass Spectrometry.
| S-EPMC6445572 | biostudies-literature

Identification of ultramodified proteins using top-down tandem mass spectra.
| S-EPMC3905687 | biostudies-literature