Dataset Information

MsmsEval: tandem mass spectral quality assignment for high-throughput proteomics.

ABSTRACT: BACKGROUND: In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable. RESULTS: We describe an application, msmsEval, that builds on previous work by statistically modeling the spectral quality discriminant function using a Gaussian mixture model. This allows a researcher to filter spectra based on the probability that a spectrum will ultimately be identified by database searching. We show that spectra that are predicted by msmsEval to be of high quality, yet remain unidentified in standard database searches, are candidates for more intensive search strategies. Using a well studied public dataset we also show that a high proportion (83.9%) of the spectra predicted by msmsEval to be of high quality but that elude standard search strategies, are in fact interpretable. CONCLUSION: msmsEval will be useful for high-throughput proteomics projects and is freely available for download from http://proteomics.ucd.ie/msmseval. Supports Windows, Mac OS X and Linux/Unix operating systems.

SUBMITTER: Wong JW

PROVIDER: S-EPMC1803797 | biostudies-other | 2007

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

msmsEval: tandem mass spectral quality assignment for high-throughput proteomics.

Wong Jason W H JW Sullivan Matthew J MJ Cartwright Hugh M HM Cagney Gerard G

BMC bioinformatics 20070209

<h4>Background</h4>In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus f ...[more]

PMID: 17291342

Similar Datasets

Project description:Glycoprotein changes occur in not only protein abundance but also the occupancy of each glycosylation site by different glycoforms during biological or pathological processes. Recent advances in mass spectrometry instrumentation and techniques have facilitated analysis of intact glycopeptides in complex biological samples by allowing the users to generate spectra of intact glycopeptides with glycans attached to each specific glycosylation site. However, assigning these spectra, leading to identification of the glycopeptides, is challenging. Here, we report an algorithm, named GPQuest, for site-specific identification of intact glycopeptides using higher-energy collisional dissociation (HCD) fragmentation of complex samples. In this algorithm, a spectral library of glycosite-containing peptides in the sample was built by analyzing the isolated glycosite-containing peptides using HCD LC-MS/MS. Spectra of intact glycopeptides were selected by using glycan oxonium ions as signature ions for glycopeptide spectra. These oxonium-ion-containing spectra were then compared with the spectral library generated from glycosite-containing peptides, resulting in assignment of each intact glycopeptide MS/MS spectrum to a specific glycosite-containing peptide. The glycan occupying each glycosite was determined by matching the mass difference between the precursor ion of intact glycopeptide and the glycosite-containing peptide to a glycan database. Using GPQuest, we analyzed LC-MS/MS spectra of protein extracts from prostate tumor LNCaP cells. Without enrichment of glycopeptides from global tryptic peptides and at a false discovery rate of 1%, 1008 glycan-containing MS/MS spectra were assigned to 769 unique intact N-linked glycopeptides, representing 344 N-linked glycosites with 57 different N-glycans. Spectral library matching using GPQuest assigns the HCD LC-MS/MS generated spectra of intact glycopeptides in an automated and high-throughput manner. Additionally, spectral library matching gives the user the possibility of identifying novel or modified glycans on specific glycosites that might be missing from the predetermined glycan databases.

Dataset Information

MsmsEval: tandem mass spectral quality assignment for high-throughput proteomics.

Publications

msmsEval: tandem mass spectral quality assignment for high-throughput proteomics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets