Proteome-wide identification of proteins and their modifications with decreased ambiguities and improved false discovery rates using unique sequence tags.
Ontology highlight
ABSTRACT: Identifying proteins and their modification states and with known levels of confidence remains as a significant challenge for proteomics. Random or decoy peptide databases are increasingly being used to estimate the false discovery rate (FDR), e.g., from liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of tryptic digests. We show that this approach can significantly underestimate the FDR and describe an approach for more confident protein identifications that uses unique partial sequences derived from a combination of database searching and amino acid residue sequencing using high-accuracy MS/MS data. Applied to a Saccharomyces cerevisiae tryptic digest, the approach provided 3 132 confident peptide identifications ( approximately 5% modified in some fashion), covering 575 proteins with an estimated zero FDR. The conventional approach provided 3 359 peptide identifications and 656 proteins with 0.3% FDR based upon a decoy database analysis. However, the present approach revealed approximately 5% of the 3 359 identifications to be incorrect and many more as potentially ambiguous (e.g., due to not considering certain amino acid substitutions and modifications). In addition, 677 peptides and 39 proteins were identified that had been missed by conventional analysis, including nontryptic peptides, peptides with a variety of expected/unexpected chemical modifications, known/unknown post-translational modifications, single nucleotide polymorphisms or gene encoding errors, and multiple modifications of individual peptides.
SUBMITTER: Shen Y
PROVIDER: S-EPMC2600587 | biostudies-literature | 2008 Mar
REPOSITORIES: biostudies-literature
ACCESS DATA