Dataset Information

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics.

ABSTRACT: Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.

SUBMITTER: Siepen JA

PROVIDER: S-EPMC2664920 | biostudies-literature | 2007 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics.

Siepen Jennifer A JA Keevil Emma-Jayne EJ Knight David D Hubbard Simon J SJ

Journal of proteome research 20070101 1

Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these ...[more]

PMID: 17203985

Similar Datasets

Project description:On-tissue digestion has become the preferred method to identify proteins in mass spectrometry (MS) imaging. In this study, we report advances in data acquisition and protein identification for MS imaging after on-tissue digestion. Tryptic peptides in a coronal mouse brain section were measured at 50 μm pixel size and revealed detailed histological structures, e.g., the ependyma (consisting of one to two cell layers), which was confirmed by H&E staining. This demonstrates that MS imaging of tryptic peptides at or close to cellular resolution is within reach. We also describe a detailed identification workflow which resulted in the identification of 99 proteins (with 435 corresponding peptides), based on comparison with LC-MS/MS data and in silico digest. These results were obtained with stringent parameters, including high mass accuracy in imaging mode (RSME < 3 ppm) and at least two unique peptides per protein showing consistent spatial distribution. We identified almost 50% of proteins with at least four corresponding peptides. As there is no agreed approach for identification of proteins after on-tissue digestion yet, we discuss our workflow in detail and make the corresponding mass spectral data available as "open data" via ProteomeXchange (identifier PXD003172). With this, we would like to contribute to a more effective discussion and the development of new approaches for tryptic peptide identification in MS imaging. From an experimental point of view, we demonstrate the improvement due to the combination of high spatial resolution and high mass resolution/mass accuracy on a measurement at 25 μm pixel size in mouse cerebellum tissue. A whole body section of a mouse pub imaged at 50 μm pixel size (40 GB, 230,000 spectra) demonstrates the stability of our protocol. For this data set, we developed a workflow that is based on conversion to the common data format imzML and sequential application of freely available software tools. In combination, the presented results for spatial resolution, protein identification, and data processing constitute significant improvements for the field of on-tissue digestion. Graphical abstract MS imaging of coronal mouse brain cerebellum with a pixel size of 25 μm: A Optical image, B myelin staining, C H&E staining, and D MS image overlay (RGB) of tryptic peptides m/z = 726.4045 ± 0.005, HGFLPR + H+ (red), m/z = 536.3173 ± 0.005, AKPAK + Na+ (green), and m/z = 994.5436 ± 0.005, WRQLIEK + Na+ (blue).

Project description:Removal of moderately oxidized proteins is mainly carried out by the proteasome, while highly modified proteins are no longer degradable. However, in the case of proteins modified by nitration of tyrosine residues to 3-nitrotyrosine (NO2Y), the role of the proteasome remains to be established. For this purpose, degradation assays and mass spectrometry analyses were performed using isolated proteasome and purified fractions of native cytochrome c (Cyt c) and tyrosine nitrated proteoforms (NO2Y74-Cyt c and NO2Y97-Cyt c). While Cyt c treated under mild conditions with hydrogen peroxide was preferentially degraded by the proteasome, NO2Y74- and NO2Y97-Cyt c species did not show an increased degradation rate with respect to native Cyt c. Peptide mapping analysis confirmed a decreased chymotrypsin-like cleavage at C-terminal of NO2Y sites within the protein, with respect to unmodified Y residues. Additionally, studies with the proteasome substrate suc-LLVY-AMC (Y-AMC) and its NO2Y-containing analog, suc-LLVNO2Y-AMC (NO2Y-AMC) were performed, both using isolated 20S-proteasome and astrocytoma cell lysates as the proteasomal source. Comparisons of both substrates showed a significantly decreased proteasome activity towards NO2Y-AMC. Moreover, NO2Y-AMC, but not Y-AMC degradation rates, were largely diminished by increasing the reaction pH, suggesting an inhibitory influence of the additional negative charge contained in NO2Y-AMC secondary to nitration. The mechanism of slowing of proteasome activity in NO2Y-contaning peptides was further substantiated in studies using the phenylalanine and nitro-phenylalanine peptide analog substrates. Finally, degradation rates of Y-AMC and NO2Y-AMC with proteinase K were the same, demonstrating the selective inability of the proteasome to readily cleave at nitrotyrosine sites. Altogether, data indicate that the proteasome has a decreased capability to cleave at C-terminal of NO2Y residues in proteins with respect to the unmodified residues, making this a possible factor that decreases the turnover of oxidized proteins, if they are not unfolded, and facilitating the accumulation of nitrated proteins.

Project description:The chondroitin sulfate proteoglycan versican is important for embryonic development and several human disorders. The versican V1 splice isoform is widely expressed and cleaved by ADAMTS proteases at a well-characterized site, Glu441-Ala442. Since ADAMTS proteases cleave the homologous proteoglycan aggrecan at multiple sites, we hypothesized that additional cleavage sites existed within versican. We report a quantitative label-free approach that ranks abundance of liquid chromatography-tandem mass spectrometry (LC-MS/MS)-identified semi-tryptic peptides after versican digestion by ADAMTS1, ADAMTS4 and ADAMTS5 to identify site-specific cleavages. Recombinant purified versican V1 constructs were digested with the recombinant full-length proteases, using catalytically inactive mutant proteases in control digests. Semi-tryptic peptide abundance ratios determined by LC-MS/MS in ADAMTS:control digests were compared to the mean of all identified peptides to obtain a z-score by which outlier peptides were ranked, using semi-tryptic peptides identifying Glu441 -Ala442 cleavage as the benchmark. Tryptic peptides with higher abundance in control digests supported cleavage site identification. We identified several novel cleavage sites supporting the ADAMTS1/4/5 cleavage site preference for a P1-Glu residue in proteoglycan substrates. Digestion of proteins in vitro and application of this z-score approach is potentially widely applicable for mapping protease cleavage sites using label-free proteomics. SIGNIFICANCE: Versican abundance and turnover are relevant to the pathogenesis of several human disorders. Versican is cleaved by A Disintegrin-like And Metalloprotease with Thrombospondin type 1 motifs (ADAMTS) family members at Glu441-Ala442, generating a bioactive proteoform called versikine, but additional cleavage sites and the site-specificity of individual ADAMTS proteases is unexplored. Here, we used a label-free proteomics strategy to identify versican cleavage sites for 3 ADAMTS proteases, applying a novel z-score-based statistical approach to compare the protease digests of versican to controls (digests with inactive protease) using the known protease cleavage site as a benchmark. We identified 21 novel cleavage sites that had a comparable z-score to the benchmark. Given the functional significance of versikine, they represent potentially significant cleavages and helped to refine a substrate site preference for each protease.The z-score approach is potentially widely applicable for discovery of site-specific cleavages within an purified protein or small ensemble of proteins using any protease.

Dataset Information

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics.

Publications

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets