Project description:Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches.
Project description:Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Project description:An amine specific peptide derivatization strategy involving the use of novel isobaric stable isotope encoded 'fixed charge' sulfonium ion reagents, coupled with an analysis strategy employing capillary HPLC, ESI-MS, and automated data dependent ion trap CID-MS/MS, -MS(3), and/or ETD-MS/MS, has been developed for the improved quantitative analysis of protein phosphorylation, and for identification and characterization of their site(s) of modification. Derivatization of 50 synthetic phosphopeptides with S,S'-dimethylthiobutanoylhydroxysuccinimide ester iodide (DMBNHS), followed by analysis using capillary HPLC-ESI-MS, yielded an average 2.5-fold increase in ionization efficiencies and a significant increase in the presence and/or abundance of higher charge state precursor ions compared to the non-derivatized phosphopeptides. Notably, 44% of the phosphopeptides (22 of 50) in their underivatized states yielded precursor ions whose maximum charge states corresponded to +2, while only 8% (4 of 50) remained at this maximum charge state following DMBNHS derivatization. Quantitative analysis was achieved by measuring the abundances of the diagnostic product ions corresponding to the neutral losses of 'light' (S(CH(3))(2)) and 'heavy' (S(CD(3))(2)) dimethylsulfide exclusively formed upon CID-MS/MS of isobaric stable isotope labeled forms of the DMBNHS derivatized phosphopeptides. Under these conditions, the phosphate group stayed intact. Access for a greater number of peptides to provide enhanced phosphopeptide sequence identification and phosphorylation site characterization was achieved via automated data-dependent CID-MS(3) or ETD-MS/MS analysis due to the formation of the higher charge state precursor ions. Importantly, improved sequence coverage was observed using ETD-MS/MS following introduction of the sulfonium ion fixed charge, but with no detrimental effects on ETD fragmentation efficiency.
Project description:MS dissociation methods, including collision induced dissociation (CID), high energy collision dissociation (HCD), and electron transfer dissociation (ETD), can each contribute distinct peptidome identifications using conventional peptide identification methods (Shen et al. J. Proteome Res. 2011), but such samples still pose significant informatics challenges. In this work, we explored utilization of high accuracy fragment ion mass measurements, in this case provided by Fourier transform MS/MS, to improve peptidome peptide data set size and consistency relative to conventional descriptive and probabilistic scoring methods. For example, we identified 20-40% more peptides than SEQUEST, Mascot, and MS_GF scoring methods using high accuracy fragment ion information and the same false discovery rate (FDR) from CID, HCD, and ETD spectra. Identified species covered >90% of the collective identifications obtained using various conventional peptide identification methods, which significantly addresses the common issue of different data analysis methods generating different peptide data sets. Choice of peptide dissociation and high-precision measurement-based identification methods presently available for degradomic-peptidomic analyses needs to be based on the coverage and confidence (or specificity) afforded by the method, as well as practical issues (e.g., throughput). By using accurate fragment information, >1000 peptidome components can be identified from a single human blood plasma analysis with low peptide-level FDRs (e.g., 0.6%), providing an improved basis for investigating potential disease-related peptidome components.
Project description:We report on the effectiveness of CID, HCD, and ETD for LC-FT MS/MS analysis of peptides using a tandem linear ion trap-Orbitrap mass spectrometer. A range of software tools and analysis parameters were employed to explore the use of CID, HCD, and ETD to identify peptides (isolated from human blood plasma) without the use of specific "enzyme rules". In the evaluation of an FDR-controlled SEQUEST scoring method, the use of accurate masses for fragments increased the number of identified peptides (by ~50%) compared to the use of conventional low accuracy fragment mass information, and CID provided the largest contribution to the identified peptide data sets compared to HCD and ETD. The FDR-controlled Mascot scoring method provided significantly fewer peptide identifications than SEQUEST (by 1.3-2.3 fold) and CID, HCD, and ETD provided similar contributions to identified peptides. Evaluation of de novo sequencing and the UStags method for more intense fragment ions revealed that HCD afforded more contiguous residues (e.g., ≥ 7 amino acids) than either CID or ETD. Both the FDR-controlled SEQUEST and Mascot scoring methods provided peptide data sets that were affected by the decoy database used and mass tolerances applied (e.g., identical peptides between data sets could be limited to ~70%), while the UStags method provided the most consistent peptide data sets (>90% overlap). The m/z ranges in which CID, HCD, and ETD contributed the largest number of peptide identifications were substantially overlapping. This work suggests that the three peptide ion fragmentation methods are complementary and that maximizing the number of peptide identifications benefits significantly from a careful match with the informatics tools and methods applied. These results also suggest that the decoy strategy may inaccurately estimate identification FDRs.
Project description:CID has become a routine method for fragmentation of peptides in shotgun proteomics, whereas electron transfer dissociation (ETD) has been described as a preferred method for peptides carrying labile PTMs. Though both of these fragmentation techniques have their obvious advantages, they also have their own drawbacks. By combining data from CID and ETD fragmentation, some of these disadvantages can potentially be overcome because of the complementarity of fragment ions produced. To evaluate alternating CID and ETD fragmentation, we analyzed a complex mixture of phosphopeptides on an LTQ-Orbitrap mass spectrometer. When the CID and ETD-derived spectra were searched separately, we observed 2504, 491, 2584, and 3249 phosphopeptide-spectrum matches from CID alone, ETD alone, decision tree-based CID/ETD, and alternating CID and ETD, respectively. Combining CID and ETD spectra prior to database searching should, intuitively, be superior to either method alone. However, when spectra from the alternating CID and ETD method were merged prior to database searching, we observed a reduction in the number of phosphopeptide-spectrum matches. The poorer identification rates observed after merging CID and ETD spectra are a reflection of a lack of optimized search algorithms for carrying out such searches and perhaps inherent weaknesses of this approach. Thus, although alternating CID and ETD experiments for phosphopeptide identification are desirable for increasing the confidence of identifications, merging spectra prior to database search has to be carefully evaluated further in the context of the various algorithms before adopting it as a routine strategy.
Project description:Comprehensive analysis of the ubiquitylome is a prerequisite to fully understand the regulatory role of ubiquitylation. However, the impact of key mass spectrometry parameters on ubiquitylome analyses has not been fully explored. In this study, we show that using electron transfer dissociation (ETD) fragmentation, either exclusively or as part of a decision tree method, leads to ca. 2-fold increase in ubiquitylation site identifications in K-?-GG peptide-enriched samples over traditional collisional-induced dissociation (CID) or higher-energy collision dissociation (HCD) methods. Precursor ions were predominantly observed as 3+ charged species or higher and in a mass range 300-1200 m/z. N-ethylmaleimide was used as an alkylating agent to reduce false positive identifications resulting from overalkylation with halo-acetamides. These results demonstrate that the application of ETD fragmentation, in addition to narrowing the mass range and using N-ethylmaleimide yields more high-confidence ubiquitylation site identification than conventional CID and HCD analysis.
Project description:Cystine knots or nested disulfides are structurally difficult to characterize, despite current technological advances in peptide mapping with high-resolution liquid chromatography coupled with mass spectrometry (LC-MS). In the case of recombinant human arylsulfatase A (rhASA), there is one cystine knot at the C-terminal, a pair of nested disulfides at the middle, and two out of three unpaired cysteines in the N-terminal region. The statuses of these cysteines are critical structure attributes for rhASA function and stability that requires precise examination. We used a unique approach to determine the status and linkage of each cysteine in rhASA, which was comprised of multi-enzyme digestion strategies (from Lys-C, trypsin, Asp-N, pepsin, and PNGase F) and multi-fragmentation methods in mass spectrometry using electron transfer dissociation (ETD), collision induced dissociation (CID), and CID with MS(3) (after ETD). In addition to generating desired lengths of enzymatic peptides for effective fragmentation, the digestion pH was optimized to minimize the disulfide scrambling. The disulfide linkages, including the cystine knot and a pair of nested cysteines, unpaired cysteines, and the post-translational modification of a cysteine to formylglycine, were all determined. In the assignment, the disulfide linkages were Cys138-Cys154, Cys143-Cys150, Cys282-Cys396, Cys470-Cys482, Cys471-Cys484, and Cys475-Cys481. For the unpaired cysteines, Cys20 and Cys276 were free cysteines, and Cys51 was largely converted to formylglycine (>70%). A successful methodology has been developed, which can be routinely used to determine these difficult-to-resolve disulfide linkages, ensuring drug function and stability.
Project description:We describe a strategy for de novo peptide sequencing based on matched pairs of tandem mass spectra (MS/MS) obtained by collision induced dissociation (CID) and 351 nm ultraviolet photodissociation (UVPD). Each precursor ion is isolated twice with the mass spectrometer switching between CID and UVPD activation modes to obtain a complementary MS/MS pair. To interpret these paired spectra, we modified the UVnovo de novo sequencing software to automatically learn from and interpret fragmentation spectra, provided a representative set of training data. This machine learning procedure, using random forests, synthesizes information from one or multiple complementary spectra, such as the CID/UVPD pairs, into peptide fragmentation site predictions. In doing so, the burden of fragmentation model definition shifts from programmer to machine and opens up the model parameter space for inclusion of nonobvious features and interactions. This spectral synthesis also serves to transform distinct types of spectra into a common representation for subsequent activation-independent processing steps. Then, independent from precursor activation constraints, UVnovo's de novo sequencing procedure generates and scores sequence candidates for each precursor. We demonstrate the combined experimental and computational approach for de novo sequencing using whole cell E. coli lysate. In benchmarks on the CID/UVPD data, UVnovo assigned correct full-length sequences to 83% of the spectral pairs of doubly charged ions with high-confidence database identifications. Considering only top-ranked de novo predictions, 70% of the pairs were deciphered correctly. This de novo sequencing performance exceeds that of PEAKS and PepNovo on the CID spectra and that of UVnovo on CID or UVPD spectra alone. As presented here, the methods for paired CID/UVPD spectral acquisition and interpretation constitute a powerful workflow for high-throughput and accurate de novo peptide sequencing.
Project description:Untargeted global metabolic profiling by liquid chromatography-mass spectrometry generates numerous signals that are due to unknown compounds and whose identification forms an important challenge. The analysis of metabolite fragmentation patterns, following collision-induced dissociation, provides a valuable tool for identification, but can be severely impeded by close chromatographic coelution of distinct metabolites. We propose a new algorithm for identifying related parent-fragment pairs and for distinguishing these from signals due to unrelated compounds. Unlike existing methods, our approach addresses the problem by means of a hypothesis test that is based on the distribution of the recorded ion counts, and thereby provides a statistically rigorous measure of the uncertainty involved in the classification problem. Because of technological constraints, the test is of primary use at low and intermediate ion counts, above which detector saturation causes substantial bias to the recorded ion count. The validity of the test is demonstrated through its application to pairs of coeluting isotopologues and to known parent-fragment pairs, which results in test statistics consistent with the null distribution. The performance of the test is compared with a commonly used Pearson correlation approach and found to be considerably better (e.g., false positive rate of 6.25%, compared with a value of 50% for the correlation for perfectly coeluting ions). Because the algorithm may be used for the analysis of high-mass compounds in addition to metabolic data, we expect it to facilitate the analysis of fragmentation patterns for a wide range of analytical problems.