Dataset Information

Considerations for peptide and protein error rate control in large-scale targeted DIA analyses: plasma analysis

ABSTRACT: Liquid chromatography coupled to tandem mass spectrometry has become the main method for high-throughput identification and quantification of peptides and the inferred proteins. Discovery proteomics commonly employs data-dependent acquisition in combination with spectrum-centric analysis. The accumulation of data generated from thousands of samples by this method has approached saturation coverage of different proteomes. Recently, as a result of technological advances, methods based on data acquisition strategies compatible with peptide-centric scoring have also reached similar proteome coverage in individual runs, and scalability. This is exemplified by SWATH-MS, which combines data-independent acquisition (DIA) with targeted data extraction of groups of transitions uniquely detecting a peptide. As the data matrices generated by these experiments continue to grow with respect to both the number of peptides identified per sample and the number of samples analyzed per study, challenges for error rate control have emerged. Here, we discuss the adaptation of statistical concepts developed for discovery proteomics based on spectrum-centric scoring to large-scale DIA experiments analyzed with peptide-centric scoring strategies, and provide some guidance on their application. We propose that, in order to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported at each level as we progress from spectral evidence to identified or detected peptides and inferred proteins. These confidence criteria should equally be applied to proteomic analyses based on spectrum- and peptide-centric scoring strategies.

INSTRUMENT(S):

ORGANISM(S): Homo Sapiens (human)

TISSUE(S): Blood Plasma

SUBMITTER: Isabell Bludau

LAB HEAD: Ruedi Aebersold

PROVIDER: PXD006625 | Pride | 2017-08-03

REPOSITORIES: Pride

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	CAL_nonparam_experimentwide_scored_pep001_concatenated.txt	Txt
	CAL_nonparam_global_protein_list_unique.txt	Txt
	CAL_nonparam_global_transition_group_list_unique.txt	Txt
	CAL_nonparam_report_grouped_by_FullPeptideName.pdf	Pdf
	CAL_nonparam_report_grouped_by_ProteinName.pdf	Pdf

Items per page:

1 - 5 of 26

Publications

Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses.

Rosenberger George G Bludau Isabell I Schmitt Uwe U Heusel Moritz M Hunter Christie L CL Liu Yansheng Y MacCoss Michael J MJ MacLean Brendan X BX Nesvizhskii Alexey I AI Pedrioli Patrick G A PGA Reiter Lukas L Röst Hannes L HL Tate Stephen S Ting Ying S YS Collins Ben C BC Aebersold Ruedi R

Nature methods 20170821 9

Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, as exemplified by the technique SWATH-MS, has emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale data sets. We demonstrate that statistical concepts developed for discovery proteomics ...[more]

PMID: 28825704

Publication: 1/2

Similar Datasets

Project description:Spectral library search (SLS) is a major approach for peptide identification from tandem mass spectrometry data, offering a complementary approach to conventional database search. Moreover, with the emergence of spectrum prediction models, proteomics database search is progressively becoming more like spectral library search of predicted peptide spectra. The performance of peptide identification algorithms thus frequently depends on how well the underlying Spectrum-Spectrum Matching (SSM) scoring functions distinguish true and false positive matches. However, detailed comparative studies evaluating the performance of SSM scoring functions remain limited by the absence of comprehensive benchmark datasets. We propose new methods to build benchmarks that assess the effectiveness and robustness of SSM scoring functions. The resulting benchmark dataset is composed of (i) a set of 476,063 precursors used to construct 8 query spectrum sets with different levels of noise added to "ideal" and real experimental spectra, and (ii) three spectral libraries with different spectra for the same 3,065,819 precursors: experimental spectra, annotated/de-noised spectra and predicted spectra. The benchmark set was then used to evaluate 9 common spectrum preprocessing scenarios, followed by the evaluation of 3 standard SSM scoring functions, Cosine, Projected-Cosine (commonly used for the analysis of chimeric/mixture spectra), and Jensen-Shannon divergence, and 2 additional scoring functions used in state-of-the-art SLS tools: SpectraST and EntropyScore. The results revealed that scoring spectrum-spectrum matches is still an important open problem, with the best recall for typical SLS searches still assessed to be poor at just ~70% at the typical 1% error rate. Overall, SpectraST performed best for spectra with little-to-no noise, but JS-divergence performed better in some cases as it was found to be most resistant to noise. Conversely, the performance of Cosine and Entropy score was found to be generally lower than previously reported, with Projected-Cosine performing especially poorly in most cases. However, the performance of the SSM scoring functions was also found to depend quite significantly on the minimum number of matching peaks required for each SSM, with benchmark results showing that the scoring functions' performance and relative ranking can be very significantly affected by how this important parameter is set. The resulting benchmark dataset can be used to test and support the development of SSM scoring functions and the proposed benchmark construction approach, providing a foundation that can be extended for additional types of spectrum-spectrum matching.

Project description:Sepsis is the major cause of mortality across intensive care units globally, yet details of accompanying pathological molecular events are unclear. This has resulted in ineffective development of sepsis-specific biomarkers and therapies, and suboptimal treatment regimens in preventing and reversing organ damage. Here, we used pharmacoproteomics to score treatment effects in a preclinical Escherichia coli sepsis model based on changes in the organ, cell, and plasma proteome landscapes. A combination of pathophysiological read-outs and time-resolved proteome maps of organs and blood enabled the definition of time-dependent and organotypic proteotypes of dysfunction and damage that was used to guide the administration of several combinations of beta-lactam antibiotic meropenem and immunomodulatory glucocorticoid methylprednisolone. The proteome-based scoring strategy revealed three distinct response patterns defined as intervention-specific reversions, non-reversions, and specific intervention-induced effects, revealing that the intervention effects depended on the underlying proteotype and varied significantly between organs. In the later stages of the disease, administration of glucocorticoids accentuated some of the positive effects exerted by Mem leading to superior reduction of the inflammatory response in the kidneys and partial restoration of the metabolic dysfunction instigated by sepsis. Unexpectedly, antibiotics introduced sepsis-independent perturbations in the mitochondrial proteome that was to some degree counteracted by glucocorticoids. In summary, this study provides a pharmacoproteomic resource describing the time-resolved septic organ failure landscape across organs and the blood compartment, and a novel scoring strategy that captures unintended secondary drug effects as an important criterion to consider while assessing therapeutic efficacy. Such information is critically important to enable a quantitative, objective and organotypic assessment of treatment benefits and unintended effects, drug synergies of candidate treatments as well as the effect of dose and time in murine sepsis models.