Dataset Information

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

ABSTRACT: The two key steps for analyzing proteomic data generated by high-resolution MS are database searching and postprocessing. While the two steps are interrelated, studies on their combinatory effects and the optimization of these procedures have not been adequately conducted. Here, we investigated the performance of three popular search engines (SEQUEST, Mascot, and MS Amanda) in conjunction with five filtering approaches, including respective score-based filtering, a group-based approach, local false discovery rate (LFDR), PeptideProphet, and Percolator. A total of eight data sets from various proteomes (e.g., E. coli, yeast, and human) produced by various instruments with high-accuracy survey scan (MS1) and high- or low-accuracy fragment ion scan (MS2) (LTQ-Orbitrap, Orbitrap-Velos, Orbitrap-Elite, Q-Exactive, Orbitrap-Fusion, and Q-TOF) were analyzed. It was found combinations involving Percolator achieved markedly more peptide and protein identifications at the same FDR level than the other 12 combinations for all data sets. Among these, combinations of SEQUEST-Percolator and MS Amanda-Percolator provided slightly better performances for data sets with low-accuracy MS2 (ion trap or IT) and high accuracy MS2 (Orbitrap or TOF), respectively, than did other methods. For approaches without Percolator, SEQUEST-group performs the best for data sets with MS2 produced by collision-induced dissociation (CID) and IT analysis; Mascot-LFDR gives more identifications for data sets generated by higher-energy collisional dissociation (HCD) and analyzed in Orbitrap (HCD-OT) and in Orbitrap Fusion (HCD-IT); MS Amanda-Group excels for the Q-TOF data set and the Orbitrap Velos HCD-OT data set. Therefore, if Percolator was not used, a specific combination should be applied for each type of data set. Moreover, a higher percentage of multiple-peptide proteins and lower variation of protein spectral counts were observed when analyzing technical replicates using Percolator-associated combinations; therefore, Percolator enhanced the reliability for both identification and quantification. The analyses were performed using the specific programs embedded in Proteome Discoverer, Scaffold, and an in-house algorithm (BuildSummary). These results provide valuable guidelines for the optimal interpretation of proteomic results and the development of fit-for-purpose protocols under different situations.

SUBMITTER: Tu C

PROVIDER: S-EPMC4859434 | biostudies-literature | 2015 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

Tu Chengjian C Sheng Quanhu Q Li Jun J Ma Danjun D Shen Xiaomeng X Wang Xue X Shyr Yu Y Yi Zhengping Z Qu Jun J

Journal of proteome research 20150930 11

The two key steps for analyzing proteomic data generated by high-resolution MS are database searching and postprocessing. While the two steps are interrelated, studies on their combinatory effects and the optimization of these procedures have not been adequately conducted. Here, we investigated the performance of three popular search engines (SEQUEST, Mascot, and MS Amanda) in conjunction with five filtering approaches, including respective score-based filtering, a group-based approach, local fa ...[more]

PMID: 26390080

Similar Datasets

Project description:ImportanceInternet-based search engine and social media data may provide a novel complementary source for better understanding the epidemiologic factors of infectious eye diseases, which could better inform eye health care and disease prevention.ObjectiveTo assess whether data from internet-based social media and search engines are associated with objective clinic-based diagnoses of conjunctivitis.Design, setting, and participantsData from encounters of 4143 patients diagnosed with conjunctivitis from June 3, 2012, to April 26, 2014, at the University of California San Francisco (UCSF) Medical Center, were analyzed using Spearman rank correlation of each weekly observation to compare demographics and seasonality of nonallergic conjunctivitis with allergic conjunctivitis. Data for patient encounters with diagnoses for glaucoma and influenza were also obtained for the same period and compared with conjunctivitis. Temporal patterns of Twitter and Google web search data, geolocated to the United States and associated with these clinical diagnoses, were compared with the clinical encounters. The a priori hypothesis was that weekly internet-based searches and social media posts about conjunctivitis may reflect the true weekly clinical occurrence of conjunctivitis.Main outcomes and measuresWeekly total clinical diagnoses at UCSF of nonallergic conjunctivitis, allergic conjunctivitis, glaucoma, and influenza were compared using Spearman rank correlation with equivalent weekly data on Tweets related to disease or disease-related keyword searches obtained from Google Trends.ResultsSeasonality of clinical diagnoses of nonallergic conjunctivitis among the 4143 patients (2364 females [57.1%] and 1776 males [42.9%]) with 5816 conjunctivitis encounters at UCSF correlated strongly with results of Google searches in the United States for the term pink eye (ρ, 0.68 [95% CI, 0.52 to 0.78]; P < .001) and correlated moderately with Twitter results about pink eye (ρ, 0.38 [95% CI, 0.16 to 0.56]; P < .001) and with clinical diagnosis of influenza (ρ, 0.33 [95% CI, 0.12 to 0.49]; P < .001), but did not significantly correlate with seasonality of clinical diagnoses of allergic conjunctivitis diagnosis at UCSF (ρ, 0.21 [95% CI, -0.02 to 0.42]; P = .06) or with results of Google searches in the United States for the term eye allergy (ρ, 0.13 [95% CI, -0.06 to 0.32]; P = .19). Seasonality of clinical diagnoses of allergic conjunctivitis at UCSF correlated strongly with results of Google searches in the United States for the term eye allergy (ρ, 0.44 [95% CI, 0.24 to 0.60]; P < .001) and eye drops (ρ, 0.47 [95% CI, 0.27 to 0.62]; P < .001).Conclusions and relevanceInternet-based search engine and social media data may reflect the occurrence of clinically diagnosed conjunctivitis, suggesting that these data sources can be leveraged to better understand the epidemiologic factors of conjunctivitis.

Dataset Information

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

Publications

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets