Dataset Information

Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.

ABSTRACT: Peptide intensities from mass spectra are increasingly used for relative quantitation of proteins in complex samples. However, numerous issues inherent to the mass spectrometry workflow turn quantitative proteomic data analysis into a crucial challenge. We and others have shown that modeling at the peptide level outperforms classical summarization-based approaches, which typically also discard a lot of proteins at the data preprocessing step. Peptide-based linear regression models, however, still suffer from unbalanced datasets due to missing peptide intensities, outlying peptide intensities and overfitting. Here, we further improve upon peptide-based models by three modular extensions: ridge regression, improved variance estimation by borrowing information across proteins with empirical Bayes and M-estimation with Huber weights. We illustrate our method on the CPTAC spike-in study and on a study comparing wild-type and ArgP knock-out Francisella tularensis proteomes. We show that the fold change estimates of our robust approach are more precise and more accurate than those from state-of-the-art summarization-based methods and peptide-based regression models, which leads to an improved sensitivity and specificity. We also demonstrate that ionization competition effects come already into play at very low spike-in concentrations and confirm that analyses with peptide-based regression methods on peptide intensity values aggregated by charge state and modification status (e.g. MaxQuant's peptides.txt file) are slightly superior to analyses on raw peptide intensity values (e.g. MaxQuant's evidence.txt file).

SUBMITTER: Goeminne LJ

PROVIDER: S-EPMC4739679 | biostudies-literature | 2016 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.

Goeminne Ludger J E LJ Gevaert Kris K Clement Lieven L

Molecular & cellular proteomics : MCP 20151113 2

Peptide intensities from mass spectra are increasingly used for relative quantitation of proteins in complex samples. However, numerous issues inherent to the mass spectrometry workflow turn quantitative proteomic data analysis into a crucial challenge. We and others have shown that modeling at the peptide level outperforms classical summarization-based approaches, which typically also discard a lot of proteins at the data preprocessing step. Peptide-based linear regression models, however, stil ...[more]

PMID: 26566788

Similar Datasets

Project description:BackgroundAlthough a great deal of rice proteomic research has been conducted, there are relatively few studies specifically addressing the rice grain proteome. The existing rice grain proteomic researches have focused on the identification of differentially expressed proteins or monitoring protein expression patterns during grain filling stages.ResultsProteins were extracted from rice grains 10, 20, and 30 days after flowering, as well as from fully mature grains. By merging all of the identified proteins in this study, we identified 4,172 non-redundant proteins with a wide range of molecular weights (from 5.2 kDa to 611 kDa) and pI values (from pH 2.9 to pH 12.6). A Genome Ontology category enrichment analysis for the 4,172 proteins revealed that 52 categories were enriched, including the carbohydrate metabolic process, transport, localization, lipid metabolic process, and secondary metabolic process. The relative abundances of the 1,784 reproducibly identified proteins were compared to detect 484 differentially expressed proteins during rice grain development. Clustering analysis and Genome Ontology category enrichment analysis revealed that proteins involved in the metabolic process were enriched through all stages of development, suggesting that proteome changes occurred even in the desiccation phase. Interestingly, enrichments of proteins involved in protein folding were detected in the desiccation phase and in fully mature grain.ConclusionThis is the first report conducting comprehensive identification of rice grain proteins. With a label free shotgun proteomic approach, we identified large number of rice grain proteins and compared the expression patterns of reproducibly identified proteins during rice grain development. Clustering analysis, Genome Ontology category enrichment analysis, and the analysis of composite expression profiles revealed dynamic changes of metabolisms during rice grain development. Interestingly, we detected that proteins involved in glycolysis, TCA-cycle, lipid metabolism, and proteolysis accumulated at higher levels in fully mature grain compared to grain developing stages, suggesting that the accumulation of these proteins during the desiccation stage may be associated with the preparation of proteins required in germination.

Dataset Information

Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.

Publications

Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets