An integrated MS data processing strategy for fast identification, in-depth and reproducible quantification of protein O-glycosylation in large cohorts of human urine samples
Ontology highlight
ABSTRACT: Protein O-glycosylation has long been recognized to be closely associated with many diseases, particularly with tumor proliferation, invasion and metastasis. The ability to efficiently profile the variation of O-glycosylation in large-scale clinical samples provides an important approach for the development of biomarkers for cancer diagnosis and for therapeutic response evaluation. Therefore, mass spectrometry (MS)-based techniques for high throughput, in-depth and reliable elucidation of protein O-glycosylation in large clinical cohorts are in high demand. However, the wide existence of serine and threonine residues in the proteome and the tens of mammalian O-glycan types lead to extremely large searching space composed of millions of theoretical combinations of peptides and O-glycans for intact O-glycopeptide database searching. As a result, exceptionally long time is required for database searching which is a major obstacle in O-glycoproteome studies of large clinical cohorts. More importantly, due to the low abundance and poor ionization of intact O-glycopeptides and the stochastic nature of data-dependent MS2 acquisition, substantially elevated missing data levels are inevitable as the sample number increases, which undermines the quantitative comparison across samples. Therefore, we report a new MS data processing strategy that integrates glycoform-specific database searching, reference library-based MS1 feature matching and MS2 identification propagation for fast identification, in-depth and reproducible label-free quantification of O-glycosylation of human urinary proteins. This strategy increases the database searching speeds by up to 20-fold and leads to a 30-40% enhanced intact O-glycopeptide quantification in individual samples with an obviously improved reproducibility. In total, we obtained quantitative information for 1068 intact O-glycopeptides across 36 healthy human urine samples with a 30-40% reduction in the amount of missing data. This is currently the largest dataset of urinary O-glycoproteome and demonstrates the application potential of this new strategy in large-scale clinical investigations.
INSTRUMENT(S): Q Exactive
ORGANISM(S): Homo Sapiens (human)
TISSUE(S): Urine
SUBMITTER: Xinyuan Zhao
LAB HEAD: Weijie Qin
PROVIDER: PXD015987 | Pride | 2020-02-04
REPOSITORIES: Pride
ACCESS DATA