Unknown

Dataset Information

0

A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs.


ABSTRACT: Protein O-GlcNAcylation, involving the ?-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.

SUBMITTER: Kao HJ 

PROVIDER: S-EPMC4682369 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs.

Kao Hui-Ju HJ   Huang Chien-Hsun CH   Bretaña Neil Arvin NA   Lu Cheng-Tsung CT   Huang Kai-Yao KY   Weng Shun-Long SL   Lee Tzong-Yi TY  

BMC bioinformatics 20151209


Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based  ...[more]

Similar Datasets

| S-EPMC9283943 | biostudies-literature
| S-EPMC4433402 | biostudies-literature
| S-EPMC7296736 | biostudies-literature
| S-EPMC5698155 | biostudies-literature
| S-EPMC3040809 | biostudies-literature
2020-08-06 | GSE150880 | GEO
2019-07-24 | GSE132205 | GEO
| S-EPMC6851338 | biostudies-literature
| S-EPMC4979681 | biostudies-literature
| S-EPMC5493779 | biostudies-literature