Unknown

Dataset Information

0

Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data.


ABSTRACT:

Background

To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find.

Results

Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T-site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation is: ~(W-S/T-W), where "~" denotes the "not" operator. 2) The consensus sequon for phosphorylation is ~(W-S/T/Y/H-W); although W-S/T/Y/H-W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N-glycosylated sequences are good approximations to non-O-glycosylatable sequences; although N - ~P - S/T is not an absolute inhibitor of O-glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S/T-site in the N - ~P - S/T sequon. 6) ASA values for N-glycosylated sequences are stochastically larger than those for O-GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O-GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O-GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%.

Conclusions

The data indicates the existence of a consensus sequon for O-glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O-glycosylation.

SUBMITTER: Gana R 

PROVIDER: S-EPMC6599295 | biostudies-literature | 2019 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data.

Gana Rajaram R   Vasudevan Sona S  

BMC molecular and cell biology 20190628 1


<h4>Background</h4>To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find.<h4>Results</h4>Three sequences from similar post-translat  ...[more]

Similar Datasets

| S-EPMC7245687 | biostudies-literature
| S-EPMC7204127 | biostudies-literature
| S-EPMC7702219 | biostudies-literature
| S-EPMC2819990 | biostudies-literature
| S-EPMC4377081 | biostudies-other
| S-EPMC5662232 | biostudies-literature
| S-EPMC2144550 | biostudies-other
| S-EPMC10950455 | biostudies-literature
| S-EPMC9807218 | biostudies-literature
| S-EPMC3429469 | biostudies-literature