Unknown

Dataset Information

0

LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine.


ABSTRACT: Background:Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. Methodology:In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. Results:By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. Conclusion:A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.

SUBMITTER: Wu M 

PROVIDER: S-EPMC7235397 | biostudies-literature | 2019 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine.

Wu Meiqi M   Lu Pengchao P   Yang Yingxi Y   Liu Liwen L   Wang Hui H   Xu Yan Y   Chu Jixun J  

Current genomics 20190801 5


<h4>Background</h4>Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to p  ...[more]

Similar Datasets

| S-EPMC4789356 | biostudies-literature
| S-EPMC2648769 | biostudies-literature
| S-EPMC5588793 | biostudies-literature
| S-EPMC4352747 | biostudies-other
| S-EPMC3924408 | biostudies-literature
| S-EPMC7283444 | biostudies-literature
| S-EPMC2220009 | biostudies-literature
| S-EPMC1594580 | biostudies-literature
| S-EPMC6854775 | biostudies-literature
| S-EPMC6580503 | biostudies-literature