Dataset Information

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

ABSTRACT: As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.

SUBMITTER: Zhang Y

PROVIDER: S-EPMC6954445 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

Zhang Yanju Y Xie Ruopeng R Wang Jiawei J Leier André A Marquez-Lago Tatiana T TT Akutsu Tatsuya T Webb Geoffrey I GI Chou Kuo-Chen KC Song Jiangning J

Briefings in bioinformatics 20191101 6

As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required f ...[more]

PMID: 30351377

Dataset Information

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

Publications

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration : Malonylation site prediction.
| S-EPMC7682087 | biostudies-literature

Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection.
| S-EPMC5133563 | biostudies-literature

Lysine succinylation and lysine malonylation in histones.
| S-EPMC3418837 | biostudies-literature

Incorporating hybrid models into lysine malonylation sites prediction on mammalian and plant proteins.
| S-EPMC7324624 | biostudies-literature

Metabolic Regulation by Lysine Malonylation, Succinylation, and Glutarylation.
| S-EPMC4563717 | biostudies-literature

SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome.
| S-EPMC6796762 | biostudies-literature

Global Proteomic Analysis of Lysine Malonylation in Toxoplasma gondii.
| S-EPMC7198775 | biostudies-literature

Computational Identification of Lysine Glutarylation Sites Using Positive-Unlabeled Learning.
| S-EPMC7521029 | biostudies-literature

Lysine malonylation and propionylation are prevalent in human lens proteins.
| S-EPMC6957740 | biostudies-literature

Systematic analysis of lysine malonylation in <i>Streptococcus mutans</i>.
| S-EPMC9742479 | biostudies-literature