Unknown

Dataset Information

0

Prediction of protein cleavage site with feature selection by random forest.


ABSTRACT: Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.

SUBMITTER: Li BQ 

PROVIDER: S-EPMC3445488 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prediction of protein cleavage site with feature selection by random forest.

Li Bi-Qing BQ   Cai Yu-Dong YD   Feng Kai-Yan KY   Zhao Gui-Jun GJ  

PloS one 20120918 9


Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns o  ...[more]

Similar Datasets

| S-EPMC4206426 | biostudies-literature
| S-EPMC5775496 | biostudies-literature
| S-EPMC3530872 | biostudies-other
| S-EPMC7508310 | biostudies-literature
2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress