Unknown

Dataset Information

0

Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.


ABSTRACT: Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.

SUBMITTER: Li BQ 

PROVIDER: S-EPMC3429425 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.

Li Bi-Qing BQ   Feng Kai-Yan KY   Chen Lei L   Huang Tao T   Cai Yu-Dong YD  

PloS one 20120828 8


Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated f  ...[more]

Similar Datasets

| S-EPMC4413511 | biostudies-literature
2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress
2012-05-10 | GSE37858 | GEO
2022-05-16 | GSE189510 | GEO
| S-EPMC3530872 | biostudies-other
| S-EPMC2777180 | biostudies-literature
| S-EPMC4145740 | biostudies-literature
| S-EPMC10335767 | biostudies-literature
| S-EPMC2621338 | biostudies-literature
| S-EPMC3235115 | biostudies-literature