Unknown

Dataset Information

0

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone.


ABSTRACT: BACKGROUND: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. RESULTS: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. CONCLUSIONS: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.

SUBMITTER: Chen P 

PROVIDER: S-EPMC4271564 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone.

Chen Peng P   Huang Jianhua Z JZ   Gao Xin X  

BMC bioinformatics 20141203


<h4>Background</h4>Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available.<h4>Results  ...[more]

Similar Datasets

| S-EPMC2709252 | biostudies-literature
| S-EPMC6823902 | biostudies-literature
| S-EPMC4329842 | biostudies-literature
| S-EPMC5132331 | biostudies-literature
| S-EPMC9941877 | biostudies-literature
| S-EPMC2638931 | biostudies-literature
| S-EPMC3530872 | biostudies-literature
| S-EPMC7988984 | biostudies-literature
| S-EPMC8413034 | biostudies-literature
| S-EPMC5881105 | biostudies-literature