Unknown

Dataset Information

0

A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction.


ABSTRACT: Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.

SUBMITTER: Hu J 

PROVIDER: S-EPMC4168127 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction.

Hu Jun J   He Xue X   Yu Dong-Jun DJ   Yang Xi-Bei XB   Yang Jing-Yu JY   Shen Hong-Bin HB  

PloS one 20140917 9


Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residue  ...[more]

Similar Datasets

| S-EPMC1847833 | biostudies-other
| S-EPMC1993824 | biostudies-literature
| S-EPMC3189935 | biostudies-literature
| S-EPMC1630469 | biostudies-literature
| S-EPMC297010 | biostudies-literature
| S-EPMC4221654 | biostudies-literature
| S-EPMC2804298 | biostudies-literature
| S-EPMC2896077 | biostudies-literature
| S-EPMC4266947 | biostudies-literature
| S-EPMC5018369 | biostudies-literature