Unknown

Dataset Information

0

Probing an optimal class distribution for enhancing prediction and feature characterization of plant virus-encoded RNA-silencing suppressors.


ABSTRACT: To counter the host RNA silencing defense mechanism, many plant viruses encode RNA silencing suppressor proteins. These groups of proteins share very low sequence and structural similarities among them, which consequently hamper their annotation using sequence similarity-based search methods. Alternatively the machine learning-based methods can become a suitable choice, but the optimal performance through machine learning-based methods is being affected by various factors such as class imbalance, incomplete learning, selection of inappropriate features, etc. In this paper, we have proposed a novel approach to deal with the class imbalance problem by finding the optimal class distribution for enhancing the prediction accuracy for the RNA silencing suppressors. The optimal class distribution was obtained using different resampling techniques with varying degrees of class distribution starting from natural distribution to ideal distribution, i.e., equal distribution. The experimental results support the fact that optimal class distribution plays an important role to achieve near perfect learning. The best prediction results are obtained with Sequential Minimal Optimization (SMO) learning algorithm. We could achieve a sensitivity of 98.5 %, specificity of 92.6 % with an overall accuracy of 95.3 % on a tenfold cross validation and is further validated using leave one out cross validation test. It was also observed that the machine learning models trained on oversampled training sets using synthetic minority oversampling technique (SMOTE) have relatively performed better than on both randomly undersampled and imbalanced training data sets. Further, we have characterized the important discriminatory sequence features of RNA-silencing suppressors which distinguish these groups of proteins from other protein families.

SUBMITTER: Nath A 

PROVIDER: S-EPMC4801844 | biostudies-other | 2016 Jun

REPOSITORIES: biostudies-other

altmetric image

Publications

Probing an optimal class distribution for enhancing prediction and feature characterization of plant virus-encoded RNA-silencing suppressors.

Nath Abhigyan A   Subbiah Karthikeyan K  

3 Biotech 20160321 1


To counter the host RNA silencing defense mechanism, many plant viruses encode RNA silencing suppressor proteins. These groups of proteins share very low sequence and structural similarities among them, which consequently hamper their annotation using sequence similarity-based search methods. Alternatively the machine learning-based methods can become a suitable choice, but the optimal performance through machine learning-based methods is being affected by various factors such as class imbalance  ...[more]

Similar Datasets

| S-EPMC4020838 | biostudies-literature
| S-EPMC3682945 | biostudies-literature
| S-EPMC524217 | biostudies-literature
| S-EPMC5599751 | biostudies-literature
| S-EPMC2800190 | biostudies-literature
| S-EPMC4049077 | biostudies-literature
| S-EPMC3911637 | biostudies-literature
| S-EPMC7924679 | biostudies-literature
| S-EPMC3737526 | biostudies-literature
| S-EPMC3773815 | biostudies-literature