Unknown

Dataset Information

0

Prediction of nuclear proteins using SVM and HMM models.


ABSTRACT:

Background

The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.

Results

All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins http://www.imtech.res.in/raghava/nppred/.

Conclusion

This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together.

SUBMITTER: Kumar M 

PROVIDER: S-EPMC2632991 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC4045734 | biostudies-literature
| S-EPMC1891633 | biostudies-literature
| S-EPMC1564421 | biostudies-literature
| S-EPMC3124801 | biostudies-literature
| S-EPMC2586131 | biostudies-literature
| S-EPMC3602657 | biostudies-literature
| S-EPMC2254373 | biostudies-literature
| S-EPMC3521467 | biostudies-literature
| S-EPMC2837750 | biostudies-literature
| S-EPMC3236849 | biostudies-literature