Unknown

Dataset Information

0

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.


ABSTRACT: BACKGROUND:DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. RESULTS:We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. CONCLUSIONS:The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.

SUBMITTER: Xu R 

PROVIDER: S-EPMC4331676 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.

Xu Ruifeng R   Zhou Jiyun J   Wang Hongpeng H   He Yulan Y   Wang Xiaolong X   Liu Bin B  

BMC systems biology 20150206


<h4>Background</h4>DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein in  ...[more]

Similar Datasets

| S-EPMC3787635 | biostudies-literature
| S-EPMC5618505 | biostudies-literature
| S-EPMC4057401 | biostudies-literature
| S-EPMC5519637 | biostudies-literature
| S-EPMC4097812 | biostudies-other
| S-EPMC5409512 | biostudies-literature
| S-EPMC4143758 | biostudies-literature
| S-EPMC1764469 | biostudies-literature
| S-EPMC5998058 | biostudies-literature
| S-EPMC1906837 | biostudies-other