Unknown

Dataset Information

0

PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context.


ABSTRACT: Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.

SUBMITTER: Zhou J 

PROVIDER: S-EPMC4901350 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context.

Zhou Jiyun J   Xu Ruifeng R   He Yulan Y   Lu Qin Q   Wang Hongpeng H   Kong Bing B  

Scientific reports 20160610


Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predicto  ...[more]

Similar Datasets

| S-EPMC5815098 | biostudies-literature
| S-EPMC3259773 | biostudies-literature
| S-EPMC3156709 | biostudies-literature
| S-EPMC5930480 | biostudies-literature
2013-05-25 | GSE46611 | GEO
2013-05-25 | E-GEOD-46611 | biostudies-arrayexpress
| S-EPMC3478195 | biostudies-literature
| S-EPMC3818907 | biostudies-literature
| S-EPMC2777810 | biostudies-literature
| S-EPMC1144969 | biostudies-other