Unknown

Dataset Information

0

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.


ABSTRACT:

Background

DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs.

Results

Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing.

Conclusions

Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.

SUBMITTER: Wang W 

PROVIDER: S-EPMC5469069 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.

Wang Wei W   Sun Lin L   Zhang Shiguang S   Zhang Hongjun H   Shi Jinling J   Xu Tianhe T   Li Keliang K  

BMC bioinformatics 20170612 1


<h4>Background</h4>DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. I  ...[more]

Similar Datasets

| S-EPMC4243121 | biostudies-literature
| S-EPMC8687298 | biostudies-literature
| S-EPMC6982935 | biostudies-literature
| S-EPMC5587031 | biostudies-literature
| S-EPMC2248737 | biostudies-literature
| S-EPMC4176320 | biostudies-literature
| S-EPMC3688935 | biostudies-literature
| S-EPMC3033795 | biostudies-literature
| S-EPMC3226063 | biostudies-literature
| S-EPMC9496475 | biostudies-literature