Unknown

Dataset Information

0

Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method.


ABSTRACT: Protein secondary structure prediction is one of the most important and challenging problems in bioinformatics. Machine learning techniques have been applied to solve the problem and have gained substantial success in this research area. However there is still room for improvement toward the theoretical limit. In this paper, we present a novel method for protein secondary structure prediction based on a data partition and semi-random subspace method (PSRSM). Data partitioning is an important strategy for our method. First, the protein training dataset was partitioned into several subsets based on the length of the protein sequence. Then we trained base classifiers on the subspace data generated by the semi-random subspace method, and combined base classifiers by majority vote rule into ensemble classifiers on each subset. Multiple classifiers were trained on different subsets. These different classifiers were used to predict the secondary structures of different proteins according to the protein sequence length. Experiments are performed on 25PDB, CB513, CASP10, CASP11, CASP12, and T100 datasets, and the good performance of 86.38%, 84.53%, 85.51%, 85.89%, 85.55%, and 85.09% is achieved respectively. Experimental results showed that our method outperforms other state-of-the-art methods.

SUBMITTER: Ma Y 

PROVIDER: S-EPMC6026213 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method.

Ma Yuming Y   Liu Yihui Y   Cheng Jinyong J  

Scientific reports 20180629 1


Protein secondary structure prediction is one of the most important and challenging problems in bioinformatics. Machine learning techniques have been applied to solve the problem and have gained substantial success in this research area. However there is still room for improvement toward the theoretical limit. In this paper, we present a novel method for protein secondary structure prediction based on a data partition and semi-random subspace method (PSRSM). Data partitioning is an important str  ...[more]

Similar Datasets

| S-EPMC1780123 | biostudies-literature
| S-EPMC8240957 | biostudies-literature
| S-EPMC1479840 | biostudies-literature
| S-EPMC4896422 | biostudies-literature
| S-EPMC3524942 | biostudies-literature
| S-EPMC4759688 | biostudies-literature
| S-EPMC3679165 | biostudies-literature
| S-EPMC3473038 | biostudies-literature
| S-EPMC4931104 | biostudies-literature
| S-EPMC5586386 | biostudies-literature