Unknown

Dataset Information

0

PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.


ABSTRACT: The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.

SUBMITTER: Chatterjee P 

PROVIDER: S-EPMC4788683 | biostudies-literature | 2016 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

Chatterjee Piyali P   Basu Subhadip S   Zubek Julian J   Kundu Mahantapas M   Nasipuri Mita M   Plewczynski Dariusz D  

Journal of molecular modeling 20160311 4


The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decisio  ...[more]

Similar Datasets

| S-EPMC4290662 | biostudies-literature
| S-EPMC5741869 | biostudies-literature
| S-EPMC3710640 | biostudies-literature
| S-EPMC4908364 | biostudies-literature
| S-EPMC5737734 | biostudies-literature
| S-EPMC3397139 | biostudies-literature
| S-EPMC3978449 | biostudies-literature
| S-EPMC137409 | biostudies-literature
| S-EPMC4529986 | biostudies-literature
| S-EPMC3380730 | biostudies-literature