Unknown

Dataset Information

0

FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator.


ABSTRACT: Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.

SUBMITTER: Bondugula R 

PROVIDER: S-EPMC2632928 | biostudies-literature | 2009 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator.

Bondugula Rajkumar R   Lee Michael S MS   Wallqvist Anders A  

Nucleic acids research 20081204 2


Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This pr  ...[more]

Similar Datasets

| S-EPMC7274753 | biostudies-literature
| S-EPMC8791938 | biostudies-literature
| S-EPMC7303698 | biostudies-literature
| S-EPMC4609292 | biostudies-literature
| S-EPMC2926617 | biostudies-literature
| S-EPMC6134161 | biostudies-literature
| S-EPMC6189833 | biostudies-other
| S-EPMC1764483 | biostudies-literature
| S-EPMC5996312 | biostudies-literature
| S-EPMC126071 | biostudies-literature