Unknown

Dataset Information

0

Effective moment feature vectors for protein domain structures.


ABSTRACT: Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100-400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on some key observations on DMs, we are able to decompose a DM image into four basic binary images, each representing the structural characteristics of a fundamental secondary structure element (SSE) or a motif in the domain. Using the concept of moments in image processing, we further derive 45 structural features based on the four binary images. Together with 4 features extracted from the basic images, we represent the structure of a domain using 49 features. We show that our feature vectors can represent domain structures effectively in terms of the following. (1) We show a higher accuracy for domain classification. (2) We show a clear and consistent distribution of domains using our proposed structural vector space. (3) We are able to cluster the domains according to our moment features and demonstrate a relationship between structural variation and functional diversity.

SUBMITTER: Shi JY 

PROVIDER: S-EPMC3877117 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Effective moment feature vectors for protein domain structures.

Shi Jian-Yu JY   Yiu Siu-Ming SM   Zhang Yan-Ning YN   Chin Francis Yuk-Lun FY  

PloS one 20131231 12


Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100-400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on som  ...[more]

Similar Datasets

| S-EPMC2832692 | biostudies-literature
| S-EPMC4452346 | biostudies-literature
| S-EPMC2098860 | biostudies-literature
| S-EPMC4547178 | biostudies-literature
| S-EPMC2666633 | biostudies-literature
| S-EPMC10833351 | biostudies-literature
| S-EPMC1635331 | biostudies-literature
| S-EPMC3376124 | biostudies-other
| S-EPMC7924679 | biostudies-literature
| S-EPMC4731830 | biostudies-literature