Unknown

Dataset Information

0

Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues.


ABSTRACT: The emergence of structural genomics presents significant challenges in the annotation of biologically uncharacterized proteins. Unfortunately, our ability to analyze these proteins is restricted by the limited catalog of known molecular functions and their associated 3D motifs.In order to identify novel 3D motifs that may be associated with molecular functions, we employ an unsupervised, two-phase clustering approach that combines k-means and hierarchical clustering with knowledge-informed cluster selection and annotation methods. We applied the approach to approximately 20,000 cysteine-based protein microenvironments (3D regions 7.5 A in radius) and identified 70 interesting clusters, some of which represent known motifs (e.g. metal binding and phosphatase activity), and some of which are novel, including several zinc binding sites. Detailed annotation results are available online for all 70 clusters at http://feature.stanford.edu/clustering/cys.The use of microenvironments instead of backbone geometric criteria enables flexible exploration of protein function space, and detection of recurring motifs that are discontinuous in sequence and diverse in structure. Clustering microenvironments may thus help to functionally characterize novel proteins and better understand the protein structure-function relationship.

SUBMITTER: Wu S 

PROVIDER: S-EPMC2833161 | biostudies-literature | 2010 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues.

Wu Shirley S   Liu Tianyun T   Altman Russ B RB  

BMC structural biology 20100202


<h4>Background</h4>The emergence of structural genomics presents significant challenges in the annotation of biologically uncharacterized proteins. Unfortunately, our ability to analyze these proteins is restricted by the limited catalog of known molecular functions and their associated 3D motifs.<h4>Results</h4>In order to identify novel 3D motifs that may be associated with molecular functions, we employ an unsupervised, two-phase clustering approach that combines k-means and hierarchical clus  ...[more]

Similar Datasets

| S-EPMC5227694 | biostudies-literature
| S-EPMC3092797 | biostudies-literature
2009-05-05 | GSE12262 | GEO
| S-EPMC5321175 | biostudies-other
| S-EPMC2777906 | biostudies-literature
| S-EPMC6376049 | biostudies-literature
| S-EPMC8900821 | biostudies-literature
| S-EPMC3536310 | biostudies-literature
| S-EPMC6410768 | biostudies-literature
| S-EPMC4359970 | biostudies-literature