Unknown

Dataset Information

0

Clustering of protein domains for functional and evolutionary studies.


ABSTRACT: BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.

SUBMITTER: Goldstein P 

PROVIDER: S-EPMC2770074 | biostudies-literature | 2009

REPOSITORIES: biostudies-literature

altmetric image

Publications

Clustering of protein domains for functional and evolutionary studies.

Goldstein Pavle P   Zucko Jurica J   Vujaklija Dusica D   Krisko Anita A   Hranueli Daslav D   Long Paul F PF   Etchebest Catherine C   Basrak Bojan B   Cullum John J  

BMC bioinformatics 20091015


<h4>Background</h4>The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering.<h4>Results</h4>An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a  ...[more]

Similar Datasets

| S-EPMC2394772 | biostudies-literature
| S-EPMC4760082 | biostudies-literature
| S-EPMC4256011 | biostudies-literature
| S-EPMC8670047 | biostudies-literature
| S-EPMC3501052 | biostudies-literature
| S-EPMC1669739 | biostudies-literature
| S-EPMC33283 | biostudies-literature
| S-EPMC99888 | biostudies-literature
| S-EPMC3429928 | biostudies-literature
| S-EPMC8409312 | biostudies-literature