Unknown

Dataset Information

0

Top-down clustering for protein subfamily identification.


ABSTRACT: We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.

SUBMITTER: Costa EP 

PROVIDER: S-EPMC3653887 | biostudies-other | 2013

REPOSITORIES: biostudies-other

altmetric image

Publications

Top-down clustering for protein subfamily identification.

Costa Eduardo P EP   Vens Celine C   Blockeel Hendrik H  

Evolutionary bioinformatics online 20130506


We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates  ...[more]

Similar Datasets

| S-EPMC6309053 | biostudies-literature
| 67720 | ecrin-mdr-crc
| S-EPMC8130575 | biostudies-literature
| S-EPMC2704838 | biostudies-literature
| S-EPMC5397417 | biostudies-literature
| S-EPMC6519736 | biostudies-literature
| S-EPMC5825287 | biostudies-literature
| S-EPMC4922564 | biostudies-literature
| S-EPMC3905687 | biostudies-literature
| S-EPMC3247958 | biostudies-literature