Unknown

Dataset Information

0

Large scale hierarchical clustering of protein sequences.


ABSTRACT: BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. RESULTS: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at http://systers.molgen.mpg.de/. CONCLUSIONS: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.

SUBMITTER: Krause A 

PROVIDER: S-EPMC547898 | biostudies-literature | 2005

REPOSITORIES: biostudies-literature

altmetric image

Publications

Large scale hierarchical clustering of protein sequences.

Krause Antje A   Stoye Jens J   Vingron Martin M  

BMC bioinformatics 20050122


<h4>Background</h4>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.<h4>Results</h4>We report on our developments in grouping all known protein sequences hierarchically into superfamily and family  ...[more]

Similar Datasets

| S-EPMC3892691 | biostudies-literature
| S-EPMC3535721 | biostudies-literature
| S-EPMC2147039 | biostudies-literature
| S-EPMC3443659 | biostudies-literature
| S-EPMC1409676 | biostudies-literature
| S-EPMC1366497 | biostudies-literature
| S-EPMC3218420 | biostudies-other
| S-EPMC9013733 | biostudies-literature
| S-EPMC7334672 | biostudies-literature
| S-EPMC9381292 | biostudies-literature