Unknown

Dataset Information

0

Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.


ABSTRACT: Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods.

SUBMITTER: Uchiyama I 

PROVIDER: S-EPMC1351371 | biostudies-literature | 2006

REPOSITORIES: biostudies-literature

altmetric image

Publications

Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.

Uchiyama Ikuo I  

Nucleic acids research 20060125 2


Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent proce  ...[more]

Similar Datasets

| S-EPMC3218420 | biostudies-other
| S-EPMC5509293 | biostudies-literature
| S-EPMC3443659 | biostudies-literature
| S-EPMC5870696 | biostudies-literature
| S-EPMC3519615 | biostudies-literature
| S-EPMC3544860 | biostudies-literature
| S-EPMC10589539 | biostudies-literature
| S-EPMC6415607 | biostudies-literature
| S-EPMC2922894 | biostudies-literature
| S-EPMC7480017 | biostudies-literature