Unknown

Dataset Information

0

Phylogeny determined by protein domain content.


ABSTRACT: A simple classification scheme that uses only the presence or absence of a protein domain architecture has been used to determine the phylogeny of 174 complete genomes. The method correctly divides the 174 taxa into Archaea, Bacteria, and Eukarya and satisfactorily sorts most of the major groups within these superkingdoms. The most challenging problem involved 119 Bacteria, many of which have reduced genomes. When a weighting factor was used that takes account of difference in genome size (number of considered folds), small-genome taxa were mostly grouped with their full-sized counterparts. Although not every organism appears exactly at its classical phylogenetic position in these trees, the agreement appears comparable with the efforts of others by using sophisticated sequence analysis and/or combinations of gene content and gene order. During the course of the study, it emerged that there is a core set of approximately 50 folds that is found in all 174 genomes and a single fold diagnostic of all Archaea.

SUBMITTER: Yang S 

PROVIDER: S-EPMC540256 | biostudies-literature | 2005 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Phylogeny determined by protein domain content.

Yang Song S   Doolittle Russell F RF   Bourne Philip E PE  

Proceedings of the National Academy of Sciences of the United States of America 20050103 2


A simple classification scheme that uses only the presence or absence of a protein domain architecture has been used to determine the phylogeny of 174 complete genomes. The method correctly divides the 174 taxa into Archaea, Bacteria, and Eukarya and satisfactorily sorts most of the major groups within these superkingdoms. The most challenging problem involved 119 Bacteria, many of which have reduced genomes. When a weighting factor was used that takes account of difference in genome size (numbe  ...[more]

Similar Datasets

| S-EPMC25829 | biostudies-literature
2014-08-18 | GSE60404 | GEO
2014-08-18 | E-GEOD-60404 | biostudies-arrayexpress
| S-EPMC3063783 | biostudies-literature
| S-EPMC54485 | biostudies-other
| S-EPMC3562099 | biostudies-literature
| S-EPMC3539735 | biostudies-literature
| S-EPMC6171248 | biostudies-literature
| S-EPMC4422434 | biostudies-literature
| S-EPMC4576195 | biostudies-literature