Unknown

Dataset Information

0

Prokaryotic phylogenies inferred from whole-genome sequence and annotation data.


ABSTRACT: Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.

SUBMITTER: Du W 

PROVIDER: S-EPMC3773407 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prokaryotic phylogenies inferred from whole-genome sequence and annotation data.

Du Wei W   Cao Zhongbo Z   Wang Yan Y   Sun Ying Y   Blanzieri Enrico E   Liang Yanchun Y  

BioMed research international 20130829


Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, gen  ...[more]

Similar Datasets

| S-EPMC1564419 | biostudies-literature
| S-EPMC5680185 | biostudies-literature
| S-EPMC3995342 | biostudies-literature
| S-EPMC6485071 | biostudies-literature
| S-EPMC5001611 | biostudies-literature
| S-EPMC4201946 | biostudies-literature
| S-EPMC4161746 | biostudies-literature
| S-EPMC7848924 | biostudies-literature
| S-EPMC2553438 | biostudies-literature
| S-EPMC10726699 | biostudies-literature