Unknown

Dataset Information

0

Ortholog-Finder: A Tool for Constructing an Ortholog Data Set.


ABSTRACT: Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, "Ortholog-Finder," to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees.

SUBMITTER: Horiike T 

PROVIDER: S-EPMC4779612 | biostudies-literature | 2016 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Ortholog-Finder: A Tool for Constructing an Ortholog Data Set.

Horiike Tokumasa T   Minai Ryoichi R   Miyata Daisuke D   Nakamura Yoji Y   Tateno Yoshio Y  

Genome biology and evolution 20160118 2


Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, "Ortholog-Finder," to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program inc  ...[more]

Similar Datasets

| S-EPMC7912148 | biostudies-literature
| PRJEB39078 | ENA
| S-EPMC5705340 | biostudies-literature
| S-EPMC441490 | biostudies-literature
| S-EPMC5945774 | biostudies-literature
| S-EPMC7470986 | biostudies-literature
| S-EPMC2685110 | biostudies-literature
| S-EPMC8269244 | biostudies-literature
| S-EPMC3691714 | biostudies-literature
| S-EPMC4085541 | biostudies-literature