Unknown

Dataset Information

0

RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity.


ABSTRACT: In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.

SUBMITTER: Ishikawa SA 

PROVIDER: S-EPMC3394461 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity.

Ishikawa Sohta A SA   Inagaki Yuji Y   Hashimoto Tetsuo T  

Evolutionary bioinformatics online 20120625


In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-ho  ...[more]

Similar Datasets

2024-03-20 | GSE261769 | GEO
| S-EPMC3151265 | biostudies-literature
| S-EPMC2792768 | biostudies-literature
| S-EPMC1156870 | biostudies-literature
| S-EPMC5751574 | biostudies-literature
| S-EPMC5102470 | biostudies-literature
| S-EPMC5907366 | biostudies-literature
| S-EPMC4838635 | biostudies-literature
| S-EPMC5947773 | biostudies-literature
| S-EPMC4168704 | biostudies-literature