Unknown

Dataset Information

0

Dynamic programming procedure for searching optimal models to estimate substitution rates based on the maximum-likelihood method.


ABSTRACT: The substitution rate in a gene can provide valuable information for understanding its functionality and evolution. A widely used method to estimate substitution rates is the maximum-likelihood method implemented in the CODEML program in the PAML package. A limited number of branch models, chosen based on a priori information or an interest in a particular lineage(s), are tested, whereas a large number of potential models are neglected. A complementary approach is also needed to test all or a large number of possible models to search for the globally optional model(s) of maximum likelihood. However, the computational time for this search even in a small number of sequences becomes impractically long. Thus, it is desirable to explore the most probable spaces to search for the optimal models. Using dynamic programming techniques, we developed a simple computational method for searching the most probable optimal branch-specific models in a practically feasible computational time. We propose three search methods to find the optimal models, which explored O(n) (method 1) to O(n(2)) (method 2 and method 3) models when the given phylogeny has n branches. In addition, we derived a formula to calculate the number of all possible models, revealing the complexity of finding the optimal branch-specific model. We show that in a reanalysis of over 50 previously published studies, the vast majority obtained better models with significantly higher likelihoods than the conventional hypothesis model methods.

SUBMITTER: Zhang C 

PROVIDER: S-EPMC3093512 | biostudies-literature | 2011 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Dynamic programming procedure for searching optimal models to estimate substitution rates based on the maximum-likelihood method.

Zhang Chengjun C   Wang Jia J   Xie Weibo W   Zhou Gang G   Long Manyuan M   Zhang Qifa Q  

Proceedings of the National Academy of Sciences of the United States of America 20110426 19


The substitution rate in a gene can provide valuable information for understanding its functionality and evolution. A widely used method to estimate substitution rates is the maximum-likelihood method implemented in the CODEML program in the PAML package. A limited number of branch models, chosen based on a priori information or an interest in a particular lineage(s), are tested, whereas a large number of potential models are neglected. A complementary approach is also needed to test all or a la  ...[more]

Similar Datasets

| S-EPMC8599758 | biostudies-literature
| S-EPMC3649670 | biostudies-literature
| S-EPMC4833081 | biostudies-literature
2024-03-20 | GSE261769 | GEO
| S-EPMC7986967 | biostudies-literature
| S-EPMC2077249 | biostudies-literature
| S-EPMC3427362 | biostudies-other
| S-EPMC6800798 | biostudies-literature
| S-EPMC3173750 | biostudies-literature
| S-EPMC5850866 | biostudies-literature