Unknown

Dataset Information

0

Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.


ABSTRACT: Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest.We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.

SUBMITTER: Liang LJ 

PROVIDER: S-EPMC2800350 | biostudies-literature | 2009 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

Liang Li-Jung LJ   Weiss Robert E RE   Redelings Benjamin B   Suchard Marc A MA  

Bioinformatics (Oxford, England) 20090806 19


<h4>Motivation</h4>Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model p  ...[more]

Similar Datasets

| S-EPMC3677258 | biostudies-literature
| S-EPMC7745353 | biostudies-literature
| S-EPMC2375134 | biostudies-literature
| S-EPMC8615982 | biostudies-literature
| S-EPMC9226685 | biostudies-literature
| S-EPMC2824819 | biostudies-literature
| S-EPMC7493370 | biostudies-literature
| S-EPMC4645715 | biostudies-literature
| S-EPMC3477105 | biostudies-literature
| S-EPMC3013781 | biostudies-literature