Dataset Information

A Bayesian sampler for optimization of protein domain hierarchies.

ABSTRACT: The process of identifying and modeling functionally divergent subgroups for a specific protein domain class and arranging these subgroups hierarchically has, thus far, largely been done via manual curation. How to accomplish this automatically and optimally is an unsolved statistical and algorithmic problem that is addressed here via Markov chain Monte Carlo sampling. Taking as input a (typically very large) multiple-sequence alignment, the sampler creates and optimizes a hierarchy by adding and deleting leaf nodes, by moving nodes and subtrees up and down the hierarchy, by inserting or deleting internal nodes, and by redefining the sequences and conserved patterns associated with each node. All such operations are based on a probability distribution that models the conserved and divergent patterns defining each subgroup. When we view these patterns as sequence determinants of protein function, each node or subtree in such a hierarchy corresponds to a subgroup of sequences with similar biological properties. The sampler can be applied either de novo or to an existing hierarchy. When applied to 60 protein domains from multiple starting points in this way, it converged on similar solutions with nearly identical log-likelihood ratio scores, suggesting that it typically finds the optimal peak in the posterior probability distribution. Similarities and differences between independently generated, nearly optimal hierarchies for a given domain help distinguish robust from statistically uncertain features. Thus, a future application of the sampler is to provide confidence measures for various features of a domain hierarchy.

SUBMITTER: Neuwald AF

PROVIDER: S-EPMC3948484 | biostudies-literature | 2014 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Bayesian sampler for optimization of protein domain hierarchies.

Neuwald Andrew F AF

Journal of computational biology : a journal of computational molecular cell biology 20140204 3

The process of identifying and modeling functionally divergent subgroups for a specific protein domain class and arranging these subgroups hierarchically has, thus far, largely been done via manual curation. How to accomplish this automatically and optimally is an unsolved statistical and algorithmic problem that is addressed here via Markov chain Monte Carlo sampling. Taking as input a (typically very large) multiple-sequence alignment, the sampler creates and optimizes a hierarchy by adding an ...[more]

PMID: 24494927

Dataset Information

A Bayesian sampler for optimization of protein domain hierarchies.

Publications

A Bayesian sampler for optimization of protein domain hierarchies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A Bayesian generative model for learning semantic hierarchies.
| S-EPMC4033064 | biostudies-literature

The Bayesian Mutation Sampler Explains Distributions of Causal Judgments.
| S-EPMC10320818 | biostudies-literature

Cortical hierarchies perform Bayesian causal inference in multisensory perception.
| S-EPMC4339735 | biostudies-literature

Bayesian Active Learning for Optimization and Uncertainty Quantification in Protein Docking.
| S-EPMC7429362 | biostudies-literature

Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution.
| S-EPMC8246133 | biostudies-literature

Bayesian optimization for seed germination.
| S-EPMC6487520 | biostudies-literature

Bayesian optimization for conformer generation.
| S-EPMC6528340 | biostudies-literature

Cost-informed Bayesian reaction optimization.
| S-EPMC11465108 | biostudies-literature

Bayesian optimization for demographic inference.
| S-EPMC10320152 | biostudies-literature

A multi-objective optimization approach accurately resolves protein domain architectures.
| S-EPMC4734041 | biostudies-literature