Unknown

Dataset Information

0

Probabilistic grammatical model for helix-helix contact site classification.


ABSTRACT: BACKGROUND:Hidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium- and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS:In this work, we present a probabilistic grammatical framework for problem-specific protein languages and apply it to classification of transmembrane helix-helix pairs configurations. The core of the model consists of a probabilistic context-free grammar, automatically inferred by a genetic algorithm from only a generic set of expert-based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix-helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix-helix contact sites. CONCLUSIONS:We demonstrated that our probabilistic context-free framework for analysis of protein sequences outperforms the state of the art in the task of helix-helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human-readable. Thus they could provide biologically meaningful information for molecular biologists.

SUBMITTER: Dyrka W 

PROVIDER: S-EPMC3892132 | biostudies-literature | 2013 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Probabilistic grammatical model for helix-helix contact site classification.

Dyrka Witold W   Nebel Jean-Christophe JC   Kotulska Malgorzata M  

Algorithms for molecular biology : AMB 20131218 1


<h4>Background</h4>Hidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium- and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited.<h4>Results</h4>In this work, we present a probabilis  ...[more]

Similar Datasets

| S-EPMC1764471 | biostudies-literature
| S-EPMC8337008 | biostudies-literature
| S-EPMC3705624 | biostudies-literature
| S-EPMC4339708 | biostudies-literature
| S-EPMC3219965 | biostudies-literature
| S-EPMC5476507 | biostudies-literature
| S-EPMC6510497 | biostudies-literature
| S-EPMC6428041 | biostudies-literature
2024-10-03 | GSE252687 | GEO
2024-10-03 | GSE252642 | GEO