Unknown

Dataset Information

0

TRILOGY: Discovery of sequence-structure patterns across diverse proteins.


ABSTRACT: We describe a new computer program, trilogy, for the automated discovery of sequence-structure patterns in proteins. trilogy implements a pattern discovery algorithm that begins with an exhaustive analysis of flexible three-residue patterns; a subset of these patterns are selected as seeds for an extension process in which longer patterns are identified. A key feature of the method is explicit treatment of both the sequence and structure components of these motifs: each trilogy pattern is a pair consisting of a sequence pattern and a structure pattern. Matches to both these component patterns are identified independently, allowing the program to assign a significance score to each sequence-structure pattern that assesses the degree of correlation between the corresponding sequence and structure motifs. trilogy identifies several thousand high-scoring patterns that occur across protein families. These include both previously identified and potentially novel motifs. We expect that these sequence-structure patterns will be useful in predicting protein structure from sequence, annotating newly determined protein structures, and identifying novel motifs of potential functional or structural significance. Further details on 7,768 significant patterns identified by trilogy can be found at http://theory.lcs.mit.edu/trilogy.

SUBMITTER: Bradley P 

PROVIDER: S-EPMC124288 | biostudies-literature | 2002 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

TRILOGY: Discovery of sequence-structure patterns across diverse proteins.

Bradley Philip P   Kim Peter S PS   Berger Bonnie B  

Proceedings of the National Academy of Sciences of the United States of America 20020601 13


We describe a new computer program, trilogy, for the automated discovery of sequence-structure patterns in proteins. trilogy implements a pattern discovery algorithm that begins with an exhaustive analysis of flexible three-residue patterns; a subset of these patterns are selected as seeds for an extension process in which longer patterns are identified. A key feature of the method is explicit treatment of both the sequence and structure components of these motifs: each trilogy pattern is a pair  ...[more]

Similar Datasets

| S-EPMC10104781 | biostudies-literature
| S-EPMC8769551 | biostudies-literature
| S-EPMC10789314 | biostudies-literature
| S-EPMC7980514 | biostudies-literature
| S-EPMC2253189 | biostudies-literature
| S-EPMC9802472 | biostudies-literature
| S-EPMC9524835 | biostudies-literature
2010-04-16 | E-GEOD-21242 | biostudies-arrayexpress
| S-EPMC4519301 | biostudies-literature
| S-EPMC3154205 | biostudies-literature