Unknown

Dataset Information

0

Sequence variations within protein families are linearly related to structural variations.


ABSTRACT: It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.

SUBMITTER: Koehl P 

PROVIDER: S-EPMC2692051 | biostudies-literature | 2002 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Sequence variations within protein families are linearly related to structural variations.

Koehl Patrice P   Levitt Michael M  

Journal of molecular biology 20021001 3


It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of s  ...[more]

Similar Datasets

| S-EPMC1560931 | biostudies-literature
| S-EPMC3069468 | biostudies-literature
2022-03-07 | PXD031606 | Pride
| S-EPMC6422139 | biostudies-literature
| S-EPMC3953544 | biostudies-literature
| S-EPMC6114881 | biostudies-literature
| S-EPMC403780 | biostudies-literature
| S-EPMC7570326 | biostudies-literature
| S-EPMC9252788 | biostudies-literature
| S-EPMC4209131 | biostudies-literature