Dataset Information

Sequence-similar, structure-dissimilar protein pairs in the PDB.

ABSTRACT: It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).

SUBMITTER: Kosloff M

PROVIDER: S-EPMC2673347 | biostudies-literature | 2008 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sequence-similar, structure-dissimilar protein pairs in the PDB.

Kosloff Mickey M Kolodny Rachel R

Proteins 20080501 2

It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-simil ...[more]

PMID: 18004789

Dataset Information

Sequence-similar, structure-dissimilar protein pairs in the PDB.

Publications

Sequence-similar, structure-dissimilar protein pairs in the PDB.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Persistently conserved positions in structurally similar, sequence dissimilar proteins: roles in preserving protein fold and function.
| S-EPMC2373454 | biostudies-literature

Use of residue pairs in protein sequence-sequence and sequence-structure alignments.
| S-EPMC2144723 | biostudies-other

Sequence-structure mapping errors in the PDB: OB-fold domains.
| S-EPMC2279972 | biostudies-literature

Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.
| S-EPMC5823500 | biostudies-literature

Pre-calculated protein structure alignments at the RCSB PDB website.
| S-EPMC3003546 | biostudies-literature

Similar sequences but dissimilar biological functions of GDF11 and myostatin.
| S-EPMC8080601 | biostudies-literature

Genetic robustness of let-7 miRNA sequence-structure pairs.
| S-EPMC6859847 | biostudies-literature

NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation.
| S-EPMC4574126 | biostudies-literature

A modified amino acid network model contains similar and dissimilar weight.
| S-EPMC3549380 | biostudies-literature

The protein structure prediction problem could be solved using the current PDB library.
| S-EPMC545829 | biostudies-literature