Dataset Information

Sequence statistics of tertiary structural motifs reflect protein stability.

ABSTRACT: The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distributions and sequence preferences. Here we argue that as the PDB continues to grow, it may become unnecessary to reduce structure into a set of elementary descriptors. Instead, it could be possible to deduce quantitative sequence-structure relationships in the context of precisely-defined complex structural motifs by mining the PDB for closely matching backbone geometries. To validate this idea, we turned to the the task of predicting changes in protein stability upon amino-acid substitution-a difficult problem of broad significance. We defined non-contiguous tertiary motifs (TERMs) around a protein site of interest and extracted sequence preferences from ensembles of closely-matching substructures in the PDB to predict mutational stability changes at the site, ??Gm. We demonstrate that these ensemble statistics predict ??Gm on par with state-of-the-art statistical and machine-learning methods on large thermodynamic datasets, and outperform these, along with a leading structure-based modeling approach, when tested in the context of unbiased diverse mutations. Further, we show that the performance of the TERM-based method is directly related to the amount of available relevant structural data, automatically improving with the growing PDB. This enables a means of estimating prediction accuracy. Our results clearly demonstrate that: 1) statistics of non-contiguous structural motifs in the PDB encode fundamental sequence-structure relationships related to protein thermodynamic stability, and 2) the PDB is now large enough that such statistics are already useful in practice, with their accuracy expected to continue increasing as the database grows. These observations suggest new ways of using structural data towards addressing problems of computational structural biology.

SUBMITTER: Zheng F

PROVIDER: S-EPMC5446159 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sequence statistics of tertiary structural motifs reflect protein stability.

Zheng Fan F Grigoryan Gevorg G

PloS one 20170526 5

The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distributions and sequence preferences. Here we argue that as the PDB continues to grow, it may become unnecessary to reduce structure into a set of elementary descriptors. Instead, it could be possible to d ...[more]

PMID: 28552940

Dataset Information

Sequence statistics of tertiary structural motifs reflect protein stability.

Publications

Sequence statistics of tertiary structural motifs reflect protein stability.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Protein language models learn evolutionary statistics of interacting sequence motifs.
| S-EPMC11551344 | biostudies-literature

Discovery of RNA secondary structural motifs using sequence-ordered thermodynamic stability and comparative sequence analysis.
| S-EPMC10336498 | biostudies-literature

Structural and sequence motifs of protein (histone) methylation enzymes.
| S-EPMC2733851 | biostudies-literature

Mining tertiary structural motifs for assessment of designability.
| S-EPMC4222026 | biostudies-literature

Sequence, structure, and cooperativity in folding of elementary protein structural motifs.
| S-EPMC4538684 | biostudies-literature

Protein-Protein Interactions Mediated by Helical Tertiary Structure Motifs.
| S-EPMC4577960 | biostudies-literature

PSSweb: protein structural statistics web server.
| S-EPMC4987900 | biostudies-literature

Tertiary Structural Motif Sequence Statistics Enable Facile Prediction and Design of Peptides that Bind Anti-apoptotic Bfl-1 and Mcl-1.
| S-EPMC6447450 | biostudies-literature

Neural assemblies uncovered by generative modeling explain whole-brain activity statistics and reflect structural connectivity.
| S-EPMC9940913 | biostudies-literature

Recurrent structural RNA motifs, Isostericity Matrices and sequence alignments.
| S-EPMC1087784 | biostudies-literature