Unknown

Dataset Information

0

Sampling of structure and sequence space of small protein folds.


ABSTRACT: Nature only samples a small fraction of the sequence space that can fold into stable proteins. Furthermore, small structural variations in a single fold, sometimes only a few amino acids, can define a protein's molecular function. Hence, to design proteins with novel functionalities, such as molecular recognition, methods to control and sample shape diversity are necessary. To explore this space, we developed and experimentally validated a computational platform that can design a wide variety of small protein folds while sampling shape diversity. We designed and evaluated stability of about 30,000 de novo protein designs of eight different folds. Among these designs, about 6,200 stable proteins were identified, including some predicted to have a first-of-its-kind minimalized thioredoxin fold. Obtained data revealed protein folding rules for structural features such as helix-connecting loops. Beyond serving as a resource for protein engineering, this massive and diverse dataset also provides training data for machine learning. We developed an accurate classifier to predict the stability of our designed proteins. The methods and the wide range of protein shapes provide a basis for designing new protein functions without compromising stability.

SUBMITTER: Linsky TW 

PROVIDER: S-EPMC9684540 | biostudies-literature | 2022 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Sampling of structure and sequence space of small protein folds.

Linsky Thomas W TW   Noble Kyle K   Tobin Autumn R AR   Crow Rachel R   Carter Lauren L   Urbauer Jeffrey L JL   Baker David D   Strauch Eva-Maria EM  

Nature communications 20221122 1


Nature only samples a small fraction of the sequence space that can fold into stable proteins. Furthermore, small structural variations in a single fold, sometimes only a few amino acids, can define a protein's molecular function. Hence, to design proteins with novel functionalities, such as molecular recognition, methods to control and sample shape diversity are necessary. To explore this space, we developed and experimentally validated a computational platform that can design a wide variety of  ...[more]

Similar Datasets

| S-EPMC8286341 | biostudies-literature
| S-EPMC2373757 | biostudies-literature
| S-EPMC148146 | biostudies-other
| S-EPMC4908355 | biostudies-literature
| S-EPMC2323961 | biostudies-literature
| S-EPMC2732808 | biostudies-literature
| S-EPMC5738032 | biostudies-literature
| S-EPMC9904845 | biostudies-literature
| S-EPMC11008089 | biostudies-literature
| S-EPMC7553338 | biostudies-literature