Unknown

Dataset Information

0

Deciphering protein evolution and fitness landscapes with latent space models.


ABSTRACT: Protein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and Gaussian process regression, the latent space representation also enables learning the protein fitness landscape in a continuous low dimensional space. Moreover, the model is also useful in predicting protein mutational stability landscapes and quantifying the importance of stability in shaping protein evolution. Overall, we illustrate that the latent space models learned using variational auto-encoders provide a mechanism for exploration of the rich data contained in protein sequences regarding evolution, fitness and stability and hence are well-suited to help guide protein engineering efforts.

SUBMITTER: Ding X 

PROVIDER: S-EPMC6904478 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Deciphering protein evolution and fitness landscapes with latent space models.

Ding Xinqiang X   Zou Zhengting Z   Brooks Iii Charles L CL  

Nature communications 20191210 1


Protein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and  ...[more]

Similar Datasets

| S-EPMC2997618 | biostudies-literature
| S-EPMC10113739 | biostudies-literature
| S-EPMC5869684 | biostudies-literature
| S-EPMC5927604 | biostudies-literature
| S-EPMC2734178 | biostudies-literature
| S-EPMC5714467 | biostudies-literature
| S-EPMC4989103 | biostudies-literature
| S-EPMC5548793 | biostudies-literature
| S-EPMC4712201 | biostudies-literature
| S-EPMC6031050 | biostudies-literature