Unknown

Dataset Information

0

Protein Ensemble Generation Through Variational Autoencoder Latent Space Sampling.


ABSTRACT: Mapping the ensemble of protein conformations that contribute to function and can be targeted by small molecule drugs remains an outstanding challenge. Here, we explore the use of variational autoencoders for reducing the challenge of dimensionality in the protein structure ensemble generation problem. We convert high-dimensional protein structural data into a continuous, low-dimensional representation, carry out a search in this space guided by a structure quality metric, and then use RoseTTAFold guided by the sampled structural information to generate 3D structures. We use this approach to generate ensembles for the cancer relevant protein K-Ras, train the VAE on a subset of the available K-Ras crystal structures and MD simulation snapshots, and assess the extent of sampling close to crystal structures withheld from training. We find that our latent space sampling procedure rapidly generates ensembles with high structural quality and is able to sample within 1 Å of held-out crystal structures, with a consistency higher than that of MD simulation or AlphaFold2 prediction. The sampled structures sufficiently recapitulate the cryptic pockets in the held-out K-Ras structures to allow for small molecule docking.

SUBMITTER: Mansoor S 

PROVIDER: S-EPMC11008089 | biostudies-literature | 2024 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Protein Ensemble Generation Through Variational Autoencoder Latent Space Sampling.

Mansoor Sanaa S   Baek Minkyung M   Park Hahnbeom H   Lee Gyu Rie GR   Baker David D  

Journal of chemical theory and computation 20240328 7


Mapping the ensemble of protein conformations that contribute to function and can be targeted by small molecule drugs remains an outstanding challenge. Here, we explore the use of variational autoencoders for reducing the challenge of dimensionality in the protein structure ensemble generation problem. We convert high-dimensional protein structural data into a continuous, low-dimensional representation, carry out a search in this space guided by a structure quality metric, and then use RoseTTAFo  ...[more]

Similar Datasets

| S-EPMC8633506 | biostudies-literature
| S-EPMC10782437 | biostudies-literature
| S-EPMC8906577 | biostudies-literature
| S-EPMC10654724 | biostudies-literature
| S-EPMC7450509 | biostudies-literature
| S-EPMC8842480 | biostudies-literature
| S-EPMC6316879 | biostudies-literature
| S-EPMC11862945 | biostudies-literature
| S-EPMC9904845 | biostudies-literature
| S-EPMC11661288 | biostudies-literature