Dataset Information

Deep2Full: Evaluating strategies for selecting the minimal mutational experiments for optimal computational predictions of deep mutational scan outcomes.

ABSTRACT: Performing a complete deep mutational scan with all single point mutations may not be practical, and may not even be required, especially if predictive computational models can be developed. Computational models are however naive to cellular response in the myriads of assay-conditions. In a realistic paradigm of assay context-aware predictive hybrid models that combine minimal experimental data from deep mutational scans with structure, sequence information and computational models, we define and evaluate different strategies for choosing this minimal set. We evaluated the trivial strategy of a systematic reduction in the number of mutational studies from 85% to 15%, along with several others about the choice of the types of mutations such as random versus site-directed with the same 15% data completeness. Interestingly, the predictive capabilities by training on a random set of mutations and using a systematic substitution of all amino acids to alanine, asparagine and histidine (ANH) were comparable. Another strategy we explored, augmenting the training data with measurements of the same mutants at multiple assay conditions, did not improve the prediction quality. For the six proteins we analyzed, the bin-wise error in prediction is optimal when 50-100 mutations per bin are used in training the computational model, suggesting that good prediction quality may be achieved with a library of 500-1000 mutations.

SUBMITTER: Sruthi CK

PROVIDER: S-EPMC6954071 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Deep2Full: Evaluating strategies for selecting the minimal mutational experiments for optimal computational predictions of deep mutational scan outcomes.

Sruthi C K CK Prakash Meher M

PloS one 20200110 1

Performing a complete deep mutational scan with all single point mutations may not be practical, and may not even be required, especially if predictive computational models can be developed. Computational models are however naive to cellular response in the myriads of assay-conditions. In a realistic paradigm of assay context-aware predictive hybrid models that combine minimal experimental data from deep mutational scans with structure, sequence information and computational models, we define an ...[more]

PMID: 31923916

Similar Datasets

Project description:Rational drug design focuses on the explanation and prediction of complex formation between therapeutic targets and small-molecule ligands. As a third and often overlooked interacting partner, water molecules play a critical role in the thermodynamics of protein-ligand binding, impacting both the entropy and enthalpy components of the binding free energy and by extension, on-target affinity and bioactivity. The community has realized the importance of binding site waters, as evidenced by the number of computational tools to predict the structure and thermodynamics of their networks. However, quantitative experimental characterization of relevant protein-ligand-water systems, and consequently the validation of these modeling methods, remains challenging. Here, we investigated the impact of solvent exchange from light (H2O) to heavy water (D2O) to provide complete thermodynamic profiling of these ternary systems. Utilizing the solvent isotope effects, we gain a deeper understanding of the energetic contributions of various components. Specifically, we conducted isothermal titration calorimetry experiments on trypsin with a series of p-substituted benzamidines, as well as carbonic anhydrase II (CAII) with a series of aromatic sulfonamides. Significant differences in binding enthalpies found between light vs heavy water indicate a substantial role of the binding site water network in protein-ligand binding. Next, we challenged two conceptually distinct modeling methods, the grid-based WaterFLAP and the molecular dynamics-based MobyWat, by predicting and scoring relevant water networks. The predicted water positions accurately reproduce those in available high-resolution X-ray and neutron diffraction structures of the relevant protein-ligand complexes. Estimated energetic contributions of the identified water networks were corroborated by the experimental thermodynamics data. Besides providing a direct validation for the predictive power of these methods, our findings confirmed the importance of considering binding site water networks in computational ligand design.

Dataset Information

Deep2Full: Evaluating strategies for selecting the minimal mutational experiments for optimal computational predictions of deep mutational scan outcomes.

Publications

Deep2Full: Evaluating strategies for selecting the minimal mutational experiments for optimal computational predictions of deep mutational scan outcomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets