Energy-entropy prediction of octanol-water logP of SAMPL7 N-acyl sulfonamide bioisosters.
Ontology highlight
ABSTRACT: Partition coefficients quantify a molecule's distribution between two immiscible liquid phases. While there are many methods to compute them, there is not yet a method based on the free energy of each system in terms of energy and entropy, where entropy depends on the probability distribution of all quantum states of the system. Here we test a method in this class called Energy Entropy Multiscale Cell Correlation (EE-MCC) for the calculation of octanol-water logP values for 22 N-acyl sulfonamides in the SAMPL7 Physical Properties Challenge (Statistical Assessment of the Modelling of Proteins and Ligands). EE-MCC logP values have a mean error of 1.8 logP units versus experiment and a standard error of the mean of 1.0 logP units for three separate calculations. These errors are primarily due to getting sufficiently converged energies to give accurate differences of large numbers, particularly for the large-molecule solvent octanol. However, this is also an issue for entropy, and approximations in the force field and MCC theory also contribute to the error. Unique to MCC is that it explains the entropy contributions over all the degrees of freedom of all molecules in the system. A gain in orientational entropy of water is the main favourable entropic contribution, supported by small gains in solute vibrational and orientational entropy but offset by unfavourable changes in the orientational entropy of octanol, the vibrational entropy of both solvents, and the positional and conformational entropy of the solute.
Project description:We predicted water-octanol partition coefficients for the molecules in the SAMPL7 challenge with explicit solvent classical molecular dynamics (MD) simulations. Water hydration free energies and octanol solvation free energies were calculated with a windowed alchemical free energy approach. Three commonly used force fields (AMBER GAFF, CHARMM CGenFF, OPLS-AA) were tested. Special emphasis was placed on converging all simulations, using a criterion developed for the SAMPL6 challenge. In aggregate, over 1000 [Formula: see text]s of simulations were performed, with some free energy windows remaining not fully converged even after 1 [Formula: see text]s of simulation time. Nevertheless, the amount of sampling produced [Formula: see text] estimates with a precision of 0.1 log units or better for converged simulations. Despite being probably as fully sampled as can expected and is feasible, the agreement with experiment remained modest for all force fields, with no force field performing better than 1.6 in root mean squared error. Overall, our results indicate that a large amount of sampling is necessary to produce precise [Formula: see text] predictions for the SAMPL7 compounds and that high precision does not necessarily lead to high accuracy. Thus, fundamental problems remain to be solved for physics-based [Formula: see text] predictions.
Project description:Theoretical approaches for predicting physicochemical properties are valuable tools for accelerating the drug discovery process. In this work, quantum chemical methods are used to predict water-octanol partition coefficients as a part of the SAMPL6 blind challenge. The SMD continuum solvent model was employed with MP2 and eight DFT functionals in conjunction with correlation consistent basis sets to determine the water-octanol transfer free energy. Several tactics towards improving the predictions of the partition coefficient were examined, including increasing the quality of basis sets, considering tautomerization, and accounting for inhomogeneities in the water and n-octanol phases. Evaluation of these various schemes highlights the impact of modeling approaches across different methods. With the inclusion of tautomers and adjustments to the permittivity constants, the best predictions were obtained with smaller basis sets and the O3LYP functional, which yielded an RMSE of 0.79 logP units. The results presented correspond to the SAMPL6 logP submission IDs: DYXBT, O7DJK, and AHMTF.
Project description:A multiple linear regression model called MLR-3 is used for predicting the experimental n-octanol/water partition coefficient (log PN) of 22 N-sulfonamides proposed by the organizers of the SAMPL7 blind challenge. The MLR-3 method was trained with 82 molecules including drug-like sulfonamides and small organic molecules, which resembled the main functional groups present in the challenge dataset. Our model, submitted as "TFE-MLR", presented a root-mean-square error of 0.58 and mean absolute error of 0.41 in log P units, accomplishing the highest accuracy, among empirical methods and also in all submissions based on the ranked ones. Overall, the results support the appropriateness of multiple linear regression approach MLR-3 for computing the n-octanol/water partition coefficient in sulfonamide-bearing compounds. In this context, the outstanding performance of empirical methodologies, where 75% of the ranked submissions achieved root-mean-square errors < 1 log P units, support the suitability of these strategies for obtaining accurate and fast predictions of physicochemical properties as partition coefficients of bioorganic compounds.
Project description:Within the scope of SAMPL7 challenge for predicting physical properties, the Integral Equation Formalism of the Miertus-Scrocco-Tomasi (IEFPCM/MST) continuum solvation model has been used for the blind prediction of n-octanol/water partition coefficients and acidity constants of a set of 22 and 20 sulfonamide-containing compounds, respectively. The log P and pKa were computed using the B3LPYP/6-31G(d) parametrized version of the IEFPCM/MST model. The performance of our method for partition coefficients yielded a root-mean square error of 1.03 (log P units), placing this method among the most accurate theoretical approaches in the comparison with both globally (rank 8th) and physical (rank 2nd) methods. On the other hand, the deviation between predicted and experimental pKa values was 1.32 log units, obtaining the second best-ranked submission. Though this highlights the reliability of the IEFPCM/MST model for predicting the partitioning and the acid dissociation constant of drug-like compounds compound, the results are discussed to identify potential weaknesses and improve the performance of the method.
Project description:Partition coefficients describe the equilibrium partitioning of a single, defined charge state of a solute between two liquid phases in contact, typically a neutral solute. Octanol-water partition coefficients ([Formula: see text]), or their logarithms (log P), are frequently used as a measure of lipophilicity in drug discovery. The partition coefficient is a physicochemical property that captures the thermodynamics of relative solvation between aqueous and nonpolar phases, and therefore provides an excellent test for physics-based computational models that predict properties of pharmaceutical relevance such as protein-ligand binding affinities or hydration/solvation free energies. The SAMPL6 Part II octanol-water partition coefficient prediction challenge used a subset of kinase inhibitor fragment-like compounds from the SAMPL6 [Formula: see text] prediction challenge in a blind experimental benchmark. Following experimental data collection, the partition coefficient dataset was kept blinded until all predictions were collected from participating computational chemistry groups. A total of 91 submissions were received from 27 participating research groups. This paper presents the octanol-water log P dataset for this SAMPL6 Part II partition coefficient challenge, which consisted of 11 compounds (six 4-aminoquinazolines, two benzimidazole, one pyrazolo[3,4-d]pyrimidine, one pyridine, one 2-oxoquinoline substructure containing compounds) with log P values in the range of 1.95-4.09. We describe the potentiometric log P measurement protocol used to collect this dataset using a Sirius T3, discuss the limitations of this experimental approach, and share suggestions for future log P data collection efforts for the evaluation of computational methods.
Project description:The octanol-water partition coefficient (logPow) is an important index for measuring solubility, membrane permeability, and bioavailability in the drug discovery field. In this paper, the logPow values of 58 compounds were predicted by alchemical free energy calculation using molecular dynamics simulation. In free energy calculations, the atomic charges of the compounds are always fixed. However, they must be recalculated for each solvent. Therefore, three different sets of atomic charges were tested using quantum chemical calculations, taking into account vacuum, octanol, and water environments. The calculated atomic charges in the different environments do not necessarily influence the correlation between calculated and experimentally measured ?Gwater values. The largest correlation coefficient values of the solvation free energy in water and octanol were 0.93 and 0.90, respectively. On the other hand, the correlation coefficient of logPow values calculated from free energies, the largest of which was 0.92, was sensitive to the combination of the solvation free energies calculated from the calculated atomic charges. These results reveal that the solvent assumed in the atomic charge calculation is an important factor determining the accuracy of predicted logPow values.
Project description:Inspired by the successful application of the embedded cluster reference interaction site model (EC-RISM), a combination of quantum-mechanical calculations with three-dimensional RISM theory to predict Gibbs energies of species in solution within the SAMPL6.1 (acidity constants, pKa) and SAMPL6.2 (octanol-water partition coefficients, log P) the methodology was applied to the recent SAMPL7 physical property challenge on aqueous pKa and octanol-water log P values. Not part of the challenge but provided by the organizers, we also computed distribution coefficients log D7.4 from predicted pKa and log P data. While macroscopic pKa predictions compared very favorably with experimental data (root mean square error, RMSE 0.72 pK units), the performance of the log P model (RMSE 1.84) fell behind expectations from the SAMPL6.2 challenge, leading to reasonable log D7.4 predictions (RMSE 1.69) from combining the independent calculations. In the post-submission phase, conformations generated by different methodology yielded results that did not significantly improve the original predictions. While overall satisfactory compared to previous log D challenges, the predicted data suggest that further effort is needed for optimizing the robustness of the partition coefficient model within EC-RISM calculations and for shaping the agreement between experimental conditions and the corresponding model description.
Project description:Estimation of the energy from a given Boltzmann sample is straightforward since one just has to average the contribution of the individual configurations. On the other hand, calculation of the absolute entropy, S (hence the absolute free energy F) is difficult because it depends on the entire (unknown) ensemble. We have developed a new method called "the hypothetical scanning molecular dynamics" (HSMD) for calculating the absolute S from a given sample (generated by any simulation technique). In other words, S (like the energy) is "written" on the sample configurations, where HSMD provides a prescription of how to "read" it. In practice, each sample conformation, i, is reconstructed with transition probabilities, and their product leads to the probability of i, hence to the entropy. HSMD is an exact method where all interactions are considered, and the only approximation is due to insufficient sampling. In previous studies HSMD (and HS Monte CarloHSMC) has been extended systematically to systems of increasing complexity, where the most recent is the seven-residue mobile loop, 304-310 (Gly-His-Gly-Ala-Gly-Gly-Ser) of the enzyme porcine pancreatic alpha-amylase modeled by the AMBER force field and AMBER with the implicit solvation GB/SA (paper I, Cheluvaraja, S.; Meirovitch, H. J. Chem. Theory Comput. 2008, 4, 192). In the present paper we make a step further and extend HSMD to the same loop capped with TIP3P explicit water at 300 K. As in paper I, we are mainly interested in entropy and free energy differences between the free and bound microstates of the loop, which are obtained from two separate MD samples of these microstates. The contribution of the loop to S and F is calculated by HSMD and that of water by a particular thermodynamic integration procedure. As expected, the free microstate is more stable than the bound microstate by a total free energy difference, Ffree-Fbound=-4.8+/-1, as compared to -25.5 kcal/mol obtained with GB/SA. We find that relatively large systematic errors in the loop entropies, Sfree(loop) and Sbound(loop) are cancelled in their difference which is thus obtained efficiently and with high accuracy, i.e., with a statistical error of 0.1 kcal/mol. This cancellation, which has been observed in previous HSMD studies, is in accord with theoretical arguments given in paper I.
Project description:Partition coefficients describe how a solute is distributed between two immiscible solvents. They are used in drug design as a measure of a solute's hydrophobicity and a proxy for its membrane permeability. We calculate partition coefficients from transfer free energies using molecular dynamics simulations in explicit solvent. Setup is done by our new Solvation Toolkit which automates the process of creating input files for any combination of solutes and solvents for many popular molecular dynamics software packages. We calculate partition coefficients between octanol/water and cyclohexane/water with the Generalized AMBER Force Field (GAFF) and the Dielectric Corrected GAFF (GAFF-DC). With similar methods in the past we found a root-mean-squared error (RMSE) of 6.3 kJ/mol in hydration free energies which would correspond to an error of around 1.6 log units in partition coefficients if solvation free energies in both solvents were estimated with comparable accuracy. Here we find an overall RMSE of about 1.2 log units with both force fields. Results from GAFF and GAFF-DC seem to exhibit systematic biases in opposite directions for calculated cyclohexane/water partition coefficients.