Project description:Free-energy calculations play an important role in the application of computational chemistry to a range of fields, including protein biochemistry, rational drug design, or materials science. Importantly, the free-energy difference is directly related to experimentally measurable quantities such as partition and adsorption coefficients, water activity, and binding affinities. Among several techniques aimed at predicting free-energy differences, perturbation approaches, involving the alchemical transformation of one molecule into another through intermediate states, stand out as rigorous methods based on statistical mechanics. However, despite the importance of free-energy calculations, the applicability of the perturbation approaches is still largely impeded by a number of challenges, including the definition of the perturbation path, i.e., alchemical changes leading to the transformation of one molecule to the other. To address this, an automatic perturbation topology builder based on a graph-matching algorithm is developed, which can identify the maximum common substructure (MCS) of two or multiple molecules and provide the perturbation topologies suitable for free-energy calculations using the GROMOS and the GROMACS simulation packages. Various MCS search options are presented leading to alternative definitions of the perturbation pathway. Moreover, perturbation topologies generated using the default multistate MCS search are used to calculate the changes in free energy between lysine and its two post-translational modifications, 3-methyllysine and acetyllysine. The pairwise free-energy calculations performed on this test system led to a cycle closure of 0.5 ± 0.3 and 0.2 ± 0.2 kJ mol-1, with GROMOS and GROMACS simulation packages, respectively. The same relative free energies between the three states are obtained by employing the enveloping distribution sampling (EDS) approach when compared to the pairwise perturbations. Importantly, this toolkit is made available online as an open-source Python package (https://github.com/drazen-petrov/SMArt).
Project description:We introduce an alternative method to perform free energy calculations for mixtures at multiple temperatures and pressures from a single simulation, by combining umbrella sampling and the continuous fractional component Monte Carlo method. One can perform a simulation of a mixture at a certain pressure and temperature and accurately compute the chemical potential at other pressures and temperatures close to the simulation conditions. This method has the following advantages: (1) Accurate estimates of the chemical potential as a function of pressure and temperature are obtained from a single state simulation without additional postprocessing. This can potentially reduce the number of simulations of a system for free energy calculations for a specific temperature and/or pressure range. (2) Partial molar volumes and enthalpies are obtained directly from the estimated chemical potentials. We tested our method for a Lennard-Jones system, aqueous mixtures of methanol at T = 298 K and P = 1 bar, and a mixture of ammonia, nitrogen, and hydrogen at T = 573 K and P = 800 bar. For pure methanol (N = 410 molecules), we observed that the estimated chemical potentials from umbrella sampling are in excellent agreement with the reference values obtained from independent simulations, for ?T = ±15 K and ?P = 100 bar (with respect to the simulated system). For larger systems, this range becomes smaller because the relative fluctuations of energy and volume become smaller. Without sufficient overlap, the performance of the method will become poor especially for nonlinear variations of the chemical potential.
Project description:All-atom free-energy methods offer a promising alternative to kinetic molecular mechanics simulations of protein folding and association. Here we report an accurate, transferable all-atom biophysical force field (PFF02) that stabilizes the native conformation of a wide range of proteins as the global optimum of the free-energy landscape. For 32 proteins of the ROSETTA decoy set and six proteins that we have previously folded with PFF01, we find near-native conformations with an average backbone RMSD of 2.14 A to the native conformation and an average Z-score of -3.46 to the corresponding decoy set. We used nonequilibrium sampling techniques starting from completely extended conformations to exhaustively sample the energy surface of three nonhomologous hairpin-peptides, a three-stranded beta-sheet, the all-helical 40 amino-acid HIV accessory protein, and a zinc-finger beta beta alpha motif, and find near-native conformations for the minimal energy for each protein. Using a massively parallel evolutionary algorithm, we also obtain a near-native low-energy conformation for the 54 amino-acid engrailed homeodomain. Our force field thus stabilized near-native conformations for a total of 20 proteins of all structure classes with an average RMSD of only 3.06 A to their respective experimental conformations.
Project description:Physicochemical characterization of multimeric biomacromolecule assembly and disassembly processes is a milestone to understand the mechanisms for biological phenomena at the molecular level. Mass spectroscopy (MS) and structural bioinformatics (SB) approaches have become feasible to identify subcomplexes involved in assembly and disassembly, while they cannot provide atomic information sufficient for free-energy calculation to characterize transition mechanism between two different sets of subcomplexes. To combine observations derived from MS and SB approaches with conventional free-energy calculation protocols, we here designed a new reaction pathway sampling method by employing hybrid configuration bias Monte Carlo/molecular dynamics (hcbMC/MD) scheme and applied it to simulate the disassembly process of serum amyloid P component (SAP) pentamer. The results we obtained are consistent with those of the earlier MS and SB studies with respect to SAP subcomplex species and the initial stage of SAP disassembly processes. Furthermore, we observed a novel dissociation event, ring-opening reaction of SAP pentamer. Employing free-energy calculation combined with the hcbMC/MD reaction pathway trajectories, we moreover obtained experimentally testable observations on (1) reaction time of the ring-opening reaction and (2) importance of Asp42 and Lys117 for stable formation of SAP oligomer.
Project description:Free energy simulations are an established computational tool in modelling chemical change in the condensed phase. However, sampling of kinetically distinct substates remains a challenge to these approaches. As a route to addressing this, we link the methods of thermodynamic integration (TI) and swarm-enhanced sampling molecular dynamics (sesMD), where simulation replicas interact cooperatively to aid transitions over energy barriers. We illustrate the approach by using alchemical alkane transformations in solution, comparing them with the multiple independent trajectory TI (IT-TI) method. Free energy changes for transitions computed by using IT-TI grew increasingly inaccurate as the intramolecular barrier was heightened. By contrast, swarm-enhanced sampling TI (sesTI) calculations showed clear improvements in sampling efficiency, leading to more accurate computed free energy differences, even in the case of the highest barrier height. The sesTI approach, therefore, has potential in addressing chemical change in systems where conformations exist in slow exchange.
Project description:A Gaussian accelerated molecular dynamics (GaMD) approach for simultaneous enhanced sampling and free energy calculation of biomolecules is presented. By constructing a boost potential that follows Gaussian distribution, accurate reweighting of the GaMD simulations is achieved using cumulant expansion to the second order. Here, GaMD is demonstrated on three biomolecular model systems: alanine dipeptide, chignolin folding, and ligand binding to the T4-lysozyme. Without the need to set predefined reaction coordinates, GaMD enables unconstrained enhanced sampling of these biomolecules. Furthermore, the free energy profiles obtained from reweighting of the GaMD simulations allow us to identify distinct low-energy states of the biomolecules and characterize the protein-folding and ligand-binding pathways quantitatively.
Project description:Free energy calculations are used to study how strongly potential drug molecules interact with their target receptors. The accuracy of these calculations depends on the accuracy of the molecular dynamics (MD) force field as well as proper sampling of the major conformations of each molecule. However, proper sampling of ligand conformations can be difficult when there are large barriers separating the major ligand conformations. An example of this is for ligands with an asymmetrically substituted phenyl ring, where the presence of protein loops hinders the proper sampling of the different ring conformations. These ring conformations become more difficult to sample when the size of the functional groups attached to the ring increases. The Adaptive Integration Method (AIM) has been developed, which adaptively changes the alchemical coupling parameter λ during the MD simulation so that conformations sampled at one λ can aid sampling at the other λ values. The Accelerated Adaptive Integration Method (AcclAIM) builds on AIM by lowering potential barriers for specific degrees of freedom at intermediate λ values. However, these methods may not work when there are very large barriers separating the major ligand conformations. In this work, we describe a modification to AIM that improves sampling of the different ring conformations, even when there is a very large barrier between them. This method combines AIM with conformational Monte Carlo sampling, giving improved convergence of ring populations and the resulting free energy. This method, called AIM/MC, is applied to study the relative binding free energy for a pair of ligands that bind to thrombin and a different pair of ligands that bind to aspartyl protease β-APP cleaving enzyme 1 (BACE1). These protein-ligand binding free energy calculations illustrate the improvements in conformational sampling and the convergence of the free energy compared to both AIM and AcclAIM.
Project description:Free-energy perturbation (FEP) methods are commonly used in drug design to calculate relative binding free energies of different ligands to a common host protein. Alchemical ligand transformations are usually performed in multiple steps which need to be chosen carefully to ensure sufficient phase-space overlap between neighboring states. With one-step or single-step FEP techniques, a single reference state is designed that samples phase-space not only representative of a full transformation but also ideally resembles multiple ligand end states and hence allows for efficient multistate perturbations. Enveloping distribution sampling (EDS) is one example for such a method in which the reference state is created by a mathematical combination of the different ligand end states based on solid statistical mechanics. We have recently proposed a novel approach to EDS which enables efficient barrier crossing between the different end states, termed accelerated EDS (A-EDS). In this work, we further simplify the parametrization of the A-EDS reference state and demonstrate the automated calculation of multiple free-energy differences between different ligands from a single simulation in three different well-described drug design model systems.
Project description:Potential of mean force (PMF) calculations are used to characterize the free energy landscape of protein-lipid and protein-protein association within membranes. Coarse-grained simulations allow binding free energies to be determined with reasonable statistical error. This accuracy relies on defining a good collective variable to describe the binding and unbinding transitions, and upon criteria for assessing the convergence of the simulation toward representative equilibrium sampling. As examples, we calculate protein-lipid binding PMFs for ANT/cardiolipin and Kir2.2/PIP2, using umbrella sampling on a distance coordinate. These highlight the importance of replica exchange between windows for convergence. The use of two independent sets of simulations, initiated from bound and unbound states, provide strong evidence for simulation convergence. For a model protein-protein interaction within a membrane, center-of-mass distance is shown to be a poor collective variable for describing transmembrane helix-helix dimerization. Instead, we employ an alternative intermolecular distance matrix RMS (DRMS) coordinate to obtain converged PMFs for the association of the glycophorin transmembrane domain. While the coarse-grained force field gives a reasonable Kd for dimerization, the majority of the bound population is revealed to be in a near-native conformation. Thus, the combination of a refined reaction coordinate with improved sampling reveals previously unnoticed complexities of the dimerization free energy landscape. We propose the use of replica-exchange umbrella sampling starting from different initial conditions as a robust approach for calculation of the binding energies in membrane simulations.
Project description:BackgroundA major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq.ResultsWe present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips .ConclusionsChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.