Dataset Information

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

ABSTRACT: BACKGROUND:The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. RESULTS:In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. CONCLUSIONS:Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.

SUBMITTER: Kim H

PROVIDER: S-EPMC5364127 | biostudies-literature | 2017 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

Kim Hyoungrae H Jang Cheongyun C Yadav Dharmendra K DK Kim Mi-Hyun MH

Journal of cheminformatics 20170323 1

<h4>Background</h4>The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers.<h4>Results</h4>In this work, we have attempted to make clustering algorithms, which could find reasonable number of re ...[more]

PMID: 29086188

Similar Datasets

Project description:BackgroundA wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae.MethodsFor each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method.ResultsClusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods.ConclusionsValidation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.

Project description:Identifying bioactive conformations of small molecules is an essential process for virtual screening applications relying on three-dimensional structure such as molecular docking. For most small molecules, conformer generators retrieve at least one bioactive-like conformation, with an atomic root-mean-square deviation (ARMSD) lower than 1 Å, among the set of low-energy conformers generated. However, there is currently no general method to prioritise these likely target-bound conformations in the ensemble. In this work, we trained atomistic neural networks (AtNNs) on 3D information of generated conformers of a curated subset of PDBbind ligands to predict the ARMSD to their closest bioactive conformation, and evaluated the early enrichment of bioactive-like conformations when ranking conformers by AtNN prediction. AtNN ranking was compared with bioactivity-unaware baselines such as ascending Sage force field energy ranking, and a slower bioactivity-based baseline ranking by ascending Torsion Fingerprint Deviation to the Maximum Common Substructure to the most similar molecule in the training set (TFD2SimRefMCS). On test sets from random ligand splits of PDBbind, ranking conformers using ComENet, the AtNN encoding the most 3D information, leads to early enrichment of bioactive-like conformations with a median BEDROC of 0.29 ± 0.02, outperforming the best bioactivity-unaware Sage energy ranking baseline (median BEDROC of 0.18 ± 0.02), and performing on a par with the bioactivity-based TFD2SimRefMCS baseline (median BEDROC of 0.31 ± 0.02). The improved performance of the AtNN and TFD2SimRefMCS baseline is mostly observed on test set ligands that bind proteins similar to proteins observed in the training set. On a more challenging subset of flexible molecules, the bioactivity-unaware baselines showed median BEDROCs up to 0.02, while AtNNs and TFD2SimRefMCS showed median BEDROCs between 0.09 and 0.13. When performing rigid ligand re-docking of PDBbind ligands with GOLD using the 1% top-ranked conformers, ComENet ranked conformers showed a higher successful docking rate than bioactivity-unaware baselines, with a rate of 0.48 ± 0.02 compared to CSD probability baseline with a rate of 0.39 ± 0.02. Similarly, on a pharmacophore searching experiment, selecting the 20% top-ranked conformers ranked by ComENet showed higher hit rate compared to baselines. Hence, the approach presented here uses AtNNs successfully to focus conformer ensembles towards bioactive-like conformations, representing an opportunity to reduce computational expense in virtual screening applications on known targets that require input conformations.

Project description:Capillary vibrating sharp-edge spray ionization (cVSSI) has been used to control the droplet charging of nebulized microdroplets and monitor effects on protein ion conformation makeup as determined by mass spectrometry (MS). Here it is observed that the application of voltage results in noticeable differences to the charge state distributions (CSDs) of ubiquitin ions. The data can be described most generally in three distinct voltage regions: Under low-voltage conditions (<+200 V, LV regime), low charge states (2+ to 4+ ions) dominate the mass spectra. For midvoltage conditions (+200 to +600 V, MV regime), higher charge states (7+ to 12+ ions) are observed. For high-voltage conditions (>+600 V, HV regime), the "nano-electrospray ionization (nESI)-type distribution" is achieved in which the 6+ and 5+ species are observed as the dominant ions. Analysis of these results suggests that different pathways to progeny nanodroplet production result in the observed ions. For the LV regime, aerodynamic breakup leads to low charge progeny droplets that are selective for the native solution conformation ensemble of ubiquitin (minus multimeric species). In the MV regime, the large droplets persist for longer periods of time, leading to droplet heating and a shift in the conformation ensemble to partially unfolded species. In the HV regime, droplets access progeny nanodroplets faster, leading to native conformation ensemble sampling as indicated by the observed nESI-type CSD. The notable observation of limited multimer formation and adduct ion formation in the LV regime is hypothesized to result from droplet aero breakup resulting in protein and charge carrier partitioning in sampled progeny droplets. The tunable droplet charging afforded by cVSSI presents opportunities to study the effects of the droplet charge, droplet size, and mass spectrometer inlet temperature on the conformer ensemble sampled by the mass spectrometer. Additionally, the approach may provide a tool for rapid comparison of protein stabilities.

Dataset Information

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

Publications

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets