Dataset Information

Prediction of pKa values using the PM6 semiempirical method.

ABSTRACT: The PM6 semiempirical method and the dispersion and hydrogen bond-corrected PM6-D3H+ method are used together with the SMD and COSMO continuum solvation models to predict pKa values of pyridines, alcohols, phenols, benzoic acids, carboxylic acids, and phenols using isodesmic reactions and compared to published ab initio results. The pKa values of pyridines, alcohols, phenols, and benzoic acids considered in this study can generally be predicted with PM6 and ab initio methods to within the same overall accuracy, with average mean absolute differences (MADs) of 0.6-0.7 pH units. For carboxylic acids, the accuracy (0.7-1.0 pH units) is also comparable to ab initio results if a single outlier is removed. For primary, secondary, and tertiary amines the accuracy is, respectively, similar (0.5-0.6), slightly worse (0.5-1.0), and worse (1.0-2.5), provided that di- and tri-ethylamine are used as reference molecules for secondary and tertiary amines. When applied to a drug-like molecule where an empirical pKa predictor exhibits a large (4.9 pH unit) error, we find that the errors for PM6-based predictions are roughly the same in magnitude but opposite in sign. As a result, most of the PM6-based methods predict the correct protonation state at physiological pH, while the empirical predictor does not. The computational cost is around 2-5 min per conformer per core processor, making PM6-based pKa prediction computationally efficient enough to be used for high-throughput screening using on the order of 100 core processors.

SUBMITTER: Kromann JC

PROVIDER: S-EPMC4991863 | biostudies-other | 2016

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Prediction of pKa values using the PM6 semiempirical method.

Kromann Jimmy C JC Larsen Frej F Moustafa Hadeel H Jensen Jan H JH

PeerJ 20160811

The PM6 semiempirical method and the dispersion and hydrogen bond-corrected PM6-D3H+ method are used together with the SMD and COSMO continuum solvation models to predict pKa values of pyridines, alcohols, phenols, benzoic acids, carboxylic acids, and phenols using isodesmic reactions and compared to published ab initio results. The pKa values of pyridines, alcohols, phenols, and benzoic acids considered in this study can generally be predicted with PM6 and ab initio methods to within the same o ...[more]

PMID: 27602298

Similar Datasets

Project description:BackgroundThe logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction.MethodsThe experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure-activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN).ResultsThe three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products.ConclusionsThis work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.

Dataset Information

Prediction of pKa values using the PM6 semiempirical method.

Publications

Prediction of pKa values using the PM6 semiempirical method.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets