Unknown

Dataset Information

0

Enhancing Carbon Acid pKa Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values.


ABSTRACT: The prediction of the aqueous pKa of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pKa prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, and learn coefficients for those atom-types that show the impact each atom-type has on the pKa of the ionisable centre. In the current work, we augment our dataset with pKa values from a series of high performing local models derived from the Ab Initio Bond Lengths method (AIBL). We find that, in distilling the knowledge available from multiple models into one general model, the prediction error for an external test set is reduced compared to that using literature experimental data alone.

SUBMITTER: Plante J 

PROVIDER: S-EPMC7922142 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Enhancing Carbon Acid pK<sub>a</sub> Prediction by Augmentation of Sparse Experimental Datasets with Accurate AIBL (QM) Derived Values.

Plante Jeffrey J   Caine Beth A BA   Popelier Paul L A PLA  

Molecules (Basel, Switzerland) 20210217 4


The prediction of the aqueous pK<sub>a</sub> of carbon acids by Quantitative Structure Property Relationship or cheminformatics-based methods is a rather arduous problem. Primarily, there are insufficient high-quality experimental data points measured in homogeneous conditions to allow for a good global model to be generated. In our computationally efficient pK<sub>a</sub> prediction method, we generate an atom-type feature vector, called a distance spectrum, from the assigned ionisation atom, a  ...[more]

Similar Datasets

| S-EPMC2203286 | biostudies-literature
| S-EPMC1538816 | biostudies-literature
| S-EPMC3048411 | biostudies-literature
| S-EPMC3261180 | biostudies-literature
| S-EPMC5380004 | biostudies-literature
| S-EPMC8574648 | biostudies-literature
| S-EPMC3414879 | biostudies-other
| S-EPMC4404502 | biostudies-literature
| S-EPMC4090387 | biostudies-literature
| S-EPMC4991863 | biostudies-other