Unknown

Dataset Information

0

Machine learning with physicochemical relationships: solubility prediction in organic solvents and water.


ABSTRACT: Solubility prediction remains a critical challenge in drug development, synthetic route and chemical process design, extraction and crystallisation. Here we report a successful approach to solubility prediction in organic solvents and water using a combination of machine learning (ANN, SVM, RF, ExtraTrees, Bagging and GP) and computational chemistry. Rational interpretation of dissolution process into a numerical problem led to a small set of selected descriptors and subsequent predictions which are independent of the applied machine learning method. These models gave significantly more accurate predictions compared to benchmarked open-access and commercial tools, achieving accuracy close to the expected level of noise in training data (LogS?±?0.7). Finally, they reproduced physicochemical relationship between solubility and molecular properties in different solvents, which led to rational approaches to improve the accuracy of each models.

SUBMITTER: Boobier S 

PROVIDER: S-EPMC7666209 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Machine learning with physicochemical relationships: solubility prediction in organic solvents and water.

Boobier Samuel S   Hose David R J DRJ   Blacker A John AJ   Nguyen Bao N BN  

Nature communications 20201113 1


Solubility prediction remains a critical challenge in drug development, synthetic route and chemical process design, extraction and crystallisation. Here we report a successful approach to solubility prediction in organic solvents and water using a combination of machine learning (ANN, SVM, RF, ExtraTrees, Bagging and GP) and computational chemistry. Rational interpretation of dissolution process into a numerical problem led to a small set of selected descriptors and subsequent predictions which  ...[more]

Similar Datasets

| S-EPMC2702786 | biostudies-literature
| S-EPMC10583449 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC10574143 | biostudies-literature
| S-EPMC5445544 | biostudies-literature
| S-EPMC10797585 | biostudies-literature
| S-EPMC7017869 | biostudies-literature
| S-EPMC8304713 | biostudies-literature
| S-EPMC2629997 | biostudies-literature
2013-01-01 | GSE29210 | GEO