Unknown

Dataset Information

0

Improved understanding of aqueous solubility modeling through topological data analysis.


ABSTRACT: Topological data analysis is a family of recent mathematical techniques seeking to understand the 'shape' of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.

SUBMITTER: Pirashvili M 

PROVIDER: S-EPMC6755597 | biostudies-literature | 2018 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Improved understanding of aqueous solubility modeling through topological data analysis.

Pirashvili Mariam M   Steinberg Lee L   Belchi Guillamon Francisco F   Niranjan Mahesan M   Frey Jeremy G JG   Brodzki Jacek J  

Journal of cheminformatics 20181120 1


Topological data analysis is a family of recent mathematical techniques seeking to understand the 'shape' of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are promin  ...[more]

Similar Datasets

| S-EPMC9611068 | biostudies-literature
| S-EPMC4027754 | biostudies-literature
| S-EPMC5554032 | biostudies-other
| S-EPMC8587167 | biostudies-literature
| S-EPMC5810985 | biostudies-literature
| S-EPMC4988722 | biostudies-literature
| S-EPMC7026387 | biostudies-literature
| S-EPMC9963121 | biostudies-literature
| S-EPMC3690275 | biostudies-literature
| S-EPMC7867001 | biostudies-literature