Unknown

Dataset Information

0

Using Machine Learning to Parse Chemical Mixture Descriptions.


ABSTRACT: Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that are readable only by trained scientists, and there are no accessible repositories of marked-up mixture data. We have designed a machine learning tool that can interpret mixture descriptions and upgrade them to the high-level Mixfile format, which can in turn be used to generate Mixtures InChI notation. The interpretation achieves a high success rate and can be used at scale to markup large catalogs and inventories, with some expert checking to catch edge cases. The training data that was accumulated during the project is made openly available, along with previously released mixture editing tools and utilities.

SUBMITTER: Clark AM 

PROVIDER: S-EPMC8412965 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC8479812 | biostudies-literature
2023-06-01 | GSE193400 | GEO
| S-EPMC2832677 | biostudies-other
| S-EPMC6240814 | biostudies-other
| S-EPMC7329181 | biostudies-literature
| S-EPMC6567068 | biostudies-literature
2021-06-02 | GSE175942 | GEO
| S-EPMC6846180 | biostudies-literature
| S-EPMC9966999 | biostudies-literature