Direct Prediction of Physicochemical Properties and Toxicities of Chemicals from Analytical Descriptors by GC-MS.
Ontology highlight
ABSTRACT: With advances in machine learning (ML) techniques, the quantitative structure-activity relationship (QSAR) approach is becoming popular for evaluating chemicals. However, the QSAR approach requires that the chemical structure of the target compound is known and that it should be convertible to molecular descriptors. These requirements lead to limitations in predicting the properties and toxicities of chemicals distributed in the environment as in the PubChem database; the structural information on only 14% of compounds is available. This study proposes a new ML-based QSAR approach that can predict the properties and toxicities of compounds using analytical descriptors of mass spectrum and retention index obtained via gas chromatography-mass spectrometry without requiring exact structural information. The model was developed based on the XGBoost ML method. The root-mean-square errors (RMSEs) for log Ko-w, log (molecular weight), melting point, boiling point, log (vapor pressure), log (water solubility), log (LD50) (rat, oral), and log (LD50) (mouse, oral) are 0.97, 0.052, 51, 23, 0.74, 1.1, 0.74, and 0.6, respectively. The model performed well on a chemical standard mixture measurement, with similar results to those of model validation. It also performed well on a measurement of contaminated oil with spectral deconvolution. These results indicate that the model is suitable for investigating unknown-structured chemicals detected in measurements. Any online user can execute the model through a web application named Detective-QSAR (http://www.mixture-platform.net/Detective_QSAR_Med_Open/). The analytical descriptor-based approach is expected to create new opportunities for the evaluation of unknown chemicals around us.
SUBMITTER: Zushi Y
PROVIDER: S-EPMC9246259 | biostudies-literature |
REPOSITORIES: biostudies-literature
ACCESS DATA