Dataset Information

Prediction of bioconcentration factors in fish and invertebrates using machine learning.

ABSTRACT: The application of machine learning has recently gained interest from ecotoxicological fields for its ability to model and predict chemical and/or biological processes, such as the prediction of bioconcentration. However, comparison of different models and the prediction of bioconcentration in invertebrates has not been previously evaluated. A comparison of 24 linear and machine learning models is presented herein for the prediction of bioconcentration in fish and important factors that influenced accumulation identified. R² and root mean square error (RMSE) for the test data (n?=?110 cases) ranged from 0.23-0.73 and 0.34-1.20, respectively. Model performance was critically assessed with neural networks and tree-based learners showing the best performance. An optimised 4-layer multi-layer perceptron (14 descriptors) was selected for further testing. The model was applied for cross-species prediction of bioconcentration in a freshwater invertebrate, Gammarus pulex. The model for G. pulex showed good performance with R² of 0.99 and 0.93 for the verification and test data, respectively. Important molecular descriptors determined to influence bioconcentration were molecular mass (MW), octanol-water distribution coefficient (logD), topological polar surface area (TPSA) and number of nitrogen atoms (nN) among others. Modelling of hazard criteria such as PBT, showed potential to replace the need for animal testing. However, the use of machine learning models in the regulatory context has been minimal to date and is critically discussed herein. The movement away from experimental estimations of accumulation to in silico modelling would enable rapid prioritisation of contaminants that may pose a risk to environmental health and the food chain.

SUBMITTER: Miller TH

PROVIDER: S-EPMC6234108 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Prediction of bioconcentration factors in fish and invertebrates using machine learning.

Miller Thomas H TH Gallidabino Matteo D MD MacRae James I JI Owen Stewart F SF Bury Nicolas R NR Barron Leon P LP

The Science of the total environment 20180810

The application of machine learning has recently gained interest from ecotoxicological fields for its ability to model and predict chemical and/or biological processes, such as the prediction of bioconcentration. However, comparison of different models and the prediction of bioconcentration in invertebrates has not been previously evaluated. A comparison of 24 linear and machine learning models is presented herein for the prediction of bioconcentration in fish and important factors that influenc ...[more]

PMID: 30114591

Similar Datasets

Project description:BACKGROUND:Prognostication is an essential tool for risk adjustment and decision making in the intensive care unit (ICU). Research into prognostication in ICU has so far been limited to data from admission or the first 24 hours. Most ICU admissions last longer than this, decisions are made throughout an admission, and some admissions are explicitly intended as time-limited prognostic trials. Despite this, temporal changes in prognostic ability during ICU admission has received little attention to date. Current predictive models, in the form of prognostic clinical tools, are typically derived from linear models and do not explicitly handle incremental information from trends. Machine learning (ML) allows predictive models to be developed which use non-linear predictors and complex interactions between variables, thus allowing incorporation of trends in measured variables over time; this has made it possible to investigate prognosis throughout an admission. METHODS AND FINDINGS:This study uses ML to assess the predictability of ICU mortality as a function of time. Logistic regression against physiological data alone outperformed APACHE-II and demonstrated several important interactions including between lactate & noradrenaline dose, between lactate & MAP, and between age & MAP consistent with the current sepsis definitions. ML models consistently outperformed logistic regression with Deep Learning giving the best results. Predictive power was maximal on the second day and was further improved by incorporating trend data. Using a limited range of physiological and demographic variables, the best machine learning model on the first day showed an area under the receiver-operator characteristic curve (AUC) of 0.883 (σ = 0.008), compared to 0.846 (σ = 0.010) for a logistic regression from the same predictors and 0.836 (σ = 0.007) for a logistic regression based on the APACHE-II score. Adding information gathered on the second day of admission improved the maximum AUC to 0.895 (σ = 0.008). Beyond the second day, predictive ability declined. CONCLUSION:This has implications for decision making in intensive care and provides a justification for time-limited trials of ICU therapy; the assessment of prognosis over more than one day may be a valuable strategy as new information on the second day helps to differentiate outcomes. New ML models based on trend data beyond the first day could greatly improve upon current risk stratification tools.

Dataset Information

Prediction of bioconcentration factors in fish and invertebrates using machine learning.

Publications

Prediction of bioconcentration factors in fish and invertebrates using machine learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets