Dataset Information

Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies.

ABSTRACT: Accurate and sufficient water quality data is essential for watershed management and sustainability. Machine learning models have shown great potentials for estimating water quality with the development of online sensors. However, accurate estimation is challenging because of uncertainties related to models used and data input. In this study, random forest (RF), support vector machine (SVM), and back-propagation neural network (BPNN) models are developed with three sampling frequency datasets (i.e., 4-hourly, daily, and weekly) and five conventional indicators (i.e., water temperature (WT), hydrogen ion concentration (pH), electrical conductivity (EC), dissolved oxygen (DO), and turbidity (TUR)) as surrogates to individually estimate riverine total phosphorus (TP), total nitrogen (TN), and ammonia nitrogen (NH4+-N) in a small-scale coastal watershed. The results show that the RF model outperforms the SVM and BPNN machine learning models in terms of estimative performance, which explains much of the variation in TP (79 ± 1.3%), TN (84 ± 0.9%), and NH4+-N (75 ± 1.3%), when using the 4-hourly sampling frequency dataset. The higher sampling frequency would help the RF obtain a significantly better performance for the three nutrient estimation measures (4-hourly > daily > weekly) for R2 and NSE values. WT, EC, and TUR were the three key input indicators for nutrient estimations in RF. Our study highlights the importance of high-frequency data as input to machine learning model development. The RF model is shown to be viable for riverine nutrient estimation in small-scale watersheds of important local water security.

SUBMITTER: Chen S

PROVIDER: S-EPMC9278742 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies.

Chen Shengyue S Zhang Zhenyu Z Lin Juanjuan J Huang Jinliang J

PloS one 20220713 7

Accurate and sufficient water quality data is essential for watershed management and sustainability. Machine learning models have shown great potentials for estimating water quality with the development of online sensors. However, accurate estimation is challenging because of uncertainties related to models used and data input. In this study, random forest (RF), support vector machine (SVM), and back-propagation neural network (BPNN) models are developed with three sampling frequency datasets (i ...[more]

PMID: 35830456

Dataset Information

Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies.

Publications

Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory.
| S-EPMC12041501 | biostudies-literature

Land use shapes riverine nutrient and sediment concentrations on Moorea, French Polynesia.
| S-EPMC12314069 | biostudies-literature

Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations.
| S-EPMC8901043 | biostudies-literature

Machine learning based variance estimation under two phase sampling using health and education sector data.
| S-EPMC12948996 | biostudies-literature

Mitigating crop modeling uncertainties through machine learning in drylands.
| S-EPMC12663468 | biostudies-literature

Neglecting uncertainties biases house-elevation decisions to manage riverine flood risks.
| S-EPMC7588474 | biostudies-literature

Multiscale Enhanced Sampling Using Machine Learning.
| S-EPMC8540671 | biostudies-literature

Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose.
| S-EPMC6950225 | biostudies-literature

Uncertainties of soil organic carbon stock estimation caused by paleoclimate and human footprint on the Qinghai Plateau.
| S-EPMC9134640 | biostudies-literature

A machine learning approach for computation of cardiovascular intrinsic frequencies.
| S-EPMC10602266 | biostudies-literature