Dataset Information

An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States.

ABSTRACT: In this paper, we integrated multiple types of predictor variables and three types of machine learners (neural network, random forest, and gradient boosting) into a geographically weighted ensemble model to estimate the daily maximum 8 h O₃ with high resolution over both space (at 1 km × 1 km grid cells covering the contiguous United States) and time (daily estimates between 2000 and 2016). We further quantify monthly model uncertainty for our 1 km × 1 km gridded domain. The results demonstrate high overall model performance with an average cross-validated R² (coefficient of determination) against observations of 0.90 and 0.86 for annual averages. Overall, the model performance of the three machine learning algorithms was quite similar. The overall model performance from the ensemble model outperformed those from any single algorithm. The East North Central region of the United States had the highest R², 0.93, and performance was weakest for the western mountainous regions (R² of 0.86) and New England (R² of 0.87). For the cross validation by season, our model had the best performance during summer with an R² of 0.88. This study can be useful for the environmental health community to more accurately estimate the health impacts of O₃ over space and time, especially in health studies at an intra-urban scale.

SUBMITTER: Requia WJ

PROVIDER: S-EPMC7498146 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States.

Requia Weeberb J WJ Di Qian Q Silvern Rachel R Kelly James T JT Koutrakis Petros P Mickley Loretta J LJ Sulprizio Melissa P MP Amini Heresh H Shi Liuhua L Schwartz Joel J

Environmental science & technology 20200901 18

In this paper, we integrated multiple types of predictor variables and three types of machine learners (neural network, random forest, and gradient boosting) into a geographically weighted ensemble model to estimate the daily maximum 8 h O<sub>3</sub> with high resolution over both space (at 1 km × 1 km grid cells covering the contiguous United States) and time (daily estimates between 2000 and 2016). We further quantify monthly model uncertainty for our 1 km × 1 km gridded domain. The results d ...[more]

PMID: 32808786

Similar Datasets

Project description:Regional air quality models are widely being used to understand the spatial extent and magnitude of the ozone non-attainment problem and to design emission control strategies needed to comply with the relevant ozone standard through direct emission perturbations. In this study, we examine the manageable portion of ground-level ozone using two simulations of the Community Multiscale Air Quality (CMAQ) model for the year 2010 and a probabilistic analysis approach involving 29 years (1990-2018) of historical ozone observations. The modeling results reveal that the reduction in the peak ozone levels from total elimination of anthropogenic emissions within the model domain is around 13-21 ppb for the 90th-100th percentile range of the daily maximum 8-hr ozone concentrations across the contiguous United States (CONUS). Large reductions in the 4th highest 8-hr ozone are seen in the regions of West (interquartile range (IQR) of 17-33%), South (IQR 22-34%), Central (IQR 19-31%), Southeast (IQR 25-34%), and Northeast (IQR 24-37%). However, sites in the western portion of the domain generally show smaller reductions even when all anthropogenic emissions are removed, possibly due to the strong influence of global background ozone, including sources such as intercontinental ozone transport, stratospheric ozone intrusions, wildfires, and biogenic precursor emissions. Probabilistic estimates of the exceedances for several hypothetical thresholds of the 4th highest 8-hr ozone indicate that, in some areas, exceedances of such hypothetical thresholds may occur even with no anthropogenic emissions due to the ever-present atmospheric stochasticity and the current global tropospheric ozone burden. Implications: Because air pollution is intricately linked to adverse health effects, National Ambient Air Quality Standards (NAAQS) have been established for criteria pollutants to safeguard human health and the environment. Areas not in compliance with the relevant standards are required to develop plans and policies to reduce their air pollution levels. Regional-scale air quality models are currently being used routinely to inform policies to identify the emissions reduction required to meet and maintain the NAAQS throughout the country. This paper examines the feasibility of the 4th highest ozone, which is used to derive the ozone design value for NAAQS, complying with various current and hypothetical 8-hr ozone thresholds over CONUS based on the information embedded in 29 years of historical ozone observations and two modeling scenarios with and without anthropogenic emissions loading.

Project description:Various approaches have been proposed to model PM2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several meteorological variables as major predictor variables. Our study used an ensemble model that integrated multiple machine learning algorithms and predictor variables to estimate daily PM2.5 at a resolution of 1 km × 1 km across the contiguous United States. We used a generalized additive model that accounted for geographic difference to combine PM2.5 estimates from neural network, random forest, and gradient boosting. The three machine learning algorithms were based on multiple predictor variables, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and others. The model training results from 2000 to 2015 indicated good model performance with a 10-fold cross-validated R2 of 0.86 for daily PM2.5 predictions. For annual PM2.5 estimates, the cross-validated R2 was 0.89. Our model demonstrated good performance up to 60 μg/m3. Using trained PM2.5 model and predictor variables, we predicted daily PM2.5 from 2000 to 2015 at every 1 km × 1 km grid cell in the contiguous United States. We also used localized land-use variables within 1 km × 1 km grids to downscale PM2.5 predictions to 100 m × 100 m grid cells. To characterize uncertainty, we used meteorological variables, land-use variables, and elevation to model the monthly standard deviation of the difference between daily monitored and predicted PM2.5 for every 1 km × 1 km grid cell. This PM2.5 prediction dataset, including the downscaled and uncertainty predictions, allows epidemiologists to accurately estimate the adverse health effect of PM2.5. Compared with model performance of individual base learners, an ensemble model would achieve a better overall estimation. It is worth exploring other ensemble model formats to synthesize estimations from different models or from different groups to improve overall performance.

Project description:Dynamic evaluation of the fully coupled Weather Research and Forecasting (WRF)- Community Multi-scale Air Quality (CMAQ) model ozone simulations over the contiguous United States (CONUS) using two decades of simulations covering the period from 1990 to 2010 is conducted to assess how well the changes in observed ozone air quality are simulated by the model. The changes induced by variations in meteorology and/or emissions are also evaluated during the same timeframe using spectral decomposition of observed and modeled ozone time series with the aim of identifying the underlying forcing mechanisms that control ozone exceedances and making informed recommendations for the optimal use of regional-scale air quality models. The evaluation is focused on the warm season's (i.e., May-September) daily maximum 8-hr (DM8HR) ozone concentrations, the 4th highest (4th) and average of top 10 DM8HR ozone values (top10), as well as the spectrally-decomposed components of the DM8HR ozone time series using the Kolmogorov-Zurbenko (KZ) filter. Results of the dynamic evaluation are presented for six regions in the U.S., consistent with the National Oceanic and Atmospheric Administration (NOAA) climatic regions. During the earlier 11-yr period (1990-2000), the simulated and observed trends are not statistically significant. During the more recent 2000-2010 period, all trends are statistically significant and WRF-CMAQ captures the observed trend in most regions. Given large number of sites for the 2000-2010 period, the model captures the observed trends in the Southwest (SW) and MW but has significantly different trend from that seen in observations for the other regions. Observational analysis reveals that it is the long-term forcing that dictates how high the ozone exceedances will be; there is a strong linear relationship between the long-term forcing and the 4th highest or the average of the top10 ozone concentrations in both observations and model output. This finding indicates that improving the model's ability to reproduce the long-term component will also enable better simulation of ozone extreme values that are of interest to regulatory agencies.

Dataset Information

An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States.

Publications

An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets