Dataset Information

Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution.

ABSTRACT: Spatiotemporal models to estimate ambient exposures at high spatiotemporal resolutions are crucial in large-scale air pollution epidemiological studies that follow participants over extended periods. Previous models typically rely on central-site monitoring data and/or covered short periods, limiting their applications to long-term cohort studies. Here we developed a spatiotemporal model that can reliably predict nitrogen oxide concentrations with a high spatiotemporal resolution over a long time span (>20 years). Leveraging the spatially extensive highly clustered exposure data from short-term measurement campaigns across 1-2 years and long-term central site monitoring in 1992-2013, we developed an integrated mixed-effect model with uncertainty estimates. Our statistical model incorporated nonlinear and spatial effects to reduce bias. Identified important predictors included temporal basis predictors, traffic indicators, population density, and subcounty-level mean pollutant concentrations. Substantial spatial autocorrelation (11-13%) was observed between neighboring communities. Ensemble learning and constrained optimization were used to enhance reliability of estimation over a large metropolitan area and a long period. The ensemble predictions of biweekly concentrations resulted in an R² of 0.85 (RMSE: 4.7 ppb) for NO₂ and 0.86 (RMSE: 13.4 ppb) for NO_x. Ensemble learning and constrained optimization generated stable time series, which notably improved the results compared with those from initial mixed-effects models.

SUBMITTER: Li L

PROVIDER: S-EPMC5609852 | biostudies-literature | 2017 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution.

Li Lianfa L Lurmann Fred F Habre Rima R Urman Robert R Rappaport Edward E Ritz Beate B Chen Jiu-Chiuan JC Gilliland Frank D FD Wu Jun J

Environmental science & technology 20170811 17

Spatiotemporal models to estimate ambient exposures at high spatiotemporal resolutions are crucial in large-scale air pollution epidemiological studies that follow participants over extended periods. Previous models typically rely on central-site monitoring data and/or covered short periods, limiting their applications to long-term cohort studies. Here we developed a spatiotemporal model that can reliably predict nitrogen oxide concentrations with a high spatiotemporal resolution over a long tim ...[more]

PMID: 28727456

Similar Datasets

Project description:BackgroundAccurate estimation of nitrogen dioxide (NO2) and nitrogen oxide (NOx) concentrations at high spatiotemporal resolutions is crucial for improving evaluation of their health effects, particularly with respect to short-term exposures and acute health outcomes. For estimation over large regions like California, high spatial density field campaign measurements can be combined with more sparse routine monitoring network measurements to capture spatiotemporal variability of NO2 and NOx concentrations. However, monitors in spatially dense field sampling are often highly clustered and their uneven distribution creates a challenge for such combined use. Furthermore, heterogeneities due to seasonal patterns of meteorology and source mixtures between sub-regions (e.g. southern vs. northern California) need to be addressed.ObjectivesIn this study, we aim to develop highly accurate and adaptive machine learning models to predict high-resolution NO2 and NOx concentrations over large geographic regions using measurements from different sources that contain samples with heterogeneous spatiotemporal distributions and clustering patterns.MethodsWe used a comprehensive Kruskal-K-means method to cluster the measurement samples from multiple heterogeneous sources. Spatiotemporal cluster-based bootstrap aggregating (bagging) of the base mixed-effects models was then applied, leveraging the clusters to obtain balanced and less correlated training samples for less bias and improvement in generalization. Further, we used the machine learning technique of grid search to find the optimal interaction of temporal basis functions and the scale of spatial effects, which, together with spatiotemporal covariates, adequately captured spatiotemporal variability in NO2 and NOx at the state and local levels.ResultsWe found an optimal combination of four temporal basis functions and 200 m scale spatial effects for the base mixed-effects models. With the cluster-based bagging of the base models, we obtained robust predictions with an ensemble cross validation R2 of 0.88 for both NO2 and NOx [RMSE (RMSEIQR): 3.62 ppb (0.28) and 9.63 ppb (0.37) respectively]. In independent tests of random sampling, our models achieved similarly strong performance (R2 of 0.87-0.90; RMSE of 3.97-9.69 ppb; RMSEIQR of 0.21-0.27), illustrating minimal over-fitting.ConclusionsOur approach has important implications for fusing data from highly clustered and heterogeneous measurement samples from multiple data sources to produce highly accurate concentration estimates of air pollutants such as NO2 and NOx at high resolution over a large region.

Project description:Drought is a natural hazard, which is a result of a prolonged shortage of precipitation, high temperature and change in the weather pattern. Drought harms society, the economy and the natural environment, but it is difficult to identify and characterize. Many areas of Pakistan have suffered severe droughts during the last three decades due to changes in the weather pattern. A drought analysis with the incorporation of climate information has not yet been undertaken in this study region. Here, we propose an ensemble approach for monthly drought prediction and to define and examine wet/dry events. Initially, the drought events were identified by the short term Standardized Precipitation Index (SPI-3). Drought is predicted based on three ensemble models i.e., Equal Ensemble Drought Prediction (EEDP), Weighted Ensemble Drought Prediction (WEDP) and the Conditional Ensemble Drought Prediction (CEDP) model. Besides, two weighting procedures are used for distributing weights in the WEDP model, such as Traditional Weighting (TW) and the Weighted Bootstrap Resampling (WBR) procedure. Four copula families (i.e., Frank, Clayton, Gumbel and Joe) are used to explain the dependency relation between climate indices and precipitation in the CEDP model. Among all four copula families, the Joe copula has been found suitable for most of the times. The CEDP model provides better results in terms of accuracy and uncertainty as compared to other ensemble models for all meteorological stations. The performance of the CEDP model indicates that the climate indices are correlated with a weather pattern of four meteorological stations. Moreover, the percentage occurrence of extreme drought events that have appeared in the Multan, Bahawalpur, Barkhan and Khanpur are 1.44%, 0.57%, 2.59% and 1.71%, respectively, whereas the percentage occurrence of extremely wet events are 2.3%, 1.72%, 0.86% and 2.86%, respectively. The understanding of drought pattern by including climate information can contribute to the knowledge of future agriculture and water resource management.

Project description:Elevated levels of ambient air pollution has been implicated as a major risk factor for morbidities and premature mortality in India, with particularly high concentrations of particulate matter in the Indo-Gangetic plain. High resolution spatiotemporal estimates of such exposures are critical to assess health effects at an individual level. This article retrospectively assesses daily average PM2.5 exposure at 1 km × 1 km grids in Delhi, India from 2010-2016, using multiple data sources and ensemble averaging approaches. We used a multi-stage modeling exercise involving satellite data, land use variables, reanalysis based meteorological variables and population density. A calibration regression was used to model PM2.5: PM10 to counter the sparsity of ground monitoring data. The relationship between PM2.5 and its spatiotemporal predictors was modeled using six learners; generalized additive models, elastic net, support vector regressions, random forests, neural networks and extreme gradient boosting. Subsequently, these predictions were combined under a generalized additive model framework using a tensor product based spatial smoothing. Overall cross-validated prediction accuracy of the model was 80% over the study period with high spatial model accuracy and predicted annual average concentrations ranging from 87 to 138 μg/m3. Annual average root mean squared errors for the ensemble averaged predictions were in the range 39.7-62.7 μg/m3 with prediction bias ranging between 4.6-11.2 μg/m3. In addition, tree based learners such as random forests and extreme gradient boosting outperformed other algorithms. Our findings indicate important seasonal and geographical differences in particulate matter concentrations within Delhi over a significant period of time, with meteorological and land use features that discriminate most and least polluted regions. This exposure assessment can be used to estimate dose response relationships more accurately over a wide range of particulate matter concentrations.

Dataset Information

Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution.

Publications

Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets