Dataset Information

Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology.

ABSTRACT: Preferential sampling has been defined in the context of geostatistical modeling as the dependence between the sampling locations and the process that describes the spatial structure of the data. It can occur when networks are designed to find high values. For example, in networks based on the U.S. Clean Air Act monitors are sited to determine whether air quality standards are exceeded. We study the impact of the design of monitor networks in the context of air pollution epidemiology studies. The effect of preferential sampling has been illustrated in the literature by highlighting its impact on spatial predictions. In this paper, we use these predictions as input in a second stage analysis, and we assess how they affect health effect inference. Our work is motivated by data from two United States regulatory networks and health data from the Multi-Ethnic Study of Atherosclerosis and Air Pollution. The two networks were designed to monitor air pollution in urban and rural areas respectively, and we found that the health analysis results based on the two networks can lead to different scientific conclusions. We use preferential sampling to gain insight into these differences. We designed a simulation study, and found that the validity and reliability of the health effect estimate can be greatly affected by how we sample the monitor locations. To better understand its effect on second stage inference, we identify two components of preferential sampling that shed light on how preferential sampling alters the properties of the health effect estimate.

SUBMITTER: Lee A

PROVIDER: S-EPMC5863931 | biostudies-literature | 2015 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology.

Lee A A Szpiro A A Kim S Y SY Sheppard L L

Environmetrics 20150305 4

Preferential sampling has been defined in the context of geostatistical modeling as the dependence between the sampling locations and the process that describes the spatial structure of the data. It can occur when networks are designed to find high values. For example, in networks based on the U.S. Clean Air Act monitors are sited to determine whether air quality standards are exceeded. We study the impact of the design of monitor networks in the context of air pollution epidemiology studies. Th ...[more]

PMID: 29576734

Similar Datasets

Project description:BackgroundTwo distinctly different types of measurement error are Berkson and classical. Impacts of measurement error in epidemiologic studies of ambient air pollution are expected to depend on error type. We characterize measurement error due to instrument imprecision and spatial variability as multiplicative (i.e. additive on the log scale) and model it over a range of error types to assess impacts on risk ratio estimates both on a per measurement unit basis and on a per interquartile range (IQR) basis in a time-series study in Atlanta.MethodsDaily measures of twelve ambient air pollutants were analyzed: NO2, NOx, O3, SO2, CO, PM10 mass, PM2.5 mass, and PM2.5 components sulfate, nitrate, ammonium, elemental carbon and organic carbon. Semivariogram analysis was applied to assess spatial variability. Error due to this spatial variability was added to a reference pollutant time-series on the log scale using Monte Carlo simulations. Each of these time-series was exponentiated and introduced to a Poisson generalized linear model of cardiovascular disease emergency department visits.ResultsMeasurement error resulted in reduced statistical significance for the risk ratio estimates for all amounts (corresponding to different pollutants) and types of error. When modelled as classical-type error, risk ratios were attenuated, particularly for primary air pollutants, with average attenuation in risk ratios on a per unit of measurement basis ranging from 18% to 92% and on an IQR basis ranging from 18% to 86%. When modelled as Berkson-type error, risk ratios per unit of measurement were biased away from the null hypothesis by 2% to 31%, whereas risk ratios per IQR were attenuated (i.e. biased toward the null) by 5% to 34%. For CO modelled error amount, a range of error types were simulated and effects on risk ratio bias and significance were observed.ConclusionsFor multiplicative error, both the amount and type of measurement error impact health effect estimates in air pollution epidemiology. By modelling instrument imprecision and spatial variability as different error types, we estimate direction and magnitude of the effects of error over a range of error types.

Project description:The era of big data has enabled sophisticated models to predict air pollution concentrations over space and time. Historically these models have been evaluated using overall metrics that measure how close predictions are to monitoring data. However, overall methods are not designed to distinguish error at timescales most relevant for epidemiologic studies, such as day-to-day errors that impact studies of short-term health associations. We introduce frequency band model performance, which quantifies health estimation capacity of air quality prediction models for time series studies of air pollution and health. Frequency band model performance uses a discrete Fourier transform to evaluate prediction models at timescales of interest. We simulated fine particulate matter (PM2.5), with errors at timescales varying from acute to seasonal, and health time series data. To compare evaluation approaches, we use correlations and root mean squared error (RMSE). Additionally, we assess health estimation capacity through bias and RMSE in estimated health associations. We apply frequency band model performance to PM2.5 predictions at 17 monitors in 8 US cities. In simulations, frequency band model performance rates predictions better (lower RMSE, higher correlation) when there is no error at a particular timescale (e.g., acute) and worse when error is added to that timescale, compared to overall approaches. Further, frequency band model performance is more strongly associated (R2 = 0.95) with health association bias compared to overall approaches (R2 = 0.57). For PM2.5 predictions in Salt Lake City, UT, frequency band model performance better identifies acute error that may impact estimated short-term health associations. For epidemiologic studies, frequency band model performance provides an improvement over existing approaches because it evaluates models at the timescale of interest and is more strongly associated with bias in estimated health associations. Evaluating prediction models at timescales relevant for health studies is critical to determining whether model error will impact estimated health associations.

Project description:PurposeVaginal microbial communities can be dominated by anaerobic (community state type IV, CST IV) or Lactobacillus (other CSTs) species. CST IV is a risk factor for spontaneous preterm birth (sPTB) and is more common among Black than White populations. In the US, average air pollution exposures are higher among Black compared to White people and exert systemic health effects. We sought to (1) quantify associations of air pollution, specifically particulate matter <2.5 μm in diameter (PM2.5), with CST IV and (2) explore the extent to which racial disparities in PM2.5 exposure might explain racial differences in the prevalence of CST IV.DesignMethods: We performed a secondary analysis of 566 participants of the Motherhood & Microbiome study. PM2.5 exposures were derived from a machine learning model integrating NASA satellite and EPA ground monitor data. Previously, cervicovaginal swabs from 15 to 20 weeks' gestation were analyzed using 16 S rRNA sequencing and hierarchical clustering assigned CSTs. Multivariable logistic regression models calculated adjusted odds ratios of CST IV (vs. other CSTs) per interquartile range (IQR) increment of PM2.5. Race-stratified and mediation analyses were performed.ResultsHigher PM2.5 exposure was associated with CST IV (aOR 1.39, 95% CI 1.02-1.91). Further adjustment for race/ethnicity attenuated the association (aOR 1.34, 95% CI: 0.97-1.83). Black participants (vs. White) had higher median PM2.5 exposure (10.6 vs. 9.6 μg/m3, P < 0.001) and higher prevalence of CST IV (47% vs. 11%, P < 0.001). Mediation analysis revealed that higher PM2.5 exposure may explain 3.9% (P = 0.038) and 3.3% (P = 0.15) of the Black-White disparity in CST IV in unadjusted and adjusted models, respectively.ConclusionPM2.5 was associated with CST IV, a risk factor for sPTB. Additionally, PM2.5 exposure may partially explain racial differences in the prevalence of CST IV. Further research is warranted to discover how environmental exposures affect microbial composition and perpetuate racial health disparities.

Dataset Information

Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology.

Publications

Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets