Dataset Information

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes.

ABSTRACT: There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution).We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic.At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003).Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims.

SUBMITTER: Hibbert JD

PROVIDER: S-EPMC2763852 | biostudies-other | 2009 Oct

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes.

Hibbert James D JD Liese Angela D AD Lawson Andrew A Porter Dwayne E DE Puett Robin C RC Standiford Debra D Liu Lenna L Dabelea Dana D

International journal of health geographics 20091008

<h4>Background</h4>There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution).<h4>Methods</h4>We evaluated the accuracy of eight geo-imputatio ...[more]

PMID: 19814809

Similar Datasets

Project description:BackgroundThree-level data arising from repeated measures on individuals who are clustered within larger units are common in health research studies. Missing data are prominent in such longitudinal studies and multiple imputation (MI) is a popular approach for handling missing data. Extensions of joint modelling and fully conditional specification MI approaches based on multilevel models have been developed for imputing three-level data. Alternatively, it is possible to extend single- and two-level MI methods to impute three-level data using dummy indicators and/or by analysing repeated measures in wide format. However, most implementations, evaluations and applications of these approaches focus on the context of incomplete two-level data. It is currently unclear which approach is preferable for imputing three-level data.MethodsIn this study, we investigated the performance of various MI methods for imputing three-level incomplete data when the target analysis model is a three-level random effects model with a random intercept for each level. The MI methods were evaluated via simulations and illustrated using empirical data, based on a case study from the Childhood to Adolescence Transition Study, a longitudinal cohort collecting repeated measures on students who were clustered within schools. In our simulations we considered a number of different scenarios covering a range of different missing data mechanisms, missing data proportions and strengths of level-2 and level-3 intra-cluster correlations.ResultsWe found that all of the approaches considered produced valid inferences about both the regression coefficient corresponding to the exposure of interest and the variance components under the various scenarios within the simulation study. In the case study, all approaches led to similar results.ConclusionResearchers may use extensions to the single- and two-level approaches, or the three-level approaches, to adequately handle incomplete three-level data. The two-level MI approaches with dummy indicator extension or the MI approaches based on three-level models will be required in certain circumstances such as when there are longitudinal data measured at irregular time intervals. However, the single- and two-level approaches with the DI extension should be used with caution as the DI approach has been shown to produce biased parameter estimates in certain scenarios.

Project description:Exposure to PM2.5 is associated with hundreds of premature mortalities every year in New York City (NYC). Current air quality and health impact assessment tools provide county-wide estimates but are inadequate for assessing health benefits at neighborhood scales, especially for evaluating policy options related to energy efficiency or climate goals. We developed a new ZIP Code-Level Air Pollution Policy Assessment (ZAPPA) tool for NYC by integrating two reduced form models─Community Air Quality Tools (C-TOOLS) and the Co-Benefits Risk Assessment Health Impacts Screening and Mapping Tool (COBRA)─that propagate emissions changes to estimate air pollution exposures and health benefits. ZAPPA leverages custom higher resolution inputs for emissions, health incidences, and population. It, then, enables rapid policy evaluation with localized ZIP code tabulation area (ZCTA)-level analysis of potential health and monetary benefits stemming from air quality management decisions. We evaluated the modeled 2016 PM2.5 values against observed values at EPA and NYCCAS monitors, finding good model performance (FAC2, 1; NMSE, 0.05). We, then, applied ZAPPA to assess PM2.5 reduction-related health benefits from five illustrative policy scenarios in NYC focused on (1) commercial cooking, (2) residential and commercial building fuel regulations, (3) fleet electrification, (4) congestion pricing in Manhattan, and (5) these four combined as a "citywide sustainable policy implementation" scenario. The citywide scenario estimates an average reduction in PM2.5 of 0.9 μg/m3. This change translates to avoiding 210-475 deaths, 340 asthma emergency department visits, and monetized health benefits worth $2B to $5B annually, with significant variation across NYC's 192 ZCTAs. ZCTA-level assessments can help prioritize interventions in neighborhoods that would see the most health benefits from air pollution reduction. ZAPPA can provide quantitative insights on health and monetary benefits for future sustainability policy development in NYC.

Dataset Information

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes.

Publications

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets