Approximate median regression for complex survey data with skewed response.
Ontology highlight
ABSTRACT: The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.
SUBMITTER: Fraser RA
PROVIDER: S-EPMC5055849 | biostudies-literature | 2016 Dec
REPOSITORIES: biostudies-literature
ACCESS DATA