Random responses inflate statistical estimates in heavily skewed addictions data.
Ontology highlight
ABSTRACT: BACKGROUND:Some respondents may respond at random to self-report surveys, rather than responding conscientiously (Meade and Craig, 2012), and this has only recently come to the attention of researchers in the addictions field (Godinho et al., 2016). Almost no research in the published addictions literature has reported screening for random responses. We illustrate how random responses can bias statistical estimates using simulated and real data, and how this is especially problematic in skewed data, as is common with substance use outcomes. METHOD:We first tested the effects of varying amounts and types of random responses on covariance-based statistical estimates in distributions with varying amounts of skew. We replicated these findings in correlations from a real dataset (Add Health) by replacing varying amounts of real data with simulated random responses. RESULTS:Skew and the proportion of random responses influenced the amount and direction of bias. When the data were not skewed, uniformly random responses deflated estimates, while long-string random responses inflated estimates. As the distributions became more skewed, all types of random responses began to inflate estimates, even at very small proportions. We observed similar effects in the Add Health data. CONCLUSIONS:Failing to screen for random responses in survey data produces biased statistical estimates, and data with only 2.5% random responses can inflate covariance-based estimates (i.e., correlations, Cronbach's alpha, regression coefficients, factor loadings, etc.) when data are heavily skewed. Screening for random responses can substantially improve data quality, reliability and validity.
SUBMITTER: King KM
PROVIDER: S-EPMC5803341 | biostudies-literature | 2018 Feb
REPOSITORIES: biostudies-literature
ACCESS DATA