Dataset Information

Random responses inflate statistical estimates in heavily skewed addictions data.

ABSTRACT:

Background

Some respondents may respond at random to self-report surveys, rather than responding conscientiously (Meade and Craig, 2012), and this has only recently come to the attention of researchers in the addictions field (Godinho et al., 2016). Almost no research in the published addictions literature has reported screening for random responses. We illustrate how random responses can bias statistical estimates using simulated and real data, and how this is especially problematic in skewed data, as is common with substance use outcomes.

Method

We first tested the effects of varying amounts and types of random responses on covariance-based statistical estimates in distributions with varying amounts of skew. We replicated these findings in correlations from a real dataset (Add Health) by replacing varying amounts of real data with simulated random responses.

Results

Skew and the proportion of random responses influenced the amount and direction of bias. When the data were not skewed, uniformly random responses deflated estimates, while long-string random responses inflated estimates. As the distributions became more skewed, all types of random responses began to inflate estimates, even at very small proportions. We observed similar effects in the Add Health data.

Conclusions

Failing to screen for random responses in survey data produces biased statistical estimates, and data with only 2.5% random responses can inflate covariance-based estimates (i.e., correlations, Cronbach's alpha, regression coefficients, factor loadings, etc.) when data are heavily skewed. Screening for random responses can substantially improve data quality, reliability and validity.

SUBMITTER: King KM

PROVIDER: S-EPMC5803341 | biostudies-literature | 2018 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Random responses inflate statistical estimates in heavily skewed addictions data.

King Kevin M KM Kim Dale S DS McCabe Connor J CJ

Drug and alcohol dependence 20171209

<h4>Background</h4>Some respondents may respond at random to self-report surveys, rather than responding conscientiously (Meade and Craig, 2012), and this has only recently come to the attention of researchers in the addictions field (Godinho et al., 2016). Almost no research in the published addictions literature has reported screening for random responses. We illustrate how random responses can bias statistical estimates using simulated and real data, and how this is especially problematic in ...[more]

PMID: 29245102

Dataset Information

Random responses inflate statistical estimates in heavily skewed addictions data.

Background

Method

Results

Conclusions

Publications

Random responses inflate statistical estimates in heavily skewed addictions data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Population structure can inflate SNP-based heritability estimates.
| S-EPMC3135810 | biostudies-literature

Characterising Uncertainty in Expert Assessments: Encoding Heavily Skewed Judgements.
| S-EPMC4627781 | biostudies-literature

Sequence differences at orthologous microsatellites inflate estimates of human-chimpanzee differentiation.
| S-EPMC4253012 | biostudies-literature

A statistical framework for assessing pharmacological responses and biomarkers using uncertainty estimates.
| S-EPMC7746236 | biostudies-literature

Statistical aspects of omics data analysis using the random compound covariate.
| S-EPMC3524312 | biostudies-literature

A genomic random interval model for statistical analysis of genomic lesion data.
| S-EPMC3740633 | biostudies-literature

Random sampling of skewed distributions implies Taylor's power law of fluctuation scaling.
| S-EPMC4485080 | biostudies-literature

Using Random Effect Models to Produce Robust Estimates of Death Rates in COVID-19 Data.
| S-EPMC9690214 | biostudies-literature

Sleep as a random walk: a super-statistical analysis of EEG data across sleep stages.
| S-EPMC8664947 | biostudies-literature

Bayesian additive regression trees for multivariate skewed responses.
| S-EPMC9851978 | biostudies-literature