Unknown

Dataset Information

0

Using Machine Learning to Uncover Hidden Heterogeneities in Survey Data.


ABSTRACT: Survey responses in public health surveys are heterogeneous. The quality of a respondent's answers depends on many factors, including cognitive abilities, interview context, and whether the interview is in person or self-administered. A largely unexplored issue is how the language used for public health survey interviews is associated with the survey response. We introduce a machine learning approach, Fuzzy Forests, which we use for model selection. We use the 2013 California Health Interview Survey (CHIS) as our training sample and the 2014 CHIS as the test sample. We found that non-English language survey responses differ substantially from English responses in reported health outcomes. We also found heterogeneity among the Asian languages suggesting that caution should be used when interpreting results that compare across these languages. The 2013 Fuzzy Forests model also correctly predicted 86% of good health outcomes using 2014 data as the test set. We show that the Fuzzy Forests methodology is potentially useful for screening for and understanding other types of survey response heterogeneity. This is especially true in high-dimensional and complex surveys.

SUBMITTER: Ramirez CM 

PROVIDER: S-EPMC6831673 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Using Machine Learning to Uncover Hidden Heterogeneities in Survey Data.

Ramirez Christina M CM   Abrajano Marisa A MA   Alvarez R Michael RM  

Scientific reports 20191105 1


Survey responses in public health surveys are heterogeneous. The quality of a respondent's answers depends on many factors, including cognitive abilities, interview context, and whether the interview is in person or self-administered. A largely unexplored issue is how the language used for public health survey interviews is associated with the survey response. We introduce a machine learning approach, Fuzzy Forests, which we use for model selection. We use the 2013 California Health Interview Su  ...[more]

Similar Datasets

| S-EPMC11356363 | biostudies-literature
| S-EPMC10868336 | biostudies-literature
| S-EPMC8960822 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
2019-07-18 | GSE134056 | GEO
2019-07-18 | GSE134052 | GEO
2023-06-01 | GSE193400 | GEO
| S-EPMC6610706 | biostudies-literature
| S-EPMC7439143 | biostudies-literature
| S-EPMC9236253 | biostudies-literature