Dataset Information

Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study.

ABSTRACT: BACKGROUND:As the process of producing official health statistics for lifestyle diseases is slow, researchers have explored using Web search data as a proxy for lifestyle disease surveillance. Existing studies, however, are prone to at least one of the following issues: ad-hoc keyword selection, overfitting, insufficient predictive evaluation, lack of generalization, and failure to compare against trivial baselines. OBJECTIVE:The aims of this study were to (1) employ a corrective approach improving previous methods; (2) study the key limitations in using Google Trends for lifestyle disease surveillance; and (3) test the generalizability of our methodology to other countries beyond the United States. METHODS:For each of the target variables (diabetes, obesity, and exercise), prevalence rates were collected. After a rigorous keyword selection process, data from Google Trends were collected. These data were denormalized to form spatio-temporal indices. L1-regularized regression models were trained to predict prevalence rates from denormalized Google Trends indices. Models were tested on a held-out set and compared against baselines from the literature as well as a trivial last year equals this year baseline. A similar analysis was done using a multivariate spatio-temporal model where the previous year's prevalence was included as a covariate. This model was modified to create a time-lagged regression analysis framework. Finally, a hierarchical time-lagged multivariate spatio-temporal model was created to account for subnational trends in the data. The model trained on US data was, then, applied in a transfer learning framework to Canada. RESULTS:In the US context, our proposed models beat the performances of the prior work, as well as the trivial baselines. In terms of the mean absolute error (MAE), the best of our proposed models yields 24% improvement (0.72-0.55; P<.001) for diabetes; 18% improvement (1.20-0.99; P=.001) for obesity, and 34% improvement (2.89-1.95; P<.001) for exercise. Our proposed across-country transfer learning framework also shows promising results with an average Spearman and Pearson correlation of 0.70 for diabetes and 0.90 and 0.91 for obesity, respectively. CONCLUSIONS:Although our proposed models beat the baselines, we find the modeling of lifestyle diseases to be a challenging problem, one that requires an abundance of data as well as creative modeling strategies. In doing so, this study shows a low-to-moderate validity of Google Trends in the context of lifestyle disease surveillance, even when applying novel corrective approaches, including a proposed denormalization scheme. We envision qualitative analyses to be a more practical use of Google Trends in the context of lifestyle disease surveillance. For the quantitative analyses, the highest utility of using Google Trends is in the context of transfer learning where low-resource countries could benefit from high-resource countries by using proxy models.

SUBMITTER: Memon SA

PROVIDER: S-EPMC7011125 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study.

Memon Shahan Ali SA Razak Saquib S Weber Ingmar I

Journal of medical Internet research 20200127 1

<h4>Background</h4>As the process of producing official health statistics for lifestyle diseases is slow, researchers have explored using Web search data as a proxy for lifestyle disease surveillance. Existing studies, however, are prone to at least one of the following issues: ad-hoc keyword selection, overfitting, insufficient predictive evaluation, lack of generalization, and failure to compare against trivial baselines.<h4>Objective</h4>The aims of this study were to (1) employ a corrective ...[more]

PMID: 32012050

Similar Datasets

Project description:BackgroundThe rising incidence of chronic diseases is a growing concern, especially in Singapore, which is one of the high-income countries with the highest prevalence of diabetes. Interventions that promote healthy lifestyle behavior changes have been proven to be effective in reducing the progression of prediabetes to diabetes, but their in-person delivery may not be feasible on a large scale. Novel technologies such as conversational agents are a potential alternative for delivering behavioral interventions that promote healthy lifestyle behavior changes to the public.ObjectiveThe aim of this study is to assess the feasibility and acceptability of using a conversational agent promoting healthy lifestyle behavior changes in the general population in Singapore.MethodsWe performed a web-based, single-arm feasibility study. The participants were recruited through Facebook over 4 weeks. The Facebook Messenger conversational agent was used to deliver the intervention. The conversations focused on diet, exercise, sleep, and stress and aimed to promote healthy lifestyle behavior changes and improve the participants' knowledge of diabetes. Messages were sent to the participants four times a week (once for each of the 4 topics of focus) for 4 weeks. We assessed the feasibility of recruitment, defined as at least 75% (150/200) of our target sample of 200 participants in 4 weeks, as well as retention, defined as 33% (66/200) of the recruited sample completing the study. We also assessed the participants' satisfaction with, and usability of, the conversational agent. In addition, we performed baseline and follow-up assessments of quality of life, diabetes knowledge and risk perception, diet, exercise, sleep, and stress.ResultsWe recruited 37.5% (75/200) of the target sample size in 1 month. Of the 75 eligible participants, 60 (80%) provided digital informed consent and completed baseline assessments. Of these 60 participants, 56 (93%) followed the study through till completion. Retention was high at 93% (56/60), along with engagement, denoted by 50% (30/60) of the participants communicating with the conversational agent at each interaction. Acceptability, usability, and satisfaction were generally high. Preliminary efficacy of the intervention showed no definitive improvements in health-related behavior.ConclusionsThe delivery of a conversational agent for healthy lifestyle behavior change through Facebook Messenger was feasible and acceptable. We were unable to recruit our planned sample solely using the free options in Facebook. However, participant retention and conversational agent engagement rates were high. Our findings provide important insights to inform the design of a future randomized controlled trial.

Project description:BackgroundInternet search query trends have been shown to correlate with incidence trends for select infectious diseases and countries. Herein, the first use of Google search queries for malaria surveillance is investigated. The research focuses on Thailand where real-time malaria surveillance is crucial as malaria is re-emerging and developing resistance to pharmaceuticals in the region.MethodsOfficial Thai malaria case data was acquired from the World Health Organization (WHO) from 2005 to 2009. Using Google correlate, an openly available online tool, and by surveying Thai physicians, search queries potentially related to malaria prevalence were identified. Four linear regression models were built from different sub-sets of malaria-related queries to be used in future predictions. The models' accuracies were evaluated by their ability to predict the malaria outbreak in 2009, their correlation with the entire available malaria case data, and by Akaike information criterion (AIC).ResultsEach model captured the bulk of the variability in officially reported malaria incidence. Correlation in the validation set ranged from 0.75 to 0.92 and AIC values ranged from 808 to 586 for the models. While models using malaria-related and general health terms were successful, one model using only microscopy-related terms obtained equally high correlations to malaria case data trends. The model built strictly of queries provided by Thai physicians was the only one that consistently captured the well-documented second seasonal malaria peak in Thailand.ConclusionsModels built from Google search queries were able to adequately estimate malaria activity trends in Thailand, from 2005-2010, according to official malaria case counts reported by WHO. While presenting their own limitations, these search queries may be valid real-time indicators of malaria incidence in the population, as correlations were on par with those of related studies for other infectious diseases. Additionally, this methodology provides a cost-effective description of malaria prevalence that can act as a complement to traditional public health surveillance. This and future studies will continue to identify ways to leverage web-based data to improve public health.

Project description:BackgroundThe use of internet search data has been demonstrated to be effective at predicting influenza incidence. This approach may be more successful for dengue which has large variation in annual incidence and a more distinctive clinical presentation and mode of transmission.MethodsWe gathered freely-available dengue incidence data from Singapore (weekly incidence, 2004-2011) and Bangkok (monthly incidence, 2004-2011). Internet search data for the same period were downloaded from Google Insights for Search. Search terms were chosen to reflect three categories of dengue-related search: nomenclature, signs/symptoms, and treatment. We compared three models to predict incidence: a step-down linear regression, generalized boosted regression, and negative binomial regression. Logistic regression and Support Vector Machine (SVM) models were used to predict a binary outcome defined by whether dengue incidence exceeded a chosen threshold. Incidence prediction models were assessed using r² and Pearson correlation between predicted and observed dengue incidence. Logistic and SVM model performance were assessed by the area under the receiver operating characteristic curve. Models were validated using multiple cross-validation techniques.ResultsThe linear model selected by AIC step-down was found to be superior to other models considered. In Bangkok, the model has an r² = 0.943, and a correlation of 0.869 between fitted and observed. In Singapore, the model has an r² = 0.948, and a correlation of 0.931. In both Singapore and Bangkok, SVM models outperformed logistic regression in predicting periods of high incidence. The AUC for the SVM models using the 75th percentile cutoff is 0.906 in Singapore and 0.960 in Bangkok.ConclusionsInternet search terms predict incidence and periods of large incidence of dengue with high accuracy and may prove useful in areas with underdeveloped surveillance systems. The methods presented here use freely available data and analysis tools and can be readily adapted to other settings.

Project description:BackgroundDabbing is an emerging method of marijuana ingestion. However, little is known about dabbing owing to limited surveillance data on dabbing.ObjectiveThe aim of the study was to analyze Google search data to assess the scope and breadth of information seeking on dabbing.MethodsGoogle Trends data about dabbing and related topics (eg, electronic nicotine delivery system [ENDS], also known as e-cigarettes) in the United States between January 2004 and December 2015 were collected by using relevant search terms such as "dab rig." The correlation between dabbing (including topics: dab and hash oil) and ENDS (including topics: vaping and e-cigarette) searches, the regional distribution of dabbing searches, and the impact of cannabis legalization policies on geographical location in 2015 were analyzed.ResultsSearches regarding dabbing increased in the United States over time, with 1,526,280 estimated searches during 2015. Searches for dab and vaping have very similar temporal patterns, where the Pearson correlation coefficient (PCC) is .992 (P<.001). Similar phenomena were also obtained in searches for hash oil and e-cigarette, in which the corresponding PCC is .931 (P<.001). Dabbing information was searched more in some western states than other regions. The average dabbing searches were significantly higher in the states with medical and recreational marijuana legalization than in the states with only medical marijuana legalization (P=.02) or the states without medical and recreational marijuana legalization (P=.01).ConclusionsPublic interest in dabbing is increasing in the United States. There are close associations between dabbing and ENDS searches. The findings suggest greater popularity of dabs in the states that legalized medical and recreational marijuana use. This study proposes a novel and timely way of cannabis surveillance, and these findings can help enhance the understanding of the popularity of dabbing and provide insights for future research and informed policy making on dabbing.

Project description:BackgroundAnxiety disorders are the most prevalent mental disorders globally, with a substantial impact on quality of life. The prevalence of anxiety disorders has increased substantially following the COVID-19 pandemic, and it is likely to be further affected by a global economic recession. Understanding anxiety themes and how they change over time and across countries is crucial for preventive and treatment strategies.ObjectiveThe aim of this study was to track the trends in anxiety themes between 2004 and 2020 in the 50 most populous countries with high volumes of internet search data. This study extends previous research by using a novel search-based methodology and including a longer time span and more countries at different income levels.MethodsWe used a crowdsourced questionnaire, alongside Bing search query data and Google Trends search volume data, to identify themes associated with anxiety disorders across 50 countries from 2004 to 2020. We analyzed themes and their mutual interactions and investigated the associations between countries' socioeconomic attributes and anxiety themes using time-series linear models. This study was approved by the Microsoft Research Institutional Review Board.ResultsQuery volume for anxiety themes was highly stable in countries from 2004 to 2019 (Spearman r=0.89) and moderately correlated with geography (r=0.49 in 2019). Anxiety themes were predominantly long-term and personal, with "having kids," "pregnancy," and "job" the most voluminous themes in most countries and years. In 2020, "COVID-19" became a dominant theme in 27 countries. Countries with a constant volume of anxiety themes over time had lower fragile state indexes (P=.007) and higher individualism (P=.003). An increase in the volume of the most searched anxiety themes was associated with a reduction in the volume of the remaining themes in 13 countries and an increase in 17 countries, and these 30 countries had a lower prevalence of mental disorders (P<.001) than the countries where no correlations were found.ConclusionsInternet search data could be a potential source for predicting the country-level prevalence of anxiety disorders, especially in understudied populations or when an in-person survey is not viable.

Project description:BACKGROUND:Genome-wide association (GWA) using large numbers of single nucleotide polymorphisms (SNPs) is now a powerful, state-of-the-art approach to mapping human disease genes. When a GWA study detects association between a SNP and the disease, this signal usually represents association with a set of several highly correlated SNPs in strong linkage disequilibrium. The challenge we address is to distinguish among these correlated loci to highlight potential functional variants and prioritize them for follow-up. RESULTS:We implemented a systematic method for testing association across diverse population samples having differing histories and LD patterns, using a logistic regression framework. The hypothesis is that important underlying biological mechanisms are shared across human populations, and we can filter correlated variants by testing for heterogeneity of genetic effects in different population samples. This approach formalizes the descriptive comparison of p-values that has typified similar cross-population fine-mapping studies to date. We applied this method to correlated SNPs in the cholinergic nicotinic receptor gene cluster CHRNA5-CHRNA3-CHRNB4, in a case-control study of cocaine dependence composed of 504 European-American and 583 African-American samples. Of the 10 SNPs genotyped in the r2 > or = 0.8 bin for rs16969968, three demonstrated significant cross-population heterogeneity and are filtered from priority follow-up; the remaining SNPs include rs16969968 (heterogeneity p = 0.75). Though the power to filter out rs16969968 is reduced due to the difference in allele frequency in the two groups, the results nevertheless focus attention on a smaller group of SNPs that includes the non-synonymous SNP rs16969968, which retains a similar effect size (odds ratio) across both population samples. CONCLUSION:Filtering out SNPs that demonstrate cross-population heterogeneity enriches for variants more likely to be important and causative. Our approach provides an important and effective tool to help interpret results from the many GWA studies now underway.

Dataset Information

Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study.

Publications

Lifestyle Disease Surveillance Using Population Search Behavior: Feasibility Study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets