Project description:Data on the number of people who have committed suicide tends to be reported with a substantial time lag of around two years. We examine whether online activity measured by Google searches can help us improve estimates of the number of suicide occurrences in England before official figures are released. Specifically, we analyse how data on the number of Google searches for the terms 'depression' and 'suicide' relate to the number of suicides between 2004 and 2013. We find that estimates drawing on Google data are significantly better than estimates using previous suicide data alone. We show that a greater number of searches for the term 'depression' is related to fewer suicides, whereas a greater number of searches for the term 'suicide' is related to more suicides. Data on suicide related search behaviour can be used to improve current estimates of the number of suicide occurrences. Electronic Supplementary Material:The online version of this article (doi:10.1140/epjds/s13688-016-0094-0) contains supplementary material.
Project description:Online activity-based data can be used to aid infectious disease forecasting. Our aim was to exploit the converging nature of the tuberculosis (TB) and diabetes epidemics to forecast TB case numbers. Thus, we extended TB prediction models based on traditional data with diabetes-related Google searches. We obtained data on the weekly case numbers of TB in Germany from June 8th, 2014, to May 5th, 2019. Internet search data were obtained from a Google Trends (GTD) search for 'diabetes' to the corresponding interval. A seasonal autoregressive moving average (SARIMA) model (0,1,1) (1,0,0) [52] was selected to describe the weekly TB case numbers with and without GTD as an external regressor. We cross-validated the SARIMA models to obtain the root mean squared errors (RMSE). We repeated this procedure with autoregressive feed-forward neural network (NNAR) models using 5-fold cross-validation. To simulate a data-poor surveillance setting, we also tested traditional and GTD-extended models against a hold-out dataset using a decreased 52-week-long period with missing values for training. Cross-validation resulted in an RMSE of 20.83 for the traditional model and 18.56 for the GTD-extended model. Cross-validation of the NNAR models showed a mean RMSE of 19.49 for the traditional model and 18.99 for the GTD-extended model. When we tested the models trained on a decreased dataset with missing values, the GTD-extended models achieved significantly better prediction than the traditional models (p < 0.001). The GTD-extended models outperformed the traditional models in all assessed model evaluation parameters. Using online activity-based data regarding diabetes can improve TB forecasting, but further validation is warranted.
Project description:We developed a dynamic forecasting model for Zika virus (ZIKV), based on real-time online search data from Google Trends (GTs). It was designed to provide Zika virus disease (ZVD) surveillance and detection for Health Departments, and predictive numbers of infection cases, which would allow them sufficient time to implement interventions. In this study, we found a strong correlation between Zika-related GTs and the cumulative numbers of reported cases (confirmed, suspected and total cases; p<0.001). Then, we used the correlation data from Zika-related online search in GTs and ZIKV epidemics between 12 February and 20 October 2016 to construct an autoregressive integrated moving average (ARIMA) model (0, 1, 3) for the dynamic estimation of ZIKV outbreaks. The forecasting results indicated that the predicted data by ARIMA model, which used the online search data as the external regressor to enhance the forecasting model and assist the historical epidemic data in improving the quality of the predictions, are quite similar to the actual data during ZIKV epidemic early November 2016. Integer-valued autoregression provides a useful base predictive model for ZVD cases. This is enhanced by the incorporation of GTs data, confirming the prognostic utility of search query based surveillance. This accessible and flexible dynamic forecast model could be used in the monitoring of ZVD to provide advanced warning of future ZIKV outbreaks.
Project description:Crises in financial markets affect humans worldwide. Detailed market data on trading decisions reflect some of the complex human behavior that has led to these crises. We suggest that massive new data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. By analyzing changes in Google query volumes for search terms related to finance, we find patterns that may be interpreted as "early warning signs" of stock market moves. Our results illustrate the potential that combining extensive behavioral data sets offers for a better understanding of collective human behavior.
Project description:BackgroundThe outbreak of coronavirus disease 2019 (COVID-19) has posed stress on the health and well-being of both Chinese people and the public worldwide. Global public interest in this new issue largely reflects people's attention to COVID-19 and their willingness to take precautionary actions. This study aimed to examine global public awareness of COVID-19 using Google Trends.MethodsUsing Google Trends, we retrieved public query data for terms of "2019-nCoV + SARS-CoV-2 + novel coronavirus + new coronavirus + COVID-19 + Corona Virus Disease 2019" between the 31st December 2019 and the 24th February 2020 in six major English-speaking countries, including the USA, the UK, Canada, Ireland, Australia, and New Zealand. Dynamic series analysis demonstrates the overall change trend of relative search volume (RSV) for the topic on COVID-19. We compared the top-ranking related queries and sub-regions distribution of RSV about COVID-19 across different countries. The correlation between daily search volumes on the topic related to COVID-19 and the daily number of people infected with SARS-CoV-2 was analyzed.ResultsThe overall search trend of RSV regarding COVID-19 increased during the early period of observing time and reached the first apex on 31st January 2020. A shorter response time and a longer duration of public attention to COVID-19 was observed in public from the USA, the UK, Australia, and Canada, than that in Ireland and New Zealand. A slightly positive correlation between daily RSV about COVID-19 and the daily number of confirmed cases was observed (P < 0.05). People across countries presented a various interest to the RSV on COVID-19, and public awareness of COVID-19 was different in various sub-regions within countries.ConclusionsThe results suggest that public response time to COVID-19 was different across countries, and the overall duration of public attention was short. The current study reminds us that governments should strengthen the publicity of COVID-19 nationally, strengthen the public's vigilance and sensitivity to COVID-19, inform public the importance of protecting themselves with enough precautionary measures, and finally control the spread of COVID-19 globally.
Project description:While incomplete non-medical data has been integrated into prediction models for epidemics, the accuracy and the generalizability of the data are difficult to guarantee. To comprehensively evaluate the ability and applicability of using social media data to predict the development of COVID-19, a new confirmed case prediction algorithm improving the Google Flu Trends algorithm is established, called Weibo COVID-19 Trends (WCT), based on the post dataset generated by all users in Wuhan on Sina Weibo. A genetic algorithm is designed to select the keyword set for filtering COVID-19 related posts. WCT can constantly outperform the highest average test score in the training set between daily new confirmed case counts and the prediction results. It remains to produce the best prediction results among other algorithms when the number of forecast days increases from one to eight days with the highest correlation score from 0.98 (P < 0.01) to 0.86 (P < 0.01) during all analysis period. Additionally, WCT effectively improves the Google Flu Trends algorithm's shortcoming of overestimating the epidemic peak value. This study offers a highly adaptive approach for feature engineering of third-party data in epidemic prediction, providing useful insights for the prediction of newly emerging infectious diseases at an early stage.
Project description:Although acute respiratory infections are a leading cause of mortality in sub-Saharan Africa, surveillance of diseases such as influenza is mostly neglected. Evaluating the usefulness of influenza-like illness (ILI) surveillance systems and developing approaches for forecasting future trends is important for pandemic preparedness. We applied and compared a range of robust statistical and machine learning models including random forest (RF) regression, support vector machines (SVM) regression, multivariable linear regression and ARIMA models to forecast 2012 to 2018 trends of reported ILI cases in Cameroon, using Google searches for influenza symptoms, treatments, natural or traditional remedies as well as, infectious diseases with a high burden (i.e., AIDS, malaria, tuberculosis). The R2 and RMSE (Root Mean Squared Error) were statistically similar across most of the methods, however, RF and SVM had the highest average R2 (0.78 and 0.88, respectively) for predicting ILI per 100,000 persons at the country level. This study demonstrates the need for developing contextualized approaches when using digital data for disease surveillance and the usefulness of search data for monitoring ILI in sub-Saharan African countries.
Project description:BackgroundNorovirus is a contagious disease. The transmission of norovirus spreads quickly and easily in various ways. Because effective methods to prevent or treat norovirus have not been discovered, it is important to rapidly recognize and report norovirus outbreaks in the early phase. Internet search has been a useful method for people to access information immediately. With the precise record of internet search trends, internet search has been a useful tool to manifest infectious disease outbreaks.ObjectiveIn this study, we tried to discover the correlation between internet search terms and norovirus infection.MethodsThe internet search trend data of norovirus were obtained from Google Trends. We used cross-correlation analysis to discover the temporal correlation between norovirus and other terms. We also used multiple linear regression with the stepwise method to recognize the most important predictors of internet search trends and norovirus. In addition, we evaluated the temporal correlation between actual norovirus cases and internet search terms in New York, California, and the United States as a whole.ResultsSome Google search terms such as gastroenteritis, watery diarrhea, and stomach bug coincided with norovirus Google Trends. Some Google search terms such as contagious, travel, and party presented earlier than norovirus Google Trends. Some Google search terms such as dehydration, bar, and coronavirus presented several months later than norovirus Google Trends. We found that fever, gastroenteritis, poison, cruise, wedding, and watery diarrhea were important factors correlated with norovirus Google Trends. In actual norovirus cases from New York, California, and the United States as a whole, some Google search terms presented with, earlier, or later than actual norovirus cases.ConclusionsOur study provides novel strategy-based internet search evidence regarding the epidemiology of norovirus.