Unknown

Dataset Information

0

Improving Google Flu Trends for COVID-19 estimates using Weibo posts


ABSTRACT: While incomplete non-medical data has been integrated into prediction models for epidemics, the accuracy and the generalizability of the data are difficult to guarantee. To comprehensively evaluate the ability and applicability of using social media data to predict the development of COVID-19, a new confirmed case prediction algorithm improving the Google Flu Trends algorithm is established, called Weibo COVID-19 Trends (WCT), based on the post dataset generated by all users in Wuhan on Sina Weibo. A genetic algorithm is designed to select the keyword set for filtering COVID-19 related posts. WCT can constantly outperform the highest average test score in the training set between daily new confirmed case counts and the prediction results. It remains to produce the best prediction results among other algorithms when the number of forecast days increases from one to eight days with the highest correlation score from 0.98 (P ​< 0.01) to 0.86 (P ​< 0.01) during all analysis period. Additionally, WCT effectively improves the Google Flu Trends algorithm's shortcoming of overestimating the epidemic peak value. This study offers a highly adaptive approach for feature engineering of third-party data in epidemic prediction, providing useful insights for the prediction of newly emerging infectious diseases at an early stage.

SUBMITTER: Guo S 

PROVIDER: S-EPMC8280378 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC4281210 | biostudies-literature
| S-EPMC6693776 | biostudies-literature
| S-EPMC9173636 | biostudies-literature
| S-EPMC7692493 | biostudies-literature
| S-EPMC10113719 | biostudies-literature
| S-EPMC7703221 | biostudies-literature
| S-EPMC11252505 | biostudies-literature
| S-EPMC9357377 | biostudies-literature
| S-EPMC7968661 | biostudies-literature
| S-EPMC7790734 | biostudies-literature