Unknown

Dataset Information

0

Annotated dataset of history-related tweets.


ABSTRACT: In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories.

SUBMITTER: Sumikawa Y 

PROVIDER: S-EPMC8427230 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC10716781 | biostudies-literature
| S-EPMC9897868 | biostudies-literature
| S-EPMC7924541 | biostudies-literature
| S-EPMC10562141 | biostudies-literature
| S-EPMC10481170 | biostudies-literature
| S-EPMC8661480 | biostudies-literature
| S-EPMC8328063 | biostudies-literature
| S-EPMC10634335 | biostudies-literature
| S-EPMC10382471 | biostudies-literature
| S-EPMC10641131 | biostudies-literature