Dataset Information

The False positive problem of automatic bot detection in social science research.

ABSTRACT: The identification of bots is an important and complicated task. The bot classifier "Botometer" was successfully introduced as a way to estimate the number of bots in a given list of accounts and, as a consequence, has been frequently used in academic publications. Given its relevance for academic research and our understanding of the presence of automated accounts in any given Twitter discourse, we are interested in Botometer's diagnostic ability over time. To do so, we collected the Botometer scores for five datasets (three verified as bots, two verified as human; n = 4,134) in two languages (English/German) over three months. We show that the Botometer scores are imprecise when it comes to estimating bots; especially in a different language. We further show in an analysis of Botometer scores over time that Botometer's thresholds, even when used very conservatively, are prone to variance, which, in turn, will lead to false negatives (i.e., bots being classified as humans) and false positives (i.e., humans being classified as bots). This has immediate consequences for academic research as most studies in social science using the tool will unknowingly count a high number of human users as bots and vice versa. We conclude our study with a discussion about how computational social scientists should evaluate machine learning systems that are developed for identifying bots.

SUBMITTER: Rauchfleisch A

PROVIDER: S-EPMC7580919 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The False positive problem of automatic bot detection in social science research.

Rauchfleisch Adrian A Kaiser Jonas J

PloS one 20201022 10

The identification of bots is an important and complicated task. The bot classifier "Botometer" was successfully introduced as a way to estimate the number of bots in a given list of accounts and, as a consequence, has been frequently used in academic publications. Given its relevance for academic research and our understanding of the presence of automated accounts in any given Twitter discourse, we are interested in Botometer's diagnostic ability over time. To do so, we collected the Botometer ...[more]

PMID: 33091067

Similar Datasets

Project description:BackgroundIndia contributes ~60% to the global leprosy burden. The country implements 14-day community-based leprosy case detection campaigns (LCDC) periodically in all high endemic states. Paramedical staff screen the population and medical officers of primary health centres (PHCs) diagnose and treat leprosy cases. Several new cases were detected during the two LCDCs held in September-2016 and February-2018. Following these LCDCs, a validation exercise was conducted in 8 Primary health centres (PHCs) of 4 districts in Bihar State by an independent expert group, to assess the correctness of case diagnosis. Just before the February 2018 LCDC campaign, we conducted an "appreciative inquiry" (AI) involving the health care staff of these 8 PHCs using the 4-D framework (Discovery-Dream-Design-Destiny).ObjectivesTo assess whether the incorrect case diagnosis (false positive diagnosis) reduced as a result of AI in the 8 PHCs between the two LCDC conducted in September-2016 and February-2018.Methodology/principal findingsA three-phase quantitative-qualitative-quantitative mixed methods research (embedded design) with the two validation exercises conducted following September-2016 and February-2018 LCDCs as quantitative phases and AI as qualitative phase. In September-2016 LCDC, 303 new leprosy cases were detected, of which 196 cases were validated and 58 (29.6%) were false positive diagnosis. In February-2018 LCDC, 118 new leprosy cases were detected of which 96 cases were validated and 22 cases (23.4%) were false positive diagnosis. After adjusting for the age, gender, type of cases and individual PHCs fixed effects, the proportion of false positive diagnosis reduced by -9% [95% confidence intervals (95%CI): -20.2% to 1.7%, p = 0.068].ConclusionFalse positive diagnosis is a major issue during LCDCs. Though the decline in false positive diagnosis is not statistically significant, the findings are encouraging and indicates that appreciative inquiry can be used to address this deficiency in programme implementation.

Dataset Information

The False positive problem of automatic bot detection in social science research.

Publications

The False positive problem of automatic bot detection in social science research.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets