Unknown

Dataset Information

0

Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore.


ABSTRACT:

Introduction

Substandard medicines are medicines that fail to meet their quality standards and/or specifications. Substandard medicines can lead to serious safety issues affecting public health. With the increasing number of pharmaceuticals and the complexity of the pharmaceutical manufacturing supply chain, monitoring for substandard medicines via manual environmental scanning can be laborious and time consuming.

Methods

A web crawler was developed to automatically detect and extract alerts on substandard medicines published on the Internet by regulatory agencies. The crawled data were labelled as related to substandard medicines or not. An expert-derived keyword-based classification algorithm was compared against machine learning algorithms to identify substandard medicine alerts on two validation datasets (n = 4920 and n = 2458) from a later time period than training data. Models were comparatively assessed for recall, precision and their F1 scores (harmonic mean of precision and recall).

Results

The web crawler routinely extracted alerts from the 46 web pages belonging to nine regulatory agencies. From October 2019 to May 2020, 12,156 unique alerts were crawled of which 7378 (60.7%) alerts were set aside for validation and contained 1160 substandard medicine alerts (15.7%). An ensemble approach of combining machine learning and keywords achieved the best recall (94% and 97%), precision (85% and 80%) and F1 scores (89% and 88%) on temporal validation.

Conclusions

Combining robust web crawler programmes with rigorously tested filtering algorithms based on machine learning and keyword models can automate and expand horizon scanning capabilities for issues relating to substandard medicines.

SUBMITTER: Ang PS 

PROVIDER: S-EPMC8214454 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC9896464 | biostudies-literature
| S-EPMC7450367 | biostudies-literature
| S-EPMC10533092 | biostudies-literature
| S-EPMC7482564 | biostudies-literature
| S-EPMC3894151 | biostudies-literature
| S-EPMC7414401 | biostudies-literature
| S-EPMC7093327 | biostudies-literature
| S-EPMC9880377 | biostudies-literature
| S-EPMC9058924 | biostudies-literature
| S-EPMC6693329 | biostudies-literature