Ontology highlight
ABSTRACT:
SUBMITTER: Tovo A
PROVIDER: S-EPMC8248688 | biostudies-literature | 2021
REPOSITORIES: biostudies-literature
Tovo Anna A Stivanello Samuele S Maritan Amos A Suweis Samir S Favaro Stefano S Formentin Marco M
PloS one 20210701 7
Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per ...[more]