Unknown

Dataset Information

0

Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype


ABSTRACT: A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small set of LDAs using heuristic means or human codings. Then, conclusions are often drawn based on the to some extent arbitrarily selected model. We present the novel method LDAPrototype, which takes the instability of the LDA into account, and show that by systematically selecting an LDA it improves the reliability of the conclusions drawn from the result and thus provides better reproducibility. The improvement coming from this selection criterion is unveiled by applying the proposed methods to an example corpus consisting of texts published in a German quality newspaper over one month.

SUBMITTER: Metais E 

PROVIDER: S-EPMC7298183 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC7652301 | biostudies-literature
| S-EPMC10249879 | biostudies-literature
| S-EPMC6853711 | biostudies-literature
| S-EPMC4240467 | biostudies-literature
| S-EPMC7861422 | biostudies-literature
| S-EPMC7600398 | biostudies-literature
| S-EPMC7862749 | biostudies-literature
| S-EPMC2995118 | biostudies-literature
| S-EPMC9387650 | biostudies-literature
| S-EPMC6026534 | biostudies-literature