Unknown

Dataset Information

0

Soft document clustering using a novel graph covering approach.


ABSTRACT:

Background

In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation.

Results

In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to perform a soft clustering as well as a hard clustering. The software is freely available on GitHub.

Conclusions

The presented integer linear programming as well as the greedy approach for this NP -complete problem lead to valuable results on random instances and some real-world data for different similarity measures. We could show that PS-Document Clustering is a remarkable approach to document clustering and opens the complete toolbox of graph theory to this field.

SUBMITTER: Dorpinghaus J 

PROVIDER: S-EPMC6047369 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

Soft document clustering using a novel graph covering approach.

Dörpinghaus Jens J   Schaaf Sebastian S   Jacobs Marc M  

BioData mining 20180614


<h4>Background</h4>In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation.<h4>Results</h4>In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostab  ...[more]

Similar Datasets

| S-EPMC8289385 | biostudies-literature
| S-EPMC4706235 | biostudies-literature
| S-EPMC8349163 | biostudies-literature
| S-EPMC3030539 | biostudies-literature
| S-EPMC6465635 | biostudies-literature
| S-EPMC10871265 | biostudies-literature
| S-EPMC5870858 | biostudies-literature
| S-EPMC7458061 | biostudies-literature
| S-EPMC11232587 | biostudies-literature
| S-EPMC5271336 | biostudies-literature