Unknown

Dataset Information

0

Topic2features: a novel framework to classify noisy and sparse textual data using LDA topic distributions.


ABSTRACT: In supervised machine learning, specifically in classification tasks, selecting and analyzing the feature vector to achieve better results is one of the most important tasks. Traditional methods such as comparing the features' cosine similarity and exploring the datasets manually to check which feature vector is suitable is relatively time consuming. Many classification tasks failed to achieve better classification results because of poor feature vector selection and sparseness of data. In this paper, we proposed a novel framework, topic2features (T2F), to deal with short and sparse data using the topic distributions of hidden topics gathered from dataset and converting into feature vectors to build supervised classifier. For this we leveraged the unsupervised topic modelling LDA (latent dirichlet allocation) approach to retrieve the topic distributions employed in supervised learning algorithms. We made use of labelled data and topic distributions of hidden topics that were generated from that data. We explored how the representation based on topics affect the classification performance by applying supervised classification algorithms. Additionally, we did careful evaluation on two types of datasets and compared them with baseline approaches without topic distributions and other comparable methods. The results show that our framework performs significantly better in terms of classification performance compared to the baseline(without T2F) approaches and also yields improvement in terms of F1 score compared to other compared approaches.

SUBMITTER: Wahid JA 

PROVIDER: S-EPMC8372003 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8880936 | biostudies-literature
| S-EPMC3983526 | biostudies-literature
| S-EPMC4768174 | biostudies-literature
| S-EPMC9930816 | biostudies-literature
| S-EPMC8053978 | biostudies-literature
| S-EPMC5077690 | biostudies-literature
| S-EPMC7277719 | biostudies-literature
| S-EPMC4718600 | biostudies-literature
| S-EPMC10198462 | biostudies-literature
| S-EPMC3283562 | biostudies-literature