Dataset Information

Click-words: learning to predict document keywords from a user perspective.

ABSTRACT:

Motivation

Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords.

Results

We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency-inverse document frequency (TF-IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF-IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks.

SUBMITTER: Islamaj Dogan R

PROVIDER: S-EPMC2958742 | biostudies-literature | 2010 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Click-words: learning to predict document keywords from a user perspective.

Islamaj Doğan Rezarta R Lu Zhiyong Z

Bioinformatics (Oxford, England) 20100901 21

<h4>Motivation</h4>Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly f ...[more]

PMID: 20810602

Dataset Information

Click-words: learning to predict document keywords from a user perspective.

Motivation

Results

Publications

Click-words: learning to predict document keywords from a user perspective.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Document vectorization method using network information of words.
| S-EPMC6638850 | biostudies-literature

Arabic handwritten alphabets, words and paragraphs per user (AHAWP) dataset.
| S-EPMC8866147 | biostudies-literature

Legal document similarity: a multi-criteria decision-making perspective.
| S-EPMC7924540 | biostudies-literature

Quantifying the Beauty of Words: A Neurocognitive Poetics Perspective.
| S-EPMC5742167 | biostudies-literature

PSE: a tool for browsing a large amount of MEDLINE/PubMed abstracts with gene names and common words as the keywords.
| S-EPMC1326231 | biostudies-literature

Touch or click friendly: Towards adaptive user interfaces for complex applications.
| S-EPMC10843409 | biostudies-literature

Mark my words: High frequency marker words impact early stages of language learning.
| S-EPMC6746567 | biostudies-literature

Judgments of Learning for Words in Vertical Space.
| S-EPMC5131559 | biostudies-literature

Analyzing user interactions with biomedical ontologies: A visual perspective.
| S-EPMC5895104 | biostudies-literature

Semantic Factors Predict the Rate of Lexical Replacement of Content Words.
| S-EPMC4731055 | biostudies-literature