Unknown

Dataset Information

0

DeepBioWSD: effective deep neural word sense disambiguation of biomedical text data.


ABSTRACT: OBJECTIVE:In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. MATERIALS AND METHODS:Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. RESULTS:We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. CONCLUSIONS:Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

SUBMITTER: Pesaranghader A 

PROVIDER: S-EPMC7787358 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.

Pesaranghader Ahmad A   Matwin Stan S   Sokolova Marina M   Pesaranghader Ali A  

Journal of the American Medical Informatics Association : JAMIA 20190501 5


<h4>Objective</h4>In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large  ...[more]

Similar Datasets

| S-EPMC2663782 | biostudies-literature
| S-EPMC1550263 | biostudies-literature
| S-EPMC6301655 | biostudies-literature
| S-EPMC6658868 | biostudies-literature
| S-EPMC3123611 | biostudies-literature
| S-EPMC9710686 | biostudies-literature
| S-EPMC7647812 | biostudies-literature
| S-EPMC8519453 | biostudies-literature
| S-EPMC5783222 | biostudies-literature
| S-EPMC5042555 | biostudies-literature