Unknown

Dataset Information

0

Cross-lingual Unified Medical Language System entity linking in online health communities.


ABSTRACT: OBJECTIVE:In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. MATERIALS AND METHODS:We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. RESULTS:We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. CONCLUSIONS:Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).

SUBMITTER: Bitton Y 

PROVIDER: S-EPMC7566404 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Cross-lingual Unified Medical Language System entity linking in online health communities.

Bitton Yonatan Y   Cohen Raphael R   Schifter Tamar T   Bachmat Eitan E   Elhadad Michael M   Elhadad Noémie N  

Journal of the American Medical Informatics Association : JAMIA 20201001 10


<h4>Objective</h4>In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with  ...[more]

Similar Datasets

| S-EPMC7566540 | biostudies-literature
| S-EPMC3428652 | biostudies-literature
| S-EPMC6860381 | biostudies-literature
| S-EPMC8237322 | biostudies-literature
| S-EPMC6729118 | biostudies-literature
| S-EPMC8678140 | biostudies-literature
| S-EPMC8497335 | biostudies-literature
| S-EPMC4959361 | biostudies-literature
| S-EPMC7647370 | biostudies-literature
| S-EPMC165595 | biostudies-literature