Unknown

Dataset Information

0

Distant supervision for medical concept normalization.


ABSTRACT: We consider the task of Medical Concept Normalization (MCN) which aims to map informal medical phrases such as "loosing weight" to formal medical concepts, such as "Weight loss". Deep learning models have shown high performance across various MCN datasets containing small number of target concepts along with adequate number of training examples per concept. However, scaling these models to millions of medical concepts entails the creation of much larger datasets which is cost and effort intensive. Recent works have shown that training MCN models using automatically labeled examples extracted from medical knowledge bases partially alleviates this problem. We extend this idea by computationally creating a distant dataset from patient discussion forums. We extract informal medical phrases and medical concepts from these forums using a synthetically trained classifier and an off-the-shelf medical entity linker respectively. We use pretrained sentence encoding models to find the k-nearest phrases corresponding to each medical concept. These mappings are used in combination with the examples obtained from medical knowledge bases to train an MCN model. Our approach outperforms the previous state-of-the-art by 15.9% and 17.1% classification accuracy across two datasets while avoiding manual labeling.

SUBMITTER: Pattisapu N 

PROVIDER: S-EPMC7415240 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Distant supervision for medical concept normalization.

Pattisapu Nikhil N   Anand Vivek V   Patil Sangameshwar S   Palshikar Girish G   Varma Vasudeva V  

Journal of biomedical informatics 20200809


We consider the task of Medical Concept Normalization (MCN) which aims to map informal medical phrases such as "loosing weight" to formal medical concepts, such as "Weight loss". Deep learning models have shown high performance across various MCN datasets containing small number of target concepts along with adequate number of training examples per concept. However, scaling these models to millions of medical concepts entails the creation of much larger datasets which is cost and effort intensiv  ...[more]

Similar Datasets

| S-EPMC7148018 | biostudies-literature
| S-EPMC8570806 | biostudies-literature
| S-EPMC5730334 | biostudies-literature
| S-EPMC4150992 | biostudies-literature
| S-EPMC5338769 | biostudies-literature
| S-EPMC7936394 | biostudies-literature
| S-EPMC5686421 | biostudies-literature
| S-EPMC3465209 | biostudies-literature
| S-EPMC6956794 | biostudies-literature
| S-EPMC7706181 | biostudies-literature