Unknown

Dataset Information

0

Comparison of UMLS terminologies to identify risk of heart disease using clinical notes.


ABSTRACT: The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) and close to the top performing system (F1=92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.

SUBMITTER: Shivade C 

PROVIDER: S-EPMC4973866 | biostudies-other | 2015 Dec

REPOSITORIES: biostudies-other

altmetric image

Publications

Comparison of UMLS terminologies to identify risk of heart disease using clinical notes.

Shivade Chaitanya C   Malewadkar Pranav P   Fosler-Lussier Eric E   Lai Albert M AM  

Journal of biomedical informatics 20150912


The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) an  ...[more]

Similar Datasets

| S-EPMC3846296 | biostudies-literature
| S-EPMC7309261 | biostudies-literature
| S-EPMC6961332 | biostudies-literature
| S-EPMC6404882 | biostudies-other
| S-EPMC3904866 | biostudies-literature