Ontology highlight
ABSTRACT: Objective
Coordination ellipsis is a linguistic phenomenon abound in medical text and is challenging for concept normalization because of difficulty in recognizing elliptical expressions referencing 2 or more entities accurately. To resolve this bottleneck, we aim to contribute a generalizable method to reconstruct concepts from medical coordinated elliptical expressions in a variety of biomedical corpora.Materials and methods
We proposed a graph-based representation model and built a pipeline to reconstruct concepts from coordinated elliptical expressions in medical text (RECEEM). There are 4 modules: (1) identify all possible candidate conjunct pairs from original coordinated elliptical expressions, (2) calculate coefficients for candidate conjuncts using the embedding model, (3) select the most appropriate decompositions by global optimization, and (4) rebuild concepts based on a pathfinding algorithm. We evaluated the pipeline's performance on 2658 coordinated elliptical expressions from 3 different medical corpora (ie, biomedical literature, clinical narratives, and eligibility criteria from clinical trials). Precision, recall, and F1 score were calculated.Results
The F1 scores for biomedical publications, clinical narratives, and research eligibility criteria were 0.862, 0.721, and 0.870, respectively. RECEEM outperformed 2 previously released methods. By incorporating RECEEM into 2 existing NLP tools, the F1 scores increased from 0.248 to 0.460 and from 0.287 to 0.630 on concept mapping of 1125 coordination ellipses.Conclusions
RECEEM improves concept normalization for medical coordinated elliptical expressions in a variety of biomedical corpora. It outperformed existing methods and significantly enhanced the performance of 2 notable NLP systems for mapping coordination ellipses in the evaluation. The algorithm is open sourced online (https://github.com/chiyuan1126/RECEEM).
SUBMITTER: Yuan C
PROVIDER: S-EPMC7647336 | biostudies-literature |
REPOSITORIES: biostudies-literature