Unknown

Dataset Information

0

Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach.


ABSTRACT:

Background

The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature-roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed.

Method

We first compute all the non-lattice subgraphs in the NCI Thesaurus. We model each concept using its role definitions, words and roots of noun chunks within its concept name and its ancestor's names. Then we perform subsumption testing for candidate concept pairs in the non-lattice subgraphs to automatically detect potentially missing IS-A relations. Domain experts evaluated the validity of these relations.

Results

We applied our approach to 19.08d version of the NCI Thesaurus. A total of 55 potentially missing IS-A relations were identified by our approach and reviewed by domain experts. 29 out of 55 were confirmed as valid by domain experts and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing IS-A relations in the NCI Thesaurus.

Conclusions

The results showed that our hybrid approach by leveraging lexical features and role definitions is effective in identifying potentially missing IS-A relations in the NCI Thesaurus.

SUBMITTER: Zheng F 

PROVIDER: S-EPMC7737275 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach.

Zheng Fengbo F   Abeysinghe Rashmie R   Sioutos Nicholas N   Whiteman Lori L   Remennik Lyubov L   Cui Licong L  

BMC medical informatics and decision making 20201215 Suppl 10


<h4>Background</h4>The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature-roots of noun chunks within  ...[more]

Similar Datasets

| S-EPMC1988837 | biostudies-literature
| S-EPMC6080685 | biostudies-literature
| S-EPMC8892251 | biostudies-literature
| S-EPMC4547405 | biostudies-literature
| S-EPMC9933066 | biostudies-literature
| S-EPMC5737412 | biostudies-literature
| S-EPMC5964258 | biostudies-literature
| S-EPMC4795618 | biostudies-literature
| S-EPMC4602280 | biostudies-literature
| S-EPMC7664296 | biostudies-literature