Unknown

Dataset Information

0

Combining lexical and context features for automatic ontology extension.


ABSTRACT: BACKGROUND:Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development more efficient. RESULTS:We developed a method that uses machine learning and word embeddings to identify words and phrases that are used to refer to an ontology class in biomedical Europe PMC full-text articles. Once labels and synonyms of a class are known, we use machine learning to identify the super-classes of a class. For this purpose, we identify lexical term variants, use word embeddings to capture context information, and rely on automated reasoning over ontologies to generate features, and we use an artificial neural network as classifier. We demonstrate the utility of our approach in identifying terms that refer to diseases in the Human Disease Ontology and to distinguish between different types of diseases. CONCLUSIONS:Our method is capable of discovering labels that refer to a class in an ontology but are not present in an ontology, and it can identify whether a class should be a subclass of some high-level ontology classes. Our approach can therefore be used for the semi-automatic extension and quality control of ontologies. The algorithm, corpora and evaluation datasets are available at https://github.com/bio-ontology-research-group/ontology-extension.

SUBMITTER: Althubaiti S 

PROVIDER: S-EPMC6958746 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Combining lexical and context features for automatic ontology extension.

Althubaiti Sara S   Kafkas Şenay Ş   Abdelhakim Marwa M   Hoehndorf Robert R  

Journal of biomedical semantics 20200113 1


<h4>Background</h4>Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development more efficient.<h4>Results</h4>We developed a method that uses machine learning and word embeddings to identify words and phrases that are used to refer to an ontology class in biomedical Europe  ...[more]

Similar Datasets

| S-EPMC3098080 | biostudies-literature
| S-EPMC7725650 | biostudies-literature
| S-EPMC3217896 | biostudies-other
| S-EPMC6176043 | biostudies-other
| S-EPMC2224899 | biostudies-literature
| S-EPMC8620237 | biostudies-literature
| S-EPMC9116247 | biostudies-literature
| S-EPMC517493 | biostudies-literature
| S-EPMC6764083 | biostudies-literature
| S-EPMC6134475 | biostudies-literature