Ontology highlight
ABSTRACT: Background
Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse.Findings
This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches.Conclusions
We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.
SUBMITTER: Ruas P
PROVIDER: S-EPMC7507273 | biostudies-literature | 2020 Sep
REPOSITORIES: biostudies-literature
Ruas Pedro P Lamurias Andre A Couto Francisco M FM
Journal of cheminformatics 20200921 1
<h4>Background</h4>Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse.<h4>Findings</h4>This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL ...[more]