Dataset Information

Sense identification data: A dataset for lexical semantics.

ABSTRACT: Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is a relevant complement to the semantic similarity task. Individuating which senses are involved in the similarity rating is also crucial in order to fully assess those ratings: if we have no idea of which two senses were retrieved, on which base can we assess the score expressing their semantic proximity? The Sense Identification Dataset (SID) dataset has been built to provide a common experimental ground to systems and approaches dealing with the sense identification task. It is the first dataset specifically designed for experimenting on the mentioned task. The SID dataset was created by manually annotating with sense identifiers the term pairs from an existing dataset, the SemEval-2017 Task 2 English dataset. The original dataset was originally conceived for experimenting on the semantic similarity task, and it contains a score expressing the human similarity rating for each term pair. For each such term pair we added a pair of annotated senses: in particular, senses were annotated such that they are compatible (explicative of) with the existing similarity ratings. The SID dataset contains BabelNet sense identifiers. This sense inventory is a broadly adopted 'naming convention' for word senses, and such identifiers can be easily mapped onto further resources such as WordNet and WikiData, thereby enabling further processing tasks and usages in the Natural Language Processing pipeline.

SUBMITTER: Colla D

PROVIDER: S-EPMC7494475 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sense identification data: A dataset for lexical semantics.

Colla Davide D Mensa Enrico E Radicioni Daniele P DP

Data in brief 20200903

Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is ...[more]

PMID: 32984463

Similar Datasets

Project description:BackgroundThe increasing use of common data elements (CDEs) in numerous research projects and clinical applications has made it imperative to create an effective classification scheme for the efficient management of these data elements. We applied high-level integrative modeling of entire clinical documents from real-world practice to create the Clinical MetaData Ontology (CMDO) for the appropriate classification and integration of CDEs that are in practical use in current clinical documents.MethodsCMDO was developed using the General Formal Ontology method with a manual iterative process comprising five steps: (1) defining the scope of CMDO by conceptualizing its first-level terms based on an analysis of clinical-practice procedures, (2) identifying CMDO concepts for representing clinical data of general CDEs by examining how and what clinical data are generated with flows of clinical care practices, (3) assigning hierarchical relationships for CMDO concepts, (4) developing CMDO properties (e.g., synonyms, preferred terms, and definitions) for each CMDO concept, and (5) evaluating the utility of CMDO.ResultsWe created CMDO comprising 189 concepts under the 4 first-level classes of Description, Event, Finding, and Procedure. CMDO has 256 definitions that cover the 189 CMDO concepts, with 459 synonyms for 139 (74.0%) of the concepts. All of the CDEs extracted from 6 HL7 templates, 25 clinical documents of 5 teaching hospitals, and 1 personal health record specification were successfully annotated by 41 (21.9%), 89 (47.6%), and 13 (7.0%) of the CMDO concepts, respectively. We created a CMDO Browser to facilitate navigation of the CMDO concept hierarchy and a CMDO-enabled CDE Browser for displaying the relationships between CMDO concepts and the CDEs extracted from the clinical documents that are used in current practice.ConclusionsCMDO is an ontology and classification scheme for CDEs used in clinical documents. Given the increasing use of CDEs in many studies and real-world clinical documentation, CMDO will be a useful tool for integrating numerous CDEs from different research projects and clinical documents. The CMDO Browser and CMDO-enabled CDE Browser make it easy to search, share, and reuse CDEs, and also effectively integrate and manage CDEs from different studies and clinical documents.

Dataset Information

Sense identification data: A dataset for lexical semantics.

Publications

Sense identification data: A dataset for lexical semantics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets