Dataset Information

Can Unified Medical Language System-based semantic representation improve automated identification of patient safety incident reports by type and severity?

ABSTRACT:

Objective

The study sought to evaluate the feasibility of using Unified Medical Language System (UMLS) semantic features for automated identification of reports about patient safety incidents by type and severity.

Materials and methods

Binary support vector machine (SVM) classifier ensembles were trained and validated using balanced datasets of critical incident report texts (n_type = 2860, n_severity = 1160) collected from a state-wide reporting system. Generalizability was evaluated on different and independent hospital-level reporting system. Concepts were extracted from report narratives using the UMLS Metathesaurus, and their relevance and frequency were used as semantic features. Performance was evaluated by F-score, Hamming loss, and exact match score and was compared with SVM ensembles using bag-of-words (BOW) features on 3 testing datasets (type/severity: n_benchmark = 286/116, n_original = 444/4837, n_independent =6000/5950).

Results

SVMs using semantic features met or outperformed those based on BOW features to identify 10 different incident types (F-score [semantics/BOW]: benchmark = 82.6%/69.4%; original = 77.9%/68.8%; independent = 78.0%/67.4%) and extreme-risk events (F-score [semantics/BOW]: benchmark = 87.3%/87.3%; original = 25.5%/19.8%; independent = 49.6%/52.7%). For incident type, the exact match score for semantic classifiers was consistently higher than BOW across all test datasets (exact match [semantics/BOW]: benchmark = 48.9%/39.9%; original = 57.9%/44.4%; independent = 59.5%/34.9%).

Discussion

BOW representations are not ideal for the automated identification of incident reports because they do not account for text semantics. UMLS semantic representations are likely to better capture information in report narratives, and thus may explain their superior performance.

Conclusions

UMLS-based semantic classifiers were effective in identifying incidents by type and extreme-risk events, providing better generalizability than classifiers using BOW.

SUBMITTER: Wang Y

PROVIDER: S-EPMC7566533 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Can Unified Medical Language System-based semantic representation improve automated identification of patient safety incident reports by type and severity?

Wang Ying Y Coiera Enrico E Magrabi Farah F

Journal of the American Medical Informatics Association : JAMIA 20201001 10

<h4>Objective</h4>The study sought to evaluate the feasibility of using Unified Medical Language System (UMLS) semantic features for automated identification of reports about patient safety incidents by type and severity.<h4>Materials and methods</h4>Binary support vector machine (SVM) classifier ensembles were trained and validated using balanced datasets of critical incident report texts (n_type = 2860, n_severity = 1160) collected from a state-wide reporting system. Generalizability was evalu ...[more]

PMID: 32574362

Similar Datasets

Project description:BackgroundFor cancer domains such as acute myeloid leukemia (AML), a large set of data elements is obtained from different institutions with heterogeneous data definitions within one patient course. The lack of clinical data harmonization impedes cross-institutional electronic data exchange and future meta-analyses.ObjectiveThis study aimed to identify and harmonize a semantic core of common data elements (CDEs) in clinical routine and research documentation, based on a systematic metadata analysis of existing documentation models.MethodsLists of relevant data items were collected and reviewed by hematologists from two university hospitals regarding routine documentation and several case report forms of clinical trials for AML. In addition, existing registries and international recommendations were included. Data items were coded to medical concepts via the Unified Medical Language System (UMLS) by a physician and reviewed by another physician. On the basis of the coded concepts, the data sources were analyzed for concept overlaps and identification of most frequent concepts. The most frequent concepts were then implemented as data elements in the standardized format of the Operational Data Model by the Clinical Data Interchange Standards Consortium.ResultsA total of 3265 medical concepts were identified, of which 1414 were unique. Among the 1414 unique medical concepts, the 50 most frequent ones cover 26.98% of all concept occurrences within the collected AML documentation. The top 100 concepts represent 39.48% of all concepts' occurrences. Implementation of CDEs is available on a European research infrastructure and can be downloaded in different formats for reuse in different electronic data capture systems.ConclusionsInformation management is a complex process for research-intense disease entities as AML that is associated with a large set of lab-based diagnostics and different treatment options. Our systematic UMLS-based analysis revealed the existence of a core data set and an exemplary reusable implementation for harmonized data capture is available on an established metadata repository.

Dataset Information

Can Unified Medical Language System-based semantic representation improve automated identification of patient safety incident reports by type and severity?

Objective

Materials and methods

Results

Discussion

Conclusions

Publications

Can Unified Medical Language System-based semantic representation improve automated identification of patient safety incident reports by type and severity?

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets