Dataset Information

UMLS concept indexing for production databases: a feasibility study.

ABSTRACT: OBJECTIVES:To explore the feasibility of using the National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus as the basis for a computational strategy to identify concepts in medical narrative text preparatory to indexing. To quantitatively evaluate this strategy in terms of true positives, false positives (spuriously identified concepts) and false negatives (concepts missed by the identification process). METHODS:Using the 1999 UMLS Metathesaurus, the authors processed a training set of 100 documents (50 discharge summaries, 50 surgical notes) with a concept-identification program, whose output was manually analyzed. They flagged concepts that were erroneously identified and added new concepts that were not identified by the program, recording the reason for failure in such cases. After several refinements to both their algorithm and the UMLS subset on which it operated, they deployed the program on a test set of 24 documents (12 of each kind). RESULTS:Of 8,745 matches in the training set, 7,227 (82.6 percent ) were true positives, whereas of 1,701 matches in the test set, 1, 298 (76.3 percent) were true positives. Matches other than true positive indicated potential problems in production-mode concept indexing. Examples of causes of problems were redundant concepts in the UMLS, homonyms, acronyms, abbreviations and elisions, concepts that were missing from the UMLS, proper names, and spelling errors. CONCLUSIONS:The error rate was too high for concept indexing to be the only production-mode means of preprocessing medical narrative. Considerable curation needs to be performed to define a UMLS subset that is suitable for concept matching.

SUBMITTER: Nadkarni P

PROVIDER: S-EPMC134593 | biostudies-literature | 2001 Jan-Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

UMLS concept indexing for production databases: a feasibility study.

Nadkarni P P Chen R R Brandt C C

Journal of the American Medical Informatics Association : JAMIA 20010101 1

<h4>Objectives</h4>To explore the feasibility of using the National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus as the basis for a computational strategy to identify concepts in medical narrative text preparatory to indexing. To quantitatively evaluate this strategy in terms of true positives, false positives (spuriously identified concepts) and false negatives (concepts missed by the identification process).<h4>Methods</h4>Using the 1999 UMLS Metathesaurus, the au ...[more]

PMID: 11141514

Similar Datasets

Project description:OBJECTIVES:To test the hypothesis that most instances of negated concepts in dictated medical documents can be detected by a strategy that relies on tools developed for the parsing of formal (computer) languages-specifically, a lexical scanner ("lexer") that uses regular expressions to generate a finite state machine, and a parser that relies on a restricted subset of context-free grammars, known as LALR(1) grammars. METHODS:A diverse training set of 40 medical documents from a variety of specialties was manually inspected and used to develop a program (Negfinder) that contained rules to recognize a large set of negated patterns occurring in the text. Negfinder's lexer and parser were developed using tools normally used to generate programming language compilers. The input to Negfinder consisted of medical narrative that was preprocessed to recognize UMLS concepts: the text of a recognized concept had been replaced with a coded representation that included its UMLS concept ID. The program generated an index with one entry per instance of a concept in the document, where the presence or absence of negation of that concept was recorded. This information was used to mark up the text of each document by color-coding it to make it easier to inspect. The parser was then evaluated in two ways: 1) a test set of 60 documents (30 discharge summaries, 30 surgical notes) marked-up by Negfinder was inspected visually to quantify false-positive and false-negative results; and 2) a different test set of 10 documents was independently examined for negatives by a human observer and by Negfinder, and the results were compared. RESULTS:In the first evaluation using marked-up documents, 8,358 instances of UMLS concepts were detected in the 60 documents, of which 544 were negations detected by the program and verified by human observation (true-positive results, or TPs). Thirteen instances were wrongly flagged as negated (false-positive results, or FPs), and the program missed 27 instances of negation (false-negative results, or FNs), yielding a sensitivity of 95.3 percent and a specificity of 97.7 percent. In the second evaluation using independent negation detection, 1,869 concepts were detected in 10 documents, with 135 TPs, 12 FPs, and 6 FNs, yielding a sensitivity of 95.7 percent and a specificity of 91.8 percent. One of the words "no," "denies/denied," "not," or "without" was present in 92.5 percent of all negations. CONCLUSIONS:Negation of most concepts in medical narrative can be reliably detected by a simple strategy. The reliability of detection depends on several factors, the most important being the accuracy of concept matching.

Project description:MotivationThe BLAST software package for sequence comparison speeds up homology search by preprocessing a query sequence into a lookup table. Numerous research studies have suggested that preprocessing the database instead would give better performance. However, production usage of sequence comparison methods that preprocess the database has been limited to programs such as BLAT and SSAHA that are designed to find matches when query and database subsequences are highly similar.ResultsWe developed a new version of the MegaBLAST module of BLAST that does the initial phase of finding short seeds for matches by searching a database index. We also developed a program makembindex that preprocesses the database into a data structure for rapid seed searching. We show that the new 'indexed MegaBLAST' is faster than the 'non-indexed' version for most practical uses. We show that indexed MegaBLAST is faster than miBLAST, another implementation of BLAST nucleotide searching with a preprocessed database, for most of the 200 queries we tested. To deploy indexed MegaBLAST as part of NCBI'sWeb BLAST service, the storage of databases and the queueing mechanism were modified, so that some machines are now dedicated to serving queries for a specific database. The response time for such Web queries is now faster than it was when each computer handled queries for multiple databases.AvailabilityThe code for indexed MegaBLAST is part of the blastn program in the NCBI C++ toolkit. The preprocessor program makembindex is also in the toolkit. Indexed MegaBLAST has been used in production on NCBI's Web BLAST service to search one version of the human and mouse genomes since October 2007. The Linux command-line executables for blastn and makembindex, documentation, and some query sets used to carry out the tests described below are available in the directory: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast [corrected]Supplementary informationSupplementary data are available at Bioinformatics online.

Project description:The two event-related potentials (ERP) studies investigated how verbs and nouns were processed in different music priming conditions in order to reveal whether the motion concept via embodiment can be stimulated and evoked across categories. Study 1 (Tasks 1 and 2) tested the processing of verbs (action verbs vs. state verbs) primed by two music types, with tempo changes (accelerating music vs. decelerating music) and without tempo changes (fast music vs. slow music) while Study 2 (Tasks 3 and 4) tested the processing of nouns (animate nouns vs. inanimate nouns) in the same priming condition as adopted in Study 1. During the experiments, participants were required to hear a piece of music prior to judging whether an ensuing word (verb or noun) is semantically congruent with the motion concept conveyed by the music. The results show that in the priming condition of music with tempo changes, state verbs and inanimate nouns elicited larger N400 amplitudes than action verbs and animate nouns, respectively in the anterior regions and anterior to central regions, whereas in the priming condition of music without tempo changes, action verbs elicited larger N400 amplitudes than state verbs and the two categories of nouns revealed no N400 difference, unexpectedly. The interactions between music and words were significant only in Tasks 1, 2, and 3. Taken together, the results demonstrate that firstly, music with tempo changes and music without tempo prime verbs and nouns in different fashions; secondly, action verbs and animate nouns are easier to process than state verbs and inanimate nouns when primed by music with tempo changes due to the shared motion concept across categories; thirdly, bodily experience differentiates between music and words in coding (encoding and decoding) fashion but the motion concept conveyed by the two categories can be subtly extracted on the metaphorical basis, as indicated in the N400 component. Our studies reveal that music tempos can prime different word classes, favoring the notion that embodied motion concept exists across domains and adding evidence to the hypothesis that music and language share the neural mechanism of meaning processing.

Project description:BackgroundEfficient management of major incidents involves triage, treatment and transport. In the absence of a standardised interdisciplinary major incident management approach, the Norwegian Air Ambulance Foundation developed Interdisciplinary Emergency Service Cooperation Course (TAS). The TAS-program was established in 1998 and by 2009, approximately 15 500 emergency service professionals have participated in one of more than 500 no-cost courses. The TAS-triage concept is based on the established triage Sieve and Paediatric Triage Tape models but modified with slap-wrap reflective triage tags and paediatric triage stretchers. We evaluated the feasibility and accuracy of the TAS-triage concept in full-scale simulated major incidents.MethodsThe learners participated in two standardised bus crash simulations: without and with competence of TAS-triage and access to TAS-triage equipment. The instructors calculated triage accuracy and measured time consumption while the learners participated in a self-reported before-after study. Each question was scored on a 7-point Likert scale with points labelled "Did not work" (1) through "Worked excellent" (7).ResultsAmong the 93 (85%) participating emergency service professionals, 48% confirmed the existence of a major incident triage system in their service, whereas 27% had access to triage tags. The simulations without TAS-triage resulted in a mean over- and undertriage of 12%. When TAS-Triage was used, no mistriage was found. The average time from "scene secured to all patients triaged" was 22 minutes (range 15-32) without TAS-triage vs. 10 minutes (range 5-21) with TAS-triage. The participants replied to "How did interdisciplinary cooperation of triage work?" with mean 4,9 (95% CI 4,7-5,2) before the course vs. mean 5,8 (95% CI 5,6-6,0) after the course, p < 0,001.ConclusionsOur modified triage Sieve tool is feasible, time-efficient and accurate in allocating priority during simulated bus accidents and may serve as a candidate for a future national standard for major incident triage.

Dataset Information

UMLS concept indexing for production databases: a feasibility study.

Publications

UMLS concept indexing for production databases: a feasibility study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets