Dataset Information

An efficient prototype method to identify and correct misspellings in clinical text.

ABSTRACT:

Objective

Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications.

Results

In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.

SUBMITTER: Workman TE

PROVIDER: S-EPMC6339425 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An efficient prototype method to identify and correct misspellings in clinical text.

Workman T Elizabeth TE Shao Yijun Y Divita Guy G Zeng-Treitler Qing Q

BMC research notes 20190118 1

<h4>Objective</h4>Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans ...[more]

PMID: 30658682

Dataset Information

An efficient prototype method to identify and correct misspellings in clinical text.

Objective

Results

Publications

An efficient prototype method to identify and correct misspellings in clinical text.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A nonparametric updating method to correct clinical prediction model drift.
| S-EPMC6857513 | biostudies-literature

An efficient method to identify differentially expressed genes in microarray experiments.
| S-EPMC3607310 | biostudies-literature

HiCuT: an efficient and low input method to identify protein-centric chromatin interactions
2022-03-16 | GSE186011 | GEO

An efficient and effective method to identify significantly perturbed subnetworks in cancer.
| S-EPMC10284573 | biostudies-literature

An efficient proteomics method to identify the cellular targets of protein kinase inhibitors.
| S-EPMC307585 | biostudies-literature

HiCuT: An efficient and low input method to identify protein-directed chromatin interactions.
| S-EPMC8979432 | biostudies-literature

Sentimental text mining based on an additional features method for text classification.
| S-EPMC6550425 | biostudies-literature

Efficient Approach to Correct Read Alignment for Pseudogene Abundance Estimates.
| S-EPMC5514313 | biostudies-literature

Using text analysis to identify functionally coherent gene groups.
| S-EPMC187532 | biostudies-literature