Unknown

Dataset Information

0

Development of an Open-Source Annotated Glaucoma Medication Dataset From Clinical Notes in the Electronic Health Record.


ABSTRACT:

Purpose

To describe the methods involved in processing and characteristics of an open dataset of annotated clinical notes from the electronic health record (EHR) annotated for glaucoma medications.

Methods

In this study, 480 clinical notes from office visits, medical record numbers (MRNs), visit identification numbers, provider names, and billing codes were extracted for 480 patients seen for glaucoma by a comprehensive or glaucoma ophthalmologist from January 1, 2019, to August 31, 2020. MRNs and all visit data were de-identified using a hash function with salt from the deidentifyr package. All progress notes were annotated for glaucoma medication name, route, frequency, dosage, and drug use using an open-source annotation tool, Doccano. Annotations were saved separately. All protected health information (PHI) in progress notes and annotated files were de-identified using the published de-identifying algorithm Philter. All progress notes and annotations were manually validated by two ophthalmologists to ensure complete de-identification.

Results

The final dataset contained 5520 annotated sentences, including those with and without medications, for 480 clinical notes. Manual validation revealed 10 instances of remaining PHI which were manually corrected.

Conclusions

Annotated free-text clinical notes can be de-identified for upload as an open dataset. As data availability increases with the adoption of EHRs, free-text open datasets will become increasingly valuable for "big data" research and artificial intelligence development. This dataset is published online and publicly available at https://github.com/jche253/Glaucoma_Med_Dataset.

Translational relevance

This open access medication dataset may be a source of raw data for future research involving big data and artificial intelligence research using free-text.

SUBMITTER: Chen JS 

PROVIDER: S-EPMC9710490 | biostudies-literature | 2022 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Development of an Open-Source Annotated Glaucoma Medication Dataset From Clinical Notes in the Electronic Health Record.

Chen Jimmy S JS   Lin Wei-Chun WC   Yang Sen S   Chiang Michael F MF   Hribar Michelle R MR  

Translational vision science & technology 20221101 11


<h4>Purpose</h4>To describe the methods involved in processing and characteristics of an open dataset of annotated clinical notes from the electronic health record (EHR) annotated for glaucoma medications.<h4>Methods</h4>In this study, 480 clinical notes from office visits, medical record numbers (MRNs), visit identification numbers, provider names, and billing codes were extracted for 480 patients seen for glaucoma by a comprehensive or glaucoma ophthalmologist from January 1, 2019, to August 3  ...[more]

Similar Datasets

| S-EPMC9508434 | biostudies-literature
| S-EPMC5346165 | biostudies-literature
| S-EPMC9047064 | biostudies-literature
| S-EPMC11564094 | biostudies-literature
| S-EPMC10904777 | biostudies-literature
| S-EPMC7651918 | biostudies-literature
| S-EPMC5943623 | biostudies-literature
| S-EPMC10514685 | biostudies-literature
| S-EPMC9950090 | biostudies-literature
| S-EPMC9292762 | biostudies-literature