Unknown

Dataset Information

0

De-identification of electronic health record using neural network.


ABSTRACT: According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without undergoing any privacy-protective measures. De-identification is a process of detecting and removing all sensitive information present in EHRs, and it is a necessary step towards privacy-preserving EHR data sharing. Over the last decade, there have been several proposals to de-identify textual data using manual, rule-based, and machine learning methods. In this article, we propose new methods to de-identify textual data based on the self-attention mechanism and stacked Recurrent Neural Network. To the best of our knowledge, we are the first to employ these techniques. Experimental results on three different datasets show that our model performs better than all state-of-the-art mechanism irrespective of the dataset. Additionally, our proposed method is significantly faster than the existing techniques. Finally, we introduced three utility metrics to judge the quality of the de-identified data.

SUBMITTER: Ahmed T 

PROVIDER: S-EPMC7596089 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

De-identification of electronic health record using neural network.

Ahmed Tanbir T   Aziz Md Momin Al MMA   Mohammed Noman N  

Scientific reports 20201029 1


According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without undergoing any privacy-protecti  ...[more]

Similar Datasets

| S-EPMC7183252 | biostudies-literature
| S-EPMC4002671 | biostudies-literature
| 2424890 | ecrin-mdr-crc
| S-EPMC8423426 | biostudies-literature
| S-EPMC6135013 | biostudies-literature
| S-EPMC5177541 | biostudies-literature
| S-EPMC8352066 | biostudies-literature
| S-EPMC8449299 | biostudies-literature
| S-EPMC10716428 | biostudies-literature
| S-EPMC9744270 | biostudies-literature