Dataset Information

The OpenDeID corpus for patient de-identification.

ABSTRACT: For research purposes, protected health information is often redacted from unstructured electronic health records to preserve patient privacy and confidentiality. The OpenDeID corpus is designed to assist development of automatic methods to redact sensitive information from unstructured electronic health records. We retrieved 4548 unstructured surgical pathology reports from four urban Australian hospitals. The corpus was developed by two annotators under three different experimental settings. The quality of the annotations was evaluated for each setting. Specifically, we employed serial annotations, parallel annotations, and pre-annotations. Our results suggest that the pre-annotations approach is not reliable in terms of quality when compared to the serial annotations but can drastically reduce annotation time. The OpenDeID corpus comprises 2,100 pathology reports from 1,833 cancer patients with an average of 737.49 tokens and 7.35 protected health information entities annotated per report. The overall inter annotator agreement and deviation scores are 0.9464 and 0.9726, respectively. Realistic surrogates are also generated to make the corpus suitable for distribution to other researchers.

SUBMITTER: Jonnagaddala J

PROVIDER: S-EPMC8497517 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundUK government guidelines and initiatives emphasise equity in delivery of care, shared decision-making, and patient-centred care. This includes sharing information with patients as partners in health decisions and empowering them to manage their health effectively. In the UK, general practitioners (GPs) routinely receive hospital discharge letters; while patients receiving copies of such letters is seen as "good practice" and recommended, it is not standardised. The effects and consequences of whether or not this happens remains unclear. The aim of this study (one of three forming the Discharge Communication Study) was to explore patient perspectives on receiving discharge letters and their views on how this could be improved in order to optimise patient experience and outcomes.MethodsSemi-structured interviews were conducted with a diverse sample of 50 patients recruited from 17 GP surgeries within the West Midlands, UK. All participants were adults with a recent episode of general hospital inpatient or outpatient care. Data were audio recorded, transcribed and analysed using mixed methods corpus linguistics techniques.ResultsParticipants reported inconsistent access to discharge letters. Most wanted to receive a copy of their discharge letter although some expressed reservations. Perceived benefits included: increased understanding of their condition and treatment, reduced anxiety, and increased satisfaction. Consequences where participants had not received letters included: letter inaccuracies being overlooked, missed follow up actions, failure to fully remember diagnosis, treatment, or self-management or recommendations, and confusion and anxiety at what occurred and what will happen next. Participants felt the usefulness of receiving copies of letters could be increased by: including a patient information section, avoidance of acronyms, and jargon or technical terms explained with lay language.ConclusionsMost patients value receiving copies of hospital discharge letters, and should be consistently offered them. Patients' preferences for letter receipt could be logged in their health records. To enable positive outcomes letters should have a clear and accessible format that reflects the priorities and information needs of patients. Patients appear not to be receiving or being offered copies of letters consistently despite UK policies and guidelines supporting this practice; this suggests a need for greater standardisation of practice.

Dataset Information

The OpenDeID corpus for patient de-identification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets