Unknown

Dataset Information

0

Automatic detection of protected health information from clinic narratives.


ABSTRACT: This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task-specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de-identification challenge.

SUBMITTER: Yang H 

PROVIDER: S-EPMC4989090 | biostudies-literature | 2015 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Automatic detection of protected health information from clinic narratives.

Yang Hui H   Garibaldi Jonathan M JM  

Journal of biomedical informatics 20150729


This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of  ...[more]

Similar Datasets

| S-EPMC6231811 | biostudies-other
| S-EPMC9799318 | biostudies-literature
| S-EPMC7307763 | biostudies-literature
| S-EPMC5758434 | biostudies-other
| S-EPMC3855581 | biostudies-literature
| S-EPMC7775604 | biostudies-literature
| S-EPMC7774955 | biostudies-literature
| S-EPMC7037767 | biostudies-literature
| S-EPMC7156708 | biostudies-literature
| S-EPMC8220305 | biostudies-literature