Unknown

Dataset Information

0

Using phrases and document metadata to improve topic modeling of clinical reports.


ABSTRACT: Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports.

SUBMITTER: Speier W 

PROVIDER: S-EPMC4902330 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Using phrases and document metadata to improve topic modeling of clinical reports.

Speier William W   Ong Michael K MK   Arnold Corey W CW  

Journal of biomedical informatics 20160421


Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical doc  ...[more]

Similar Datasets

| S-EPMC9109990 | biostudies-literature
| S-EPMC4801402 | biostudies-literature
| S-EPMC6244181 | biostudies-literature
| S-EPMC6524712 | biostudies-literature
| S-EPMC3236833 | biostudies-other
| S-EPMC6235242 | biostudies-literature
| S-EPMC9064087 | biostudies-literature
| S-EPMC10585361 | biostudies-literature
| S-EPMC9930816 | biostudies-literature
| S-EPMC10703021 | biostudies-literature