Dataset Information

Mining heterogeneous clinical notes by multi-modal latent topic model.

ABSTRACT: Latent knowledge can be extracted from the electronic notes that are recorded during patient encounters with the health system. Using these clinical notes to decipher a patient's underlying comorbidites, symptom burdens, and treatment courses is an ongoing challenge. Latent topic model as an efficient Bayesian method can be used to model each patient's clinical notes as "documents" and the words in the notes as "tokens". However, standard latent topic models assume that all of the notes follow the same topic distribution, regardless of the type of note or the domain expertise of the author (such as doctors or nurses). We propose a novel application of latent topic modeling, using multi-note topic model (MNTM) to jointly infer distinct topic distributions of notes of different types. We applied our model to clinical notes from the MIMIC-III dataset to infer distinct topic distributions over the physician and nursing note types. Based on manual assessments made by clinicians, we observed a significant improvement in topic interpretability using MNTM modeling over the baseline single-note topic models that ignore the note types. Moreover, our MNTM model led to a significantly higher prediction accuracy for prolonged mechanical ventilation and mortality using only the first 48 hours of patient data. By correlating the patients' topic mixture with hospital mortality and prolonged mechanical ventilation, we identified several diagnostic topics that are associated with poor outcomes. Because of its elegant and intuitive formation, we envision a broad application of our approach in mining multi-modality text-based healthcare information that goes beyond clinical notes. Code available at https://github.com/li-lab-mcgill/heterogeneous_ehr.

SUBMITTER: Wen Z

PROVIDER: S-EPMC8031429 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:ObjectiveExisting research on social determinants of health (SDoH) predominantly focuses on physician notes and structured data within electronic medical records. This study posits that social work notes are an untapped, potentially rich source for SDoH information. We hypothesize that clinical notes recorded by social workers, whose role is to ameliorate social and economic factors, might provide a complementary information source of data on SDoH compared to physician notes, which primarily concentrate on medical diagnoses and treatments. We aimed to use word frequency analysis and topic modeling to identify prevalent terms and robust topics of discussion within a large cohort of social work notes including both outpatient and in-patient consultations.Materials and methodsWe retrieved a diverse, deidentified corpus of 0.95 million clinical social work notes from 181 644 patients at the University of California, San Francisco. We conducted word frequency analysis related to ICD-10 chapters to identify prevalent terms within the notes. We then applied Latent Dirichlet Allocation (LDA) topic modeling analysis to characterize this corpus and identify potential topics of discussion, which was further stratified by note types and disease groups.ResultsWord frequency analysis primarily identified medical-related terms associated with specific ICD10 chapters, though it also detected some subtle SDoH terms. In contrast, the LDA topic modeling analysis extracted 11 topics explicitly related to social determinants of health risk factors, such as financial status, abuse history, social support, risk of death, and mental health. The topic modeling approach effectively demonstrated variations between different types of social work notes and across patients with different types of diseases or conditions.DiscussionOur findings highlight LDA topic modeling's effectiveness in extracting SDoH-related themes and capturing variations in social work notes, demonstrating its potential for informing targeted interventions for at-risk populations.ConclusionSocial work notes offer a wealth of unique and valuable information on an individual's SDoH. These notes present consistent and meaningful topics of discussion that can be effectively analyzed and utilized to improve patient care and inform targeted interventions for at-risk populations.

Project description:Existing whole-brain models are generally tailored to the modelling of a particular data modality (e.g., fMRI or MEG/EEG). We propose that despite the differing aspects of neural activity each modality captures, they originate from shared network dynamics. Building on the universal principles of self-organising delay-coupled nonlinear systems, we aim to link distinct features of brain activity - captured across modalities - to the dynamics unfolding on a macroscopic structural connectome. To jointly predict connectivity, spatiotemporal and transient features of distinct signal modalities, we consider two large-scale models - the Stuart Landau and Wilson and Cowan models - which generate short-lived 40 Hz oscillations with varying levels of realism. To this end, we measure features of functional connectivity and metastable oscillatory modes (MOMs) in fMRI and MEG signals - and compare them against simulated data. We show that both models can represent MEG functional connectivity (FC), functional connectivity dynamics (FCD) and generate MOMs to a comparable degree. This is achieved by adjusting the global coupling and mean conduction time delay and, in the WC model, through the inclusion of balance between excitation and inhibition. For both models, the omission of delays dramatically decreased the performance. For fMRI, the SL model performed worse for FCD and MOMs, highlighting the importance of balanced dynamics for the emergence of spatiotemporal and transient patterns of ultra-slow dynamics. Notably, optimal working points varied across modalities and no model was able to achieve a correlation with empirical FC higher than 0.4 across modalities for the same set of parameters. Nonetheless, both displayed the emergence of FC patterns that extended beyond the constraints of the anatomical structure. Finally, we show that both models can generate MOMs with empirical-like properties such as size (number of brain regions engaging in a mode) and duration (continuous time interval during which a mode appears). Our results demonstrate the emergence of static and dynamic properties of neural activity at different timescales from networks of delay-coupled oscillators at 40 Hz. Given the higher dependence of simulated FC on the underlying structural connectivity, we suggest that mesoscale heterogeneities in neural circuitry may be critical for the emergence of parallel cross-modal functional networks and should be accounted for in future modelling endeavours.

Project description:BackgroundPrior literature suggests that psychosocial factors adversely impact health and health care utilization outcomes. However, psychosocial factors are typically not captured by the structured data in electronic medical records (EMRs) but are rather recorded as free text in different types of clinical notes.ObjectiveWe here propose a text-mining approach to analyze EMRs to identify older adults with key psychosocial factors that predict adverse health care utilization outcomes, measured by 30-day readmission. The psychological factors were appended to the LACE (Length of stay, Acuity of the admission, Comorbidity of the patient, and Emergency department use) Index for Readmission to improve the prediction of readmission risk.MethodsWe performed a retrospective analysis using EMR notes of 43,216 hospitalization encounters in a hospital from January 1, 2017 to February 28, 2019. The mean age of the cohort was 67.51 years (SD 15.87), the mean length of stay was 5.57 days (SD 10.41), and the mean intensive care unit stay was 5% (SD 22%). We employed text-mining techniques to extract psychosocial topics that are representative of these patients and tested the utility of these topics in predicting 30-day hospital readmission beyond the predictive value of the LACE Index for Readmission.ResultsThe added text-mined factors improved the area under the receiver operating characteristic curve of the readmission prediction by 8.46% for geriatric patients, 6.99% for the general hospital population, and 6.64% for frequent admitters. Medical social workers and case managers captured more of the psychosocial text topics than physicians.ConclusionsThe results of this study demonstrate the feasibility of extracting psychosocial factors from EMR clinical notes and the value of these notes in improving readmission risk prediction. Psychosocial profiles of patients can be curated and quantified from text mining clinical notes and these profiles can be successfully applied to artificial intelligence models to improve readmission risk prediction.

Dataset Information

Mining heterogeneous clinical notes by multi-modal latent topic model.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets