Dataset Information

Interpretable disease prediction using heterogeneous patient records with self-attentive fusion encoder.

ABSTRACT:

Objective

We propose an interpretable disease prediction model that efficiently fuses multiple types of patient records using a self-attentive fusion encoder. We assessed the model performance in predicting cardiovascular disease events, given the records of a general patient population.

Materials and methods

We extracted 798111 ses and 67 623 controls from the sample cohort database and nationwide healthcare claims data of South Korea. Among the information provided, our model used the sequential records of medical codes and patient characteristics, such as demographic profiles and the most recent health examination results. These two types of patient records were combined in our self-attentive fusion module, whereas previously dominant methods aggregated them using a simple concatenation. The prediction performance was compared to state-of-the-art recurrent neural network-based approaches and other widely used machine learning approaches.

Results

Our model outperformed all the other compared methods in predicting cardiovascular disease events. It achieved an area under the curve of 0.839, while the other compared methods achieved between 0.74111 d 0.830. Moreover, our model consistently outperformed the other methods in a more challenging setting in which we tested the model's ability to draw an inference from more nonobvious, diverse factors.

Discussion

We also interpreted the attention weights provided by our model as the relative importance of each time step in the sequence. We showed that our model reveals the informative parts of the patients' history by measuring the attention weights.

Conclusion

We suggest an interpretable disease prediction model that efficiently fuses heterogeneous patient records and demonstrates superior disease prediction performance.

SUBMITTER: Kwak H

PROVIDER: S-EPMC8449612 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:ImportanceHalf of the people who die by suicide make a health care visit within 1 month of their death. However, clinicians lack the tools to identify these patients.ObjectiveTo predict suicide attempts within 1 and 6 months of presentation at an emergency department (ED) for psychiatric problems.Design, setting, and participantsThis prognostic study assessed the 1-month and 6-month risk of suicide attempts among 1818 patients presenting to an ED between February 4, 2015, and March 13, 2017, with psychiatric problems. Data analysis was performed from May 1, 2020, to November 19, 2021.Main outcomes and measuresSuicide attempts 1 and 6 months after presentation to the ED were defined by combining data from electronic health records (EHRs) with patient 1-month (n = 1102) and 6-month (n = 1220) follow-up surveys. Ensemble machine learning was used to develop predictive models and a risk score for suicide.ResultsA total of 1818 patients participated in this study (1016 men [55.9%]; median age, 33 years [IQR, 24-46 years]; 266 Hispanic patients [14.6%]; 1221 non-Hispanic White patients [67.2%], 142 non-Hispanic Black patients [7.8%], 64 non-Hispanic Asian patients [3.5%], and 125 non-Hispanic patients of other race and ethnicity [6.9%]). A total of 137 of 1102 patients (12.9%; weighted prevalence) attempted suicide within 1 month, and a total of 268 of 1220 patients (22.0%; weighted prevalence) attempted suicide within 6 months. Clinicians' assessment alone was little better than chance at predicting suicide attempts, with externally validated area under the receiver operating characteristic curve (AUC) of 0.67 for the 1-month model and 0.60 for the 6-month model. Prediction accuracy was slightly higher for models based on EHR data (1-month model: AUC, 0.71; 6 month model: AUC, 0.65) and was best using patient self-reports (1-month model: AUC, 0.76; 6-month model: AUC, 0.77), especially when patient self-reports were combined with EHR and/or clinician data (1-month model: AUC, 0.77; and 6 month model: AUC, 0.79). A model that used only 20 patient self-report questions and an EHR-based risk score performed similarly well (1-month model: AUC, 0.77; 6 month model: AUC, 0.78). In the best 1-month model, 30.7% (positive predicted value) of the patients classified as having highest risk (top 25% of the sample) made a suicide attempt within 1 month of their ED visit, accounting for 64.8% (sensitivity) of all 1-month attempts. In the best 6-month model, 46.0% (positive predicted value) of the patients classified at highest risk made a suicide attempt within 6 months of their ED visit, accounting for 50.2% (sensitivity) of all 6-month attempts.Conclusions and relevanceThis prognostic study suggests that the ability to identify patients at high risk of suicide attempt after an ED visit for psychiatric problems improved using a combination of patient self-reports and EHR data.

Project description:ObjectivePreventing suicide in US youth is of paramount concern, with rates increasing over 50% between 2007 and 2018. Statistical modeling using electronic health records may help identify at-risk youth before a suicide attempt. While electronic health records contain diagnostic information, which are known risk factors, they generally lack or poorly document social determinants (e.g., social support), which are also known risk factors. If statistical models are built incorporating not only diagnostic records, but also social determinants measures, additional at-risk youth may be identified before a suicide attempt.MethodsSuicide attempts were predicted in hospitalized patients, ages 10-24, from the State of Connecticut's Hospital Inpatient Discharge Database (HIDD; N = 38943). Predictors included demographic information, diagnosis codes, and using a data fusion framework, social determinants features transferred or fused from an external source of survey data, The National Longitudinal Study of Adolescent to Adult Health (Add Health). Social determinant information for each HIDD patient was generated by averaging values from their most similar Add Health individuals (e.g., top 10), based upon matching shared features between datasets (e.g., Pearson's r). Attempts were then modelled using an elastic net logistic regression with both HIDD features and fused Add Health features.ResultsThe model including fused social determinants outperformed the conventional model (AUC = 0.83 v. 0.82). Sensitivity and positive predictive values at 90 and 95% specificity were almost 10% higher when including fused features (e.g., sensitivity at 90% specificity = 0.48 v. 0.44). Among social determinants variables, the perception that their mother cares and being non-religious appeared particularly important to performance improvement.DiscussionThis proof-of-concept study showed that incorporating social determinants measures from an external survey database could improve prediction of youth suicide risk from clinical data using a data fusion framework. While social determinant data directly from patients might be ideal, estimating these characteristics via data fusion avoids the task of data collection, which is generally time-consuming, expensive, and suffers from non-compliance.

Project description:BackgroundMultimorbidity presents an increasingly common problem in older population, and is tightly related to polypharmacy, i.e., concurrent use of multiple medications by one individual. Detecting polypharmacy from drug prescription records is not only related to multimorbidity, but can also point at incorrect use of medicines. In this work, we build models for predicting polypharmacy from drug prescription records for newly diagnosed chronic patients. We evaluate the models' performance with a strong focus on interpretability of the results.MethodsA centrally collected nationwide dataset of prescription records was used to perform electronic phenotyping of patients for the following two chronic conditions: type 2 diabetes mellitus (T2D) and cardiovascular disease (CVD). In addition, a hospital discharge dataset was linked to the prescription records. A regularized regression model was built for 11 different experimental scenarios on two datasets, and complexity of the model was controlled with a maximum number of dimensions (MND) parameter. Performance and interpretability of the model were evaluated with AUC, AUPRC, calibration plots, and interpretation by a medical doctor.ResultsFor the CVD model, AUC and AUPRC values of 0.900 (95% [0.898-0.901]) and 0.640 (0.635-0.645) were reached, respectively, while for the T2D model the values were 0.808 (0.803-0.812) and 0.732 (0.725-0.739). Reducing complexity of the model by 65% and 48% for CVD and T2D, resulted in 3% and 4% lower AUC, and 4% and 5% lower AUPRC values, respectively. Calibration plots for our models showed that we can achieve moderate calibration with reducing the models' complexity without significant loss of predictive performance.DiscussionIn this study, we found that it is possible to use drug prescription data to build a model for polypharmacy prediction in older population. In addition, the study showed that it is possible to find a balance between good performance and interpretability of the model, and achieve acceptable calibration at the same time.