Dataset Information

An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.

ABSTRACT:

Objective

Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.

Materials and methods

Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n = 1822) were stratified into low-mortality (21.8%, n = 397) and high-mortality (6.0%, n = 110) extremes using a two-sided P-value score quantifying deviation of observed vs. expected 30-day patient mortality rates. Three patient cohorts were assembled: patients seen by low-mortality clinicians, high-mortality clinicians, and an unfiltered crowd of all clinicians (n = 1046, 1046, and 5230 post-propensity score matching, respectively). Predicted order lists were automatically generated from recommender system algorithms trained on each patient cohort and evaluated against (i) real-world practice patterns reflected in patient cases with better-than-expected mortality outcomes and (ii) reference standards derived from clinical practice guidelines.

Results

Across six common admission diagnoses, order lists learned from the crowd demonstrated the greatest alignment with guideline references (AUROC range = 0.86-0.91), performing on par or better than those learned from low-mortality clinicians (0.79-0.84, P < 10^-5) or manually-authored hospital order sets (0.65-0.77, P < 10^-3). The same trend was observed in evaluating model predictions against better-than-expected patient cases, with the crowd model (AUROC mean = 0.91) outperforming the low-mortality model (0.87, P < 10^-16) and order set benchmarks (0.78, P < 10^-35).

Discussion

Whether machine-learning models are trained on all clinicians or a subset of experts illustrates a bias-variance tradeoff in data usage. Defining robust metrics to assess quality based on internal (e.g. practice patterns from better-than-expected patient cases) or external reference standards (e.g. clinical practice guidelines) is critical to assess decision support content.

Conclusion

Learning relevant decision support content from all clinicians is as, if not more, robust than learning from a select subgroup of clinicians favored by patient outcomes.

SUBMITTER: Wang JK

PROVIDER: S-EPMC6250126 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.

Wang Jason K JK Hom Jason J Balasubramanian Santhosh S Schuler Alejandro A Shah Nigam H NH Goldstein Mary K MK Baiocchi Michael T M MTM Chen Jonathan H JH

Journal of biomedical informatics 20180907

<h4>Objective</h4>Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.<h4>Materials and methods</h4>Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n = 1822) were stratified into low-mortality (21.8%, n = 397) and high-mortality (6.0%, n = 110) extremes using a two-sided P-value score quantifying deviation of observed vs. expected 30-day patient ...[more]

PMID: 30195660

Similar Datasets

Project description:ImportanceGoal-concordant care is an ongoing challenge in hospital settings. Identification of high mortality risk within 30 days may call attention to the need to have serious illness conversations, including the documentation of patient goals of care.ObjectiveTo examine goals of care discussions (GOCDs) in a community hospital setting with patients identified as having a high risk of mortality by a machine learning mortality prediction algorithm.Design, setting, and participantsThis cohort study took place at community hospitals within 1 health care system. Participants included adult patients with a high risk of 30-day mortality who were admitted to 1 of 4 hospitals between January 2 and July 15, 2021. Patient encounters of inpatients in the intervention hospital where physicians were notified of the computed high risk mortality score were compared with patient encounters of inpatients in 3 community hospitals without the intervention (ie, matched control).InterventionPhysicians of patients with a high risk of mortality within 30 days received notification and were encouraged to arrange for GOCDs.Main outcomes and measuresThe primary outcome was the percentage change of documented GOCDs prior to discharge. Propensity-score matching was completed on a preintervention and postintervention period using age, sex, race, COVID-19 status, and machine learning-predicted mortality risk scores. A difference-in-difference analysis validated the results.ResultsOverall, 537 patients were included in this study with 201 in the preintervention period (94 in the intervention group; 104 in the control group) and 336 patients in the postintervention period. The intervention and control groups included 168 patients per group and were well-balanced in age (mean [SD], 79.3 [9.60] vs 79.6 [9.21] years; standardized mean difference [SMD], 0.03), sex (female, 85 [51%] vs 85 [51%]; SMD, 0), race (White patients, 145 [86%] vs 144 [86%]; SMD 0.006), and Charlson comorbidities (median [range], 8.00 [2.00-15.0] vs 9.00 [2.00 to 19.0]; SMD, 0.34). Patients in the intervention group from preintervention to postintervention period were associated with being 5 times more likely to have documented GOCDs (OR, 5.11 [95% CI, 1.93 to 13.42]; P = .001) by discharge compared with matched controls, and GOCD occurred significantly earlier in the hospitalization in the intervention patients as compared with matched controls (median, 4 [95% CI, 3 to 6] days vs 16 [95% CI, 15 to not applicable] days; P < .001). Similar findings were observed for Black patient and White patient subgroups.Conclusions and relevanceIn this cohort study, patients whose physicians had knowledge of high-risk predictions from machine learning mortality algorithms were associated with being 5 times more likely to have documented GOCDs than matched controls. Additional external validation is needed to determine if similar interventions would be helpful at other institutions.

Project description:BackgroundMethotrexate (MTX) is the gold-standard first-line disease-modifying anti-rheumatic drug for juvenile idiopathic arthritis (JIA), despite only being either effective or tolerated in half of children and young people (CYP). To facilitate stratified treatment of early JIA, novel methods in machine learning were used to i) identify clusters with distinct disease patterns following MTX initiation; ii) predict cluster membership; and iii) compare clusters to existing treatment response measures.MethodsDiscovery and verification cohorts included CYP who first initiated MTX before January 2018 in one of four UK multicentre prospective cohorts of JIA within the CLUSTER consortium. JADAS components (active joint count, physician (PGA) and parental (PGE) global assessments, ESR) were recorded at MTX start and over the following year. Clusters of MTX 'response' were uncovered using multivariate group-based trajectory modelling separately in discovery and verification cohorts. Clusters were compared descriptively to ACR Pedi 30/90 scores, and multivariate logistic regression models predicted cluster-group assignment.FindingsThe discovery cohorts included 657 CYP and verification cohorts 1241 CYP. Six clusters were identified: Fast improvers (11%), Slow Improvers (16%), Improve-Relapse (7%), Persistent Disease (44%), Persistent PGA (8%) and Persistent PGE (13%), the latter two characterised by improvement in all features except one. Factors associated with clusters included ethnicity, ILAR category, age, PGE, and ESR scores at MTX start, with predictive model area under the curve values of 0.65-0.71. Singular ACR Pedi 30/90 scores at 6 and 12 months could not capture speeds of improvement, relapsing courses or diverging disease patterns.InterpretationSix distinct patterns following initiation of MTX have been identified using methods in artificial intelligence. These clusters demonstrate the limitations in traditional yes/no treatment response assessment (e.g., ACRPedi30) and can form the basis of a stratified medicine programme in early JIA.FundingMedical Research Council, Versus Arthritis, Great Ormond Street Hospital Children's Charity, Olivia's Vision, and the National Institute for Health Research.

Project description:BackgroundThis study aims to evaluate the capabilities and limitations of large language models (LLMs) for providing patient education for men undergoing radiotherapy for localized prostate cancer, incorporating assessments from both clinicians and patients.MethodsSix questions about definitive radiotherapy for prostate cancer were designed based on common patient inquiries. These questions were presented to different LLMs [ChatGPT‑4, ChatGPT-4o (both OpenAI Inc., San Francisco, CA, USA), Gemini (Google LLC, Mountain View, CA, USA), Copilot (Microsoft Corp., Redmond, WA, USA), and Claude (Anthropic PBC, San Francisco, CA, USA)] via the respective web interfaces. Responses were evaluated for readability using the Flesch Reading Ease Index. Five radiation oncologists assessed the responses for relevance, correctness, and completeness using a five-point Likert scale. Additionally, 35 prostate cancer patients evaluated the responses from ChatGPT‑4 for comprehensibility, accuracy, relevance, trustworthiness, and overall informativeness.ResultsThe Flesch Reading Ease Index indicated that the responses from all LLMs were relatively difficult to understand. All LLMs provided answers that clinicians found to be generally relevant and correct. The answers from ChatGPT‑4, ChatGPT-4o, and Claude AI were also found to be complete. However, we found significant differences between the performance of different LLMs regarding relevance and completeness. Some answers lacked detail or contained inaccuracies. Patients perceived the information as easy to understand and relevant, with most expressing confidence in the information and a willingness to use ChatGPT‑4 for future medical questions. ChatGPT-4's responses helped patients feel better informed, despite the initially standardized information provided.ConclusionOverall, LLMs show promise as a tool for patient education in prostate cancer radiotherapy. While improvements are needed in terms of accuracy and readability, positive feedback from clinicians and patients suggests that LLMs can enhance patient understanding and engagement. Further research is essential to fully realize the potential of artificial intelligence in patient education.

Project description:BackgroundHybrid imaging became an instrumental part of medical imaging, particularly cancer imaging processes in clinical routine. To date, several radiomic and machine learning studies investigated the feasibility of in vivo tumor characterization with variable outcomes. This study aims to investigate the effect of recently proposed fuzzy radiomics and compare its predictive performance to conventional radiomics in cancer imaging cohorts. In addition, lesion vs. lesion+surrounding fuzzy and conventional radiomic analysis was conducted.MethodsPreviously published 11C Methionine (MET) positron emission tomography (PET) glioma, 18F-FDG PET/computed tomography (CT) lung, and 68GA-PSMA-11 PET/magneto-resonance imaging (MRI) prostate cancer retrospective cohorts were included in the analysis to predict their respective clinical endpoints. Four delineation methods including manually defined reference binary (Ref-B), its smoothed, fuzzified version (Ref-F), as well as extended binary (Ext-B) and its fuzzified version (Ext-F) were incorporated to extract imaging biomarker standardization initiative (IBSI)-conform radiomic features from each cohort. Machine learning for the four delineation approaches was performed utilizing a Monte Carlo cross-validation scheme to estimate the predictive performance of the four delineation methods.ResultsReference fuzzy (Ref-F) delineation outperformed its binary delineation (Ref-B) counterpart in all cohorts within a volume range of 938-354987 mm3 with relative cross-validation area under the receiver operator characteristics curve (AUC) of +4.7-10.4. Compared to Ref-B, the highest AUC performance difference was observed by the Ref-F delineation in the glioma cohort (Ref-F: 0.74 vs. Ref-B: 0.70) and in the prostate cohort by Ref-F and Ext-F (Ref-F: 0.84, Ext-F: 0.86 vs. Ref-B: 0.80). In addition, fuzzy radiomics decreased feature redundancy by approx. 20%.ConclusionsFuzzy radiomics has the potential to increase predictive performance particularly in small lesion sizes compared to conventional binary radiomics in PET. We hypothesize that this effect is due to the ability of fuzzy radiomics to model partial volume effects and delineation uncertainties at small lesion boundaries. In addition, we consider that the lower redundancy of fuzzy radiomic features supports the identification of imaging biomarkers in future studies. Future studies shall consider systematically analyzing lesions and their surroundings with fuzzy and binary radiomics.

Dataset Information

An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.

Objective

Materials and methods

Results

Discussion

Conclusion

Publications

An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets