Project description:IntroductionVenous thromboembolism (VTE) risk assessment at admission is of great importance for early screening and timely prophylaxis and management during hospitalization. The purpose of this study is to develop and validate novel risk assessment models at admission based on machine learning (ML) methods.MethodsIn this retrospective study, a total of 3078 individuals were included with their Caprini variables within 24 hours at admission. Then several ML models were built, including logistic regression (LR), random forest (RF), and extreme gradient boosting (XGB). The prediction performance of ML models and the Caprini risk score (CRS) was then validated and compared through a series of evaluation metrics.ResultsThe values of AUROC and AUPRC were 0.798 and 0.303 for LR, 0.804 and 0.360 for RF, and 0.796 and 0.352 for XGB, respectively, which outperformed CRS significantly (0.714 and 0.180, P < 0.001). When prediction scores were stratified into three risk levels for application, RF could obtain more reasonable results than CRS, including smaller false positive alerts and larger lower-risk proportions. The boosting results of stratification were further verified by the net-reclassification-improvement (NRI) analysis.DiscussionThis study indicated that machine learning models could improve VTE risk prediction at admission compared with CRS. Among the ML models, RF was found to have superior performance and great potential in clinical practice.
Project description:Diagnosis and appropriate intervention for myocardial infarction (MI) are time-sensitive but rely on clinical measures that can be progressive and initially inconclusive, underscoring the need for an accurate and early predictor of MI to support diagnostic and clinical management decisions. The objective of this study was to develop a machine learning algorithm (MLA) to predict MI diagnosis based on electronic health record data (EHR) readily available during Emergency Department assessment. An MLA was developed using retrospective patient data. The MLA used patient data as they became available in the first 3 h of care to predict MI diagnosis (defined by International Classification of Diseases, 10th revision code) at any time during the encounter. The MLA obtained an area under the receiver operating characteristic curve of 0.87, sensitivity of 87% and specificity of 70%, outperforming the comparator scoring systems TIMI and GRACE on all metrics. An MLA can synthesize complex EHR data to serve as a clinically relevant risk stratification tool for MI.
Project description:BackgroundPatient safety in the intensive care unit (ICU) is one of the most critical issues, and unplanned extubation (UE) is considered the most adverse event for patient safety. Prevention and early detection of such an event is an essential but difficult component of quality care.ObjectiveThis study aimed to develop and validate prediction models for UE in ICU patients using machine learning.MethodsThis study was conducted in an academic tertiary hospital in Seoul, Republic of Korea. The hospital had approximately 2000 inpatient beds and 120 ICU beds. As of January 2019, the hospital had approximately 9000 outpatients on a daily basis. The number of annual ICU admissions was approximately 10,000. We conducted a retrospective study between January 1, 2010, and December 31, 2018. A total of 6914 extubation cases were included. We developed a UE prediction model using machine learning algorithms, which included random forest (RF), logistic regression (LR), artificial neural network (ANN), and support vector machine (SVM). For evaluating the model's performance, we used the area under the receiver operating characteristic curve (AUROC). The sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were also determined for each model. For performance evaluation, we also used a calibration curve, the Brier score, and the integrated calibration index (ICI) to compare different models. The potential clinical usefulness of the best model at the best threshold was assessed through a net benefit approach using a decision curve.ResultsAmong the 6914 extubation cases, 248 underwent UE. In the UE group, there were more males than females, higher use of physical restraints, and fewer surgeries. The incidence of UE was higher during the night shift as compared to the planned extubation group. The rate of reintubation within 24 hours and hospital mortality were higher in the UE group. The UE prediction algorithm was developed, and the AUROC for RF was 0.787, for LR was 0.762, for ANN was 0.763, and for SVM was 0.740.ConclusionsWe successfully developed and validated machine learning-based prediction models to predict UE in ICU patients using electronic health record data. The best AUROC was 0.787 and the sensitivity was 0.949, which was obtained using the RF algorithm. The RF model was well-calibrated, and the Brier score and ICI were 0.129 and 0.048, respectively. The proposed prediction model uses widely available variables to limit the additional workload on the clinician. Further, this evaluation suggests that the model holds potential for clinical usefulness.
Project description:Background: Myocardial ischemia is a common early symptom of cardiovascular disease (CVD). Reliable detection of myocardial ischemia using computer-aided analysis of electrocardiograms (ECG) provides an important reference for early diagnosis of CVD. The vectorcardiogram (VCG) could improve the performance of ECG-based myocardial ischemia detection by affording temporal-spatial characteristics related to myocardial ischemia and capturing subtle changes in ST-T segment in continuous cardiac cycles. We aim to investigate if the combination of ECG and VCG could improve the performance of machine learning algorithms in automatic myocardial ischemia detection. Methods: The ST-T segments of 20-second, 12-lead ECGs, and VCGs were extracted from 377 patients with myocardial ischemia and 52 healthy controls. Then, sample entropy (SampEn, of 12 ECG leads and of three VCG leads), spatial heterogeneity index (SHI, of VCG) and temporal heterogeneity index (THI, of VCG) are calculated. Using a grid search, four SampEn and two features are selected as input signal features for ECG-only and VCG-only models based on support vector machine (SVM), respectively. Similarly, three features (S I , THI, and SHI, where S I is the SampEn of lead I) are further selected for the ECG + VCG model. 5-fold cross validation was used to assess the performance of ECG-only, VCG-only, and ECG + VCG models. To fully evaluate the algorithmic generalization ability, the model with the best performance was selected and tested on a third independent dataset of 148 patients with myocardial ischemia and 52 healthy controls. Results: The ECG + VCG model with three features (S I ,THI, and SHI) yields better classifying results than ECG-only and VCG-only models with the average accuracy of 0.903, sensitivity of 0.903, specificity of 0.905, F1 score of 0.942, and AUC of 0.904, which shows better performance with fewer features compared with existing works. On the third independent dataset, the testing showed an AUC of 0.814. Conclusion: The SVM algorithm based on the ECG + VCG model could reliably detect myocardial ischemia, providing a potential tool to assist cardiologists in the early diagnosis of CVD in routine screening during primary care services.
Project description:BACKGROUND:Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis-gonadotropin-releasing hormone (GnRH)-stimulation test or GnRH analogue (GnRHa)-stimulation test-is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE:We aimed to combine multiple CPP-related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS:In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS:Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS:The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.
Project description:ObjectiveAccurate estimations of surgical case durations can lead to the cost-effective utilization of operating rooms. We developed a novel machine learning approach, using both structured and unstructured features as input, to predict a continuous probability distribution of surgical case durations.Materials and methodsThe data set consisted of 53 783 surgical cases performed over 4 years at a tertiary-care pediatric hospital. Features extracted included categorical (American Society of Anesthesiologists [ASA] Physical Status, inpatient status, day of week), continuous (scheduled surgery duration, patient age), and unstructured text (procedure name, surgical diagnosis) variables. A mixture density network (MDN) was trained and compared to multiple tree-based methods and a Bayesian statistical method. A continuous ranked probability score (CRPS), a generalized extension of mean absolute error, was the primary performance measure. Pinball loss (PL) was calculated to assess accuracy at specific quantiles. Performance measures were additionally evaluated on common and rare surgical procedures. Permutation feature importance was measured for the best performing model.ResultsMDN had the best performance, with a CRPS of 18.1 minutes, compared to tree-based methods (19.5-22.1 minutes) and the Bayesian method (21.2 minutes). MDN had the best PL at all quantiles, and the best CRPS and PL for both common and rare procedures. Scheduled duration and procedure name were the most important features in the MDN.ConclusionsUsing natural language processing of surgical descriptors, we demonstrated the use of ML approaches to predict the continuous probability distribution of surgical case durations. The more discerning forecast of the ML-based MDN approach affords opportunities for guiding intelligent schedule design and day-of-surgery operational decisions.
Project description:BackgroundThe construction of a robust healthcare information system is fundamental to enhancing countries' capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China's rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization's (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China.MethodsRelevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital's Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework.ResultsThe patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients' physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05).ConclusionsSuspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264).
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:The Coronavirus Disease 2019 (COVID-19) is transitioning into the endemic phase. Nonetheless, it is crucial to remain mindful that pandemics related to infectious respiratory diseases (IRDs) can emerge unpredictably. Therefore, we aimed to develop and validate a severity assessment model for IRDs, including COVID-19, influenza, and novel influenza, using CT images on a multi-centre data set. Of the 805 COVID-19 patients collected from a single centre, 649 were used for training and 156 were used for internal validation (D1). Additionally, three external validation sets were obtained from 7 cohorts: 1138 patients with COVID-19 (D2), and 233 patients with influenza and novel influenza (D3). A hybrid model, referred to as Hybrid-DDM, was constructed by combining two deep learning models and a machine learning model. Across datasets D1, D2, and D3, the Hybrid-DDM exhibited significantly improved performance compared to the baseline model. The areas under the receiver operating curves (AUCs) were 0.830 versus 0.767 (p = 0.036) in D1, 0.801 versus 0.753 (p < 0.001) in D2, and 0.774 versus 0.668 (p < 0.001) in D3. This study indicates that the Hybrid-DDM model, trained using COVID-19 patient data, is effective and can also be applicable to patients with other types of viral pneumonia.
Project description:BackgroundDuring atherosclerosis, the narrowing of the arterial lumen is observed through the accumulation of bio compounds and the formation of plaque within artery walls. A non-linear optical imaging modality (NLOM), coherent anti-stokes Raman scattering (CARS) microscopy, can be used to image lipid-rich structures commonly found in atherosclerotic plaques. By matching the lipid's molecular vibrational frequencies (CH bonds), it is possible to map the accumulation of lipid-rich structures without the need for exogenous labelling and/or processing of the samples. CARS allows for the visualization of the morphological features of plaque. In combination with supervised machine learning, CARS imaged morphological features can be used to characterize the progression of atherosclerotic plaques. RESULTS: Based on a set of label-free CARS images of atherosclerotic plaques (i.e. foam cell clusters) from a Watanabe heritable hyperlipidemic rabbit model, we developed an automated pipeline to classify atherosclerotic lesions based on their major morphological features. Our method uses image preprocessing to first improve the quality of the CARS-imaged plaque, followed by the segmentation of the plaque using Otsu thresholding, marker-controlled watershed, K-means segmentation and a novel independent foam cell thresholding segmentation. To define relevant morphological features, 27 quantitative features were extracted and further refined by a novel coefficient of variation feature refinement method in accordance with filter-type feature selection. Refined morphological features were supplied into three supervised machine learning algorithms; K-nearest neighbour, support vector machine and decision tree classifier. The classification pipeline showcased the ability to exploit relevant plaque morphological features to accurately classify 3 pre-defined stages of atherosclerosis: early fatty streak development (EFS) and advancing atheroma (AA) with a greater than 85% class accuracy CONCLUSIONS: Through the combination of CARS microscopy and computational methods, a powerful classification tool was developed to identify the progression of atherosclerotic plaque in an automated manner. Using a curated dataset, the classification pipeline demonstrated the ability to differentiate between EFS, EF and AA. Thus, presenting the opportunity to classify the onset of atherosclerosis at an earlier stage of development.