Project description:IntroductionThe time required to reach clinical remission varies in patients with chronic urticaria (CU). The objective of this study is to develop a predictive model using a machine learning methodology to predict time to clinical remission for patients with CU.MethodsAdults with ≥ 2 ICD-9/10 relevant CU diagnosis codes/CU-related treatment > 6 weeks apart were identified in the Optum deidentified electronic health record dataset (January 2007 to June 2019). Clinical remission was defined as ≥ 12 months without CU diagnosis/CU-related treatment. A random survival forest was used to predict time from diagnosis to clinical remission for each patient based on clinical and demographic features available at diagnosis. Model performance was assessed using concordance, which indicates the degree of agreement between observed and predicted time to remission. To characterize clinically relevant groups, features were summarized among cohorts that were defined based on quartiles of predicted time to remission.ResultsAmong 112,443 patients, 73.5% reached clinical remission, with a median of 336 days from diagnosis. From 1876 initial features, 176 were retained in the final model, which predicted a median of 318 days to remission. The model showed good performance with a concordance of 0.62. Patients with predicted longer time to remission tended to be older with delayed CU diagnosis, and have more comorbidities, more laboratory tests, higher body mass index, and polypharmacy during the 12-month period before the first CU diagnosis.ConclusionsApplying machine learning to real-world data enabled accurate prediction of time to clinical remission and identified multiple relevant demographic and clinical variables with predictive value. Ongoing work aims to further validate and integrate these findings into clinical applications for CU management.
Project description:IntroductionPost-stroke depression (PSD) is a serious mental disorder after ischemic stroke. Early detection is important for clinical practice. This research aims to develop machine learning models to predict new-onset PSD using real-world data.MethodsWe collected data for ischemic stroke patients from multiple medical institutions in Taiwan between 2001 and 2019. We developed models from 61,460 patients and used 15,366 independent patients to test the models' performance by evaluating their specificities and sensitivities. The predicted targets were whether PSD occurred at 30, 90, 180, and 365 days post-stroke. We ranked the important clinical features in these models.ResultsIn the study's database sample, 1.3% of patients were diagnosed with PSD. The average specificity and sensitivity of these four models were 0.83-0.91 and 0.30-0.48, respectively. Ten features were listed as important features related to PSD at different time points, namely old age, high height, low weight post-stroke, higher diastolic blood pressure after stroke, no pre-stroke hypertension but post-stroke hypertension (new-onset hypertension), post-stroke sleep-wake disorders, post-stroke anxiety disorders, post-stroke hemiplegia, and lower blood urea nitrogen during stroke.DiscussionMachine learning models can provide as potential predictive tools for PSD and important factors are identified to alert clinicians for early detection of depression in high-risk stroke patients.
Project description:BackgroundAdvances in machine learning (ML) provide great opportunities in the prediction of hospital readmission. This review synthesizes the literature on ML methods and their performance for predicting hospital readmission in the US.MethodsThis review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Extension for Scoping Reviews (PRISMA-ScR) Statement. The extraction of items was also guided by the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS). Electronic databases PUBMED, MEDLINE, and EMBASE were systematically searched from January 1, 2015, through December 10, 2019. The articles were imported into COVIDENCE online software for title/abstract screening and full-text eligibility. Observational studies using ML techniques for hospital readmissions among US patients were eligible for inclusion. Articles without a full text available in the English language were excluded. A qualitative synthesis included study characteristics, ML algorithms utilized, and model validation, and quantitative analysis assessed model performance. Model performances in terms of Area Under the Curve (AUC) were analyzed using R software. Quality in Prognosis Studies (QUIPS) tool was used to assess the quality of the reviewed studies.ResultsOf 522 citations reviewed, 43 studies met the inclusion criteria. A majority of the studies used electronic health records (24, 56%), followed by population-based data sources (15, 35%) and administrative claims data (4, 9%). The most common algorithms were tree-based methods (23, 53%), neural network (NN) (14, 33%), regularized logistic regression (12, 28%), and support vector machine (SVM) (10, 23%). Most of these studies (37, 85%) were of high quality. A majority of these studies (28, 65%) reported ML algorithms with an AUC above 0.70. There was a range of variability within AUC reported by these studies with a median of 0.68 (IQR: 0.64-0.76; range: 0.50-0.90).ConclusionsThe ML algorithms involving tree-based methods, NN, regularized logistic regression, and SVM are commonly used to predict hospital readmission in the US. Further research is needed to compare the performance of ML algorithms for hospital readmission prediction.
Project description:Lapatinib is used for the treatment of metastatic HER2(+) breast cancer. We aim to establish a prediction model for lapatinib dose using machine learning and deep learning techniques based on a real-world study. There were 149 breast cancer patients enrolled from July 2016 to June 2017 at Fudan University Shanghai Cancer Center. The sequential forward selection algorithm based on random forest was applied for variable selection. Twelve machine learning and deep learning algorithms were compared in terms of their predictive abilities (logistic regression, SVM, random forest, Adaboost, XGBoost, GBDT, LightGBM, CatBoost, TabNet, ANN, Super TML, and Wide&Deep). As a result, TabNet was chosen to construct the prediction model with the best performance (accuracy = 0.82 and AUC = 0.83). Afterward, four variables that strongly correlated with lapatinib dose were ranked via importance score as follows: treatment protocols, weight, number of chemotherapy treatments, and number of metastases. Finally, the confusion matrix was used to validate the model for a dose regimen of 1,250 mg lapatinib (precision = 81% and recall = 95%), and for a dose regimen of 1,000 mg lapatinib (precision = 87% and recall = 64%). To conclude, we established a deep learning model to predict lapatinib dose based on important influencing variables selected from real-world evidence, to achieve an optimal individualized dose regimen with good predictive performance.
Project description:ObjectiveThe goal of this study was to evaluate the efficacy of machine learning (ML) techniques in predicting survival for chordoma patients in comparison with the standard Cox proportional hazards (CoxPH) model.MethodsUsing a Surveillance, Epidemiology, and End Results database of consecutive newly diagnosed chordoma cases between January 2000 and December 2018, we created and validated three ML survival models as well as a traditional CoxPH model in this population-based cohort study. Randomly, the dataset was divided into training and validation datasets. Tuning hyperparameters on the training dataset involved a 1000-iteration random search with fivefold cross-validation. Concordance index (C-index), Brier score, and integrated Brier score were used to evaluate the performance of the model. The receiver operating characteristic (ROC) curves, calibration curves, and area under the ROC curves (AUC) were used to assess the reliability of the models by predicting 5- and 10-year survival probabilities.ResultsA total of 724 chordoma patients were divided into training (n = 508) and validation (n = 216) cohorts. Cox regression identified nine significant prognostic factors (p < 0.05). ML models showed superior performance over CoxPH model, with DeepSurv having the highest C-index (0.795) and the best discrimination for 5- and 10-year survival (AUC 0.84 and 0.88). Calibration curves revealed strong correlation between DeepSurv predictions and actual survival. Risk stratification by DeepSurv model effectively discriminated high- and low-risk groups (p < 0.01). The optimized DeepSurv model was implemented into a web application for clinical use that can be found at https://hust-chengp-ml-chordoma-app-19rjyr.streamlitapp.com/ .ConclusionML algorithms based on time-to-event results are effective in chordoma prediction, with DeepSurv having the best discrimination performance and calibration.
Project description:BackgroundBeing one of the most widespread, pervasive, and troublesome illnesses in the world, depression causes dysfunction in various spheres of individual and social life. Regrettably, despite obtaining evidence-based antidepressant medication, up to 70% of people are going to continue to experience troublesome symptoms. Quetiapine, as one of the most commonly prescribed antipsychotic medication worldwide, has been reported as an effective augmentation strategy to antidepressants. The right quetiapine dose and personalized quetiapine treatment are frequently challenging for clinicians. This study aimed to identify important influencing variables for quetiapine dose by maximizing the use of data from real world, and develop a predictive model of quetiapine dose through machine learning techniques to support selections for treatment regimens.MethodsThe study comprised 308 depressed patients who were medicated with quetiapine and hospitalized in the First Hospital of Hebei Medical University, from November 1, 2019, to August 31, 2022. To identify the important variables influencing the dose of quetiapine, a univariate analysis was applied. The prediction abilities of nine machine learning models (XGBoost, LightGBM, RF, GBDT, SVM, LR, ANN, DT) were compared. Algorithm with the optimal model performance was chosen to develop the prediction model.ResultsFour predictors were selected from 38 variables by the univariate analysis (p < 0.05), including quetiapine TDM value, age, mean corpuscular hemoglobin concentration, and total bile acid. Ultimately, the XGBoost algorithm was used to create a prediction model for quetiapine dose that had the greatest predictive performance (accuracy = 0.69) out of nine models. In the testing cohort (62 cases), a total of 43 cases were correctly predicted of the quetiapine dose regimen. In dose subgroup analysis, AUROC for patients with daily dose of 100 mg, 200 mg, 300 mg and 400 mg were 0.99, 0.75, 0.93 and 0.86, respectively.ConclusionsIn this work, machine learning techniques are used for the first time to estimate the dose of quetiapine for patients with depression, which is valuable for the clinical drug recommendations.
Project description:BackgroundMachine learning (ML) offers vigorous statistical and probabilistic techniques that can successfully predict certain clinical conditions using large volumes of data. A review of ML and big data research analytics in maternal depression is pertinent and timely, given the rapid technological developments in recent years.ObjectiveThis study aims to synthesize the literature on ML and big data analytics for maternal mental health, particularly the prediction of postpartum depression (PPD).MethodsWe used a scoping review methodology using the Arksey and O'Malley framework to rapidly map research activity in ML for predicting PPD. Two independent researchers searched PsycINFO, PubMed, IEEE Xplore, and the ACM Digital Library in September 2020 to identify relevant publications in the past 12 years. Data were extracted from the articles' ML model, data type, and study results.ResultsA total of 14 studies were identified. All studies reported the use of supervised learning techniques to predict PPD. Support vector machine and random forest were the most commonly used algorithms in addition to Naive Bayes, regression, artificial neural network, decision trees, and XGBoost (Extreme Gradient Boosting). There was considerable heterogeneity in the best-performing ML algorithm across the selected studies. The area under the receiver operating characteristic curve values reported for different algorithms were support vector machine (range 0.78-0.86), random forest method (0.88), XGBoost (0.80), and logistic regression (0.93).ConclusionsML algorithms can analyze larger data sets and perform more advanced computations, which can significantly improve the detection of PPD at an early stage. Further clinical research collaborations are required to fine-tune ML algorithms for prediction and treatment. ML might become part of evidence-based practice in addition to clinical knowledge and existing research evidence.
Project description:OBJECTIVE:To determine how machine learning has been applied to prediction applications in population health contexts. Specifically, to describe which outcomes have been studied, the data sources most widely used and whether reporting of machine learning predictive models aligns with established reporting guidelines. DESIGN:A scoping review. DATA SOURCES:MEDLINE, EMBASE, CINAHL, ProQuest, Scopus, Web of Science, Cochrane Library, INSPEC and ACM Digital Library were searched on 18 July 2018. ELIGIBILITY CRITERIA:We included English articles published between 1980 and 2018 that used machine learning to predict population-health-related outcomes. We excluded studies that only used logistic regression or were restricted to a clinical context. DATA EXTRACTION AND SYNTHESIS:We summarised findings extracted from published reports, which included general study characteristics, aspects of model development, reporting of results and model discussion items. RESULTS:Of 22 618 articles found by our search, 231 were included in the review. The USA (n=71, 30.74%) and China (n=40, 17.32%) produced the most studies. Cardiovascular disease (n=22, 9.52%) was the most studied outcome. The median number of observations was 5414 (IQR=16?543.5) and the median number of features was 17 (IQR=31). Health records (n=126, 54.5%) and investigator-generated data (n=86, 37.2%) were the most common data sources. Many studies did not incorporate recommended guidelines on machine learning and predictive modelling. Predictive discrimination was commonly assessed using area under the receiver operator curve (n=98, 42.42%) and calibration was rarely assessed (n=22, 9.52%). CONCLUSIONS:Machine learning applications in population health have concentrated on regions and diseases well represented in traditional data sources, infrequently using big data. Important aspects of model development were under-reported. Greater use of big data and reporting guidelines for predictive modelling could improve machine learning applications in population health. REGISTRATION NUMBER:Registered on the Open Science Framework on 17 July 2018 (available at https://osf.io/rnqe6/).
Project description:BackgroundHepatocellular carcinoma (HCC) is a leading cause of cancer-related deaths worldwide, often linked to chronic inflammation. Our study aimed to probe inflammation pathways at the genetic level and pinpoint biomarkers linked to HCC patient survival.MethodsWe analyzed gene transcriptome data from 246 resectable stage I and II HCC patients from The Cancer Genome Atlas (TCGA). After selecting 917 inflammation-related genes (IRGs), we identified 104 differentially expressed genes (DEGs) through differential expression analysis. Two significant prognostic DEGs, S100A9 and PBK, were identified using LASSO and Cox regression, forming the basis of a risk score model. We conducted functional enrichment and immune landscape analyses, validated our findings on 170 patients from the GSE14520 dataset, and performed mutational analysis using TCGA somatic mutation data.ResultsWe analyzed 296 samples (246 HCC, 50 normal liver), showing significant survival differences between high and low-risk groups based on our risk score model. Functional enrichment analysis unveiled inflammation-associated pathways. Validation using the GSE14520 dataset confirmed our risk score's predictive ability, and we explored clinical correlations.ConclusionOur study delineates inflammation-related genomic changes in HCC, unveiling prognostic biomarkers with potential therapeutic implications. These findings deepen our understanding of HCC molecular mechanisms and may guide personalized therapeutic approaches, ultimately improving patient outcomes.
Project description:Inkjet printing has been extensively explored in recent years to produce personalised medicines due to its low cost and versatility. Pharmaceutical applications have ranged from orodispersible films to complex polydrug implants. However, the multi-factorial nature of the inkjet printing process makes formulation (e.g., composition, surface tension, and viscosity) and printing parameter optimization (e.g., nozzle diameter, peak voltage, and drop spacing) an empirical and time-consuming endeavour. Instead, given the wealth of publicly available data on pharmaceutical inkjet printing, there is potential for a predictive model for inkjet printing outcomes to be developed. In this study, machine learning (ML) models (random forest, multilayer perceptron, and support vector machine) to predict printability and drug dose were developed using a dataset of 687 formulations, consolidated from in-house and literature-mined data on inkjet-printed formulations. The optimized ML models predicted the printability of formulations with an accuracy of 97.22%, and predicted the quality of the prints with an accuracy of 97.14%. This study demonstrates that ML models can feasibly provide predictive insights to inkjet printing outcomes prior to formulation preparation, affording resource- and time-savings.