Project description:There is increasing recognition that asthma and eczema are heterogeneous diseases. We investigated the predictive ability of a spectrum of machine learning methods to disambiguate clinical sub-groups of asthma, wheeze and eczema, using a large heterogeneous set of attributes in an unselected population. The aim was to identify to what extent such heterogeneous information can be combined to reveal specific clinical manifestations.The study population comprised a cross-sectional sample of adults, and included representatives of the general population enriched by subjects with asthma. Linear and non-linear machine learning methods, from logistic regression to random forests, were fit on a large attribute set including demographic, clinical and laboratory features, genetic profiles and environmental exposures. Outcome of interest were asthma, wheeze and eczema encoded by different operational definitions. Model validation was performed via bootstrapping.The study population included 554 adults, 42% male, 38% previous or current smokers. Proportion of asthma, wheeze, and eczema diagnoses was 16.7%, 12.3%, and 21.7%, respectively. Models were fit on 223 non-genetic variables plus 215 single nucleotide polymorphisms. In general, non-linear models achieved higher sensitivity and specificity than other methods, especially for asthma and wheeze, less for eczema, with areas under receiver operating characteristic curve of 84%, 76% and 64%, respectively. Our findings confirm that allergen sensitisation and lung function characterise asthma better in combination than separately. The predictive ability of genetic markers alone is limited. For eczema, new predictors such as bio-impedance were discovered.More usefully-complex modelling is the key to a better understanding of disease mechanisms and personalised healthcare: further advances are likely with the incorporation of more factors/attributes and longitudinal measures.
Project description:Asthma in children is a heterogeneous disease manifested by various phenotypes and endotypes. The level of disease control, as well as the effectiveness of anti-inflammatory treatment, is variable and inadequate in a significant portion of patients. By applying machine learning algorithms, we aimed to predict the treatment success in a pediatric asthma cohort and to identify the key variables for understanding the underlying mechanisms. We predicted the treatment outcomes in children with mild to severe asthma (N = 365), according to changes in asthma control, lung function (FEV1 and MEF50) and FENO values after 6 months of controller medication use, using Random Forest and AdaBoost classifiers. The highest prediction power is achieved for control- and, to a lower extent, for FENO-related treatment outcomes, especially in younger children. The most predictive variables for asthma control are related to asthma severity and the total IgE, which were also predictive for FENO-based outcomes. MEF50-related treatment outcomes were better predicted than the FEV1-based response, and one of the best predictive variables for this response was hsCRP, emphasizing the involvement of the distal airways in childhood asthma. Our results suggest that asthma control- and FENO-based outcomes can be more accurately predicted using machine learning than the outcomes according to FEV1 and MEF50. This supports the symptom control-based asthma management approach and its complementary FENO-guided tool in children. T2-high asthma seemed to respond best to the anti-inflammatory treatment. The results of this study in predicting the treatment success will help to enable treatment optimization and to implement the concept of precision medicine in pediatric asthma treatment.
Project description:Asthma is a common disease with profoundly variable natural history and patient morbidity. Heterogeneity has long been appreciated, and much work has focused on identifying subgroups of patients with similar pathobiological underpinnings. Previous studies of the Severe Asthma Research Program (SARP) cohort linked gene expression changes to specific clinical and physiologic characteristics. While invaluable for hypothesis generation, these data include extensive candidate gene lists that complicate target identification and validation. In this analysis, we performed unsupervised clustering of the SARP cohort using bronchial epithelial cell gene expression data, identifying a transcriptional signature for participants suffering exacerbation-prone asthma with impaired lung function. Clinically, participants in this asthma cluster exhibited a mixed inflammatory process and bore transcriptional hallmarks of NF-κB and activator protein 1 (AP-1) activation, despite high corticosteroid exposure. Using supervised machine learning, we found a set of 31 genes that classified patients with high accuracy and could reconstitute clinical and transcriptional hallmarks of our patient clustering in an external cohort. Of these genes, IL18R1 (IL-18 Receptor 1) negatively associated with lung function and was highly expressed in the most severe patient cluster. We validated IL18R1 protein expression in lung tissue and identified downstream NF-κB and AP-1 activity, supporting IL-18 signaling in severe asthma pathogenesis and highlighting this approach for gene and pathway discovery.
Project description:The aim of this observational retrospective study is to improve early risk stratification of hospitalized Covid-19 patients by predicting in-hospital mortality, transfer to intensive care unit (ICU) and mechanical ventilation from electronic health record data of the first 24 h after admission. Our machine learning model predicts in-hospital mortality (AUC = 0.918), transfer to ICU (AUC = 0.821) and the need for mechanical ventilation (AUC = 0.654) from a few laboratory data of the first 24 h after admission. Models based on dichotomous features indicating whether a laboratory value exceeds or falls below a threshold perform nearly as good as models based on numerical features. We devise completely data-driven and interpretable machine-learning models for the prediction of in-hospital mortality, transfer to ICU and mechanical ventilation for hospitalized Covid-19 patients within 24 h after admission. Numerical values of. CRP and blood sugar and dichotomous indicators for increased partial thromboplastin time (PTT) and glutamic oxaloacetic transaminase (GOT) are amongst the best predictors.
Project description:BackgroundThere are no objective, biological markers that can robustly predict methylphenidate response in attention deficit hyperactivity disorder. This study aimed to examine whether applying machine learning approaches to pretreatment demographic, clinical questionnaire, environmental, neuropsychological, neuroimaging, and genetic information can predict therapeutic response following methylphenidate administration.MethodsThe present study included 83 attention deficit hyperactivity disorder youth. At baseline, parents completed the ADHD Rating Scale-IV and Disruptive Behavior Disorder rating scale, and participants undertook the continuous performance test, Stroop color word test, and resting-state functional MRI scans. The dopamine transporter gene, dopamine D4 receptor gene, alpha-2A adrenergic receptor gene (ADRA2A) and norepinephrine transporter gene polymorphisms, and blood lead and urine cotinine levels were also measured. The participants were enrolled in an 8-week, open-label trial of methylphenidate. Four different machine learning algorithms were used for data analysis.ResultsSupport vector machine classification accuracy was 84.6% (area under receiver operating characteristic curve 0.84) for predicting methylphenidate response. The age, weight, ADRA2A MspI and DraI polymorphisms, lead level, Stroop color word test performance, and oppositional symptoms of Disruptive Behavior Disorder rating scale were identified as the most differentiating subset of features.ConclusionsOur results provide preliminary support to the translational development of support vector machine as an informative method that can assist in predicting treatment response in attention deficit hyperactivity disorder, though further work is required to provide enhanced levels of classification performance.
Project description:BackgroundAlthough inhaled corticosteroids (ICS) are the first-line therapy for patients with persistent asthma, many patients continue to have exacerbations. We developed machine learning models to predict the ICS response in patients with asthma.MethodsThe subjects included asthma patients of European ancestry (n = 1371; 448 children; 916 adults). A genome-wide association study was performed to identify the SNPs associated with ICS response. Using the SNPs identified, two machine learning models were developed to predict ICS response: (1) least absolute shrinkage and selection operator (LASSO) regression and (2) random forest.ResultsThe LASSO regression model achieved an AUC of 0.71 (95% CI 0.67-0.76; sensitivity: 0.57; specificity: 0.75) in an independent test cohort, and the random forest model achieved an AUC of 0.74 (95% CI 0.70-0.78; sensitivity: 0.70; specificity: 0.68). The genes contributing to the prediction of ICS response included those associated with ICS responses in asthma (TPSAB1, FBXL16), asthma symptoms and severity (ABCA7, CNN2, PTRN3, and BSG/CD147), airway remodeling (ELANE, FSTL3), mucin production (GAL3ST), leukotriene synthesis (GPX4), allergic asthma (ZFPM1, SBNO2), and others.ConclusionsAn accurate risk prediction of ICS response can be obtained using machine learning methods, with the potential to inform personalized treatment decisions. Further studies are needed to examine if the integration of richer phenotype data could improve risk prediction.
Project description:BackgroundUnfractionated heparin (UFH) is an anticoagulant drug that is considered a high-risk medication because an excessive dose can cause bleeding, whereas an insufficient dose can lead to a recurrent embolic event. Therapeutic response to the initiation of intravenous UFH is monitored using activated partial thromboplastin time (aPTT) as a measure of blood clotting time. Clinicians iteratively adjust the dose of UFH toward a target, indication-defined therapeutic aPTT range using nomograms, but this process can be imprecise and can take ≥36 hours to achieve the target range. Thus, a more efficient approach is required.ObjectiveIn this study, we aimed to develop and validate a machine learning (ML) algorithm to predict aPTT within 12 hours after a specified bolus and maintenance dose of UFH.MethodsThis was a retrospective cohort study of 3019 patient episodes of care from January 2017 to August 2020 using data collected from electronic health records of 5 hospitals in Queensland, Australia. Data from 4 hospitals were used to build and test ensemble models using cross-validation, whereas data from the fifth hospital were used for external validation. We built 2 ML models: a regression model to predict the aPTT value after a UFH bolus dose and a multiclass model to predict the aPTT, classified as subtherapeutic (aPTT <70 seconds), therapeutic (aPTT 70-100 seconds), or supratherapeutic (aPTT >100 seconds). Modeling was performed using Driverless AI (H2O), an automated ML tool, and 17 different experiments were iteratively conducted to optimize model accuracy.ResultsIn predicting aPTT, the best performing model was an ensemble with 4x LightGBM models with a root mean square error of 31.35 (SD 1.37). In predicting the aPTT class using a repurposed data set, the best performing ensemble model achieved an accuracy of 0.599 (SD 0.0289) and an area under the receiver operating characteristic curve of 0.735. External validation yielded similar results: root mean square error of 30.52 (SD 1.29) for the aPTT prediction model, and accuracy of 0.568 (SD 0.0315) and area under the receiver operating characteristic curve of 0.724 for the aPTT multiclassification model.ConclusionsTo the best of our knowledge, this is the first ML model applied to intravenous UFH dosing that has been developed and externally validated in a multisite adult general medical and surgical inpatient setting. We present the processes of data collection, preparation, and feature engineering for replication.
Project description:ObjectiveThe Scleroderma: Cyclophosphamide or Transplantation (SCOT) trial demonstrated clinical benefit of haematopoietic stem cell transplant (HSCT) compared with cyclophosphamide (CYC). We mapped PBC (peripheral blood cell) samples from the SCOT clinical trial to scleroderma intrinsic subsets and tested the hypothesis that they predict long-term response to HSCT.MethodsWe analysed gene expression from PBCs of SCOT participants to identify differential treatment response. PBC gene expression data were generated from 63 SCOT participants at baseline and follow-up timepoints. Participants who completed treatment protocol were stratified by intrinsic gene expression subsets at baseline, evaluated for event-free survival (EFS) and analysed for differentially expressed genes (DEGs).ResultsParticipants from the fibroproliferative subset on HSCT experienced significant improvement in EFS compared with fibroproliferative participants on CYC (p=0.0091). In contrast, EFS did not significantly differ between CYC and HSCT arms for the participants from the normal-like subset (p=0.77) or the inflammatory subset (p=0.1). At each timepoint, we observed considerably more DEGs in HSCT arm compared with CYC arm with HSCT arm showing significant changes in immune response pathways.ConclusionsParticipants from the fibroproliferative subset showed the most significant long-term benefit from HSCT compared with CYC. This study suggests that intrinsic subset stratification of patients may be used to identify patients with SSc who receive significant benefit from HSCT.