Project description:There is increasing recognition that asthma and eczema are heterogeneous diseases. We investigated the predictive ability of a spectrum of machine learning methods to disambiguate clinical sub-groups of asthma, wheeze and eczema, using a large heterogeneous set of attributes in an unselected population. The aim was to identify to what extent such heterogeneous information can be combined to reveal specific clinical manifestations.The study population comprised a cross-sectional sample of adults, and included representatives of the general population enriched by subjects with asthma. Linear and non-linear machine learning methods, from logistic regression to random forests, were fit on a large attribute set including demographic, clinical and laboratory features, genetic profiles and environmental exposures. Outcome of interest were asthma, wheeze and eczema encoded by different operational definitions. Model validation was performed via bootstrapping.The study population included 554 adults, 42% male, 38% previous or current smokers. Proportion of asthma, wheeze, and eczema diagnoses was 16.7%, 12.3%, and 21.7%, respectively. Models were fit on 223 non-genetic variables plus 215 single nucleotide polymorphisms. In general, non-linear models achieved higher sensitivity and specificity than other methods, especially for asthma and wheeze, less for eczema, with areas under receiver operating characteristic curve of 84%, 76% and 64%, respectively. Our findings confirm that allergen sensitisation and lung function characterise asthma better in combination than separately. The predictive ability of genetic markers alone is limited. For eczema, new predictors such as bio-impedance were discovered.More usefully-complex modelling is the key to a better understanding of disease mechanisms and personalised healthcare: further advances are likely with the incorporation of more factors/attributes and longitudinal measures.
Project description:Asthma in children is a heterogeneous disease manifested by various phenotypes and endotypes. The level of disease control, as well as the effectiveness of anti-inflammatory treatment, is variable and inadequate in a significant portion of patients. By applying machine learning algorithms, we aimed to predict the treatment success in a pediatric asthma cohort and to identify the key variables for understanding the underlying mechanisms. We predicted the treatment outcomes in children with mild to severe asthma (N = 365), according to changes in asthma control, lung function (FEV1 and MEF50) and FENO values after 6 months of controller medication use, using Random Forest and AdaBoost classifiers. The highest prediction power is achieved for control- and, to a lower extent, for FENO-related treatment outcomes, especially in younger children. The most predictive variables for asthma control are related to asthma severity and the total IgE, which were also predictive for FENO-based outcomes. MEF50-related treatment outcomes were better predicted than the FEV1-based response, and one of the best predictive variables for this response was hsCRP, emphasizing the involvement of the distal airways in childhood asthma. Our results suggest that asthma control- and FENO-based outcomes can be more accurately predicted using machine learning than the outcomes according to FEV1 and MEF50. This supports the symptom control-based asthma management approach and its complementary FENO-guided tool in children. T2-high asthma seemed to respond best to the anti-inflammatory treatment. The results of this study in predicting the treatment success will help to enable treatment optimization and to implement the concept of precision medicine in pediatric asthma treatment.
Project description:Asthma is a common disease with profoundly variable natural history and patient morbidity. Heterogeneity has long been appreciated, and much work has focused on identifying subgroups of patients with similar pathobiological underpinnings. Previous studies of the Severe Asthma Research Program (SARP) cohort linked gene expression changes to specific clinical and physiologic characteristics. While invaluable for hypothesis generation, these data include extensive candidate gene lists that complicate target identification and validation. In this analysis, we performed unsupervised clustering of the SARP cohort using bronchial epithelial cell gene expression data, identifying a transcriptional signature for participants suffering exacerbation-prone asthma with impaired lung function. Clinically, participants in this asthma cluster exhibited a mixed inflammatory process and bore transcriptional hallmarks of NF-κB and activator protein 1 (AP-1) activation, despite high corticosteroid exposure. Using supervised machine learning, we found a set of 31 genes that classified patients with high accuracy and could reconstitute clinical and transcriptional hallmarks of our patient clustering in an external cohort. Of these genes, IL18R1 (IL-18 Receptor 1) negatively associated with lung function and was highly expressed in the most severe patient cluster. We validated IL18R1 protein expression in lung tissue and identified downstream NF-κB and AP-1 activity, supporting IL-18 signaling in severe asthma pathogenesis and highlighting this approach for gene and pathway discovery.
Project description:The aim of this observational retrospective study is to improve early risk stratification of hospitalized Covid-19 patients by predicting in-hospital mortality, transfer to intensive care unit (ICU) and mechanical ventilation from electronic health record data of the first 24 h after admission. Our machine learning model predicts in-hospital mortality (AUC = 0.918), transfer to ICU (AUC = 0.821) and the need for mechanical ventilation (AUC = 0.654) from a few laboratory data of the first 24 h after admission. Models based on dichotomous features indicating whether a laboratory value exceeds or falls below a threshold perform nearly as good as models based on numerical features. We devise completely data-driven and interpretable machine-learning models for the prediction of in-hospital mortality, transfer to ICU and mechanical ventilation for hospitalized Covid-19 patients within 24 h after admission. Numerical values of. CRP and blood sugar and dichotomous indicators for increased partial thromboplastin time (PTT) and glutamic oxaloacetic transaminase (GOT) are amongst the best predictors.
Project description:BackgroundThere are no objective, biological markers that can robustly predict methylphenidate response in attention deficit hyperactivity disorder. This study aimed to examine whether applying machine learning approaches to pretreatment demographic, clinical questionnaire, environmental, neuropsychological, neuroimaging, and genetic information can predict therapeutic response following methylphenidate administration.MethodsThe present study included 83 attention deficit hyperactivity disorder youth. At baseline, parents completed the ADHD Rating Scale-IV and Disruptive Behavior Disorder rating scale, and participants undertook the continuous performance test, Stroop color word test, and resting-state functional MRI scans. The dopamine transporter gene, dopamine D4 receptor gene, alpha-2A adrenergic receptor gene (ADRA2A) and norepinephrine transporter gene polymorphisms, and blood lead and urine cotinine levels were also measured. The participants were enrolled in an 8-week, open-label trial of methylphenidate. Four different machine learning algorithms were used for data analysis.ResultsSupport vector machine classification accuracy was 84.6% (area under receiver operating characteristic curve 0.84) for predicting methylphenidate response. The age, weight, ADRA2A MspI and DraI polymorphisms, lead level, Stroop color word test performance, and oppositional symptoms of Disruptive Behavior Disorder rating scale were identified as the most differentiating subset of features.ConclusionsOur results provide preliminary support to the translational development of support vector machine as an informative method that can assist in predicting treatment response in attention deficit hyperactivity disorder, though further work is required to provide enhanced levels of classification performance.
Project description:ObjectiveThe Scleroderma: Cyclophosphamide or Transplantation (SCOT) trial demonstrated clinical benefit of haematopoietic stem cell transplant (HSCT) compared with cyclophosphamide (CYC). We mapped PBC (peripheral blood cell) samples from the SCOT clinical trial to scleroderma intrinsic subsets and tested the hypothesis that they predict long-term response to HSCT.MethodsWe analysed gene expression from PBCs of SCOT participants to identify differential treatment response. PBC gene expression data were generated from 63 SCOT participants at baseline and follow-up timepoints. Participants who completed treatment protocol were stratified by intrinsic gene expression subsets at baseline, evaluated for event-free survival (EFS) and analysed for differentially expressed genes (DEGs).ResultsParticipants from the fibroproliferative subset on HSCT experienced significant improvement in EFS compared with fibroproliferative participants on CYC (p=0.0091). In contrast, EFS did not significantly differ between CYC and HSCT arms for the participants from the normal-like subset (p=0.77) or the inflammatory subset (p=0.1). At each timepoint, we observed considerably more DEGs in HSCT arm compared with CYC arm with HSCT arm showing significant changes in immune response pathways.ConclusionsParticipants from the fibroproliferative subset showed the most significant long-term benefit from HSCT compared with CYC. This study suggests that intrinsic subset stratification of patients may be used to identify patients with SSc who receive significant benefit from HSCT.
Project description:IntroductionAsthma is a long-term condition with rapid onset worsening of symptoms ('attacks') which can be unpredictable and may prove fatal. Models predicting asthma attacks require high sensitivity to minimise mortality risk, and high specificity to avoid unnecessary prescribing of preventative medications that carry an associated risk of adverse events. We aim to create a risk score to predict asthma attacks in primary care using a statistical learning approach trained on routinely collected electronic health record data.Methods and analysisWe will employ machine-learning classifiers (naïve Bayes, support vector machines, and random forests) to create an asthma attack risk prediction model, using the Asthma Learning Health System (ALHS) study patient registry comprising 500?000 individuals across 75 Scottish general practices, with linked longitudinal primary care prescribing records, primary care Read codes, accident and emergency records, hospital admissions and deaths. Models will be compared on a partition of the dataset reserved for validation, and the final model will be tested in both an unseen partition of the derivation dataset and an external dataset from the Seasonal Influenza Vaccination Effectiveness II (SIVE II) study.Ethics and disseminationPermissions for the ALHS project were obtained from the South East Scotland Research Ethics Committee 02 [16/SS/0130] and the Public Benefit and Privacy Panel for Health and Social Care (1516-0489). Permissions for the SIVE II project were obtained from the Privacy Advisory Committee (National Services NHS Scotland) [68/14] and the National Research Ethics Committee West Midlands-Edgbaston [15/WM/0035]. The subsequent research paper will be submitted for publication to a peer-reviewed journal and code scripts used for all components of the data cleaning, compiling, and analysis will be made available in the open source GitHub website (https://github.com/hollytibble).
Project description:Early detection of severe asthma exacerbations through home monitoring data in patients with stable mild-to-moderate chronic asthma could help to timely adjust medication. We evaluated the potential of machine learning methods compared to a clinical rule and logistic regression to predict severe exacerbations. We used daily home monitoring data from two studies in asthma patients (development: n = 165 and validation: n = 101 patients). Two ML models (XGBoost, one class SVM) and a logistic regression model provided predictions based on peak expiratory flow and asthma symptoms. These models were compared with an asthma action plan rule. Severe exacerbations occurred in 0.2% of all daily measurements in the development (154/92,787 days) and validation cohorts (94/40,185 days). The AUC of the best performing XGBoost was 0.85 (0.82-0.87) and 0.88 (0.86-0.90) for logistic regression in the validation cohort. The XGBoost model provided overly extreme risk estimates, whereas the logistic regression underestimated predicted risks. Sensitivity and specificity were better overall for XGBoost and logistic regression compared to one class SVM and the clinical rule. We conclude that ML models did not beat logistic regression in predicting short-term severe asthma exacerbations based on home monitoring data. Clinical application remains challenging in settings with low event incidence and high false alarm rates with high sensitivity.
Project description:Phospholipidosis is an adverse effect caused by numerous cationic amphiphilic drugs and can affect many cell types. It is characterized by the excess accumulation of phospholipids and is most reliably identified by electron microscopy of cells revealing the presence of lamellar inclusion bodies. The development of phospholipidosis can cause a delay in the drug development process, and the importance of computational approaches to the problem has been well documented. Previous work on predictive methods for phospholipidosis showed that state of the art machine learning methods produced the best results. Here we extend this work by looking at a larger data set mined from the literature. We find that circular fingerprints lead to better models than either E-Dragon descriptors or a combination of the two. We also observe very similar performance in general between Random Forest and Support Vector Machine models.