Project description:IntroductionBecause Alzheimer's disease (AD) has significant heterogeneity in encephalatrophy and clinical manifestations, AD research faces two critical challenges: eliminating the impact of natural aging and extracting valuable clinical data for patients with AD.MethodsThis study attempted to address these challenges by developing a novel machine-learning model called tensorized contrastive principal component analysis (T-cPCA). The objectives of this study were to predict AD progression and identify clinical subtypes while minimizing the influence of natural aging.ResultsWe leveraged a clinical variable space of 872 features, including almost all AD clinical examinations, which is the most comprehensive AD feature description in current research. T-cPCA yielded the highest accuracy in predicting AD progression by effectively minimizing the confounding effects of natural aging.DiscussionThe representative features and pathogenic circuits of the four primary AD clinical subtypes were discovered. Confirmed by clinical doctors in Tangdu Hospital, the plaques (18F-AV45) distribution of typical patients in the four clinical subtypes are consistent with representative brain regions found in four AD subtypes, which further offers novel insights into the underlying mechanisms of AD pathogenesis.
Project description:To define the target population of patients who have suspicion of sepsis (SOS) and to provide a basis for assessing the burden of SOS, and the evaluation of sepsis guidelines and improvement programmes.Retrospective analysis of routinely collected hospital administrative data.Secondary care, eight National Health Service (NHS) Acute Trusts.Hospital Episode Statistics data for 2013-2014 was used to identify all admissions with a primary diagnosis listed in the 'suspicion of sepsis' (SOS) coding set. The SOS coding set consists of all bacterial infective diagnoses.We identified 47?475 admissions with SOS, equivalent to a rate of 17 admissions per 1000 adults in a given year. The mortality for this group was 7.2% during their acute hospital admission. Urinary tract infection was the most common diagnosis and lobar pneumonia was associated with the most deaths. A short list of 10 diagnoses can account for 85% of the deaths.Patients with SOS can be identified in routine administrative data. It is these patients who should be screened for sepsis and are the target of programmes to improve the detection and treatment of sepsis. The effectiveness of such programmes can be evaluated by examining the outcomes of patients with SOS.
Project description:BackgroundLate mortality risk in sepsis-survivors persists for years with high readmission rates and low quality of life. The present study seeks to link the clinical sepsis-survivors heterogeneity with distinct biological profiles at ICU discharge and late adverse events using an unsupervised analysis.MethodsIn the original FROG-ICU prospective, observational, multicenter study, intensive care unit (ICU) patients with sepsis on admission (Sepsis-3) were identified (N = 655). Among them, 467 were discharged alive from the ICU and included in the current study. Latent class analysis was applied to identify distinct sepsis-survivors clinical classes using readily available data at ICU discharge. The primary endpoint was one-year mortality after ICU discharge.ResultsAt ICU discharge, two distinct subtypes were identified (A and B) using 15 readily available clinical and biological variables. Patients assigned to subtype B (48% of the studied population) had more impaired cardiovascular and kidney functions, hematological disorders and inflammation at ICU discharge than subtype A. Sepsis-survivors in subtype B had significantly higher one-year mortality compared to subtype A (respectively, 34% vs 16%, p < 0.001). When adjusted for standard long-term risk factors (e.g., age, comorbidities, severity of illness, renal function and duration of ICU stay), subtype B was independently associated with increased one-year mortality (adjusted hazard ratio (HR) = 1.74 (95% CI 1.16-2.60); p = 0.006).ConclusionsA subtype with sustained organ failure and inflammation at ICU discharge can be identified from routine clinical and laboratory data and is independently associated with poor long-term outcome in sepsis-survivors. Trial registration NCT01367093; https://clinicaltrials.gov/ct2/show/NCT01367093 .
Project description:Whole genome expression profiles are widely used to discover molecular subtypes of diseases. A remaining challenge is to identify the correspondence or commonality of subtypes found in multiple, independent data sets generated on various platforms. While model-based supervised learning is often used to make these connections, the models can be biased to the training data set and thus miss inherent, relevant substructure in the test data. Here we describe an unsupervised subclass mapping method (SubMap), which reveals common subtypes between independent data sets. The subtypes within a data set can be determined by unsupervised clustering or given by predetermined phenotypes before applying SubMap. We define a measure of correspondence for subtypes and evaluate its significance building on our previous work on gene set enrichment analysis. The strength of the SubMap method is that it does not impose the structure of one data set upon another, but rather uses a bi-directional approach to highlight the common substructures in both. We show how this method can reveal the correspondence between several cancer-related data sets. Notably, it identifies common subtypes of breast cancer associated with estrogen receptor status, and a subgroup of lymphoma patients who share similar survival patterns, thus improving the accuracy of a clinical outcome predictor.
Project description:Variability in hospital-level sepsis mortality rates may be due to differences in case mix, quality of care, or diagnosis and coding practices. Centers for Disease Control and Prevention's Adult Sepsis Event definition could facilitate objective comparisons of sepsis mortality rates between hospitals but requires rigorous risk-adjustment tools. We developed risk-adjustment models for Adult Sepsis Events using administrative and electronic health record data.DesignRetrospective cohort study.SettingOne hundred thirty-six U.S. hospitals in Cerner HealthFacts (derivation dataset) and 137 HCA Healthcare hospitals (validation dataset).PatientsA total of 95,154 hospitalized adult patients (derivation) and 201,997 patients (validation) meeting Centers for Disease Control and Prevention Adult Sepsis Event criteria.InterventionsNone.Measurements and main resultsWe created logistic regression models of increasing complexity using administrative and electronic health record data to predict in-hospital mortality. An administrative model using demographics, comorbidities, and coded markers of severity of illness at admission achieved an area under the receiver operating curve of 0.776 (95% CI, 0.770-0.783) in the Cerner cohort, with diminishing calibration at higher baseline risk deciles. An electronic health record-based model that integrated administrative data with laboratory results, vasopressors, and mechanical ventilation achieved an area under the receiver operating curve of 0.826 (95% CI, 0.820-0.831) in the derivation cohort and 0.827 (95% CI, 0.824-0.829) in the validation cohort, with better calibration than the administrative model. Adding vital signs and Glasgow Coma Score minimally improved performance.ConclusionsModels incorporating electronic health record data accurately predict hospital mortality for patients with Adult Sepsis Events and outperform models using administrative data alone. Utilizing laboratory test results, vasopressors, and mechanical ventilation without vital signs may achieve a good balance between data collection needs and model performance, but electronic health record-based models must be attentive to potential variability in data quality and availability. With ongoing testing and refinement of these risk-adjustment models, Adult Sepsis Event surveillance may enable more meaningful comparisons of hospital sepsis outcomes and provide an important window into quality of care.
Project description:Multiple sclerosis (MS) can be divided into four phenotypes based on clinical evolution. The pathophysiological boundaries of these phenotypes are unclear, limiting treatment stratification. Machine learning can identify groups with similar features using multidimensional data. Here, to classify MS subtypes based on pathological features, we apply unsupervised machine learning to brain MRI scans acquired in previously published studies. We use a training dataset from 6322 MS patients to define MRI-based subtypes and an independent cohort of 3068 patients for validation. Based on the earliest abnormalities, we define MS subtypes as cortex-led, normal-appearing white matter-led, and lesion-led. People with the lesion-led subtype have the highest risk of confirmed disability progression (CDP) and the highest relapse rate. People with the lesion-led MS subtype show positive treatment response in selected clinical trials. Our findings suggest that MRI-based subtypes predict MS disability progression and response to treatment and may be used to define groups of patients in interventional trials.
Project description:The ability to classify patients with bipolar disorder (BD) is restricted by their heterogeneity, which limits the understanding of their neuropathology. Therefore, we aimed to investigate clinically discernible and neurobiologically distinguishable BD subtypes. T1-weighted and resting-state functional magnetic resonance images of 112 patients with BD were obtained, and patients were segregated according to diagnostic subtype (i.e., types I and II) and clinical patterns, including the number of episodes and hospitalizations and history of suicide and psychosis. For each clinical pattern, fewer and more occurrences subgroups and types I and II were classified through nested cross-validation for robust performance, with minimum redundancy and maximum relevance, in feature selection. To assess the proportion of variance in cognitive performance explained by the neurobiological markers, multiple linear regression between verbal memory and the selected features was conducted. Satisfactory performance (mean accuracy, 73.60%) in classifying patients with a high or low number of episodes was attained through functional connectivity, mostly from default-mode and motor networks. Moreover, these neurobiological markers explained 62% of the variance in verbal memory. The number of episodes is a potentially critical aspect of the neuropathology of BD. Neurobiological markers can help identify BD neuroprogression.
Project description:BackgroundIdentifying cancer subtypes is an important component of the personalised medicine framework. An increasing number of computational methods have been developed to identify cancer subtypes. However, existing methods rarely use information from gene regulatory networks to facilitate the subtype identification. It is widely accepted that gene regulatory networks play crucial roles in understanding the mechanisms of diseases. Different cancer subtypes are likely caused by different regulatory mechanisms. Therefore, there are great opportunities for developing methods that can utilise network information in identifying cancer subtypes.ResultsIn this paper, we propose a method, weighted similarity network fusion (WSNF), to utilise the information in the complex miRNA-TF-mRNA regulatory network in identifying cancer subtypes. We firstly build the regulatory network where the nodes represent the features, i.e. the microRNAs (miRNAs), transcription factors (TFs) and messenger RNAs (mRNAs) and the edges indicate the interactions between the features. The interactions are retrieved from various interatomic databases. We then use the network information and the expression data of the miRNAs, TFs and mRNAs to calculate the weight of the features, representing the level of importance of the features. The feature weight is then integrated into a network fusion approach to cluster the samples (patients) and thus to identify cancer subtypes. We applied our method to the TCGA breast invasive carcinoma (BRCA) and glioblastoma multiforme (GBM) datasets. The experimental results show that WSNF performs better than the other commonly used computational methods, and the information from miRNA-TF-mRNA regulatory network contributes to the performance improvement. The WSNF method successfully identified five breast cancer subtypes and three GBM subtypes which show significantly different survival patterns. We observed that the expression patterns of the features in some miRNA-TF-mRNA sub-networks vary across different identified subtypes. In addition, pathway enrichment analyses show that the top pathways involving the most differentially expressed genes in each of the identified subtypes are different. The results would provide valuable information for understanding the mechanisms characterising different cancer subtypes and assist the design of treatment therapies. All datasets and the R scripts to reproduce the results are available online at the website: http://nugget.unisa.edu.au/Thuc/cancersubtypes/.
Project description:MotivationCancer subtype classification has the potential to significantly improve disease prognosis and develop individualized patient management. Existing methods are limited by their ability to handle extremely high-dimensional data and by the influence of misleading, irrelevant factors, resulting in ambiguous and overlapping subtypes.ResultsTo address the above issues, we proposed a novel approach to disentangling and eliminating irrelevant factors by leveraging the power of deep learning. Specifically, we designed a deep-learning framework, referred to as DeepType, that performs joint supervised classification, unsupervised clustering and dimensionality reduction to learn cancer-relevant data representation with cluster structure. We applied DeepType to the METABRIC breast cancer dataset and compared its performance to state-of-the-art methods. DeepType significantly outperformed the existing methods, identifying more robust subtypes while using fewer genes. The new approach provides a framework for the derivation of more accurate and robust molecular cancer subtypes by using increasingly complex, multi-source data.Availability and implementationAn open-source software package for the proposed method is freely available at http://www.acsu.buffalo.edu/~yijunsun/lab/DeepType.html.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Detection and diagnosis of cancer are especially important for early prevention and effective treatments. Traditional methods of cancer detection are usually time-consuming and expensive. Liquid biopsy, a newly proposed noninvasive detection approach, can promote the accuracy and decrease the cost of detection according to a personalized expression profile. However, few studies have been performed to analyze this type of data, which can promote more effective methods for detection of different cancer subtypes. In this study, we applied some reliable machine learning algorithms to analyze data retrieved from patients who had one of six cancer subtypes (breast cancer, colorectal cancer, glioblastoma, hepatobiliary cancer, lung cancer and pancreatic cancer) as well as healthy persons. Quantitative gene expression profiles were used to encode each sample. Then, they were analyzed by the maximum relevance minimum redundancy method. Two feature lists were obtained in which genes were ranked rigorously. The incremental feature selection method was applied to the mRMR feature list to extract the optimal feature subset, which can be used in the support vector machine algorithm to determine the best performance for the detection of cancer subtypes and healthy controls. The ten-fold cross-validation for the constructed optimal classification model yielded an overall accuracy of 0.751. On the other hand, we extracted the top eighteen features (genes), including TTN, RHOH, RPS20, TRBC2, in another feature list, the MaxRel feature list, and performed a detailed analysis of them. The results indicated that these genes could be important biomarkers for discriminating different cancer subtypes and healthy controls.