Dataset Information

Identifying Subgroups of Patients With Autism by Gene Expression Profiles Using Machine Learning Algorithms.

ABSTRACT: Objectives: The identification of subgroups of autism spectrum disorder (ASD) may partially remedy the problems of clinical heterogeneity to facilitate the improvement of clinical management. The current study aims to use machine learning algorithms to analyze microarray data to identify clusters with relatively homogeneous clinical features. Methods: The whole-genome gene expression microarray data were used to predict communication quotient (SCQ) scores against all probes to select differential expression regions (DERs). Gene set enrichment analysis was performed for DERs with a fold-change >2 to identify hub pathways that play a role in the severity of social communication deficits inherent to ASD. We then used two machine learning methods, random forest classification (RF) and support vector machine (SVM), to identify two clusters using DERs. Finally, we evaluated how accurately the clusters predicted language impairment. Results: A total of 191 DERs were initially identified, and 54 of them with a fold-change >2 were selected for the pathway analysis. Cholesterol biosynthesis and metabolisms pathways appear to act as hubs that connect other trait-associated pathways to influence the severity of social communication deficits inherent to ASD. Both RF and SVM algorithms can yield a classification accuracy level >90% when all 191 DERs were analyzed. The ASD subtypes defined by the presence of language impairment, a strong indicator for prognosis, can be predicted by transcriptomic profiles associated with social communication deficits and cholesterol biosynthesis and metabolism. Conclusion: The results suggest that both RF and SVM are acceptable options for machine learning algorithms to identify AD subgroups characterized by clinical homogeneity related to prognosis.

SUBMITTER: Lin PI

PROVIDER: S-EPMC8149626 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Identifying Subgroups of Patients With Autism by Gene Expression Profiles Using Machine Learning Algorithms.

Lin Ping-I PI Moni Mohammad Ali MA Gau Susan Shur-Fen SS Eapen Valsamma V

Frontiers in psychiatry 20210512

<b>Objectives:</b> The identification of subgroups of autism spectrum disorder (ASD) may partially remedy the problems of clinical heterogeneity to facilitate the improvement of clinical management. The current study aims to use machine learning algorithms to analyze microarray data to identify clusters with relatively homogeneous clinical features. <b>Methods:</b> The whole-genome gene expression microarray data were used to predict communication quotient (SCQ) scores against all probes to sele ...[more]

PMID: 34054599

Similar Datasets

Project description:ObjectiveThe cause and mechanism of non-obstructive azoospermia (NOA) is complicated; therefore, an effective therapy strategy is yet to be developed. This study aimed to analyse the pathogenesis of NOA at the molecular biological level and to identify the core regulatory genes, which could be utilised as potential biomarkers.MethodsThree NOA microarray datasets (GSE45885, GSE108886, and GSE145467) were collected from the GEO database and merged into training sets; a further dataset (GSE45887) was then defined as the validation set. Differential gene analysis, consensus cluster analysis, and WGCNA were used to identify preliminary signature genes; then, enrichment analysis was applied to these previously screened signature genes. Next, 4 machine learning algorithms (RF, SVM, GLM, and XGB) were used to detect potential biomarkers that are most closely associated with NOA. Finally, a diagnostic model was constructed from these potential biomarkers and visualised as a nomogram. The differential expression and predictive reliability of the biomarkers were confirmed using the validation set. Furthermore, the competing endogenous RNA network was constructed to identify the regulatory mechanisms of potential biomarkers; further, the CIBERSORT algorithm was used to calculate immune infiltration status among the samples.ResultsA total of 215 differentially expressed genes (DEGs) were identified between NOA and control groups (27 upregulated and 188 downregulated genes). The WGCNA results identified 1123 genes in the MEblue module as target genes that are highly correlated with NOA positivity. The NOA samples were divided into 2 clusters using consensus clustering; further, 1027 genes in the MEblue module, which were screened by WGCNA, were considered to be target genes that are highly correlated with NOA classification. The 129 overlapping genes were then established as signature genes. The XGB algorithm that had the maximum AUC value (AUC=0.946) and the minimum residual value was used to further screen the signature genes. IL20RB, C9orf117, HILS1, PAOX, and DZIP1 were identified as potential NOA biomarkers. This 5 biomarker model had the highest AUC value, of up to 0.982, compared to other single biomarker models; additionally, the results of this biomarker model were verified in the validation set.ConclusionsAs IL20RB, C9orf117, HILS1, PAOX, and DZIP1 have been determined to possess the strongest association with NOA, these five genes could be used as potential therapeutic targets for NOA patients. Furthermore, the model constructed using these five genes, which possessed the highest diagnostic accuracy, may be an effective biomarker model that warrants further experimental validation.

Project description:OBJECTIVES:Identifying subgroups of ICU patients with similar clinical needs and trajectories may provide a framework for more efficient ICU care through the design of care platforms tailored around patients' shared needs. However, objective methods for identifying these ICU patient subgroups are lacking. We used a machine learning approach to empirically identify ICU patient subgroups through clustering analysis and evaluate whether these groups might represent appropriate targets for care redesign efforts. DESIGN:We performed clustering analysis using data from patients' hospital stays to retrospectively identify patient subgroups from a large, heterogeneous ICU population. SETTING:Kaiser Permanente Northern California, a healthcare delivery system serving 3.9 million members. PATIENTS:ICU patients 18 years old or older with an ICU admission between January 1, 2012, and December 31, 2012, at one of 21 Kaiser Permanente Northern California hospitals. INTERVENTIONS:None. MEASUREMENTS AND MAIN RESULTS:We used clustering analysis to identify putative clusters among 5,000 patients randomly selected from 24,884 ICU patients. To assess cluster validity, we evaluated the distribution and frequency of patient characteristics and the need for invasive therapies. We then applied a classifier built from the sample cohort to the remaining 19,884 patients to compare the derivation and validation clusters. Clustering analysis successfully identified six clinically recognizable subgroups that differed significantly in all baseline characteristics and clinical trajectories, despite sharing common diagnoses. In the validation cohort, the proportion of patients assigned to each cluster was similar and demonstrated significant differences across clusters for all variables. CONCLUSIONS:A machine learning approach revealed important differences between empirically derived subgroups of ICU patients that are not typically revealed by admitting diagnosis or severity of illness alone. Similar data-driven approaches may provide a framework for future organizational innovations in ICU care tailored around patients' shared needs.

Project description:BackgroundIt has been increasingly recognized that adults living alone have a higher likelihood of developing Major Depressive Disorder (MDD) than those living with others. However, there is still no prediction model for MDD specifically designed for adults who live alone.ObjectiveThis study aims to investigate the effectiveness of utilizing personal health data in combination with a stacked ensemble machine learning (SEML) technique to detect MDD among adults living alone, seeking to gain insights into the interaction between personal health data and MDD.MethodsOur data originated from the US National Health and Nutrition Examination Survey (NHANES) spanning 2007 to 2018. We finally selected a set of 30 easily accessible variables encompassing demographic profiles, lifestyle factors, and baseline health conditions. We constructed a SEML model for MDD detection, incorporating three conventional machine learning algorithms as base models and a Neural Network (NN) as the meta-model. Furthermore, SHapley Additive exPlanations (SHAP) analysis was used to explain the impact of each predictor on MDD.ResultsThe study included 2,642 adult participants who lived alone, of whom 10.6% (279 out of 2,642) had a PHQ-9 score of 10 or above, indicating the presence of MDD. The performance of our SEML model was robust, with an area under the curve (AUC) of 0.85. Further analysis using SHAP revealed positive correlations between the occurrence of MDD and factors such as sleep disorders, number of prescription medications, need for specific walking aids, leak urine during nonphysical activities, chronic bronchitis, and Healthy Eating Index (HEI) scores for sodium. Conversely, age, the Family Monthly Poverty Level Index (FMMPI), and HEI scores for added sugar showed negative correlations with MDD occurrence. Additionally, a U-shaped relationship was noted between the occurrence of MDD and both sleep duration and Body Mass Index (BMI), as well as HEI scores for dairy.ConclusionThe study has successfully developed a predictive model for MDD, specifically tailored for adults living alone using a stacked ensemble technique, enhancing the identification of MDD and its risk factors among adults living alone.

Dataset Information

Identifying Subgroups of Patients With Autism by Gene Expression Profiles Using Machine Learning Algorithms.

Publications

Identifying Subgroups of Patients With Autism by Gene Expression Profiles Using Machine Learning Algorithms.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets