Dataset Information

Identification of Diagnostic Markers for Major Depressive Disorder Using Machine Learning Methods.

ABSTRACT:

Background

Major depressive disorder (MDD) is a global health challenge that impacts the quality of patients' lives severely. The disorder can manifest in many forms with different combinations of symptoms, which makes its clinical diagnosis difficult. Robust biomarkers are greatly needed to improve diagnosis and to understand the etiology of the disease. The main purpose of this study was to create a predictive model for MDD diagnosis based on peripheral blood transcriptomes.

Materials and methods

We collected nine RNA expression datasets for MDD patients and healthy samples from the Gene Expression Omnibus database. After a series of quality control and heterogeneity tests, 302 samples from six studies were deemed suitable for the study. R package "MetaOmics" was applied for systematic meta-analysis of genome-wide expression data. Receiver operating characteristic (ROC) curve analysis was used to evaluate the diagnostic effectiveness of individual genes. To obtain a better diagnostic model, we also adopted the support vector machine (SVM), random forest (RF), k-nearest neighbors (kNN), and naive Bayesian (NB) tools for modeling, with the RF method being used for feature selection.

Results

Our analysis revealed six differentially expressed genes (AKR1C3, ARG1, KLRB1, MAFG, TPST1, and WWC3) with a false discovery rate (FDR) < 0.05 between MDD patients and control subjects. We then evaluated the diagnostic ability of these genes individually. With single gene prediction, we achieved a corresponding area under the curve (AUC) value of 0.63 ± 0.04, 0.67 ± 0.07, 0.70 ± 0.11, 0.64 ± 0.08, 0.68 ± 0.07, and 0.62 ± 0.09, respectively, for these genes. Next, we constructed the classifiers of SVM, RF, kNN, and NB with an AUC of 0.84 ± 0.09, 0.81 ± 0.10, 0.73 ± 0.11, and 0.83 ± 0.09, respectively, in validation datasets, suggesting that the SVM classifier might be superior for constructing an MDD diagnostic model. The final SVM classifier including 70 feature genes was capable of distinguishing MDD samples from healthy controls and yielded an AUC of 0.78 in an independent dataset.

Conclusion

This study provides new insights into potential biomarkers through meta-analysis of GEO data. Constructing different machine learning models based on these biomarkers could be a valuable approach for diagnosing MDD in clinical practice.

SUBMITTER: Zhao S

PROVIDER: S-EPMC8249859 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Identification of Diagnostic Markers for Major Depressive Disorder Using Machine Learning Methods.

Zhao Shu S Bao Zhiwei Z Zhao Xinyi X Xu Mengxiang M Li Ming D MD Yang Zhongli Z

Frontiers in neuroscience 20210618

<h4>Background</h4>Major depressive disorder (MDD) is a global health challenge that impacts the quality of patients' lives severely. The disorder can manifest in many forms with different combinations of symptoms, which makes its clinical diagnosis difficult. Robust biomarkers are greatly needed to improve diagnosis and to understand the etiology of the disease. The main purpose of this study was to create a predictive model for MDD diagnosis based on peripheral blood transcriptomes.<h4>Materia ...[more]

PMID: 34220416

Similar Datasets

Project description:Bipolar disorder (BD) showed the highest suicide rate of all psychiatric disorders, and its underlying causative genes and effective treatments remain unclear. During diagnosis, BD is often confused with schizophrenia (SC) and major depressive disorder (MDD), due to which patients may receive inadequate or inappropriate treatment, which is detrimental to their prognosis. This study aims to establish a diagnostic model to distinguish BD from SC and MDD in multiple public datasets through bioinformatics and machine learning and to provide new ideas for diagnosing BD in the future. Three brain tissue datasets containing BD, SC, and MDD were chosen from the Gene Expression Omnibus database (GEO), and two peripheral blood datasets were selected for validation. Linear Models for Microarray Data (Limma) analysis was carried out to identify differentially expressed genes (DEGs). Functional enrichment analysis and machine learning were utilized to identify. Least absolute shrinkage and selection operator (LASSO) regression was employed for identifying candidate immune-associated central genes, constructing protein-protein interaction networks (PPI), building artificial neural networks (ANN) for validation, and plotting receiver operating characteristic curve (ROC curve) for differentiating BD from SC and MDD and creating immune cell infiltration to study immune cell dysregulation in the three diseases. RBM10 was obtained as a candidate gene to distinguish BD from SC. Five candidate genes (LYPD1, HMBS, HEBP2, SETD3, and ECM2) were obtained to distinguish BD from MDD. The validation was performed by ANN, and ROC curves were plotted for diagnostic value assessment. The outcomes exhibited the prediction model to have a promising diagnostic value. In the immune infiltration analysis, Naive B, Resting NK, and Activated Mast Cells were found to be substantially different between BD and SC. Naive B and Memory B cells were prominently variant between BD and MDD. In this study, RBM10 was found as a candidate gene to distinguish BD from SC; LYPD1, HMBS, HEBP2, SETD3, and ECM2 serve as five candidate genes to distinguish BD from MDD. The results obtained from the ANN network showed that these candidate genes could perfectly distinguish BD from SC and MDD (76.923% and 81.538%, respectively).

Project description:Selecting a course of treatment in psychiatry remains a trial-and-error process, and this long-standing clinical challenge has prompted an increased focus on predictive models of treatment response using machine learning techniques. Electroencephalography (EEG) represents a cost-effective and scalable potential measure to predict treatment response to major depressive disorder. We performed separate meta-analyses to determine the ability of models to distinguish between responders and non-responders using EEG across treatments, as well as a performed subgroup analysis of response to transcranial magnetic stimulation (rTMS), and antidepressants (Registration Number: CRD42021257477) in Major Depressive Disorder by searching PubMed, Scopus, and Web of Science for articles published between January 1960 and February 2022. We included 15 studies that predicted treatment responses among patients with major depressive disorder using machine-learning techniques. Within a random-effects model with a restricted maximum likelihood estimator comprising 758 patients, the pooled accuracy across studies was 83.93% (95% CI: 78.90-89.29), with an Area-Under-the-Curve (AUC) of 0.850 (95% CI: 0.747-0.890), and partial AUC of 0.779. The average sensitivity and specificity across models were 77.96% (95% CI: 60.05-88.70), and 84.60% (95% CI: 67.89-92.39), respectively. In a subgroup analysis, greater performance was observed in predicting response to rTMS (Pooled accuracy: 85.70% (95% CI: 77.45-94.83), Area-Under-the-Curve (AUC): 0.928, partial AUC: 0.844), relative to antidepressants (Pooled accuracy: 81.41% (95% CI: 77.45-94.83, AUC: 0.895, pAUC: 0.821). Furthermore, across all meta-analyses, the specificity (true negatives) of EEG models was greater than the sensitivity (true positives), suggesting that EEG models thus far better identify non-responders than responders to treatment in MDD. Studies varied widely in important features across models, although relevant features included absolute and relative power in frontal and temporal electrodes, measures of connectivity, and asymmetry across hemispheres. Predictive models of treatment response using EEG hold promise in major depressive disorder, although there is a need for prospective model validation in independent datasets, and a greater emphasis on replicating physiological markers. Crucially, standardization in cut-off values and clinical scales for defining clinical response and non-response will aid in the reproducibility of findings and the clinical utility of predictive models. Furthermore, several models thus far have used data from open-label trials with small sample sizes and evaluated performance in the absence of training and testing sets, which increases the risk of statistical overfitting. Large consortium studies are required to establish predictive signatures of treatment response using EEG, and better elucidate the replicability of specific markers. Additionally, it is speculated that greater performance was observed in rTMS models, since EEG is assessing neural networks more likely to be directly targeted by rTMS, comprising electrical activity primarily near the surface of the cortex. Prospectively, there is a need for models that examine the comparative effectiveness of multiple treatments across the same patients. However, this will require a thoughtful consideration towards cumulative treatment effects, and whether washout periods between treatments should be utilised. Regardless, longitudinal cross-over trials comparing multiple treatments across the same group of patients will be an important prerequisite step to both facilitate precision psychiatry and identify generalizable physiological predictors of response between and across treatment options.

Project description:BackgroundMajor depressive disorder (MDD) is a severe disease characterized by multiple pathological changes. However, there are no reliable diagnostic biomarkers for MDD. The aim of the current study was to investigate the gene network and biomarkers underlying the pathophysiology of MDD.MethodsIn this study, we conducted a comprehensive analysis of the mRNA expression profile of MDD using data from Gene Expression Omnibus (GEO). The MDD dataset (GSE98793) with 128 MDD and 64 control whole blood samples was divided randomly into two non-overlapping groups for cross-validated differential gene expression analysis. The gene ontology (GO) enrichment and gene set enrichment analysis (GSEA) were performed for annotation, visualization, and integrated discovery. Protein-protein interaction (PPI) network was constructed by STRING database and hub genes were identified by the CytoHubba plugin. The gene expression difference and the functional similarity of hub genes were investigated for further gene expression and function exploration. Moreover, the receiver operating characteristic curve was performed to verify the diagnostic value of the hub genes.ResultsWe identified 761 differentially expressed genes closely related to MDD. The Venn diagram and GO analyses indicated that changes in MDD are mainly enriched in ribonucleoprotein complex biogenesis, antigen receptor-mediated signaling pathway, catalytic activity (acting on RNA), structural constituent of ribosome, mitochondrial matrix, and mitochondrial protein complex. The GSEA suggested that tumor necrosis factor signaling pathway, Toll-like receptor signaling pathway, apoptosis pathway, and NF-kappa B signaling pathway are all crucial in the development of MDD. A total of 20 hub genes were selected via the PPI network. Additionally, the identified hub genes were downregulated and show high functional similarity and diagnostic value in MDD.ConclusionsOur findings may provide novel insight into the functional characteristics of MDD through integrative analysis of GEO data, and suggest potential biomarkers and therapeutic targets for MDD.

Project description:BackgroundIt has been increasingly recognized that adults living alone have a higher likelihood of developing Major Depressive Disorder (MDD) than those living with others. However, there is still no prediction model for MDD specifically designed for adults who live alone.ObjectiveThis study aims to investigate the effectiveness of utilizing personal health data in combination with a stacked ensemble machine learning (SEML) technique to detect MDD among adults living alone, seeking to gain insights into the interaction between personal health data and MDD.MethodsOur data originated from the US National Health and Nutrition Examination Survey (NHANES) spanning 2007 to 2018. We finally selected a set of 30 easily accessible variables encompassing demographic profiles, lifestyle factors, and baseline health conditions. We constructed a SEML model for MDD detection, incorporating three conventional machine learning algorithms as base models and a Neural Network (NN) as the meta-model. Furthermore, SHapley Additive exPlanations (SHAP) analysis was used to explain the impact of each predictor on MDD.ResultsThe study included 2,642 adult participants who lived alone, of whom 10.6% (279 out of 2,642) had a PHQ-9 score of 10 or above, indicating the presence of MDD. The performance of our SEML model was robust, with an area under the curve (AUC) of 0.85. Further analysis using SHAP revealed positive correlations between the occurrence of MDD and factors such as sleep disorders, number of prescription medications, need for specific walking aids, leak urine during nonphysical activities, chronic bronchitis, and Healthy Eating Index (HEI) scores for sodium. Conversely, age, the Family Monthly Poverty Level Index (FMMPI), and HEI scores for added sugar showed negative correlations with MDD occurrence. Additionally, a U-shaped relationship was noted between the occurrence of MDD and both sleep duration and Body Mass Index (BMI), as well as HEI scores for dairy.ConclusionThe study has successfully developed a predictive model for MDD, specifically tailored for adults living alone using a stacked ensemble technique, enhancing the identification of MDD and its risk factors among adults living alone.

Dataset Information

Identification of Diagnostic Markers for Major Depressive Disorder Using Machine Learning Methods.

Background

Materials and methods

Results

Conclusion

Publications

Identification of Diagnostic Markers for Major Depressive Disorder Using Machine Learning Methods.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets