Project description:ObjectiveThe cause and mechanism of non-obstructive azoospermia (NOA) is complicated; therefore, an effective therapy strategy is yet to be developed. This study aimed to analyse the pathogenesis of NOA at the molecular biological level and to identify the core regulatory genes, which could be utilised as potential biomarkers.MethodsThree NOA microarray datasets (GSE45885, GSE108886, and GSE145467) were collected from the GEO database and merged into training sets; a further dataset (GSE45887) was then defined as the validation set. Differential gene analysis, consensus cluster analysis, and WGCNA were used to identify preliminary signature genes; then, enrichment analysis was applied to these previously screened signature genes. Next, 4 machine learning algorithms (RF, SVM, GLM, and XGB) were used to detect potential biomarkers that are most closely associated with NOA. Finally, a diagnostic model was constructed from these potential biomarkers and visualised as a nomogram. The differential expression and predictive reliability of the biomarkers were confirmed using the validation set. Furthermore, the competing endogenous RNA network was constructed to identify the regulatory mechanisms of potential biomarkers; further, the CIBERSORT algorithm was used to calculate immune infiltration status among the samples.ResultsA total of 215 differentially expressed genes (DEGs) were identified between NOA and control groups (27 upregulated and 188 downregulated genes). The WGCNA results identified 1123 genes in the MEblue module as target genes that are highly correlated with NOA positivity. The NOA samples were divided into 2 clusters using consensus clustering; further, 1027 genes in the MEblue module, which were screened by WGCNA, were considered to be target genes that are highly correlated with NOA classification. The 129 overlapping genes were then established as signature genes. The XGB algorithm that had the maximum AUC value (AUC=0.946) and the minimum residual value was used to further screen the signature genes. IL20RB, C9orf117, HILS1, PAOX, and DZIP1 were identified as potential NOA biomarkers. This 5 biomarker model had the highest AUC value, of up to 0.982, compared to other single biomarker models; additionally, the results of this biomarker model were verified in the validation set.ConclusionsAs IL20RB, C9orf117, HILS1, PAOX, and DZIP1 have been determined to possess the strongest association with NOA, these five genes could be used as potential therapeutic targets for NOA patients. Furthermore, the model constructed using these five genes, which possessed the highest diagnostic accuracy, may be an effective biomarker model that warrants further experimental validation.
Project description:Objective: RNA-binding proteins (RBPs) are essential for most post-transcriptional regulatory events, which exert critical roles in nearly all aspects of cell biology. Here, characteristic RBPs of IgA nephropathy were determined with multiple machine learning algorithms. Methods: Our study included three gene expression datasets of IgA nephropathy (GSE37460, GSE73953, GSE93798). Differential expression of RBPs between IgA nephropathy and normal samples was analyzed via limma, and hub RBPs were determined through MCODE. Afterwards, three machine learning algorithms (LASSO, SVM-RFE, random forest) were integrated to determine characteristic RBPs, which were verified in the Nephroseq database. Immune cell infiltrations were estimated through CIBERSORT. Utilizing ConsensusClusterPlus, IgA nephropathy were classified based on hub RBPs. The potential upstream miRNAs were predicted. Results: Among 388 RBPs with differential expression, 43 hub RBPs were determined. After integration of three machine learning algorithms, three characteristic RBPs were finally identified (DDX27, RCL1, and TFB2M). All of them were down-regulated in IgA nephropathy than normal specimens, with the excellent diagnostic efficacy. Additionally, they were significantly linked to immune cell infiltrations, immune checkpoints, and pyroptosis-relevant genes. Based on hub RBPs, IgA nephropathy was stably classified as two subtypes (cluster 1 and 2). Cluster 1 exhibited the relatively high expression of pyroptosis-relevant genes and characteristic RBPs. MiR-501-3p, miR-760, miR-502-3p, miR-1224-5p, and miR-107 were potential upstream miRNAs of hub RBPs. Conclusion: Collectively, our findings determine three characteristic RBPs in IgA nephropathy and two RBPs-based subtypes, and thus provide a certain basis for further research on the diagnosis and pathogenesis of IgA nephropathy.
Project description:IntroductionDiabetic nephropathy (DN) is a common diabetes-related complication with unclear underlying pathological mechanisms. Although recent studies have linked glycolysis to various pathological states, its role in DN remains largely underexplored.MethodsIn this study, the expression patterns of glycolysis-related genes (GRGs) were first analyzed using the GSE30122, GSE30528, and GSE96804 datasets, followed by an evaluation of the immune landscape in DN. An unsupervised consensus clustering of DN samples from the same dataset was conducted based on differentially expressed GRGs. The hub genes associated with DN and glycolysis-related clusters were identified via weighted gene co-expression network analysis (WGCNA) and machine learning algorithms. Finally, the expression patterns of these hub genes were validated using single-cell sequencing data and quantitative real-time polymerase chain reaction (qRT-PCR).ResultsEleven GRGs showed abnormal expression in DN samples, leading to the identification of two distinct glycolysis clusters, each with its own immune profile and functional pathways. The analysis of the GSE142153 dataset showed that these clusters had specific immune characteristics. Furthermore, the Extreme Gradient Boosting (XGB) model was the most effective in diagnosing DN. The five most significant variables, including GATM, PCBD1, F11, HRSP12, and G6PC, were identified as hub genes for further investigation. Single-cell sequencing data showed that the hub genes were predominantly expressed in proximal tubular epithelial cells. In vitro experiments confirmed the expression pattern in NC.ConclusionOur study provides valuable insights into the molecular mechanisms underlying DN, highlighting the involvement of GRGs and immune cell infiltration.
Project description:Many potential biomarkers in nephrology have been studied, but few are currently used in clinical practice. One is osteopontin (OPN). We compared urinary OPN concentrations in 80 participants: 67 patients with various biopsy-proven glomerulopathies (GNs)-immunoglobulin A nephropathy (IgAN, 29), membranous nephropathy (MN, 20) and lupus nephritis (LN, 18) and 13 with no GN. Follow-up included 48 participants. Machine learning was used to correlate OPN with other factors to classify patients by GN type. The resulting algorithm had an accuracy of 87% in differentiating IgAN from other GNs using urinary OPN levels only. A lesser effect for discriminating MN and LN was observed. However, the lower number of patients and the phenotypic heterogeneity of MN and LN might have affected those results. OPN was significantly higher in IgAN at baseline than in other GNs and therefore might be useful for identifying patients with IgAN. That observation did not apply to either patients with IgAN at follow-up or to patients with other GNs. OPN seems to be a valuable biomarker and should be validated in future studies. Machine learning is a powerful tool that, compared with traditional statistical methods, can be also applied to smaller datasets.
Project description:Systemic lupus erythematosus (SLE) is an autoimmune disease involving multiple systems. Its recurrent episodes and fluctuating disease courses have a severe impact on patients. Biomarkers to predict disease prognosis and remission are still lacking in SLE. We downloaded the GSE50772 dataset from the Gene Expression Omnibus database and identified differentially expressed genes (DEGs) between SLE and healthy controls. Weighted gene co-expression network analysis was used to identify key gene modules and corresponding genes in SLE. The overlapped genes in DEGs and key modules are used as key genes for subsequent analysis. These key genes were analyzed using 3 machine learning algorithms, including the least absolute shrinkage and selection operator, support vector machine recursive elimination, and random forest algorithms. The overlapped genes were obtained as potential biomarkers for further analysis, investigating and validating the potential biomarkers' possible functions, regulatory mechanisms, diagnostic value, and expression levels. And finally studied the differences between groups in level of immune cell infiltration and explored the relationship between potential biomarkers and immunity. A total of 234 overlapped genes in DEGs and key modules are used as key genes for subsequent analysis. After taking the intersection of the key genes obtained by 3 algorithms, we got 4 potential biomarkers (ARID2, CYSTM1, DDIT3, and RNASE1) with high diagnostic values. Finally, further immune infiltration analysis showed differences in various immune cells in the SLE and healthy control samples. ARID2, CYSTM1, DDIT3, and RNASE1 can affect the immune function of SLE patients. ARID2, CYSTM1, DDIT3, and RNASE1 could be used as immune-related potential biomarkers and therapeutic or diagnostic targets for further research.
Project description:BackgroundBreast cancer (BC) ranks first in incidence among women, with approximately 2 million new cases per year. Therefore, it is essential to investigate emerging targets for BC patients' diagnosis and prognosis.MethodsWe analyzed gene expression data from 99 normal and 1,081 BC tissues in The Cancer Genome Atlas (TCGA) database. Differentially expressed genes (DEGs) were identified using "limma" R package, and relevant modules were chosen through Weighted Gene Coexpression Network Analysis (WGCNA). Intersection genes were obtained by matching DEGs to WGCNA module genes. Functional enrichment studies were performed on these genes using Gene Ontology (GO), Disease Ontology (DO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Biomarkers were screened via Protein-Protein Interaction (PPI) networks and multiple machine-learning algorithms. The Gene Expression Profiling Interactive Analysis (GEPIA), The University of ALabama at Birmingham CANcer (UALCAN), and Human Protein Atlas (HPA) databases were employed to examine mRNA and protein expression of eight biomarkers. Kaplan-Meier mapper tool assessed their prognostic capabilities. Key biomarkers were analyzed via single-cell sequencing, and their relationship with immune infiltration was examined using Tumor Immune Estimation Resource (TIMER) database and "xCell" R package. Lastly, drug prediction was conducted based on the identified biomarkers.ResultsWe identified 1,673 DEGs and 542 important genes through differential analysis and WGCNA, respectively. Intersection analysis revealed 76 genes, which play significant roles in immune-related viral infection and IL-17 signaling pathways. DIX domain containing 1 (DIXDC1), Dual specificity phosphatase 6 (DUSP6), Pyruvate dehydrogenase kinase 4 (PDK4), C-X-C motif chemokine ligand 12 (CXCL12), Interferon regulatory factor 7 (IRF7), Integrin subunit alpha 7 (ITGA7), NIMA related kinase 2 (NEK2), and Nuclear receptor subfamily 3 group C member 1 (NR3C1) were selected as BC biomarkers using machine-learning algorithms. NEK2 was the most critical gene for diagnosis. Prospective drugs targeting NEK2 include etoposide and lukasunone.ConclusionsOur study identified DIXDC1, DUSP6, PDK4, CXCL12, IRF7, ITGA7, NEK2, and NR3C1 as potential diagnostic biomarkers for BC, with NEK2 having the highest potential to aid in diagnosis and prognosis in clinical settings.
Project description:Hypertrophic cardiomyopathy (HCM) is a relatively common inherited cardiac disease that results in left ventricular hypertrophy. Machine learning uses algorithms to study patterns in data and develop models able to make predictions. The aim of this study is to identify HCM subtypes and examine the mechanisms of HCM using machine learning algorithms. Clinical and laboratory findings of 143 adult patients with a confirmed diagnosis of nonobstructive HCM are analyzed; HCM subtypes are determined by clustering, while the presence of different HCM features is predicted in classification machine learning tasks. Four clusters are determined as the optimal number of clusters for this dataset. Models that can predict the presence of particular HCM features from other genotypic and phenotypic information are generated, and subsets of features sufficient to predict the presence of other features of HCM are determined. This research proposes four subtypes of HCM assessed by machine learning algorithms and based on the overall phenotypic expression of the participants of the study. The identified subsets of features sufficient to determine the presence of particular HCM aspects could provide deeper insights into the mechanisms of HCM.
Project description:Diabetic nephropathy (DN), a multifaceted disease with various contributing factors, presents challenges in understanding its underlying causes. Uncovering biomarkers linked to this condition can shed light on its pathogenesis and support the creation of new diagnostic and treatment methods. Gene expression data were sourced from accessible public databases, and Weighted Gene Co-expression Network Analysis (WGCNA)was employed to pinpoint gene co-expression modules relevant to DN. Subsequently, various machine learning techniques, such as random forest, lasso regression algorithm (LASSO), and support vector machine-recursive feature elimination (SVM-REF), were utilized for distinguishing DN cases from controls using the identified gene modules. Additionally, functional enrichment analyses were conducted to explore the biological roles of these genes. Our analysis revealed 131 genes showing distinct expression patterns between controlled and uncontrolled groups. During the integrated WCGNA, we identified 61 co-expressed genes encompassing both categories. The enrichment analysis highlighted involvement in various immune responses and complex activities. Techniques like Random Forest, LASSO, and SVM-REF were applied to pinpoint key hub genes, leading to the identification of VWF and DNASE1L3. In the context of DN, they demonstrated significant consistency in both expression and function. Our research uncovered potential biomarkers for DN through the application of WGCNA and various machine learning methods. The results indicate that 2 central genes could serve as innovative diagnostic indicators and therapeutic targets for this disease. This discovery offers fresh perspectives on the development of DN and could contribute to the advancement of new diagnostic and treatment approaches.
Project description:ObjectiveAbdominal aortic aneurysm (AAA) is a life-threatening vascular condition. This study aimed to discover new indicators for the early detection of AAA and explore the possible involvement of immune cell activity in its development.MethodsSourced from the Gene Expression Omnibus, the AAA microarray datasets GSE47472 and GSE57691 were combined to generate the training set. Additionally, a separate dataset (GSE7084) was designated as the validation set. Enrichment analyses were carried out to explore the underlying biological mechanisms using Disease Ontology, Kyoto Encyclopedia of Genes and Genomes, and Gene Ontology. We then utilized weighted gene co-expression network analysis (WGCNA) along with 3 machine learning techniques: least absolute shrinkage and selection operator, support vector machine-recursive feature elimination, and random forest, to identify feature genes for AAA. Moreover, data were validated using the receiver operating characteristic (ROC) curve, with feature genes defined as those having an area under the curve above 85% and a p-value below 0.05. Finally, the single sample gene set enrichment analysis algorithm was applied to probe the immune landscape in AAA and its connection to the selected feature genes.ResultsWe discovered 72 differentially expressed genes (DEGs) when comparing healthy and AAA samples, including 36 upregulated and 36 downregulated genes. Functional enrichment analysis revealed that the DEGs associated with AAA are primarily involved in inflammatory regulation and immune response. By intersecting the result of 3 machine learning algorithms and WGCNA, 3 feature genes were identified, including MRAP2, PPP1R14A, and PLN genes. The diagnostic performance of all these genes was strong, as revealed by the ROC analysis. A significant increase in 15 immune cell types in AAA samples was observed, based on the analysis of immune cell infiltration. In addition, the 3 feature genes show a strong linkage with different types of immune cells.ConclusionThree feature genes (MRAP2, PPP1R14A, and PLN) related to the development of AAA were identified. These genes are linked to immune cell activity and the inflammatory microenvironment, providing potential biomarkers for early detection and a basis for further research into AAA progression.
Project description:BackgroundAs the leading cause of chronic kidney disease, diabetic kidney disease (DKD) is an enormous burden for all healthcare systems around the world. However, its early diagnosis has no effective methods.MethodsFirst, gene expression data in GEO database were extracted, and the differential genes of diabetic tubulopathy were obtained. Immune-related genesets were generated by WGCNA and immune cell infiltration analyses. Then, differentially expressed immune-related cuproptosis genes (DEICGs) were derived by the intersection of differential genes and genes related to cuproptosis and immune. To investigate the functions of DEICGs, volcano plots and GO term enrichment analysis was performed. Machine learning and protein-protein interaction (PPI) network analysis helped to finally screen out hub genes. The diagnostic efficacy of them was evaluated by GSEA analysis, receiver operating characteristic (ROC) curve, single-cell RNA sequencing and the Nephroseq website. The expression of hub genes at the animal level by STZ -induced and db/db DKD mouse models was further verified.ResultsFinally, three hub genes, including FSTL1, CX3CR1 and AGR2 that were up-regulated in both the test set GSE30122 and the validation set GSE30529, were screened. The areas under the curve (AUCs) of ROC curves of hub genes were 0.911, 0.935 and 0.922, respectively, and 0.946 when taking as a whole. Correlation analysis showed that the expression level of three hub genes demonstrated their negative relationship with GFR, while those of FSTL1 displayed a positive correlation with the level of serum creatinine. GSEA was enriched in inflammatory and immune-related pathways. Single-nucleus RNA sequencing indicated the main distribution of FSTL1 in podocyte and mesangial cells, the high expression of CX3CR1 in leukocytes and the main localization of AGR2 in the loop of Henle. In mouse models, all three hub genes were increased in both STZ-induced and db/db DKD models.ConclusionMachine learning was combined with WGCNA, immune cell infiltration and PPI analyses to identify three hub genes associated with cuproptosis, immunity and diabetic nephropathy, which all have great potential as diagnostic markers for DKD and even predict disease progression.