Dataset Information

Integrated COVID-19 Predictor: Differential expression analysis to reveal potential biomarkers and prediction of coronavirus using RNA-Seq profile data.

ABSTRACT:

Background

The world has been battling the continuous COVID-19 pandemic spread by the SARS-CoV-2 virus for last two years. The issue of viral disease prediction is constantly a matter of interest in virology and the study of disease transmission over the long years.

Objective

In this study, we aimed to implement genome association studies using RNA-Seq of COVID-19 and reveal highly expressed gene biomarkers and prediction based on the machine learning model of COVID-19 analysis to combat this pandemic.

Method

We collected RNA-Seq gene count data for both healthy (Control) and non-healthy (Treated) COVID-19 cases. In this experiment, a sequence of bioinformatics strategies and statistical techniques, such as fold-change and adjusted p-value, were processed to identify differentially expressed genes (DEGs). We filtered biomarker sets of high DEGs, moderate DEGs, and low DEGs using DESeq2, Limma Trend, and Limma Voom methods based on intersection and union operations and applied machine learning techniques to predict COVID-19.

Result

Through experimental analysis, 67 potential biomarkers were extracted, comprising 49 up-regulated and 18 down-regulated genes, using statistical techniques and a set-theory consensus strategy. We trained the machine learning models on 12 different biomarker sets and found that the SVM model performed better than the other classifiers with 99.07% classification accuracy for moderate DEGs.

Conclusion

Our study revealed that identified differentially expressed genes of the moderate DEGs biomarker set, |log2FC| ≥ 2 with adjusted p-value < 0.05, work significantly as input features to implement a machine learning model using a kernel-based SVM technique to predict COVID-19.

SUBMITTER: Iqbal N

PROVIDER: S-EPMC9162937 | biostudies-literature | 2022 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Integrated COVID-19 Predictor: Differential expression analysis to reveal potential biomarkers and prediction of coronavirus using RNA-Seq profile data.

Iqbal Naiyar N Kumar Pradeep P

Computers in biology and medicine 20220603

<h4>Background</h4>The world has been battling the continuous COVID-19 pandemic spread by the SARS-CoV-2 virus for last two years. The issue of viral disease prediction is constantly a matter of interest in virology and the study of disease transmission over the long years.<h4>Objective</h4>In this study, we aimed to implement genome association studies using RNA-Seq of COVID-19 and reveal highly expressed gene biomarkers and prediction based on the machine learning model of COVID-19 analysis to ...[more]

PMID: 35687925

Similar Datasets

Project description:BackgroundRetroperitoneal liposarcoma (RLPS) is a mesenchymal malignant tumor characterized by different degrees of adipocytic differentiation. Well-differentiated liposarcoma (WDLPS) and dedifferentiated liposarcoma (DDLPS) are two of the most common subtypes of RLPS, exhibiting clear differences in biological behaviors and clinical prognosis. The metabolic features and genomic characteristics remain unclear.MethodsThis study employed lipidomic and RNA-seq analyses of RLPS tissues from 19 WDLPS and 29 DDLPS patients. Western blot and immunohistochemistry staining were performed to verify the tumor tissue protein levels of TIMP1, FN1, MMP11, GPNMB, and ECM1. Enzyme-linked immunosorbent assay (ELISA) was performed to evaluate different serum protein levels in 128 blood samples from patients with RLPS. Multivariate analysis was performed to identify the most crucial variables associated with overall survival (OS) and recurrence-free survival (RFS) of the RLPS patients.ResultsLipidomic analysis revealed a significant difference in lipid metabolism, particularly in phosphatidylcholines and triacylglycerides metabolism. RNA sequencing analysis revealed that 1,630 differentially expressed genes (DEGs) were significantly enriched in lipid metabolism, developmental process, and extracellular matrix (ECM) pathways. Integrated lipidomic and transcriptomic analysis identified 29 genes as potential biomarkers between WDLPS and DDLPS. Among the 29 DEGs, we found that TIMP1, FN1, MMP11, GPNMB, and ECM1 were increased in DDLPS tumor tissues than in WDLPS tumor tissues. The receiver operating characteristic (ROC) curve showed high specificity and sensitivity in diagnosing patients using a five-gene combination (AUC = 0.904). ELISA revealed a significant increase in the serum levels of ECM1 and GPNMB in patients with DDLPS compared to patients with WDLPS. ECM1 increased progressively across different FNCLCC Grades, correlating negatively with RFS (P = 0.043). GPNMB levels showed a negative correlation with OS (P = 0.019).ConclusionsOur study reveals different lipid metabolism, several transcriptional pathways between WDLPS and DDLPS, and examines several serum markers associated with the prognosis of RLPS. These findings provide a vital basis for future endeavors in diagnosing and predicting the prognosis of retroperitoneal liposarcoma with different differentiations.

Project description:BackgroundUse of next-generation sequencing technologies to transcriptomics (RNA-seq) for gene expression profiling has found widespread application in studying different biological conditions including cancers. However, RNA-seq experiments are still small sample size experiments due to the cost. Recently, an increased focus has been on meta-analysis methods for integrated differential expression analysis for exploration of potential biomarkers. In this study, we propose a p-value combination method for meta-analysis of multiple independent but related RNA-seq studies that accounts for sample size of a study and direction of expression of genes in individual studies.ResultsThe proposed method generalizes the inverse-normal method without an increase in statistical or computational complexity and does not pre- or post-hoc filter genes that have conflicting direction of expression in different studies. Thus, the proposed method, as compared to the inverse-normal, has better potential for the discovery of differentially expressed genes (DEGs) with potentially conflicting differential signals from multiple studies related to disease. We demonstrated the use of the proposed method in detection of biologically relevant DEGs in glioblastoma (GBM), the most aggressive brain cancer. Our approach notably enabled the identification of over-expressed tumour suppressor gene RAD51 in GBM compared to healthy controls, which has recently been shown to be a target for inhibition to enhance radiosensitivity of GBM cells during treatment. Pathway analysis identified multiple aberrant GBM related pathways as well as novel regulators such as TCF7L2 and MAPT as important upstream regulators in GBM.ConclusionsThe proposed meta-analysis method generalizes the existing inverse-normal method by providing a way to establish differential expression status for genes with conflicting direction of expression in individual RNA-seq studies. Hence, leading to further exploration of them as potential biomarkers for the disease.

Project description:Transcription factors (TFs) play critical roles in mediating the plant response to various abiotic stresses, particularly heat stress. Plants respond to elevated temperatures by modulating the expression of genes involved in diverse metabolic pathways, a regulatory process primarily governed by multiple TFs in a networked configuration. Many TFs, such as WRKY, MYB, NAC, bZIP, zinc finger protein, AP2/ERF, DREB, ERF, bHLH, and brassinosteroids, are associated with heat shock factor (Hsf) families, and are involved in heat stress tolerance. These TFs hold the potential to control multiple genes, which makes them ideal targets for enhancing the heat stress tolerance of crop plants. Despite their immense importance, only a small number of heat-stress-responsive TFs have been identified in rice. The molecular mechanisms underpinning the role of TFs in rice adaptation to heat stress still need to be researched. This study identified three TF genes, including OsbZIP14, OsMYB2, and OsHSF7, by integrating transcriptomic and epigenetic sequencing data analysis of rice in response to heat stress. Through comprehensive bioinformatics analysis, we demonstrated that OsbZIP14, one of the key heat-responsive TF genes, contained a basic-leucine zipper domain and primarily functioned as a nuclear TF with transcriptional activation capability. By knocking out the OsbZIP14 gene in the rice cultivar Zhonghua 11, we observed that the knockout mutant OsbZIP14 exhibited dwarfism with reduced tiller during the grain-filling stage. Under high-temperature treatment, it was also demonstrated that in the OsbZIP14 mutant, the expression of the OsbZIP58 gene, a key regulator of rice seed storage protein (SSP) accumulation, was upregulated. Furthermore, bimolecular fluorescence complementation (BiFC) experiments uncovered a direct interaction between OsbZIP14 and OsbZIP58. Our results suggested that OsbZIP14 acts as a key TF gene through the concerted action of OsbZIP58 and OsbZIP14 during rice filling under heat stress. These findings provide good candidate genes for genetic improvement of rice but also offer valuable scientific insights into the mechanism of heat tolerance stress in rice.

Project description:BackgroundPostmenopausal osteoporosis (PMOP) represents as a significant health concern, particularly as the population ages. Currently, there is a paucity of comprehensive descriptions regarding the immunoregulatory mechanisms and early diagnostic biomarkers associated with PMOP. This study aims to examine immune-related differentially expressed genes (IR-DEGs) in the peripheral blood mononuclear cells of PMOP patients to identify immunological patterns and diagnostic biomarkers.MethodsThe GSE56815 dataset from the Gene Expression Omnibus (GEO) database was used as the training group, while the GSE2208 dataset served as the validation group. Initially, differential expression analysis was conducted after data integration to identify IR-DEGs in the peripheral blood mononuclear cells of PMOP. Subsequently, feature selection of these IR-DEGs was performed using RF, SVM-RFE, and LASSO regression models. Additionally, the expression of IR-DEGs in distinct bone marrow cell subtypes was analyzed using single-cell RNA sequencing (scRNA-seq) datasets, allowing the identification of cellular communication patterns within various cell subgroups. Finally, molecular subtypes and diagnostic models for PMOP were constructed based on these selected IR-DEGs. Furthermore, the expression levels of characteristic IR-DEGs were examined in rat osteoporosis (OP) models.ResultsUsing machine learning, six IR-DEGs (JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5) were identified. Subsequently, two molecular subtypes of PMOP (subtype 1 and subtype 2) were established, with subtype 1 exhibiting a higher proportion of M1 macrophage infiltration. Analysis of the scRNA-seq dataset revealed 11 distinct cell clusters. It was noted that JUN was significantly overexpressed in M1 macrophages, while HMOX1 showed a marked elevation in endothelial cells and M2 macrophages. Cell communication results suggested that the PMOP microenvironment features increased interactions among M2 macrophages, CD8+ T cells, Tregs, and fibroblasts. The diagnostic model based on these six IR-DEGs demonstrated excellent diagnostic performance (AUC = 0.927). In the OP rat model, the expression of IL1R2 and TNFSF8 were significantly elevated.ConclusionJUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5 may serve as promising molecular targets for diagnosing and subtyping patients with PMOP. These results offer novel perspectives on the early diagnosis of PMOP and the advancement of personalized immune-based therapies.

Dataset Information

Integrated COVID-19 Predictor: Differential expression analysis to reveal potential biomarkers and prediction of coronavirus using RNA-Seq profile data.

Background

Objective

Method

Result

Conclusion

Publications

Integrated COVID-19 Predictor: Differential expression analysis to reveal potential biomarkers and prediction of coronavirus using RNA-Seq profile data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets