Dataset Information

Machine learning identified genetic features associated with HIV sequences in the monocytes.

ABSTRACT:

SUBMITTER: Peng X

PROVIDER: S-EPMC10752474 | biostudies-literature | 2023 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Machine learning identified genetic features associated with HIV sequences in the monocytes.

Peng Xiaorong X Zhu Biao B

Chinese medical journal 20231128 24

PMID: 38018159

Similar Datasets

Project description:Due to difficulty in early diagnosis of Alzheimer's disease (AD) related to cost and differentiated capability, it is necessary to identify low-cost, accessible, and reliable tools for identifying AD risk in the preclinical stage. We hypothesized that cognitive ability, as expressed in the vocal features in daily conversation, is associated with AD progression. Thus, we have developed a novel machine learning prediction model to identify AD risk by using the rich voice data collected from daily conversations, and evaluated its predictive performance in comparison with a classification method based on the Japanese version of the Telephone Interview for Cognitive Status (TICS-J). We used 1,465 audio data files from 99 Healthy controls (HC) and 151 audio data files recorded from 24 AD patients derived from a dementia prevention program conducted by Hachioji City, Tokyo, between March and May 2020. After extracting vocal features from each audio file, we developed machine-learning models based on extreme gradient boosting (XGBoost), random forest (RF), and logistic regression (LR), using each audio file as one observation. We evaluated the predictive performance of the developed models by describing the receiver operating characteristic (ROC) curve, calculating the areas under the curve (AUCs), sensitivity, and specificity. Further, we conducted classifications by considering each participant as one observation, computing the average of their audio files' predictive value, and making comparisons with the predictive performance of the TICS-J based questionnaire. Of 1,616 audio files in total, 1,308 (81.0%) were randomly allocated to the training data and 308 (19.1%) to the validation data. For audio file-based prediction, the AUCs for XGboost, RF, and LR were 0.863 (95% confidence interval [CI]: 0.794-0.931), 0.882 (95% CI: 0.840-0.924), and 0.893 (95%CI: 0.832-0.954), respectively. For participant-based prediction, the AUC for XGboost, RF, LR, and TICS-J were 1.000 (95%CI: 1.000-1.000), 1.000 (95%CI: 1.000-1.000), 0.972 (95%CI: 0.918-1.000) and 0.917 (95%CI: 0.918-1.000), respectively. There was difference in predictive accuracy of XGBoost and TICS-J with almost approached significance (p = 0.065). Our novel prediction model using the vocal features of daily conversations demonstrated the potential to be useful for the AD risk assessment.

Project description:HIV reservoirs persist despite successful antiretroviral therapy (ART) and are a major obstacle to the eradication and cure of HIV. The mature monocyte subset, CD14+CD16+, contributes to viral reservoirs and HIV-associated comorbidities. Only a subset of monocytes harbors HIV (HIV+), while the rest remain uninfected, exposed cells (HIVexp). We developed an innovative single cell RNA sequencing (scRNAseq) pipeline that detects HIV and host transcripts simultaneously, enabling us to examine differences between HIV+ and HIVexp mature monocytes. Using this, we characterized uninfected, HIV+, and HIVexp primary human mature monocytes with and without ART. We showed that HIV+ mature monocytes do not form their own cluster separately from HIVexp but can be distinguished by significant differential gene expression. We found that ART decreased levels of unspliced HIV transcripts potentially by modulating host transcriptional regulators shown to decrease viral infection and replication. We also identified and characterized mature monocyte subpopulations differentially impacted by HIV and ART. We identified genes dysregulated by ART in HIVexp monocytes compared to their uninfected counterpart and, of interest, the junctional protein ALCAM, suggesting that ART impacts monocyte functions. Our data provide a novel method for simultaneous detection of HIV and host transcripts. We identify potential targets, such as those genes whose expression is increased in HIV+ mature monocytes compared to HIVexp, to block their entry into tissues, preventing establishment/replenishment of HIV reservoirs even with ART, thereby reducing and/or eliminating viral burden and HIV-associated comorbidities. Our data also highlight the heterogeneity of mature monocyte subsets and their potential contributions to HIV pathogenesis in the ART era.IMPORTANCE HIV enters tissues early after infection, leading to establishment and persistence of HIV reservoirs despite antiretroviral therapy (ART). Viral reservoirs are a major obstacle to the eradication and cure of HIV. CD14+CD16+ (mature) monocytes may contribute to establishment and reseeding of reservoirs. A subset of monocytes, consisting mainly of CD14+CD16+ cells, harbors HIV (HIV+), while the rest remain uninfected, exposed cells (HIVexp). It is important to identify cells harboring virus to eliminate reservoirs. Using an innovative single-cell RNA sequencing (scRNAseq) pipeline to detect HIV and host transcripts simultaneously, we characterized HIV+ and HIVexp primary human mature monocytes with and without ART. HIV+ mature monocytes are not a unique subpopulation but rather can be distinguished from HIVexp by differential gene expression. We characterized mature monocyte subpopulations differently impacted by HIV and ART, highlighting their potential contributions to HIV-associated comorbidities. Our data propose therapeutic targets to block HIV+ monocyte entry into tissues, preventing establishment and replenishment of reservoirs even with ART.

Project description:ObjectivesPost-stroke depression (PSD) is a common and serious psychiatric complication which hinders functional recovery and social participation of stroke patients. Stroke is characterized by dynamic changes in metabolism and hemodynamics, however, there is still a lack of metabolism-associated effective and reliable diagnostic markers and therapeutic targets for PSD. Our study was dedicated to the discovery of metabolism related diagnostic and therapeutic biomarkers for PSD.MethodsExpression profiles of GSE140275, GSE122709, and GSE180470 were obtained from GEO database. Differentially expressed genes (DEGs) were detected in GSE140275 and GSE122709. Functional enrichment analysis was performed for DEGs in GSE140275. Weighted gene co-expression network analysis (WGCNA) was constructed in GSE122709 to identify key module genes. Moreover, correlation analysis was performed to obtain metabolism related genes. Interaction analysis of key module genes, metabolism related genes, and DEGs in GSE122709 was performed to obtain candidate hub genes. Two machine learning algorithms, least absolute shrinkage and selection operator (LASSO) and random forest, were used to identify signature genes. Expression of signature genes was validated in GSE140275, GSE122709, and GSE180470. Gene set enrichment analysis (GSEA) was applied on signature genes. Based on signature genes, a nomogram model was constructed in our PSD cohort (27 PSD patients vs. 54 controls). ROC curves were performed for the estimation of its diagnostic value. Finally, correlation analysis between expression of signature genes and several clinical traits was performed.ResultsFunctional enrichment analysis indicated that DEGs in GSE140275 enriched in metabolism pathway. A total of 8,188 metabolism associated genes were identified by correlation analysis. WGCNA analysis was constructed to obtain 3,471 key module genes. A total of 557 candidate hub genes were identified by interaction analysis. Furthermore, two signature genes (SDHD and FERMT3) were selected using LASSO and random forest analysis. GSEA analysis found that two signature genes had major roles in depression. Subsequently, PSD cohort was collected for constructing a PSD diagnosis. Nomogram model showed good reliability and validity. AUC values of receiver operating characteristic (ROC) curve of SDHD and FERMT3 were 0.896 and 0.964. ROC curves showed that two signature genes played a significant role in diagnosis of PSD. Correlation analysis found that SDHD (r = 0.653, P < 0.001) and FERM3 (r = 0.728, P < 0.001) were positively related to the Hamilton Depression Rating Scale 17-item (HAMD) score.ConclusionA total of 557 metabolism associated candidate hub genes were obtained by interaction with DEGs in GSE122709, key modules genes, and metabolism related genes. Based on machine learning algorithms, two signature genes (SDHD and FERMT3) were identified, they were proved to be valuable therapeutic and diagnostic biomarkers for PSD. Early diagnosis and prevention of PSD were made possible by our findings.

Dataset Information

Machine learning identified genetic features associated with HIV sequences in the monocytes.

Publications

Machine learning identified genetic features associated with HIV sequences in the monocytes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets