Project description:We report the application of single-cell RNA sequencing(scRNA-seq) in mouse monocyte cells by integrating scRNA-seq, transcriptionfactor binding motifs, and ATAC-seq data using machine learning. We generated scRNA-seqdata from mouse monocytes treated with PBS, SD-LPS, 4-PBA, and SD-LPS + 4-PBA tounderstand the gene regulatory networks of monocytes under the low-grade inflammatorycondition and the mechanism of action for 4-PBA. We find two novelsubpopulations of monocyte cells in response to SD-LPS. We show that 4-PBApotently reprograms an anti-inflammatory monocyte phenotype and masks theeffects of subclinical low dose LPS. Together with TF binding motifs and ATAC-seqdata, a machine learning method, using guided, regularized random forest (GRRF)and feature selection was developed to select the best candidate TFs that areinvolved in the activation of monocytes within different clusters. Our results suggestthat our new machine learning method can select candidate regulatory genes aspotential targets for developing new therapeutics against low-gradeinflammation.
Project description:Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.
Project description:Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.
Project description:Due to difficulty in early diagnosis of Alzheimer's disease (AD) related to cost and differentiated capability, it is necessary to identify low-cost, accessible, and reliable tools for identifying AD risk in the preclinical stage. We hypothesized that cognitive ability, as expressed in the vocal features in daily conversation, is associated with AD progression. Thus, we have developed a novel machine learning prediction model to identify AD risk by using the rich voice data collected from daily conversations, and evaluated its predictive performance in comparison with a classification method based on the Japanese version of the Telephone Interview for Cognitive Status (TICS-J). We used 1,465 audio data files from 99 Healthy controls (HC) and 151 audio data files recorded from 24 AD patients derived from a dementia prevention program conducted by Hachioji City, Tokyo, between March and May 2020. After extracting vocal features from each audio file, we developed machine-learning models based on extreme gradient boosting (XGBoost), random forest (RF), and logistic regression (LR), using each audio file as one observation. We evaluated the predictive performance of the developed models by describing the receiver operating characteristic (ROC) curve, calculating the areas under the curve (AUCs), sensitivity, and specificity. Further, we conducted classifications by considering each participant as one observation, computing the average of their audio files' predictive value, and making comparisons with the predictive performance of the TICS-J based questionnaire. Of 1,616 audio files in total, 1,308 (81.0%) were randomly allocated to the training data and 308 (19.1%) to the validation data. For audio file-based prediction, the AUCs for XGboost, RF, and LR were 0.863 (95% confidence interval [CI]: 0.794-0.931), 0.882 (95% CI: 0.840-0.924), and 0.893 (95%CI: 0.832-0.954), respectively. For participant-based prediction, the AUC for XGboost, RF, LR, and TICS-J were 1.000 (95%CI: 1.000-1.000), 1.000 (95%CI: 1.000-1.000), 0.972 (95%CI: 0.918-1.000) and 0.917 (95%CI: 0.918-1.000), respectively. There was difference in predictive accuracy of XGBoost and TICS-J with almost approached significance (p = 0.065). Our novel prediction model using the vocal features of daily conversations demonstrated the potential to be useful for the AD risk assessment.
Project description:Umami peptides enhance the umami taste of food and have good food processing properties, nutritional value, and numerous potential applications. Wet testing for the identification of umami peptides is a time-consuming and expensive process. Here, we report the iUmami-DRLF that uses a logistic regression (LR) method solely based on the deep learning pre-trained neural network feature extraction method, unified representation (UniRep based on multiplicative LSTM), for feature extraction from the peptide sequences. The findings demonstrate that deep learning representation learning significantly enhanced the capability of models in identifying umami peptides and predictive precision solely based on peptide sequence information. The newly validated taste sequences were also used to test the iUmami-DRLF and other predictors, and the result indicates that the iUmami-DRLF has better robustness and accuracy and remains valid at higher probability thresholds. The iUmami-DRLF method can aid further studies on enhancing the umami flavor of food for satisfying the need for an umami-flavored diet.
Project description:HIV reservoirs persist despite successful antiretroviral therapy (ART) and are a major obstacle to the eradication and cure of HIV. The mature monocyte subset, CD14+CD16+, contributes to viral reservoirs and HIV-associated comorbidities. Only a subset of monocytes harbors HIV (HIV+), while the rest remain uninfected, exposed cells (HIVexp). We developed an innovative single cell RNA sequencing (scRNAseq) pipeline that detects HIV and host transcripts simultaneously, enabling us to examine differences between HIV+ and HIVexp mature monocytes. Using this, we characterized uninfected, HIV+, and HIVexp primary human mature monocytes with and without ART. We showed that HIV+ mature monocytes do not form their own cluster separately from HIVexp but can be distinguished by significant differential gene expression. We found that ART decreased levels of unspliced HIV transcripts potentially by modulating host transcriptional regulators shown to decrease viral infection and replication. We also identified and characterized mature monocyte subpopulations differentially impacted by HIV and ART. We identified genes dysregulated by ART in HIVexp monocytes compared to their uninfected counterpart and, of interest, the junctional protein ALCAM, suggesting that ART impacts monocyte functions. Our data provide a novel method for simultaneous detection of HIV and host transcripts. We identify potential targets, such as those genes whose expression is increased in HIV+ mature monocytes compared to HIVexp, to block their entry into tissues, preventing establishment/replenishment of HIV reservoirs even with ART, thereby reducing and/or eliminating viral burden and HIV-associated comorbidities. Our data also highlight the heterogeneity of mature monocyte subsets and their potential contributions to HIV pathogenesis in the ART era.IMPORTANCE HIV enters tissues early after infection, leading to establishment and persistence of HIV reservoirs despite antiretroviral therapy (ART). Viral reservoirs are a major obstacle to the eradication and cure of HIV. CD14+CD16+ (mature) monocytes may contribute to establishment and reseeding of reservoirs. A subset of monocytes, consisting mainly of CD14+CD16+ cells, harbors HIV (HIV+), while the rest remain uninfected, exposed cells (HIVexp). It is important to identify cells harboring virus to eliminate reservoirs. Using an innovative single-cell RNA sequencing (scRNAseq) pipeline to detect HIV and host transcripts simultaneously, we characterized HIV+ and HIVexp primary human mature monocytes with and without ART. HIV+ mature monocytes are not a unique subpopulation but rather can be distinguished from HIVexp by differential gene expression. We characterized mature monocyte subpopulations differently impacted by HIV and ART, highlighting their potential contributions to HIV-associated comorbidities. Our data propose therapeutic targets to block HIV+ monocyte entry into tissues, preventing establishment and replenishment of reservoirs even with ART.
Project description:The massive socioeconomic impacts engendered by extreme floods provides a clear motivation for improved understanding of flood drivers. We use self-organizing maps, a type of artificial neural network, to perform unsupervised clustering of climate reanalysis data to identify synoptic-scale atmospheric circulation patterns associated with extreme floods across the United States. We subsequently assess the flood characteristics (e.g., frequency, spatial domain, event size, and seasonality) specific to each circulation pattern. To supplement this analysis, we have developed an interactive website with detailed information for every flood of record. We identify four primary categories of circulation patterns: tropical moisture exports, tropical cyclones, atmospheric lows or troughs, and melting snow. We find that large flood events are generally caused by tropical moisture exports (tropical cyclones) in the western and central (eastern) United States. We identify regions where extreme floods regularly occur outside the normal flood season (e.g., the Sierra Nevada Mountains due to tropical moisture exports) and regions where multiple extreme flood events can occur within a single year (e.g., the Atlantic seaboard due to tropical cyclones and atmospheric lows or troughs). These results provide the first machine-learning based near-continental scale identification of atmospheric circulation patterns associated with extreme floods with valuable insights for flood risk management.
Project description:Traumatic Brain Injury (TBI) is a frequently occurring condition and approximately 90% of TBI cases are classified as mild (mTBI). However, conventional MRI has limited diagnostic and prognostic value, thus warranting the utilization of additional imaging modalities and analysis procedures. The functional connectomic approach using resting-state functional MRI (rs-fMRI) has shown great potential and promising diagnostic capabilities across multiple clinical scenarios, including mTBI. Additionally, there is increasing recognition of a fundamental role of brain dynamics in healthy and pathological cognition. Here, we undertake an in-depth investigation of mTBI-related connectomic disturbances and their emotional and cognitive correlates. We leveraged machine learning and graph theory to combine static and dynamic functional connectivity (FC) with regional entropy values, achieving classification accuracy up to 75% (77, 74 and 76% precision, sensitivity and specificity, respectively). As compared to healthy controls, the mTBI group displayed hypoconnectivity in the temporal poles, which correlated positively with semantic (r = 0.43, p < 0.008) and phonemic verbal fluency (r = 0.46, p < 0.004), while hypoconnectivity in the right dorsal posterior cingulate correlated positively with depression symptom severity (r = 0.54, p < 0.0006). These results highlight the importance of residual FC in these regions for preserved cognitive and emotional function in mTBI. Conversely, hyperconnectivity was observed in the right precentral and supramarginal gyri, which correlated negatively with semantic verbal fluency (r=-0.47, p < 0.003), indicating a potential ineffective compensatory mechanism. These novel results are promising toward understanding the pathophysiology of mTBI and explaining some of its most lingering emotional and cognitive symptoms.
Project description:ObjectivesPost-stroke depression (PSD) is a common and serious psychiatric complication which hinders functional recovery and social participation of stroke patients. Stroke is characterized by dynamic changes in metabolism and hemodynamics, however, there is still a lack of metabolism-associated effective and reliable diagnostic markers and therapeutic targets for PSD. Our study was dedicated to the discovery of metabolism related diagnostic and therapeutic biomarkers for PSD.MethodsExpression profiles of GSE140275, GSE122709, and GSE180470 were obtained from GEO database. Differentially expressed genes (DEGs) were detected in GSE140275 and GSE122709. Functional enrichment analysis was performed for DEGs in GSE140275. Weighted gene co-expression network analysis (WGCNA) was constructed in GSE122709 to identify key module genes. Moreover, correlation analysis was performed to obtain metabolism related genes. Interaction analysis of key module genes, metabolism related genes, and DEGs in GSE122709 was performed to obtain candidate hub genes. Two machine learning algorithms, least absolute shrinkage and selection operator (LASSO) and random forest, were used to identify signature genes. Expression of signature genes was validated in GSE140275, GSE122709, and GSE180470. Gene set enrichment analysis (GSEA) was applied on signature genes. Based on signature genes, a nomogram model was constructed in our PSD cohort (27 PSD patients vs. 54 controls). ROC curves were performed for the estimation of its diagnostic value. Finally, correlation analysis between expression of signature genes and several clinical traits was performed.ResultsFunctional enrichment analysis indicated that DEGs in GSE140275 enriched in metabolism pathway. A total of 8,188 metabolism associated genes were identified by correlation analysis. WGCNA analysis was constructed to obtain 3,471 key module genes. A total of 557 candidate hub genes were identified by interaction analysis. Furthermore, two signature genes (SDHD and FERMT3) were selected using LASSO and random forest analysis. GSEA analysis found that two signature genes had major roles in depression. Subsequently, PSD cohort was collected for constructing a PSD diagnosis. Nomogram model showed good reliability and validity. AUC values of receiver operating characteristic (ROC) curve of SDHD and FERMT3 were 0.896 and 0.964. ROC curves showed that two signature genes played a significant role in diagnosis of PSD. Correlation analysis found that SDHD (r = 0.653, P < 0.001) and FERM3 (r = 0.728, P < 0.001) were positively related to the Hamilton Depression Rating Scale 17-item (HAMD) score.ConclusionA total of 557 metabolism associated candidate hub genes were obtained by interaction with DEGs in GSE122709, key modules genes, and metabolism related genes. Based on machine learning algorithms, two signature genes (SDHD and FERMT3) were identified, they were proved to be valuable therapeutic and diagnostic biomarkers for PSD. Early diagnosis and prevention of PSD were made possible by our findings.
Project description:Angiogenesis is a key process for the proliferation and metastatic spread of cancer cells. Anti-angiogenic peptides (AAPs), with the capability of inhibiting angiogenesis, are promising candidates in cancer treatment. We propose AAPL, a sequence-based predictor to identify AAPs with machine learning models of improved prediction accuracy. Each peptide sequence was transformed to a vector of 4335 numeric values according to 58 different feature types, followed by a heuristic algorithm for feature selection. Next, the hyperparameters of six machine learning models were optimized with respect to the feature subset. We considered two datasets, one with entire peptide sequences and the other with 15 amino acids from peptide N-termini. AAPL achieved Matthew's correlation coefficients of 0.671 and 0.756 for independent tests based on the two datasets, respectively, outperforming existing predictors by a range of 5.3% to 24.6%. Further analyses show that AAPL yields higher prediction accuracy for peptides with more hydrophobic residues, and fewer hydrophilic and charged residues. The source code of AAPL is available at https://github.com/yunzheng2002/Anti-angiogenic .