Browse
Submit Data
Databases
API
Help

Dataset Information

18 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

XGBoost-Based Feature Learning Method for Mining COVID-19 Novel Diagnostic Markers.

ABSTRACT: In December 2019, an outbreak of novel coronavirus pneumonia spread over Wuhan, Hubei Province, China, which then developed into a significant global health public event, giving rise to substantial economic losses. We downloaded throat swab expression profiling data of COVID-19 positive and negative patients from the Gene Expression Omnibus (GEO) database to mine novel diagnostic biomarkers. XGBoost was used to construct the model and select feature genes. Subsequently, we constructed COVID-19 classifiers such as MARS, KNN, SVM, MIL, and RF using machine learning methods. We selected the KNN classifier with the optimal MCC value from these classifiers using the IFS method to identify 24 feature genes. Finally, we used principal component analysis to classify the samples and found that the 24 feature genes could effectively be used to classify COVID-19-positive and negative patients. Additionally, we analyzed the possible biological functions and signaling pathways in which the 24 feature genes were involved by GO and KEGG enrichment analyses. The results demonstrated that these feature genes were primarily enriched in biological functions such as viral transcription and viral gene expression and pathways such as Coronavirus disease-COVID-19. In summary, the 24 feature genes we identified were highly effective in classifying COVID-19 positive and negative patients, which could serve as novel markers for COVID-19.

SUBMITTER: Song X

PROVIDER: S-EPMC9256927 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

A novel method of literature mining to identify candidate COVID-19 drugs.

Project description:SummaryCOVID-19 is a serious infectious disease that has recently emerged and continues to spread worldwide. Its spreading rate is too high to expect that new specific drugs will be developed in sufficient time. As an alternative, drugs already developed for other diseases have been tested for use in the treatment of COVID-19 (drug repositioning). However, to select candidate drugs from a large number of compounds, numerous inhibition assays involving viral infection of cultured cells are required. For efficiency, it would be useful to narrow the list of candidates down using logical considerations prior to performing these assays. We have developed a powerful tool to predict candidate drugs for the treatment of COVID-19 and other diseases. This tool is based on the concatenation of events/substances, each of which is linked to a KEGG (Kyoto Encyclopedia of Genes and Genomes) code based on a relationship obtained from text mining of the vast literature in the PubMed database. By analyzing 21 589 326 records with abstracts from PubMed, 98 556 KEGG codes with NAME/DEFINITION fields were connected. Among them, 9799 KEGG drug codes were connected to COVID-19, of which 7492 codes had no direct connection to COVID-19. Although this report focuses on COVID-19, the program developed here can be applied to other infectious diseases and used to quickly identify drug candidates when new infectious diseases appear in the future.Availability and implementationThe programs and data underlying this article will be shared on reasonable request to the corresponding authors.Contactatmuramatsu@g.ecc.u-tokyo.ac.jp, amtanok@mail.ecc.u-tokyo.ac.jp.Supplementary informationSupplementary data are available at Bioinformatics Advances online.

| S-EPMC9710631 | biostudies-literature

Identification of COVID-19-Specific Immune Markers Using a Machine Learning Method.

Project description:Notably, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a tight relationship with the immune system. Human resistance to COVID-19 infection comprises two stages. The first stage is immune defense, while the second stage is extensive inflammation. This process is further divided into innate and adaptive immunity during the immune defense phase. These two stages involve various immune cells, including CD4+ T cells, CD8+ T cells, monocytes, dendritic cells, B cells, and natural killer cells. Various immune cells are involved and make up the complex and unique immune system response to COVID-19, providing characteristics that set it apart from other respiratory infectious diseases. In the present study, we identified cell markers for differentiating COVID-19 from common inflammatory responses, non-COVID-19 severe respiratory diseases, and healthy populations based on single-cell profiling of the gene expression of six immune cell types by using Boruta and mRMR feature selection methods. Some features such as IFI44L in B cells, S100A8 in monocytes, and NCR2 in natural killer cells are involved in the innate immune response of COVID-19. Other features such as ZFP36L2 in CD4+ T cells can regulate the inflammatory process of COVID-19. Subsequently, the IFS method was used to determine the best feature subsets and classifiers in the six immune cell types for two classification algorithms. Furthermore, we established the quantitative rules used to distinguish the disease status. The results of this study can provide theoretical support for a more in-depth investigation of COVID-19 pathogenesis and intervention strategies.

| S-EPMC9344575 | biostudies-literature

A novel hybrid feature combination method for enhanced movie recommendations with user resemblance and attitude mining.

Project description:Most movie recommendation methods use hard-clustering and simple collaborative filtering techniques in order to achieve their end results. However, these methods tend to overlook crucial aspects of both users and items. When these methods hard cluster a movie item into a cluster, they turn a blind eye to the fact that the item also exhibits some properties of another cluster's items. Recommender systems facilitate users and relevant things expeditiously supported their requests and historic communications with alternative customers. Recommendation systems are a crucial portion of signifying things particularly in streaming amenities. For streaming motion-picture show services like Netflix, recommendation methods are vital for serving to users notice fresh movies to get pleasure from. However, massive amounts of information will turn out restrictions in recommendations due to accuracy as a result of diversity and meagerness problems. Our work proposes a unique hybrid technique that mixes collaborative filtering and characteristics of demographic filtering technique to point the close users, and associate against one another. This technique has been established over associate in tending analysis of the way to cut back the blunders in grading estimates supported users' earlier communications that ends up in improved prediction accuracy in among completely different algorithms. Additionally, a feature combination technique is utilized that progresses the expectation accuracy and to check our method, using MovieLens 1M dataset, we contended an offline assessment, already available assessment tactics, and compared the same with the output factors to support authenticating the proposed procedure.

| S-EPMC8357110 | biostudies-literature

Potential of vibrational spectroscopy coupled with machine learning as a non-invasive diagnostic method for COVID-19.

Project description:Background and objectiveEfforts to alleviate the ongoing coronavirus disease 2019 (COVID-19) crisis showed that rapid, sensitive, and large-scale screening is critical for controlling the current infection and that of ongoing pandemics.MethodsHere, we explored the potential of vibrational spectroscopy coupled with machine learning to screen COVID-19 patients in its initial stage. Herein presented is a hybrid classification model called grey wolf optimized support vector machine (GWO-SVM). The proposed model was tested and comprehensively compared with other machine learning models via vibrational spectroscopic fingerprinting including saliva FTIR spectra dataset and serum Raman scattering spectra dataset.ResultsFor the unknown vibrational spectra, the presented GWO-SVM model provided an accuracy, specificity and F1_score value of 0.9825, 0.9714 and 0.9778 for saliva FTIR spectra dataset, respectively, while an overall accuracy, specificity and F1_score value of 0.9085, 0.9552 and 0.9036 for serum Raman scattering spectra dataset, respectively, which showed superiority than those of state-of-the-art models, thereby suggesting the suitability of the GWO-SVM model to be adopted in a clinical setting for initial screening of COVID-19 patients.ConclusionsProspectively, the presented vibrational spectroscopy based GWO-SVM model can facilitate in screening of COVID-19 patients and alleviate the medical service burden. Therefore, herein proof-of-concept results showed the chance of vibrational spectroscopy coupled with GWO-SVM model to help COVID-19 diagnosis and have the potential be further used for early screening of other infectious diseases.

| S-EPMC9711896 | biostudies-literature

A novel speech emotion recognition method based on feature construction and ensemble learning.

Project description:In the field of Human-Computer Interaction (HCI), speech emotion recognition technology plays an important role. Facing a small number of speech emotion data, a novel speech emotion recognition method based on feature construction and ensemble learning is proposed in this paper. Firstly, the acoustic features are extracted from the speech signal and combined to form different original feature sets. Secondly, based on Light Gradient Boosting Machine (LightGBM) and Sequential Forward Selection (SFS) method, a novel feature selection method named L-SFS is proposed. And then, the softmax regression model is used to learn automatically the weights of the four single weak learners including Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Extreme Gradient Boosting (XGBoost) and LightGBM. Lastly, based on the learned automatically weights and the weighted average probability voting strategy, an ensemble classification model named Sklex is constructed, which integrates the above four single weak learners. In conclusion, the method reflects the effectiveness of feature construction and the superiority and stability of ensemble learning, and gets good speech emotion recognition accuracy.

| S-EPMC9377622 | biostudies-literature

A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites.

Project description:Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.

| S-EPMC8576272 | biostudies-literature

PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning.

Project description:The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online textual data to predict an event's trends can be troublesome. To our knowledge, no study analyzes the online news articles and the disease data about coronavirus disease. Therefore, we propose an LDA-based topic model, called PAN-LDA (Pandemic-Latent Dirichlet allocation), that incorporates the COVID-19 cases data and news articles into common LDA to obtain a new set of features. The generated features are introduced as additional features to Machine learning(ML) algorithms to improve the forecasting of time series data. Furthermore, we are employing collapsed Gibbs sampling (CGS) as the underlying technique for parameter inference. The results from experiments suggest that the obtained features from PAN-LDA generate more identifiable topics and empirically add value to the outcome.

| S-EPMC8505021 | biostudies-literature

COVID-19 Diagnostic System Using Medical Image Classification and Retrieval: A Novel Method for Image Analysis

Project description: Not available

| S-EPMC8194842 | biostudies-literature

A machine learning method for the identification and characterization of novel COVID-19 drug targets.

Project description:In addition to vaccines, the World Health Organization sees novel medications as an urgent matter to fight the ongoing COVID-19 pandemic. One possible strategy is to identify target proteins, for which a perturbation by an existing compound is likely to benefit COVID-19 patients. In order to contribute to this effort, we present GuiltyTargets-COVID-19 ( https://guiltytargets-covid.eu/ ), a machine learning supported web tool to identify novel candidate drug targets. Using six bulk and three single cell RNA-Seq datasets, together with a lung tissue specific protein-protein interaction network, we demonstrate that GuiltyTargets-COVID-19 is capable of (i) prioritizing meaningful target candidates and assessing their druggability, (ii) unraveling their linkage to known disease mechanisms, (iii) mapping ligands from the ChEMBL database to the identified targets, and (iv) pointing out potential side effects in the case that the mapped ligands correspond to approved drugs. Our example analyses identified 4 potential drug targets from the datasets: AKT3 from both the bulk and single cell RNA-Seq data as well as AKT2, MLKL, and MAPK11 in the single cell experiments. Altogether, we believe that our web tool will facilitate future target identification and drug development for COVID-19, notably in a cell type and tissue specific manner.

| S-EPMC10156718 | biostudies-literature

Prediction of Recombination Spots Using Novel Hybrid Feature Extraction Method via Deep Learning Approach.

Project description:Meiotic recombination is the driving force of evolutionary development and an important source of genetic variation. The meiotic recombination does not take place randomly in a chromosome but occurs in some regions of the chromosome. A region in chromosomes with higher rate of meiotic recombination events are considered as hotspots and a region where frequencies of the recombination events are lower are called coldspots. Prediction of meiotic recombination spots provides useful information about the basic functionality of inheritance and genome diversity. This study proposes an intelligent computational predictor called iRSpots-DNN for the identification of recombination spots. The proposed predictor is based on a novel feature extraction method and an optimized deep neural network (DNN). The DNN was employed as a classification engine whereas, the novel features extraction method was developed to extract meaningful features for the identification of hotspots and coldspots across the yeast genome. Unlike previous algorithms, the proposed feature extraction avoids bias among different selected features and preserved the sequence discriminant properties along with the sequence-structure information simultaneously. This study also considered other effective classifiers named support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to predict recombination spots. Experimental results on a benchmark dataset with 10-fold cross-validation showed that iRSpots-DNN achieved the highest accuracy, i.e., 95.81%. Additionally, the performance of the proposed iRSpots-DNN is significantly better than the existing predictors on a benchmark dataset. The relevant benchmark dataset and source code are freely available at: https://github.com/Fatima-Khan12/iRspot_DNN/tree/master/iRspot_DNN.

| S-EPMC7527634 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data