Unknown

Dataset Information

0

A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction.


ABSTRACT: The increase in coronavirus disease 2019 (COVID-19) infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed pressure on healthcare services worldwide. Therefore, it is crucial to identify critical factors for the assessment of the severity of COVID-19 infection and the optimization of an individual treatment strategy. In this regard, the present study leverages a dataset of blood samples from 485 COVID-19 individuals in the region of Wuhan, China to identify essential blood biomarkers that predict the mortality of COVID-19 individuals. For this purpose, a hybrid of filter, statistical, and heuristic-based feature selection approach was used to select the best subset of informative features. As a result, minimum redundancy maximum relevance (mRMR), a two-tailed unpaired t-test, and whale optimization algorithm (WOA) were eventually selected as the three most informative blood biomarkers: International normalized ratio (INR), platelet large cell ratio (P-LCR), and D-dimer. In addition, various machine learning (ML) algorithms (random forest (RF), support vector machine (SVM), extreme gradient boosting (EGB), naïve Bayes (NB), logistic regression (LR), and k-nearest neighbor (KNN)) were trained. The performance of the trained models was compared to determine the model that assist in predicting the mortality of COVID-19 individuals with higher accuracy, F1 score, and area under the curve (AUC) values. In this paper, the best performing RF-based model built using the three most informative blood parameters predicts the mortality of COVID-19 individuals with an accuracy of 0.96 ± 0.062, F1 score of 0.96 ± 0.099, and AUC value of 0.98 ± 0.024, respectively on the independent test data. Furthermore, the performance of our proposed RF-based model in terms of accuracy, F1 score, and AUC was significantly better than the known blood biomarkers-based ML models built using the Pre_Surv_COVID_19 data. Therefore, the present study provides a novel hybrid approach to screen the most informative blood biomarkers to develop an RF-based model, which accurately and reliably predicts in-hospital mortality of confirmed COVID-19 individuals, during surge periods. An application based on our proposed model was implemented and deployed at Heroku.

SUBMITTER: Syed AH 

PROVIDER: S-EPMC9316550 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC9006223 | biostudies-literature
| S-EPMC5769548 | biostudies-literature
| S-EPMC3164604 | biostudies-literature
| S-EPMC8620196 | biostudies-literature
| S-EPMC3769381 | biostudies-literature
| S-EPMC7527634 | biostudies-literature
| S-EPMC8151160 | biostudies-literature
| S-EPMC7287073 | biostudies-literature
| S-EPMC6245785 | biostudies-other
| S-EPMC6466481 | biostudies-literature