Unknown

Dataset Information

0

A descriptive study of random forest algorithm for predicting COVID-19 patients outcome.


ABSTRACT:

Background

The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients.

Methods

The clinical information from 126 patients diagnosed with COVID-19 were collected from Wuhan Fourth Hospital. Specific clinical characteristics, laboratory findings, treatments and clinical outcomes were analyzed from patients hospitalized for treatment from 1 February to 15 March 2020, and subsequently died or were discharged. A random forest (RF) algorithm was used to predict the prognoses of COVID-19 patients and identify the optimal diagnostic predictors for patients' clinical prognoses.

Results

Seven of the 126 patients were excluded for losing endpoints, 103 of the remaining 119 patients were discharged (alive) and 16 died in the hospital. A synthetic minority over-sampling technique (SMOTE) was used to correct the imbalanced distribution of clinical patients. Recursive feature elimination (RFE) was used to select the optimal subset for analysis. Eleven clinical parameters, Myo, CD8, age, LDH, LMR, CD45, Th/Ts, dyspnea, NLR, D-Dimer and CK were chosen with AUC approximately 0.9905. The RF algorithm was built to predict the prognoses of COVID-19 patients based on the best subset, and the area under the ROC curve (AUC) of the test data was 100%. Moreover, two optimal clinical risk predictors, lactate dehydrogenase (LDH) and Myoglobin (Myo), were selected based on the Gini index. The univariable logistic analysis revealed a substantial increase in the risk for in-hospital mortality when Myo was higher than 80 ng/ml (OR = 7.54, 95% CI [3.42-16.63]) and LDH was higher than 500 U/L (OR = 4.90, 95% CI [2.13-11.25]).

Conclusion

We applied an RF algorithm to predict the mortality of COVID-19 patients with high accuracy and identified LDH higher than 500 U/L and Myo higher than 80 ng/ml to be potential risk factors for the prognoses of COVID-19 patients in the early stage of the disease.

SUBMITTER: Wang J 

PROVIDER: S-EPMC7486830 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

A descriptive study of random forest algorithm for predicting COVID-19 patients outcome.

Wang Jie J   Yu Heping H   Hua Qingquan Q   Jing Shuili S   Liu Zhifen Z   Peng Xiang X   Cao Cheng'an C   Luo Yongwen Y  

PeerJ 20200909


<h4>Background</h4>The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients.<h4>Methods</h4>The clinical information from 126 patients diagnosed with COVID-19 were collected from Wuhan Fourth Hospital. Specific clinical characteristics, laboratory findings, treatments and clinical outcomes were analyzed from p  ...[more]

Similar Datasets

| S-EPMC9226542 | biostudies-literature
| S-EPMC7302385 | biostudies-literature
2012-05-10 | GSE37858 | GEO
2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress
| S-EPMC2777180 | biostudies-literature
2022-05-16 | GSE189510 | GEO
| S-EPMC7439995 | biostudies-literature
| S-EPMC10929084 | biostudies-literature
| S-EPMC10229444 | biostudies-literature
| S-EPMC8193767 | biostudies-literature