Unknown

Dataset Information

0

Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model.


ABSTRACT: The majority of patients with prostate cancer die of non-cancer causes of death (COD). It is thus important to accurately predict multi-category COD in these patients. Random forest (RF), a popular machine learning model, has been shown useful for predicting binary cancer-specific deaths. However, its accuracy for predicting multi-category COD in cancer patients is unclear. We included patients in Surveillance, Epidemiology, and End Results-18 cancer registry-program with prostate cancer diagnosed in 2004 (followed-up through 2016). They were randomly divided into training and testing sets with equal sizes. We evaluated prediction accuracies of RF and conventional statistical/multinomial models for 6-category COD by data-encoding types using the 2-fold cross-validation approach. Among 49,864 prostate cancer patients, 29,611 (59.4%) were alive at the end of follow-up, and 5,448 (10.9%) died of cardiovascular disease, 4,607 (9.2%) of prostate cancer, 3,681 (7.4%) of non-prostate cancer, 717 (1.4%) of infection, and 5,800 (11.6%) of other causes. We predicted 6-category COD among these patients with a mean accuracy of 59.1% (n=240, 95% CI, 58.7%-59.4%) in RF models with one-hot encoding, and 50.4% (95% CI, 49.7%-51.0%) in multinomial models. Tumor characteristics, prostate-specific antigen level, and diagnosis confirmation-method were important in RF and multinomial models. In RF models, no statistical differences were found between the accuracies of training versus cross-validation phases, and those of categorical versus one-hot encoding. We here report that RF models can outperform multinomial logistic models (absolute accuracy-difference, 8.7%) in predicting long-term 6-category COD among prostate cancer patients, while pathology diagnosis itself and tumor pathology remain important factors.

SUBMITTER: Wang J 

PROVIDER: S-EPMC7269775 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model.

Wang Jianwei J   Deng Fei F   Zeng Fuqing F   Shanahan Andrew J AJ   Li Wei Vivian WV   Zhang Lanjing L  

American journal of cancer research 20200501 5


The majority of patients with prostate cancer die of non-cancer causes of death (COD). It is thus important to accurately predict multi-category COD in these patients. Random forest (RF), a popular machine learning model, has been shown useful for predicting binary cancer-specific deaths. However, its accuracy for predicting multi-category COD in cancer patients is unclear. We included patients in Surveillance, Epidemiology, and End Results-18 cancer registry-program with prostate cancer diagnos  ...[more]

Similar Datasets

| S-EPMC3883111 | biostudies-literature
| S-EPMC7783755 | biostudies-literature
2019-10-29 | GSE127985 | GEO
| S-EPMC6823902 | biostudies-literature
| S-EPMC8012581 | biostudies-literature
| S-EPMC8257600 | biostudies-literature
| S-EPMC8525763 | biostudies-literature
| S-EPMC3163175 | biostudies-literature
| S-EPMC9226542 | biostudies-literature
| S-EPMC8575902 | biostudies-literature