Unknown

Dataset Information

0

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.


ABSTRACT: OBJECTIVE:Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes. METHODS:To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments ("A" and "B") with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves. RESULTS:In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P?=?.0164; B: time from diagnosis to treatment, P?=?.0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene (IGHV) status, absent Zap 70 expression, female sex, and younger age. CONCLUSIONS:This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity.

SUBMITTER: Coombes CE 

PROVIDER: S-EPMC7647286 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.

Coombes Caitlin E CE   Abrams Zachary B ZB   Li Suli S   Abruzzo Lynne V LV   Coombes Kevin R KR  

Journal of the American Medical Informatics Association : JAMIA 20200701 7


<h4>Objective</h4>Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes.<h4>Methods</h4>To ad  ...[more]

Similar Datasets

| S-EPMC8294525 | biostudies-literature
| S-EPMC9715328 | biostudies-literature
| S-EPMC2805743 | biostudies-literature
| S-EPMC3679784 | biostudies-literature
| S-EPMC6029549 | biostudies-literature
| S-EPMC4007929 | biostudies-literature
| S-EPMC7820637 | biostudies-literature
| S-EPMC8269042 | biostudies-literature
| S-EPMC8295712 | biostudies-literature
| S-EPMC9521409 | biostudies-literature