Unknown

Dataset Information

0

Weakly supervised temporal model for prediction of breast cancer distant recurrence.


ABSTRACT: Efficient prediction of cancer recurrence in advance may help to recruit high risk breast cancer patients for clinical trial on-time and can guide a proper treatment plan. Several machine learning approaches have been developed for recurrence prediction in previous studies, but most of them use only structured electronic health records and only a small training dataset, with limited success in clinical application. While free-text clinic notes may offer the greatest nuance and detail about a patient's clinical status, they are largely excluded in previous predictive models due to the increase in processing complexity and need for a complex modeling framework. In this study, we developed a weak-supervision framework for breast cancer recurrence prediction in which we trained a deep learning model on a large sample of free-text clinic notes by utilizing a combination of manually curated labels and NLP-generated non-perfect recurrence labels. The model was trained jointly on manually curated data from 670 patients and NLP-curated data of 8062 patients. It was validated on manually annotated data from 224 patients with recurrence and achieved 0.94 AUROC. This weak supervision approach allowed us to learn from a larger dataset using imperfect labels and ultimately provided greater accuracy compared to a smaller hand-curated dataset, with less manual effort invested in curation.

SUBMITTER: Sanyal J 

PROVIDER: S-EPMC8096809 | biostudies-literature | 2021 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Weakly supervised temporal model for prediction of breast cancer distant recurrence.

Sanyal Josh J   Tariq Amara A   Kurian Allison W AW   Rubin Daniel D   Banerjee Imon I  

Scientific reports 20210504 1


Efficient prediction of cancer recurrence in advance may help to recruit high risk breast cancer patients for clinical trial on-time and can guide a proper treatment plan. Several machine learning approaches have been developed for recurrence prediction in previous studies, but most of them use only structured electronic health records and only a small training dataset, with limited success in clinical application. While free-text clinic notes may offer the greatest nuance and detail about a pat  ...[more]

Similar Datasets

| S-EPMC10147732 | biostudies-literature
| S-EPMC11759606 | biostudies-literature
| S-EPMC8692405 | biostudies-literature
| S-EPMC11612040 | biostudies-literature
| S-EPMC8164786 | biostudies-literature
| S-EPMC11811162 | biostudies-literature
| S-EPMC7585122 | biostudies-literature
| S-EPMC10036616 | biostudies-literature
| S-EPMC5583651 | biostudies-literature
| S-EPMC6488210 | biostudies-literature