Unknown

Dataset Information

0

Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA.


ABSTRACT:

Background

Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer.

Methods

Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N =?546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validations to assess generalization performance.

Results

In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91-0.93) with a mean sensitivity of 85% (95% CI 83-86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance.

Conclusions

A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.

SUBMITTER: Wan N 

PROVIDER: S-EPMC6708173 | biostudies-literature | 2019 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications


<h4>Background</h4>Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detect  ...[more]

Similar Datasets

2022-09-14 | E-MTAB-11607 | biostudies-arrayexpress
| S-EPMC9427134 | biostudies-literature
| S-EPMC10854986 | biostudies-literature
| S-EPMC11293328 | biostudies-literature
| S-EPMC7693522 | biostudies-literature
| S-EPMC8722762 | biostudies-literature
| S-EPMC6550189 | biostudies-literature
| S-EPMC9646748 | biostudies-literature
| S-EPMC6775242 | biostudies-literature
| S-EPMC5561031 | biostudies-other