Unknown

Dataset Information

0

Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data


ABSTRACT: Abstract Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1 600 000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting) and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix.

SUBMITTER: Tellaetxe-Abete M 

PROVIDER: S-EPMC8557387 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC5907718 | biostudies-literature
| S-EPMC8891360 | biostudies-literature
| S-EPMC4228284 | biostudies-literature
| S-EPMC6555260 | biostudies-literature
| S-EPMC6459541 | biostudies-literature
| S-EPMC10415133 | biostudies-literature
| S-EPMC7202553 | biostudies-literature
| S-EPMC4130321 | biostudies-other
| S-EPMC9614148 | biostudies-literature
| S-EPMC7094160 | biostudies-literature