Unknown

Dataset Information

0

NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer.


ABSTRACT: BACKGROUND:The accurate screening of tumor genomic landscapes for somatic mutations using high-throughput sequencing involves a crucial step in precise clinical diagnosis and targeted therapy. However, the complex inherent features of cancer tissue, especially, tumor genetic intra-heterogeneity coupled with the problem of sequencing and alignment artifacts, makes somatic variant calling a challenging task. Current variant filtering strategies, such as rule-based filtering and consensus voting of different algorithms, have previously helped to increase specificity, although comes at the cost of sensitivity. METHODS:In light of this, we have developed the NeoMutate framework which incorporates 7 supervised machine learning (ML) algorithms to exploit the strengths of multiple variant callers, using a non-redundant set of biological and sequence features. We benchmarked NeoMutate by simulating more than 10,000 bona fide cancer-related mutations into three well-characterized Genome in a Bottle (GIAB) reference samples. RESULTS:A robust and exhaustive evaluation of NeoMutate's performance based on 5-fold cross validation experiments, in addition to 3 independent tests, demonstrated a substantially improved variant detection accuracy compared to any of its individual composite variant callers and consensus calling of multiple tools. CONCLUSIONS:We show here that integrating multiple tools in an ensemble ML layer optimizes somatic variant detection rates, leading to a potentially improved variant selection framework for the diagnosis and treatment of cancer.

SUBMITTER: Anzar I 

PROVIDER: S-EPMC6524241 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer.

Anzar Irantzu I   Sverchkova Angelina A   Stratford Richard R   Clancy Trevor T  

BMC medical genomics 20190516 1


<h4>Background</h4>The accurate screening of tumor genomic landscapes for somatic mutations using high-throughput sequencing involves a crucial step in precise clinical diagnosis and targeted therapy. However, the complex inherent features of cancer tissue, especially, tumor genetic intra-heterogeneity coupled with the problem of sequencing and alignment artifacts, makes somatic variant calling a challenging task. Current variant filtering strategies, such as rule-based filtering and consensus v  ...[more]

Similar Datasets

| S-EPMC7848170 | biostudies-literature
| S-EPMC7599952 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC6735703 | biostudies-literature
| S-EPMC7969712 | biostudies-literature
| S-EPMC11373136 | biostudies-literature
| S-EPMC5930664 | biostudies-literature
| S-EPMC8901043 | biostudies-literature
2013-01-01 | GSE29210 | GEO
| S-EPMC8687333 | biostudies-literature