Unknown

Dataset Information

0

SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach.


ABSTRACT: It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve performance even with a limited number of callers and is more robust and stable than machine learning based ensemble approaches. To fully exploit the multi-callers, we also developed a software package, SomaticCombiner, that can combine multiple callers and integrates a new variant allelic frequency (VAF) adaptive majority voting approach, which can maintain sensitive detection for variants with low VAFs.

SUBMITTER: Wang M 

PROVIDER: S-EPMC7393490 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach.

Wang Mingyi M   Luo Wen W   Jones Kristine K   Bian Xiaopeng X   Williams Russell R   Higson Herbert H   Wu Dongjing D   Hicks Belynda B   Yeager Meredith M   Zhu Bin B  

Scientific reports 20200730 1


It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve perfo  ...[more]

Similar Datasets

| S-EPMC5394620 | biostudies-literature
| S-EPMC5800023 | biostudies-literature
| S-EPMC8756192 | biostudies-literature
| S-EPMC7044309 | biostudies-literature
| S-EPMC6428590 | biostudies-literature
| S-EPMC7347135 | biostudies-literature
| S-EPMC10182612 | biostudies-literature
| S-EPMC8458033 | biostudies-literature
| S-EPMC6123722 | biostudies-literature
| S-EPMC6342379 | biostudies-other