Unknown

Dataset Information

0

SomatoSim: precision simulation of somatic single nucleotide variants.


ABSTRACT:

Background

Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performance. However, many existing somatic variant simulation tools are limited because they rely on generating entirely synthetic reads derived from a reference genome or because they do not allow for the precise customizability that would enable a more focused understanding of single nucleotide variant calling performance.

Results

SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results.

Conclusions

SomatoSim is a user-friendly tool that offers a high level of customizability for simulating somatic single nucleotide variants. SomatoSim is available at https://github.com/BieseckerLab/SomatoSim .

SUBMITTER: Hawari MA 

PROVIDER: S-EPMC7936459 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

SomatoSim: precision simulation of somatic single nucleotide variants.

Hawari Marwan A MA   Hong Celine S CS   Biesecker Leslie G LG  

BMC bioinformatics 20210306 1


<h4>Background</h4>Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performanc  ...[more]

Similar Datasets

| S-EPMC4856034 | biostudies-literature
2016-08-23 | E-GEOD-74718 | biostudies-arrayexpress
| S-EPMC7371703 | biostudies-literature
2016-08-23 | GSE74718 | GEO
| S-EPMC8006362 | biostudies-literature
| S-EPMC3922638 | biostudies-literature
| S-EPMC4603299 | biostudies-literature