Unknown

Dataset Information

0

VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.


ABSTRACT:

Summary

VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing.

Availability and implementation

Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim.

Contact

rd@bina.com

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Mu JC 

PROVIDER: S-EPMC4410653 | biostudies-literature | 2015 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.

Mu John C JC   Mohiyuddin Marghoob M   Li Jian J   Bani Asadi Narges N   Gerstein Mark B MB   Abyzov Alexej A   Wong Wing H WH   Lam Hugo Y K HY  

Bioinformatics (Oxford, England) 20141217 9


<h4>Summary</h4>VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including  ...[more]

Similar Datasets

| S-EPMC3365218 | biostudies-literature
| S-EPMC3896407 | biostudies-literature
| S-EPMC3271355 | biostudies-literature
| S-EPMC2673065 | biostudies-literature
| S-EPMC4177668 | biostudies-literature
2009-06-02 | E-GEOD-14696 | biostudies-arrayexpress
| S-EPMC3341827 | biostudies-other
| S-EPMC4400759 | biostudies-literature
| S-EPMC6584586 | biostudies-literature