Unknown

Dataset Information

0

SAMBLASTER: fast duplicate marking and structural variant read extraction.


ABSTRACT: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times.We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.SAMBLASTER is open-source C+ + code and freely available for download from https://github.com/GregoryFaust/samblaster.

SUBMITTER: Faust GG 

PROVIDER: S-EPMC4147885 | biostudies-literature | 2014 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

SAMBLASTER: fast duplicate marking and structural variant read extraction.

Faust Gregory G GG   Hall Ira M IM  

Bioinformatics (Oxford, England) 20140507 17


<h4>Motivation</h4>Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times.<h4>Results</h4>We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM fi  ...[more]

Similar Datasets

| S-EPMC10883419 | biostudies-literature
| S-EPMC10724848 | biostudies-literature
| S-EPMC8686677 | biostudies-literature
| S-EPMC6821325 | biostudies-literature
| S-EPMC10079824 | biostudies-literature
| S-EPMC6051193 | biostudies-literature
| S-EPMC3137215 | biostudies-literature
| S-EPMC10946394 | biostudies-literature
| S-EPMC1570150 | biostudies-literature
| S-EPMC3436805 | biostudies-literature