Unknown

Dataset Information

0

SAMBLASTER: fast duplicate marking and structural variant read extraction.


ABSTRACT:

Motivation

Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times.

Results

We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.

Availability and implementation

SAMBLASTER is open-source C+ + code and freely available for download from https://github.com/GregoryFaust/samblaster.

SUBMITTER: Faust GG 

PROVIDER: S-EPMC4147885 | biostudies-literature | 2014 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

SAMBLASTER: fast duplicate marking and structural variant read extraction.

Faust Gregory G GG   Hall Ira M IM  

Bioinformatics (Oxford, England) 20140507 17


<h4>Motivation</h4>Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times.<h4>Results</h4>We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM fi  ...[more]

Similar Datasets

| S-EPMC10883419 | biostudies-literature
| S-EPMC10724848 | biostudies-literature
| S-EPMC8686677 | biostudies-literature
| S-EPMC10079824 | biostudies-literature
| S-EPMC6821325 | biostudies-literature
| S-EPMC6051193 | biostudies-literature
| S-EPMC3137215 | biostudies-literature
| S-EPMC10946394 | biostudies-literature
| S-EPMC1570150 | biostudies-literature
| S-EPMC3436805 | biostudies-literature