Unknown

Dataset Information

0

Halvade: scalable sequence analysis with MapReduce.


ABSTRACT:

Motivation

Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine.

Results

We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading.

SUBMITTER: Decap D 

PROVIDER: S-EPMC4514927 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC5535306 | biostudies-other
| S-EPMC3247927 | biostudies-other
| S-EPMC3953661 | biostudies-other
| S-EPMC3127959 | biostudies-literature
| S-EPMC6821332 | biostudies-literature
| S-EPMC3952952 | biostudies-literature
| S-EPMC3521391 | biostudies-literature
| S-EPMC3083393 | biostudies-literature
| S-EPMC2682523 | biostudies-literature
| S-EPMC2928508 | biostudies-literature