Unknown

Dataset Information

0

Compacting de Bruijn graphs from sequencing data quickly and in low memory.


ABSTRACT:

Motivation

As the quantity of data per sequencing experiment increases, the challenges of fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used data structure in fragment assembly algorithms, used to represent the information from a set of reads. Compaction is an important data reduction step in most de Bruijn graph based algorithms where long simple paths are compacted into single vertices. Compaction has recently become the bottleneck in assembly pipelines, and improving its running time and memory usage is an important problem.

Results

We present an algorithm and a tool bcalm 2 for the compaction of de Bruijn graphs. bcalm 2 is a parallel algorithm that distributes the input based on a minimizer hashing technique, allowing for good balance of memory usage throughout its execution. For human sequencing data, bcalm 2 reduces the computational burden of compacting the de Bruijn graph to roughly an hour and 3?GB of memory. We also applied bcalm 2 to the 22 Gbp loblolly pine and 20 Gbp white spruce sequencing datasets. Compacted graphs were constructed from raw reads in less than 2 days and 40?GB of memory on a single machine. Hence, bcalm 2 is at least an order of magnitude more efficient than other available methods.

Availability and implementation

Source code of bcalm 2 is freely available at: https://github.com/GATB/bcalm

Contact

rayan.chikhi@univ-lille1.fr.

SUBMITTER: Chikhi R 

PROVIDER: S-EPMC4908363 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Compacting de Bruijn graphs from sequencing data quickly and in low memory.

Chikhi Rayan R   Limasset Antoine A   Medvedev Paul P  

Bioinformatics (Oxford, England) 20160601 12


<h4>Motivation</h4>As the quantity of data per sequencing experiment increases, the challenges of fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used data structure in fragment assembly algorithms, used to represent the information from a set of reads. Compaction is an important data reduction step in most de Bruijn graph based algorithms where long simple paths are compacted into single vertices. Compaction has recently become the bottleneck in assemb  ...[more]

Similar Datasets

| S-EPMC6122196 | biostudies-literature
| S-EPMC4120145 | biostudies-literature
| S-EPMC5872255 | biostudies-literature
| S-EPMC8275350 | biostudies-literature
| S-EPMC6197042 | biostudies-literature
| S-EPMC6612864 | biostudies-other
| S-EPMC4253301 | biostudies-literature
| S-EPMC8326735 | biostudies-literature
| S-EPMC3421212 | biostudies-literature
| S-EPMC6061703 | biostudies-literature