Unknown

Dataset Information

0

Space-efficient and exact de Bruijn graph representation based on a Bloom filter.


ABSTRACT: BACKGROUND:The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (?30 GB). RESULTS:We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. CONCLUSIONS:An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.

SUBMITTER: Chikhi R 

PROVIDER: S-EPMC3848682 | biostudies-literature | 2013 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Space-efficient and exact de Bruijn graph representation based on a Bloom filter.

Chikhi Rayan R   Rizk Guillaume G  

Algorithms for molecular biology : AMB 20130916 1


<h4>Background</h4>The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB).<h4>Results</h4>We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filt  ...[more]

Similar Datasets

| S-EPMC5870571 | biostudies-literature
| S-EPMC6022659 | biostudies-literature
| S-EPMC8521641 | biostudies-literature
| S-EPMC5591975 | biostudies-literature
| S-EPMC8025321 | biostudies-literature
| S-EPMC8147420 | biostudies-literature
| S-EPMC3517413 | biostudies-literature
| S-EPMC10541625 | biostudies-literature
| S-EPMC5547447 | biostudies-other
| S-EPMC5114782 | biostudies-literature