Unknown

Dataset Information

0

Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.


ABSTRACT: DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of minimizer-space de Bruijn graphs to enable long-read genome assembly. mdBG achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without compromising accuracy. A human genome is assembled in under 10 min using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 min using 1 GB RAM. In addition, we constructed a minimizer-space de Bruijn graph-based representation of 661,405 bacterial genomes, comprising 16 million nodes and 45 million edges, and successfully search it for anti-microbial resistance (AMR) genes in 12 min. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics, and pangenomics. Code for constructing mdBGs is freely available for download at https://github.com/ekimb/rust-mdbg/.

SUBMITTER: Ekim B 

PROVIDER: S-EPMC8562525 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.

Ekim Barış B   Berger Bonnie B   Chikhi Rayan R  

Cell systems 20210914 10


DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of minimizer-space de Bruijn graphs to enable long-read genome assembly. mdBG achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without compromising accuracy. A human genome is assembled in under 10 min using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 min  ...[more]

Similar Datasets

| S-EPMC10541625 | biostudies-literature
| S-EPMC6612831 | biostudies-literature
| S-EPMC5206522 | biostudies-literature
| S-EPMC5351550 | biostudies-literature
| S-EPMC8521641 | biostudies-literature
| S-EPMC8337006 | biostudies-literature
| S-EPMC5872255 | biostudies-literature
| S-EPMC9528980 | biostudies-literature
| S-EPMC4253301 | biostudies-literature
| S-EPMC8326735 | biostudies-literature