Unknown

Dataset Information

0

Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.


ABSTRACT: DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of minimizer-space de Bruijn graphs to enable long-read genome assembly. mdBG achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without compromising accuracy. A human genome is assembled in under 10 min using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 min using 1 GB RAM. In addition, we constructed a minimizer-space de Bruijn graph-based representation of 661,405 bacterial genomes, comprising 16 million nodes and 45 million edges, and successfully search it for anti-microbial resistance (AMR) genes in 12 min. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics, and pangenomics. Code for constructing mdBGs is freely available for download at https://github.com/ekimb/rust-mdbg/.

SUBMITTER: Ekim B 

PROVIDER: S-EPMC8562525 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC10541625 | biostudies-literature
| S-EPMC6612831 | biostudies-literature
| S-EPMC5206522 | biostudies-literature
| S-EPMC5351550 | biostudies-literature
| S-EPMC8521641 | biostudies-literature
| S-EPMC5872255 | biostudies-literature
| S-EPMC6612864 | biostudies-other
| S-EPMC4253301 | biostudies-literature
| S-EPMC8326735 | biostudies-literature
| S-EPMC3421212 | biostudies-literature