Unknown

Dataset Information

0

Simplitigs as an efficient and scalable representation of de Bruijn graphs.


ABSTRACT: de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes.

SUBMITTER: Brinda K 

PROVIDER: S-EPMC8025321 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8326735 | biostudies-literature
| S-EPMC4015147 | biostudies-literature
| S-EPMC5872255 | biostudies-literature
| S-EPMC6612864 | biostudies-other
| S-EPMC4120145 | biostudies-literature
| S-EPMC4253301 | biostudies-literature
| S-EPMC3848682 | biostudies-literature
| S-EPMC5870571 | biostudies-literature
| S-EPMC3421212 | biostudies-literature
| S-EPMC6061703 | biostudies-literature