Unknown

Dataset Information

0

Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification.


ABSTRACT: Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.

SUBMITTER: Song L 

PROVIDER: S-EPMC10680779 | biostudies-literature | 2023 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification.

Song Li L   Langmead Ben B  

bioRxiv : the preprint server for biology 20231117


Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for comp  ...[more]

Similar Datasets

| S-EPMC10153118 | biostudies-literature
| S-EPMC7250429 | biostudies-literature
| S-EPMC6761962 | biostudies-literature
| S-EPMC7079445 | biostudies-literature
| S-EPMC6104016 | biostudies-literature
| S-EPMC4053813 | biostudies-other
| S-EPMC7657843 | biostudies-literature
| S-EPMC2957682 | biostudies-literature
| S-EPMC3223155 | biostudies-literature
| S-EPMC7218625 | biostudies-literature