Dataset Information

B-move: faster bidirectional character extensions in a run-length compressed index.

ABSTRACT: Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.'s r-index and Nishimoto and Tabei's move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.'s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient bidirectional character extensions in run-length compressed space. It achieves bidirectional character extensions up to 8 times faster than the br-index, closing the performance gap with FM-index-based alternatives, while maintaining the br-index's favorable memory characteristics. For example, all available complete E. coli genomes on NCBI's RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Thus, b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.

SUBMITTER: Depuydt L

PROVIDER: S-EPMC11160816 | biostudies-literature | 2024 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

b-move: faster bidirectional character extensions in a run-length compressed index.

Depuydt Lore L Renders Luca L de Vyver Simon Van SV Veys Lennart L Gagie Travis T Fostier Jan J

bioRxiv : the preprint server for biology 20240602

Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.'s r-index and Nishimoto and Tabei's move structure, alleviate memory constraints but focus primarily on bac ...[more]

PMID: 38854079

Dataset Information

B-move: faster bidirectional character extensions in a run-length compressed index.

Publications

b-move: faster bidirectional character extensions in a run-length compressed index.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

b-move: Faster Lossless Approximate Pattern Matching in a Run-Length Compressed Index.
| S-EPMC11601852 | biostudies-literature

Faster STORM using compressed sensing.
| S-EPMC3477591 | biostudies-literature

How to run 50% faster without external energy.
| S-EPMC7096173 | biostudies-literature

Compressed Sensing 3D-GRASE for faster High-Resolution MRI.
| S-EPMC6619236 | biostudies-literature

Short-lived species move uphill faster under climate change.
| S-EPMC9056483 | biostudies-literature

The sands of time run faster near the end.
| S-EPMC5461489 | biostudies-literature

Faster and less phototoxic 3D fluorescence microscopy using a versatile compressed sensing scheme.
| S-EPMC5499637 | biostudies-literature

The faster, the better? Relationships between run-up speed, the degree of difficulty (D-score), height and length of flight on vault in artistic gymnastics.
| S-EPMC6405201 | biostudies-literature

Optimal designs of the side sensitive synthetic chart for the coefficient of variation based on the median run length and expected median run length.
| S-EPMC8323885 | biostudies-literature

Sigmoni: classification of nanopore signal with a compressed pangenome index.
| S-EPMC10462034 | biostudies-literature