Unknown

Dataset Information

0

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets.


ABSTRACT: MOTIVATION:In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. RESULTS:We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45?h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ?4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. AVAILABILITY AND IMPLEMENTATION:https://github.com/kamimrcht/REINDEER. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Marchet C 

PROVIDER: S-EPMC7355249 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets.

Marchet Camille C   Iqbal Zamin Z   Gautheret Daniel D   Salson Mikaël M   Chikhi Rayan R  

Bioinformatics (Oxford, England) 20200701 Suppl_1


<h4>Motivation</h4>In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets.<h4>Results</h4>We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to re  ...[more]

Similar Datasets

| S-EPMC2646250 | biostudies-literature
| S-EPMC6263557 | biostudies-literature
| S-EPMC3618241 | biostudies-literature
| S-EPMC7727328 | biostudies-literature
| S-EPMC6266934 | biostudies-literature
| S-EPMC3368933 | biostudies-literature
| S-EPMC4504488 | biostudies-literature
| S-EPMC5908213 | biostudies-literature
| S-EPMC6969201 | biostudies-literature
| S-EPMC5408915 | biostudies-literature