Unknown

Dataset Information

0

Fast and efficient short read mapping based on a succinct hash index.


ABSTRACT: BACKGROUND:Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time. RESULTS:We present the succinct hash index - a novel data structure for read mapping which is a variant of the classical q-gram index with a particularly small memory footprint occupying between 3.5 and 5.3 GB for a human reference genome for typical parameter settings. The succinct hash index features two novel seed selection algorithms (group seeding and variable-length seeding) and an efficient parallel construction algorithm, which we have implemented to design the FEM (Fast(F) and Efficient(E) read Mapper(M)) mapper. FEM can return all read mappings within a given edit distance. Our experimental results show that FEM is scalable and outperforms other state-of-the-art all-mappers in terms of both speed and memory footprint. Compared to Masai, FEM is an order-of-magnitude faster using a single thread and two orders-of-magnitude faster when using multiple threads. Furthermore, we observe an up to 2.8-fold speedup compared to BitMapper and an order-of-magnitude speedup compared to BitMapper2 and Hobbes3. CONCLUSIONS:The presented succinct index is the first feasible implementation of the q-gram index functionality that occupies around 3.5 GB of memory for a whole human reference genome. FEM is freely available at https://github.com/haowenz/FEM .

SUBMITTER: Zhang H 

PROVIDER: S-EPMC5845352 | biostudies-literature | 2018 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fast and efficient short read mapping based on a succinct hash index.

Zhang Haowen H   Chan Yuandong Y   Fan Kaichao K   Schmidt Bertil B   Liu Weiguo W  

BMC bioinformatics 20180309 1


<h4>Background</h4>Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time.<h4>Results</h4>We present the succinct hash index - a novel data structure for read mapping which is a variant of the classical q-gram index with a particularly small memory footprint occupying between 3.5 and 5.3 GB for a human reference genome for typical  ...[more]

Similar Datasets

| S-EPMC3944147 | biostudies-literature
| S-EPMC4464037 | biostudies-literature
| S-EPMC5425171 | biostudies-other
| S-EPMC3669295 | biostudies-literature
| S-EPMC2752123 | biostudies-literature
| S-EPMC2730575 | biostudies-literature
| S-EPMC6913027 | biostudies-literature
| S-EPMC5103424 | biostudies-literature
| S-EPMC7488264 | biostudies-literature
| S-EPMC4804353 | biostudies-literature