Dataset Information

A benchmark study of k-mer counting methods for high-throughput sequencing.

ABSTRACT: The rapid development of high-throughput sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection. Recently, several techniques have been developed to count k-mers in large sequencing datasets, with a trade-off between the time and memory required to perform this function. We assessed several k-mer counting programs and evaluated their relative performance, primarily on the basis of runtime and memory usage. We also considered additional parameters such as disk usage, accuracy, parallelism, the impact of compressed input, performance in terms of counting large k values and the scalability of the application to larger datasets.We make specific recommendations for the setup of a current state-of-the-art program and suggestions for further development.

SUBMITTER: Manekar SC

PROVIDER: S-EPMC6280066 | biostudies-other | 2018 Dec

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

A benchmark study of k-mer counting methods for high-throughput sequencing.

Manekar Swati C SC Sathe Shailesh R SR

GigaScience 20181201 12

The rapid development of high-throughput sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection. Recently, several techniques have been developed to count k-mers in large sequencing datasets, with a trade-off bet ...[more]

PMID: 30346548

Dataset Information

A benchmark study of k-mer counting methods for high-throughput sequencing.

Publications

A benchmark study of k-mer counting methods for high-throughput sequencing.

OmicsDI is part of the ELIXIR infrastructure

Tweets