Unknown

Dataset Information

0

CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing.


ABSTRACT:

Motivation

Taxonomic classification of short reads and taxonomic profiling of metagenomic samples are well-studied yet challenging problems. The presence of species belonging to groups without close representation in a reference dataset is particularly challenging. While k-mer-based methods have performed well in terms of running time and accuracy, they tend to have reduced accuracy for such novel species. Thus, there is a growing need for methods that combine the scalability of k-mers with increased sensitivity.

Results

Here, we show that using locality-sensitive hashing (LSH) can increase the sensitivity of the k-mer-based search. Our method, which combines LSH with several heuristics techniques including soft lowest common ancestor labeling and voting, is more accurate than alternatives in both taxonomic classification of individual reads and abundance profiling.

Availability and implementation

CONSULT-II is implemented in C++, and the software, together with reference libraries, is publicly available on GitHub https://github.com/bo1929/CONSULT-II.

SUBMITTER: Sapcı AOB 

PROVIDER: S-EPMC10985673 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing.

Şapcı Ali Osman Berk AOB   Rachtman Eleonora E   Mirarab Siavash S  

Bioinformatics (Oxford, England) 20240301 4


<h4>Motivation</h4>Taxonomic classification of short reads and taxonomic profiling of metagenomic samples are well-studied yet challenging problems. The presence of species belonging to groups without close representation in a reference dataset is particularly challenging. While k-mer-based methods have performed well in terms of running time and accuracy, they tend to have reduced accuracy for such novel species. Thus, there is a growing need for methods that combine the scalability of k-mers w  ...[more]

Similar Datasets

| S-EPMC8340999 | biostudies-literature
| S-EPMC6612865 | biostudies-literature
| S-EPMC10538361 | biostudies-literature
| S-EPMC7669687 | biostudies-literature
| S-EPMC4393915 | biostudies-other
| S-EPMC5773183 | biostudies-literature
| S-EPMC2844998 | biostudies-literature
| S-EPMC9301846 | biostudies-literature
| S-EPMC10311298 | biostudies-literature
| S-EPMC4218995 | biostudies-literature