Unknown

Dataset Information

0

PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing.


ABSTRACT:

Motivation

One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances.

Results

Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses.

Availability and implementation

The method introduced here was implemented, together with other existing methods, in a dependency-free software written in C, GenDisCal, available as source code from https://github.com/LM-UGent/GenDisCal. The software supports multithreading and has been tested on Windows and Linux (CentOS). In addition, a Java-based graphical user interface that acts as a wrapper for the software is also available.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Goussarov G 

PROVIDER: S-EPMC7178395 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

PaSiT: a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing.

Goussarov Gleb G   Cleenwerck Ilse I   Mysara Mohamed M   Leys Natalie N   Monsieurs Pieter P   Tahon Guillaume G   Carlier Aurélien A   Vandamme Peter P   Van Houdt Rob R  

Bioinformatics (Oxford, England) 20200401 8


<h4>Motivation</h4>One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonuc  ...[more]

Similar Datasets

| S-EPMC2289816 | biostudies-literature
| S-EPMC9303293 | biostudies-literature
| S-EPMC6061775 | biostudies-literature
| S-EPMC3532236 | biostudies-literature
| S-EPMC5374636 | biostudies-literature
2009-03-05 | GSE10970 | GEO
| PRJNA92921 | ENA
| S-EPMC2244631 | biostudies-literature
| S-EPMC8558511 | biostudies-literature
| S-EPMC7204536 | biostudies-literature