Unknown

Dataset Information

0

StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees.


ABSTRACT: BACKGROUND:Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. RESULTS:A tool named StrainSeeker was developed that constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k-mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. CONCLUSION:StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.

SUBMITTER: Roosaare M 

PROVIDER: S-EPMC5438578 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees.

Roosaare Märt M   Vaher Mihkel M   Kaplinski Lauris L   Möls Märt M   Andreson Reidar R   Lepamets Maarja M   Kõressaar Triinu T   Naaber Paul P   Kõljalg Siiri S   Remm Maido M  

PeerJ 20170518


<h4>Background</h4>Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees.<h4>Results</h4>A tool named StrainSeeker was developed that constructs a list of specific <i>k</i>-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses  ...[more]

Similar Datasets

| S-EPMC5768271 | biostudies-literature
| S-EPMC7430814 | biostudies-literature
2015-10-01 | E-MTAB-3887 | biostudies-arrayexpress
| S-EPMC6580563 | biostudies-literature
| S-EPMC2893182 | biostudies-literature
| S-EPMC4835549 | biostudies-literature
| PRJNA530507 | ENA
| PRJEB37895 | ENA
| S-EPMC4071206 | biostudies-literature
| S-EPMC5451431 | biostudies-literature