Ontology highlight
ABSTRACT: Summary
Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads.Availability and implementation
Package's source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl.Contact
ivan.borozan@oicr.on.caSupplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Borozan I
PROVIDER: S-EPMC4734043 | biostudies-literature | 2016 Feb
REPOSITORIES: biostudies-literature
Borozan Ivan I Ferretti Vincent V
Bioinformatics (Oxford, England) 20151009 3
<h4>Summary</h4>Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrate ...[more]