Dataset Information

CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads.

ABSTRACT:

Summary

Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads.

Availability and implementation

Package's source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl.

Contact

ivan.borozan@oicr.on.ca

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Borozan I

PROVIDER: S-EPMC4734043 | biostudies-literature | 2016 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads.

Borozan Ivan I Ferretti Vincent V

Bioinformatics (Oxford, England) 20151009 3

<h4>Summary</h4>Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrate ...[more]

PMID: 26454281

Dataset Information

CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads.

Summary

Availability and implementation

Contact

Supplementary information

Publications

CSSSCL: a python package that uses combined sequence similarity scores for accurate taxonomic classification of long and short sequence reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Rapid and accurate taxonomic classification of cpn60 amplicon sequence variants.
| S-EPMC10362019 | biostudies-literature

Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks.
| S-EPMC9436379 | biostudies-literature

ccbmlib - a Python package for modeling Tanimoto similarity value distributions.
| S-EPMC7050271 | biostudies-literature

Assembling Reads Improves Taxonomic Classification of Species.
| S-EPMC7465921 | biostudies-literature

A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads.
| S-EPMC3462201 | biostudies-literature

HiTaxon: a hierarchical ensemble framework for taxonomic classification of short reads.
| S-EPMC10873905 | biostudies-literature

PySeqLab: an open source Python package for sequence labeling and segmentation.
| S-EPMC5872256 | biostudies-literature

plotnineSeqSuite: a Python package for visualizing sequence data using ggplot2 style.
| S-EPMC10546746 | biostudies-literature

FindAdapt: A python package for fast and accurate adapter detection in small RNA sequencing.
| S-EPMC10833567 | biostudies-literature

SpeciateIT and vSpeciateDB: novel, fast, and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota.
| S-EPMC11437924 | biostudies-literature