Unknown

Dataset Information

0

Starcode: sequence clustering based on all-pairs search.


ABSTRACT: The increasing throughput of sequencing technologies offers new applications and challenges for computational biology. In many of those applications, sequencing errors need to be corrected. This is particularly important when sequencing reads from an unknown reference such as random DNA barcodes. In this case, error correction can be done by performing a pairwise comparison of all the barcodes, which is a computationally complex problem.Here, we address this challenge and describe an exact algorithm to determine which pairs of sequences lie within a given Levenshtein distance. For error correction or redundancy reduction purposes, matched pairs are then merged into clusters of similar sequences. The efficiency of starcode is attributable to the poucet search, a novel implementation of the Needleman-Wunsch algorithm performed on the nodes of a trie. On the task of matching random barcodes, starcode outperforms sequence clustering algorithms in both speed and precision.The C source code is available at http://github.com/gui11aume/starcode.

SUBMITTER: Zorita E 

PROVIDER: S-EPMC4765884 | biostudies-literature | 2015 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Starcode: sequence clustering based on all-pairs search.

Zorita Eduard E   Cuscó Pol P   Filion Guillaume J GJ  

Bioinformatics (Oxford, England) 20150131 12


<h4>Motivation</h4>The increasing throughput of sequencing technologies offers new applications and challenges for computational biology. In many of those applications, sequencing errors need to be corrected. This is particularly important when sequencing reads from an unknown reference such as random DNA barcodes. In this case, error correction can be done by performing a pairwise comparison of all the barcodes, which is a computationally complex problem.<h4>Results</h4>Here, we address this ch  ...[more]

Similar Datasets

| S-EPMC3035798 | biostudies-literature
| S-EPMC3984869 | biostudies-other
| S-EPMC8034561 | biostudies-literature
| S-EPMC3154205 | biostudies-literature
| S-EPMC2946782 | biostudies-literature
| S-EPMC6853674 | biostudies-literature
| S-EPMC5524321 | biostudies-literature
| S-EPMC9146526 | biostudies-literature
| S-EPMC7275956 | biostudies-literature
| S-EPMC3716875 | biostudies-literature