Unknown

Dataset Information

0

Kmer-SSR: a fast and exhaustive SSR search algorithm.


ABSTRACT: One of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a 'good enough' solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a 'good enough' solution may not accurately portray results in population genetics, phylogenetics and forensics, which require accurate SSRs to calculate intra- and inter-species interactions.We present Kmer-SSR, which finds all SSRs faster than most heuristic SSR identification algorithms in a parallelized, easy-to-use manner. The exhaustive Kmer-SSR option has 100% precision and 100% recall and accurately identifies every SSR of any specified length. To identify more biologically pertinent SSRs, we also developed several filters that allow users to easily view a subset of SSRs based on user input. Kmer-SSR, coupled with the filter options, accurately and intuitively identifies SSRs quickly and in a more user-friendly manner than any other SSR identification algorithm.The source code is freely available on GitHub at https://github.com/ridgelab/Kmer-SSR.perry.ridge@byu.edu.

SUBMITTER: Pickett BD 

PROVIDER: S-EPMC5860095 | biostudies-literature | 2017 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Kmer-SSR: a fast and exhaustive SSR search algorithm.

Pickett Brandon D BD   Miller Justin B JB   Ridge Perry G PG  

Bioinformatics (Oxford, England) 20171201 24


<h4>Motivation</h4>One of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a 'good enough' solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a 'good enough' solution may not accurately portray results in population genetics, phylogenetics and  ...[more]

Similar Datasets

| S-EPMC5013907 | biostudies-literature
| S-EPMC8342486 | biostudies-literature
| S-EPMC3665501 | biostudies-literature
| S-EPMC3591303 | biostudies-literature
| S-EPMC5727114 | biostudies-literature
| S-EPMC8864561 | biostudies-literature
| S-EPMC2394828 | biostudies-literature
| S-EPMC7669687 | biostudies-literature
| S-EPMC3496572 | biostudies-literature
| S-EPMC5651902 | biostudies-literature