Unknown

Dataset Information

0

Improved algorithms for approximate string matching (extended abstract).


ABSTRACT: BACKGROUND: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. RESULTS: We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s - |n - m|).min(m, n, s) + m + n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm also excels in practice, especially in cases where the two strings compared differ significantly in length. CONCLUSION: We have provided the design, analysis and implementation of a new algorithm for calculating the edit distance of two strings with both theoretical and practical implications. Source code of our algorithm is available online.

SUBMITTER: Papamichail D 

PROVIDER: S-EPMC2648743 | biostudies-literature | 2009

REPOSITORIES: biostudies-literature

altmetric image

Publications

Improved algorithms for approximate string matching (extended abstract).

Papamichail Dimitris D   Papamichail Georgios G  

BMC bioinformatics 20090130


<h4>Background</h4>The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share.<h4>Results</h4>We designed an output sensitive algorithm solving the  ...[more]

Similar Datasets

| S-EPMC4464037 | biostudies-literature
| S-EPMC9645238 | biostudies-literature
| S-EPMC7824336 | biostudies-literature
| S-EPMC6573793 | biostudies-other
| S-EPMC4182709 | biostudies-literature
| S-EPMC5013917 | biostudies-literature
| S-EPMC10011865 | biostudies-literature
| S-EPMC4916026 | biostudies-literature
| S-EPMC5613400 | biostudies-literature
| S-EPMC3629260 | biostudies-literature