Unknown

Dataset Information

0

A geometric interpretation for local alignment-free sequence comparison.


ABSTRACT: Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant [Formula: see text] as the similarity measure, local alignment-free comparison between a pair of sequences can be formulated as the problem of finding the maximum bichromatic dot product between two sets of points in high dimensions. We introduce a geometric framework that reduces this problem to that of finding the bichromatic closest pair (BCP), allowing the properties of the underlying metric to be leveraged. Local alignment-free sequence comparison can be solved by making a quadratic number of alignment-free substring comparisons. We show both theoretically and through empirical results on simulated data that our approximation algorithm requires a subquadratic number of such comparisons and trades only a small amount of accuracy to achieve this efficiency. Therefore, our algorithm can extend the current usage of alignment-free-based methods and can also be regarded as a substitute for local alignment algorithms in many biological studies.

SUBMITTER: Behnam E 

PROVIDER: S-EPMC3704055 | biostudies-literature | 2013 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

A geometric interpretation for local alignment-free sequence comparison.

Behnam Ehsan E   Waterman Michael S MS   Smith Andrew D AD  

Journal of computational biology : a journal of computational molecular cell biology 20130701 7


Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant [Formula: see text] as the similarity measure, local alignment-free comparison between a pair of sequences can be formulated as the problem of finding the maximum bichromatic dot product between two sets of  ...[more]

Similar Datasets

| S-EPMC3799466 | biostudies-literature
| S-EPMC6659240 | biostudies-literature
| S-EPMC3123933 | biostudies-literature
| S-EPMC2818754 | biostudies-literature
| S-EPMC5627421 | biostudies-literature
| S-EPMC4080745 | biostudies-literature
| S-EPMC1559724 | biostudies-literature
| S-EPMC3581251 | biostudies-literature
| S-EPMC6937637 | biostudies-literature
| S-EPMC7963080 | biostudies-literature