Unknown

Dataset Information

0

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.


ABSTRACT:

Motivation

Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods.

Results

In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don't-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don't-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points.

Availability and implementation

http://spacedanchor.gobics.de.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Leimeister CA 

PROVIDER: S-EPMC6330006 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.

Leimeister Chris-André CA   Dencker Thomas T   Morgenstern Burkhard B  

Bioinformatics (Oxford, England) 20190101 2


<h4>Motivation</h4>Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods.<h4>Results</h4>In this article,  ...[more]

Similar Datasets

| S-EPMC5409309 | biostudies-literature
| S-EPMC4080745 | biostudies-literature
| S-EPMC6528274 | biostudies-literature
| S-EPMC7671388 | biostudies-literature
| S-EPMC1764478 | biostudies-literature
| S-EPMC3880068 | biostudies-literature
| S-EPMC4594664 | biostudies-literature
| S-EPMC403711 | biostudies-literature
| S-EPMC1955456 | biostudies-literature
| S-EPMC10311327 | biostudies-literature