Unknown

Dataset Information

0

Fast and accurate phylogeny reconstruction using filtered spaced-word matches.


ABSTRACT:

Motivation

Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods.

Results

We propose Filtered Spaced Word Matches (FSWM) , a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don't-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don't-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don't-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes.

Availability and implementation

The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/.

Contact

chris.leimeister@stud.uni-goettingen.de.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Leimeister CA 

PROVIDER: S-EPMC5409309 | biostudies-literature | 2017 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fast and accurate phylogeny reconstruction using filtered spaced-word matches.

Leimeister Chris-André CA   Sohrabi-Jahromi Salma S   Morgenstern Burkhard B  

Bioinformatics (Oxford, England) 20170401 7


<h4>Motivation</h4>Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods.<h4>Results</h4>We propose Filtered Spaced Word Matches (FSWM) , a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-de  ...[more]

Similar Datasets

| S-EPMC6330006 | biostudies-literature
| S-EPMC7671388 | biostudies-literature
| S-EPMC4080745 | biostudies-literature
| S-EPMC7005598 | biostudies-literature
| S-EPMC2913662 | biostudies-literature
| S-EPMC403711 | biostudies-literature
| S-EPMC9853099 | biostudies-literature
| S-EPMC3002250 | biostudies-literature
| S-EPMC7310859 | biostudies-literature
| S-EPMC4576710 | biostudies-literature