Unknown

Dataset Information

0

Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform.


ABSTRACT: Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the 'over-alignment' bias. Despite linear time complexity for the computation of marginal likelihoods, the overall method's complexity is cubic in sequence length. Inspired by the popular aligner MAFFT, we propose a new technique to accelerate the evolutionary indel based alignment. Amino acid sequences are converted to sequences representing their physicochemical properties, and homologous blocks are identified by multi-scale short-time Fourier transform. Three three-dimensional DP matrices are then created under PIP, with homologous blocks defining sparse structures where most cells are excluded from the calculations. The homologous blocks are connected through intermediate 'linking blocks'. The homologous and linking blocks are aligned under PIP as independent DP sub-matrices and their tracebacks merged to yield the final alignment. The new algorithm can largely profit from parallel computing, yielding a theoretical speed-up estimated to be proportional to the cubic power of the number of sub-blocks in the DP matrices. We compare the new method to the original PIP approach and demonstrate it on real data.

SUBMITTER: Maiolo M 

PROVIDER: S-EPMC7671320 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform.

Maiolo Massimo M   Ulzega Simone S   Gil Manuel M   Anisimova Maria M  

NAR genomics and bioinformatics 20201106 4


Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the 'over-alignment' bias. Despite linear time complexity for the computation of marginal likelihoods, the overall method's complexity is cubic in sequence length. Inspired by the popular aligner MAFFT, we propose a new technique  ...[more]

Similar Datasets

| S-EPMC6151001 | biostudies-literature
| S-EPMC7682253 | biostudies-literature
| S-EPMC3009689 | biostudies-literature
| S-EPMC3495709 | biostudies-literature
| S-EPMC2705234 | biostudies-literature
| S-EPMC4768690 | biostudies-literature
| S-EPMC6202327 | biostudies-literature
| S-EPMC8175789 | biostudies-literature
| S-EPMC3358119 | biostudies-other
| S-EPMC5860160 | biostudies-literature