Dataset Information

FAMSA: Fast and accurate multiple sequence alignment of huge protein families.

ABSTRACT: Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8?GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa.

SUBMITTER: Deorowicz S

PROVIDER: S-EPMC5037421 | biostudies-literature | 2016 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

FAMSA: Fast and accurate multiple sequence alignment of huge protein families.

Deorowicz Sebastian S Debudaj-Grabysz Agnieszka A Gudyś Adam A

Scientific reports 20160927

Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise ...[more]

PMID: 27670777

Dataset Information

FAMSA: Fast and accurate multiple sequence alignment of huge protein families.

Publications

FAMSA: Fast and accurate multiple sequence alignment of huge protein families.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

mTM-align: an algorithm for fast and accurate multiple protein structure alignment.
| S-EPMC5946935 | biostudies-literature

Recursive MAGUS: Scalable and accurate multiple sequence alignment.
| S-EPMC8523058 | biostudies-literature

Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.
| S-EPMC4382095 | biostudies-literature

X-Mapper: fast and accurate sequence alignment via gapped x-mers.
| S-EPMC11755882 | biostudies-literature

Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels.
| S-EPMC11221656 | biostudies-literature

Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee.
| S-EPMC3303701 | biostudies-literature

Sequence embedding for fast construction of guide trees for multiple sequence alignment.
| S-EPMC2893182 | biostudies-literature

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.
| S-EPMC1955456 | biostudies-literature

SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.
| S-EPMC3389763 | biostudies-literature

ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference.
| S-EPMC7735675 | biostudies-literature