Dataset Information

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

ABSTRACT: Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

SUBMITTER: Sievers F

PROVIDER: S-EPMC3261699 | biostudies-literature | 2011 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Sievers Fabian F Wilm Andreas A Dineen David D Gibson Toby J TJ Karplus Kevin K Li Weizhong W Lopez Rodrigo R McWilliam Hamish H Remmert Michael M Söding Johannes J Thompson Julie D JD Higgins Desmond G DG

Molecular systems biology 20111011

Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new ...[more]

PMID: 21988835

Dataset Information

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Publications

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Simple chained guide trees give high-quality protein multiple sequence alignments.
| S-EPMC4115562 | biostudies-literature

Mirage2's high-quality spliced protein-to-genome mappings produce accurate multiple-sequence alignments of isoforms.
| S-EPMC10166558 | biostudies-literature

Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments.
| S-EPMC7297217 | biostudies-literature

Leveraging protein language models for accurate multiple sequence alignments.
| S-EPMC10538487 | biostudies-literature

PROMALS web server for accurate multiple protein sequence alignments.
| S-EPMC1933189 | biostudies-literature

A statistical score for assessing the quality of multiple sequence alignments.
| S-EPMC1687212 | biostudies-literature

Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments.
| S-EPMC5939968 | biostudies-literature

Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment.
| S-EPMC3585936 | biostudies-literature

Protein language models trained on multiple sequence alignments learn phylogenetic relationships.
| S-EPMC9588007 | biostudies-literature

PROMALS3D web server for accurate multiple protein sequence and structure alignments.
| S-EPMC2447800 | biostudies-literature