Unknown

Dataset Information

0

Automatic assessment of alignment quality.


ABSTRACT: Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets.

SUBMITTER: Lassmann T 

PROVIDER: S-EPMC1316116 | biostudies-literature | 2005

REPOSITORIES: biostudies-literature

altmetric image

Publications

Automatic assessment of alignment quality.

Lassmann Timo T   Sonnhammer Erik L L EL  

Nucleic acids research 20051216 22


Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the c  ...[more]

Similar Datasets

| S-EPMC4281176 | biostudies-literature
| S-EPMC6939343 | biostudies-literature
| S-EPMC5831847 | biostudies-literature
| S-EPMC2039753 | biostudies-literature
| S-EPMC2952876 | biostudies-literature
| S-EPMC3820657 | biostudies-other
| S-EPMC2853116 | biostudies-literature
2014-02-01 | GSE52393 | GEO
| S-EPMC2440426 | biostudies-literature
2014-02-01 | E-GEOD-52393 | biostudies-arrayexpress