Unknown

Dataset Information

0

Protein sequence alignment analysis by local covariation: coevolution statistics detect benchmark alignment errors.


ABSTRACT: The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files/.

SUBMITTER: Dickson RJ 

PROVIDER: S-EPMC3371027 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Protein sequence alignment analysis by local covariation: coevolution statistics detect benchmark alignment errors.

Dickson Russell J RJ   Gloor Gregory B GB  

PloS one 20120608 6


The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Loca  ...[more]

Similar Datasets

| S-EPMC2893159 | biostudies-literature
| S-EPMC6311937 | biostudies-literature
| S-EPMC1635699 | biostudies-literature
| S-EPMC2374782 | biostudies-literature
| S-EPMC280650 | biostudies-literature
| S-EPMC3042914 | biostudies-literature
| S-EPMC4595117 | biostudies-literature
| S-EPMC2818754 | biostudies-literature
| S-EPMC2940242 | biostudies-literature
| S-EPMC1087786 | biostudies-literature