Unknown

Dataset Information

0

ECOMPASS: evaluative comparison of multiple protein alignments by statistical score.


ABSTRACT:

Motivation

Detecting subtle biologically relevant patterns in protein sequences often requires the construction of a large and accurate multiple sequence alignment (MSA). Methods for constructing MSAs are usually evaluated using benchmark alignments, which, however, typically contain very few sequences and are therefore inappropriate when dealing with large numbers of proteins.

Results

eCOMPASS addresses this problem using a statistical measure of relative alignment quality based on direct coupling analysis (DCA): to maintain protein structural integrity over evolutionary time, substitutions at one residue position typically result in compensating substitutions at other positions. eCOMPASS computes the statistical significance of the congruence between high scoring directly coupled pairs and 3D contacts in corresponding structures, which depends upon properly aligned homologous residues. We illustrate eCOMPASS using both simulated and real MSAs.

Availability and implementation

The eCOMPASS executable, C++ open source code and input data sets are available at https://www.igs.umaryland.edu/labs/neuwald/software/compass.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Neuwald AF 

PROVIDER: S-EPMC8545322 | biostudies-literature | 2021 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

eCOMPASS: evaluative comparison of multiple protein alignments by statistical score.

Neuwald Andrew F AF   Kolaczkowski Bryan D BD   Altschul Stephen F SF  

Bioinformatics (Oxford, England) 20211001 20


<h4>Motivation</h4>Detecting subtle biologically relevant patterns in protein sequences often requires the construction of a large and accurate multiple sequence alignment (MSA). Methods for constructing MSAs are usually evaluated using benchmark alignments, which, however, typically contain very few sequences and are therefore inappropriate when dealing with large numbers of proteins.<h4>Results</h4>eCOMPASS addresses this problem using a statistical measure of relative alignment quality based  ...[more]

Similar Datasets

| S-EPMC1687212 | biostudies-literature
| S-EPMC516024 | biostudies-literature
| S-EPMC6693267 | biostudies-literature
| S-EPMC7297217 | biostudies-literature
| S-EPMC1828169 | biostudies-literature
| S-EPMC3394275 | biostudies-literature
| S-EPMC2373449 | biostudies-literature
| S-EPMC1347381 | biostudies-literature
| S-EPMC10538487 | biostudies-literature
| S-EPMC1933189 | biostudies-literature