Unknown

Dataset Information

0

In search for more accurate alignments in the twilight zone.


ABSTRACT: A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold-prediction benchmarks. The alignments obtained by sequence-sequence or sequence-structure matching algorithms differ significantly from the structural alignments. To study this problem, we developed a simplified method to explicitly enumerate all possible alignments for a pair of proteins. This allowed us to estimate the number of significantly different alignments for a given scoring method that score better than the structural alignment. Using several examples of distantly related proteins, we show that for standard sequence-sequence alignment methods, the number of significantly different alignments is usually large, often about 10(10) alternatives. This distance decreases when the alignment method is improved, but the number is still too large for the brute force enumeration approach. More effective strategies were needed, so we evaluated and compared two well-known approaches for searching the space of suboptimal alignments. We combined their best features and produced a hybrid method, which yielded alignments that surpassed the original alignments for about 50% of protein pairs with minimal computational effort.

SUBMITTER: Jaroszewski L 

PROVIDER: S-EPMC2373660 | biostudies-literature | 2002 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

In search for more accurate alignments in the twilight zone.

Jaroszewski Lukasz L   Li Weizhong W   Godzik Adam A  

Protein science : a publication of the Protein Society 20020701 7


A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold-prediction benchmarks. The alignments obtained by sequence-sequence or sequence-structure matching algorithms differ significantly from the structural  ...[more]

Similar Datasets

| S-EPMC3561995 | biostudies-literature
| S-EPMC2876130 | biostudies-literature
| S-EPMC2391167 | biostudies-literature
| S-EPMC5558704 | biostudies-other
| S-EPMC8115882 | biostudies-literature
| S-EPMC3013127 | biostudies-literature
| S-EPMC3573025 | biostudies-literature
| S-EPMC7297217 | biostudies-literature
| S-EPMC4123902 | biostudies-literature
| S-EPMC2527991 | biostudies-literature