Unknown

Dataset Information

0

Fold homology detection using sequence fragment composition profiles of proteins.


ABSTRACT: The effectiveness of sequence alignment in detecting structural homology among protein sequences decreases markedly when pairwise sequence identity is low (the so-called "twilight zone" problem of sequence alignment). Alternative sequence comparison strategies able to detect structural kinship among highly divergent sequences are necessary to address this need. Among them are alignment-free methods, which use global sequence properties (such as amino acid composition) to identify structural homology in a rapid and straightforward way. We explore the viability of using tetramer sequence fragment composition profiles in finding structural relationships that lie undetected by traditional alignment. We establish a strategy to recast any given protein sequence into a tetramer sequence fragment composition profile, using a series of amino acid clustering steps that have been optimized for mutual information. Our method has the effect of compressing the set of 160,000 unique tetramers (if using the 20-letter amino acid alphabet) into a more tractable number of reduced tetramers (approximately 15-30), so that a meaningful tetramer composition profile can be constructed. We test remote homology detection at the topology and fold superfamily levels using a comprehensive set of fold homologs, culled from the CATH database that share low pairwise sequence similarity. Using the receiver-operating characteristic measure, we demonstrate potentially significant improvement in using information-optimized reduced tetramer composition, over methods relying only on the raw amino acid composition or on traditional sequence alignment, in homology detection at or below the "twilight zone".

SUBMITTER: Solis AD 

PROVIDER: S-EPMC2933786 | biostudies-literature | 2010 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fold homology detection using sequence fragment composition profiles of proteins.

Solis Armando D AD   Rackovsky Shalom R SR  

Proteins 20101001 13


The effectiveness of sequence alignment in detecting structural homology among protein sequences decreases markedly when pairwise sequence identity is low (the so-called "twilight zone" problem of sequence alignment). Alternative sequence comparison strategies able to detect structural kinship among highly divergent sequences are necessary to address this need. Among them are alignment-free methods, which use global sequence properties (such as amino acid composition) to identify structural homo  ...[more]

Similar Datasets

| S-EPMC2845653 | biostudies-literature
| S-EPMC5793832 | biostudies-literature
| S-EPMC2896139 | biostudies-literature
| S-EPMC7141871 | biostudies-literature
| S-EPMC4703025 | biostudies-literature
| S-EPMC2645910 | biostudies-literature
| S-EPMC7537947 | biostudies-literature
| S-EPMC1794419 | biostudies-literature
| S-EPMC2703895 | biostudies-literature
| S-EPMC7051832 | biostudies-literature