Dataset Information

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

ABSTRACT:

Motivation

Many proteins with vastly dissimilar sequences are found to share a common fold, as evidenced in the wealth of structures now available in the Protein Data Bank. One idea that has found success in various applications is the concept of a reduced amino acid alphabet, wherein similar amino acids are clustered together. Given the structural similarity exhibited by many apparently dissimilar sequences, we undertook this study looking for improvements in fold recognition by comparing protein sequences written in a reduced alphabet.

Results

We tested over 150 of the amino acid clustering schemes proposed in the literature with all-versus-all pairwise sequence alignments of sequences in the Distance mAtrix aLIgnment database. We combined several metrics from information retrieval popular in the literature: mean precision, area under the Receiver Operating Characteristic curve and recall at a fixed error rate and found that, in contrast to previous work, reduced alphabets in many cases outperform full alphabets. We find that reduced alphabets can perform at a level comparable to full alphabets in correct pairwise alignment of sequences and can show increased sensitivity to pairs of sequences with structural similarity but low-sequence identity. Based on these results, we hypothesize that reduced alphabets may also show performance gains with more sophisticated methods such as profile and pattern searches.

Availability

A table of results as well as the substitution matrices and residue groupings from this study can be downloaded from (http://www.rpgroup.caltech.edu/publications/supplements/alphabets).

SUBMITTER: Peterson EL

PROVIDER: S-EPMC2732308 | biostudies-literature | 2009 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

Peterson Eric L EL Kondev Jané J Theriot Julie A JA Phillips Rob R

Bioinformatics (Oxford, England) 20090407 11

<h4>Motivation</h4>Many proteins with vastly dissimilar sequences are found to share a common fold, as evidenced in the wealth of structures now available in the Protein Data Bank. One idea that has found success in various applications is the concept of a reduced amino acid alphabet, wherein similar amino acids are clustered together. Given the structural similarity exhibited by many apparently dissimilar sequences, we undertook this study looking for improvements in fold recognition by compari ...[more]

PMID: 19351620

Similar Datasets

Project description:BackgroundIn structural genomics, an important goal is the detection and classification of protein-protein interactions, given the structures of the interacting partners. We have developed empirical energy functions to identify native structures of protein-protein complexes among sets of decoy structures. To understand the role of amino acid diversity, we parameterized a series of functions, using a hierarchy of amino acid alphabets of increasing complexity, with 2, 3, 4, 6, and 20 amino acid groups. Compared to previous work, we used the simplest possible functional form, with residue-residue interactions and a stepwise distance-dependence. We used increased computational resources, however, constructing 290,000 decoys for 219 protein-protein complexes, with a realistic docking protocol where the protein partners are flexible and interact through a molecular mechanics energy function. The energy parameters were optimized to correctly assign as many native complexes as possible. To resolve the multiple minimum problem in parameter space, over 64000 starting parameter guesses were tried for each energy function. The optimized functions were tested by cross validation on subsets of our native and decoy structures, by blind tests on series of native and decoy structures available on the Web, and on models for 13 complexes submitted to the CAPRI structure prediction experiment.ResultsPerformance is similar to several other statistical potentials of the same complexity. For example, the CAPRI target structure is correctly ranked ahead of 90% of its decoys in 6 cases out of 13. The hierarchy of amino acid alphabets leads to a coherent hierarchy of energy functions, with qualitatively similar parameters for similar amino acid types at all levels. Most remarkably, the performance with six amino acid classes is equivalent to that of the most detailed, 20-class energy function.ConclusionThis suggests that six carefully chosen amino acid classes are sufficient to encode specificity in protein-protein interactions, and provide a starting point to develop more complicated energy functions.

Dataset Information

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

Motivation

Results

Availability

Publications

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets