Unknown

Dataset Information

0

Amino acid "little Big Bang": representing amino acid substitution matrices as dot products of Euclidian vectors.


ABSTRACT:

Background

Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices.

Results

We present a Euclidian vector representation of the amino acids, obtained by the singular value decomposition of the substitution matrices. The substitution matrix entries correspond to the dot product of amino acid vectors. We apply this vector encoding to the study of the relative importance of various amino acid physicochemical properties upon the substitution matrices. We also characterize and compare the PAM and BLOSUM series substitution matrices.

Conclusions

This vector encoding introduces a Euclidian metric in the amino acid space, consistent with substitution matrices. Such a numerical description of the amino acid is useful when intrinsic properties of amino acids are necessary, for instance, building sequence profiles or finding consensus sequences, using machine learning algorithms such as Support Vector Machine and Neural Networks algorithms.

SUBMITTER: Zimmermann K 

PROVIDER: S-EPMC3098074 | biostudies-literature | 2010 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Amino acid "little Big Bang": representing amino acid substitution matrices as dot products of Euclidian vectors.

Zimmermann Karel K   Gibrat Jean-François JF  

BMC bioinformatics 20100104


<h4>Background</h4>Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices.<h4>Results</h4>We present a Euclidian vector representation of the amino acids, obtained by the singular value decomposition of the subst  ...[more]

Similar Datasets

| S-EPMC307629 | biostudies-literature
| S-EPMC8445205 | biostudies-literature
| S-EPMC9821064 | biostudies-literature
| S-EPMC6841959 | biostudies-literature
| S-EPMC2588515 | biostudies-literature
| S-EPMC3904525 | biostudies-other
| S-EPMC8442976 | biostudies-literature
| S-EPMC11258327 | biostudies-literature
| S-EPMC5932575 | biostudies-literature
| S-EPMC4575589 | biostudies-literature