Unknown

Dataset Information

0

PSI-BLAST pseudocounts and the minimum description length principle.


ABSTRACT: Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.

SUBMITTER: Altschul SF 

PROVIDER: S-EPMC2647318 | biostudies-literature | 2009 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

PSI-BLAST pseudocounts and the minimum description length principle.

Altschul Stephen F SF   Gertz E Michael EM   Agarwala Richa R   Schäffer Alejandro A AA   Yu Yi-Kuo YK  

Nucleic acids research 20081216 3


Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length  ...[more]

Similar Datasets

| S-EPMC3608252 | biostudies-literature
| S-EPMC3117365 | biostudies-literature
| S-EPMC146917 | biostudies-other
| S-EPMC1874647 | biostudies-literature
| S-EPMC2881392 | biostudies-literature
| S-EPMC5455086 | biostudies-literature
| S-EPMC2651257 | biostudies-literature
| S-EPMC1450308 | biostudies-literature
| S-EPMC2211314 | biostudies-literature
| S-EPMC6936045 | biostudies-literature