Unknown

Dataset Information

0

PHOG-BLAST--a new generation tool for fast similarity search of protein families.


ABSTRACT:

Background

The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions.

Results

We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10-20 times less computer memory and computation time than PSI-BLAST.

Conclusion

Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less precise than rigorous profile-profile comparison method, though it runs much faster and can be used as a hit pre-selecting tool.

SUBMITTER: Merkeev IV 

PROVIDER: S-EPMC1522020 | biostudies-literature | 2006 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

PHOG-BLAST--a new generation tool for fast similarity search of protein families.

Merkeev Igor V IV   Mironov Andrey A AA  

BMC evolutionary biology 20060622


<h4>Background</h4>The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions.<h4>Results</h4>We developed a new algorithm called PHOG-BLAST for fast similarity search of profil  ...[more]

Similar Datasets

| S-EPMC3244761 | biostudies-literature
| S-EPMC3113943 | biostudies-literature
| S-EPMC146917 | biostudies-other
| S-EPMC3591303 | biostudies-literature
| S-EPMC1421445 | biostudies-literature
| S-EPMC3125810 | biostudies-literature
| S-EPMC3125779 | biostudies-literature
| S-EPMC4122987 | biostudies-literature
| S-EPMC2194796 | biostudies-literature
| S-EPMC7613299 | biostudies-literature