Unknown

Dataset Information

0

A sequence sub-sampling algorithm increases the power to detect distant homologues.


ABSTRACT: Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.

SUBMITTER: Johnston CR 

PROVIDER: S-EPMC1174907 | biostudies-other | 2005

REPOSITORIES: biostudies-other

altmetric image

Publications

A sequence sub-sampling algorithm increases the power to detect distant homologues.

Johnston Catrióna R CR   Shields Denis C DC  

Nucleic acids research 20050708 12


Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classi  ...[more]

Similar Datasets

| S-EPMC9333274 | biostudies-literature
| S-EPMC2586129 | biostudies-literature
| S-EPMC7759463 | biostudies-literature
| S-EPMC8081657 | biostudies-literature
| S-EPMC1221988 | biostudies-other
| S-EPMC4765046 | biostudies-literature
| S-EPMC1950820 | biostudies-literature
| S-EPMC5765515 | biostudies-literature
| S-EPMC4031050 | biostudies-literature
2021-05-12 | GSE150601 | GEO