Dataset Information

A sequence sub-sampling algorithm increases the power to detect distant homologues.

ABSTRACT: Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.

SUBMITTER: Johnston CR

PROVIDER: S-EPMC1174907 | biostudies-other | 2005

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

A sequence sub-sampling algorithm increases the power to detect distant homologues.

Johnston Catrióna R CR Shields Denis C DC

Nucleic acids research 20050708 12

Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classi ...[more]

PMID: 16006623

Dataset Information

A sequence sub-sampling algorithm increases the power to detect distant homologues.

Publications

A sequence sub-sampling algorithm increases the power to detect distant homologues.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Similar Datasets

Sampling re-design increases power to detect change in the Great Barrier Reef's inshore water quality.
| S-EPMC9333274 | biostudies-literature

Algorithm to find distant repeats in a single protein sequence.
| S-EPMC2586129 | biostudies-literature

Dense sampling of bird diversity increases power of comparative genomics.
| S-EPMC7759463 | biostudies-literature

Author Correction: Dense sampling of bird diversity increases power of comparative genomics.
| S-EPMC8081657 | biostudies-literature

Inhibition of distant caspase homologues by natural caspase inhibitors.
| S-EPMC1221988 | biostudies-other

Modelling local gene networks increases power to detect trans-acting genetic effects on gene expression.
| S-EPMC4765046 | biostudies-literature

Combining evidence of natural selection with association analysis increases power to detect malaria-resistance variants.
| S-EPMC1950820 | biostudies-literature

An algorithm to detect unexpected increases in frequency of reports of adverse events in EudraVigilance.
| S-EPMC5765515 | biostudies-literature

High-resolution modeling of transmembrane helical protein structures from distant homologues.
| S-EPMC4031050 | biostudies-literature

smORFer: a modular algorithm to detect small ORFs in prokaryotes
2021-05-12 | GSE150601 | GEO