Unknown

Dataset Information

0

Homologous over-extension: a challenge for iterative similarity searches.


ABSTRACT: We have characterized a novel type of PSI-BLAST error, homologous over-extension (HOE), using embedded PFAM domain queries on searches against a reference library containing Pfam-annotated UniProt sequences and random synthetic sequences. PSI-BLAST makes two types of errors: alignments to non-homologous regions and HOE alignments that begin in a homologous region, but extend beyond the homology into neighboring sequence regions. When the neighboring sequence region contains a non-homologous domain, PSI-BLAST can incorporate the unrelated sequence into its position specific scoring matrix, which then finds non-homologous proteins with significant expectation values. HOE accounts for the largest fraction of the initial false positive (FP) errors, and the largest fraction of FPs at iteration 5. In searches against complete protein sequences, 5-9% of alignments at iteration 5 are non-homologous. HOE frequently begins in a partial protein domain; when partial domains are removed from the library, HOE errors decrease from 16 to 3% of weighted coverage (hard queries; 35-5% for sampled queries) and no-error searches increase from 2 to 58% weighed coverage (hard; 16-78% sampled). When HOE is reduced by not extending previously found sequences, PSI-BLAST specificity improves 4-8-fold, with little loss in sensitivity.

SUBMITTER: Gonzalez MW 

PROVIDER: S-EPMC2853128 | biostudies-literature | 2010 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Homologous over-extension: a challenge for iterative similarity searches.

Gonzalez Mileidy W MW   Pearson William R WR  

Nucleic acids research 20100111 7


We have characterized a novel type of PSI-BLAST error, homologous over-extension (HOE), using embedded PFAM domain queries on searches against a reference library containing Pfam-annotated UniProt sequences and random synthetic sequences. PSI-BLAST makes two types of errors: alignments to non-homologous regions and HOE alignments that begin in a homologous region, but extend beyond the homology into neighboring sequence regions. When the neighboring sequence region contains a non-homologous doma  ...[more]

Similar Datasets

| S-EPMC3240574 | biostudies-literature
| S-EPMC1489924 | biostudies-literature
| S-EPMC2441795 | biostudies-literature
| S-EPMC6439793 | biostudies-other
| S-EPMC4375400 | biostudies-literature
| S-EPMC1160184 | biostudies-literature
| S-EPMC107726 | biostudies-literature
| S-EPMC2703971 | biostudies-literature
| S-EPMC3403935 | biostudies-literature
| S-EPMC2756558 | biostudies-other