Unknown

Dataset Information

0

Adjusting scoring matrices to correct overextended alignments.


ABSTRACT: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (<33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions.We measured local alignment start/stop boundary accuracy using a set of queries where the correct alignment boundaries were known, and found that 7% of BLASTP and 8% of SSEARCH alignment boundaries were overextended. Overextended alignments include non-homologous sequences; they occur most frequently between sequences that are more closely related (>33% identity). Adjusting the scoring matrix to reflect the identity of the homologous sequence can correct higher identity overextended alignment boundaries. In addition, the scoring matrix that produced a correct alignment could be reliably predicted based on the sequence identity seen in the original BLOSUM62 alignment. Realigning with the predicted scoring matrix corrected 37% of all overextended alignments, resulting in more correct alignments than using BLOSUM62 alone.

SUBMITTER: Mills LJ 

PROVIDER: S-EPMC3834790 | biostudies-literature | 2013 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Adjusting scoring matrices to correct overextended alignments.

Mills Lauren J LJ   Pearson William R WR  

Bioinformatics (Oxford, England) 20130831 23


<h4>Motivation</h4>Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (<33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions.<h4>Results</h4>We measured local alignment start/stop boundary accuracy using a set of queries where the correct alignment boundaries were known,  ...[more]

Similar Datasets

| S-EPMC11063651 | biostudies-literature
| S-EPMC1087784 | biostudies-literature
| S-EPMC8388040 | biostudies-literature
| S-EPMC7586916 | biostudies-literature
| S-EPMC2590594 | biostudies-literature
| S-EPMC2373449 | biostudies-literature
| S-EPMC5978496 | biostudies-literature
| S-EPMC4021105 | biostudies-literature
| S-EPMC6149929 | biostudies-literature
| S-EPMC6841959 | biostudies-literature