Unknown

Dataset Information

0

Increasing sequence search sensitivity with transitive alignments.


ABSTRACT: Sequence alignment is an important bioinformatics tool for identifying homology, but searching against the full set of available sequences is likely to result in many hits to poorly annotated sequences providing very little information. Consequently, we often want alignments against a specific subset of sequences: for instance, we are looking for sequences from a particular species, sequences that have known 3d-structures, sequences that have a reliable (curated) function annotation, and so on. Although such subset databases are readily available, they only represent a small fraction of all sequences. Thus, the likelihood of finding close homologs for query sequences is smaller, and the alignments will in general have lower scores. This makes it difficult to distinguish hits to homologous sequences from random hits to unrelated sequences. Here, we propose a method that addresses this problem by first aligning query sequences against a large database representing the corpus of known sequences, and then constructing indirect (or transitive) alignments by combining the results with alignments from the large database against the desired target database. We compare the results to direct pairwise alignments, and show that our method gives us higher sensitivity alignments against the target database.

SUBMITTER: Malde K 

PROVIDER: S-EPMC3573025 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Increasing sequence search sensitivity with transitive alignments.

Malde Ketil K   Furmanek Tomasz T  

PloS one 20130214 2


Sequence alignment is an important bioinformatics tool for identifying homology, but searching against the full set of available sequences is likely to result in many hits to poorly annotated sequences providing very little information. Consequently, we often want alignments against a specific subset of sequences: for instance, we are looking for sequences from a particular species, sequences that have known 3d-structures, sequences that have a reliable (curated) function annotation, and so on.  ...[more]

Similar Datasets

| S-EPMC2737730 | biostudies-literature
| S-EPMC2373660 | biostudies-literature
| S-EPMC1538804 | biostudies-literature
| S-EPMC1948021 | biostudies-literature
| S-EPMC7297217 | biostudies-literature
| S-EPMC1933219 | biostudies-literature
| S-EPMC6994045 | biostudies-literature
| S-EPMC3985675 | biostudies-literature
| S-EPMC2850363 | biostudies-literature
| S-EPMC4548304 | biostudies-literature