Unknown

Dataset Information

0

Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT.


ABSTRACT: BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI's BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary. FINDINGS: We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT. CONCLUSIONS: Despite faster programs miss sequence matches otherwise found by NCBI's BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task.

SUBMITTER: Moreno-Hagelsieb G 

PROVIDER: S-EPMC4180129 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT.

Moreno-Hagelsieb Gabriel G   Hudy-Yuffa Brigitte B  

BMC research notes 20140916


<h4>Background</h4>As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI's BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might  ...[more]

Similar Datasets

| S-EPMC4101998 | biostudies-literature
| S-EPMC5850840 | biostudies-literature
| S-EPMC6334396 | biostudies-literature
| S-EPMC4253983 | biostudies-literature
| S-EPMC6435814 | biostudies-literature
| S-EPMC4094424 | biostudies-literature
| S-EPMC5829572 | biostudies-literature
| S-EPMC6775668 | biostudies-other
| S-EPMC3488263 | biostudies-literature
| S-EPMC3282942 | biostudies-literature