Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT.
Ontology highlight
ABSTRACT: BACKGROUND: As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI's BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary. FINDINGS: We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT. CONCLUSIONS: Despite faster programs miss sequence matches otherwise found by NCBI's BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task.
SUBMITTER: Moreno-Hagelsieb G
PROVIDER: S-EPMC4180129 | biostudies-literature | 2014
REPOSITORIES: biostudies-literature
ACCESS DATA