Unknown

Dataset Information

0

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics.


ABSTRACT:

Motivation

Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROC(n)) score, the area under the ROC curve (AUC) of a 'pooled' ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROC(n) score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROC(n) score can be very sensitive to retrieval results from as little as a single query.

Methods

To replace the pooled ROC(n) score, we propose the Threshold Average Precision (TAP-k), a measure closely related to the well-known average precision in information retrieval, but reflecting the usage of E-values in bioinformatics. Furthermore, in addition to conditions previously given in the literature, we introduce three new criteria that an ideal measure of retrieval efficacy should satisfy.

Results

PSI-BLAST, GLOBAL, HMMER and RPS-BLAST provided examples of using the TAP-k and pooled ROC(n) scores to evaluate sequence retrieval algorithms. In particular, compelling examples using real data highlight the drawbacks of the pooled ROC(n) score, showing that it can produce evaluations skewing far from intuitive expectations. In contrast, the TAP-k satisfies most of the criteria desired in an ideal measure of retrieval efficacy.

Availability and implementation

The TAP-k web server and downloadable Perl script are freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html.ncbi/tap/

SUBMITTER: Carroll HD 

PROVIDER: S-EPMC2894514 | biostudies-literature | 2010 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics.

Carroll Hyrum D HD   Kann Maricel G MG   Sheetlin Sergey L SL   Spouge John L JL  

Bioinformatics (Oxford, England) 20100526 14


<h4>Motivation</h4>Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROC(n)) score, the area under the ROC curve (AUC) of a 'pooled' ROC curve, truncated at n irrelevant records. Unfortu  ...[more]

Similar Datasets

| S-EPMC6755314 | biostudies-literature
| S-EPMC5714153 | biostudies-literature
| S-EPMC3212578 | biostudies-literature
| S-EPMC5993001 | biostudies-literature
| S-EPMC7231912 | biostudies-literature
| S-EPMC5823756 | biostudies-literature
| S-EPMC8265094 | biostudies-literature
| S-EPMC6884132 | biostudies-literature
| S-EPMC7908712 | biostudies-literature
| S-EPMC5558639 | biostudies-other