Unknown

Dataset Information

0

A graph-search framework for associating gene identifiers with documents.


ABSTRACT: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method.We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach.The utility of a named entity recognition (NER) system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components.

SUBMITTER: Cohen WW 

PROVIDER: S-EPMC1617121 | biostudies-other | 2006 Oct

REPOSITORIES: biostudies-other

altmetric image

Publications

A graph-search framework for associating gene identifiers with documents.

Cohen William W WW   Minkov Einat E  

BMC bioinformatics 20061010


<h4>Background</h4>One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a  ...[more]

Similar Datasets

| S-EPMC6964641 | biostudies-literature
| S-EPMC7570338 | biostudies-literature
| S-EPMC5945037 | biostudies-literature
| S-EPMC522014 | biostudies-literature
| S-EPMC7392341 | biostudies-literature
| S-EPMC6093941 | biostudies-literature
| S-EPMC6231400 | biostudies-literature
| S-EPMC5732329 | biostudies-literature
| S-EPMC6996733 | biostudies-literature
| S-EPMC4447347 | biostudies-literature