Unknown

Dataset Information

0

LINNAEUS: a species name identification system for biomedical literature.


ABSTRACT:

Background

The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.

Results

In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers.

Conclusions

LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at http://linnaeus.sourceforge.net/.

SUBMITTER: Gerner M 

PROVIDER: S-EPMC2836304 | biostudies-literature | 2010 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

LINNAEUS: a species name identification system for biomedical literature.

Gerner Martin M   Nenadic Goran G   Bergman Casey M CM  

BMC bioinformatics 20100211


<h4>Background</h4>The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.<h4>Results</h4>In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically ge  ...[more]

Similar Datasets

| S-EPMC5023114 | biostudies-literature
2013-12-23 | E-GEOD-53091 | biostudies-arrayexpress
| S-EPMC5181534 | biostudies-literature
2013-12-23 | GSE53091 | GEO
| S-EPMC6642118 | biostudies-literature
| S-EPMC4107897 | biostudies-literature
| S-EPMC10281857 | biostudies-literature
| S-EPMC3984244 | biostudies-literature
| S-EPMC2277400 | biostudies-literature
| S-EPMC3771067 | biostudies-literature