Unknown

Dataset Information

0

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.


ABSTRACT: Motivation:Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. Results:We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. Availability and implementation:The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. Contact:andyli@ece.ufl.edu or aconesa@ufl.edu. Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Zhu Q 

PROVIDER: S-EPMC5925775 | biostudies-literature | 2018 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

Zhu Qile Q   Li Xiaolin X   Conesa Ana A   Pereira Cécile C  

Bioinformatics (Oxford, England) 20180501 9


<h4>Motivation</h4>Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with  ...[more]

Similar Datasets

| S-EPMC7014657 | biostudies-literature
| S-EPMC6956779 | biostudies-literature
| S-EPMC7485218 | biostudies-literature
| S-EPMC6247938 | biostudies-literature
| S-EPMC8242017 | biostudies-literature
| S-EPMC6798575 | biostudies-literature
| S-EPMC7959609 | biostudies-literature
| S-EPMC11373323 | biostudies-literature
| S-EPMC6041968 | biostudies-literature
| S-EPMC7872256 | biostudies-literature