Unknown

Dataset Information

0

A corpus of plant-disease relations in the biomedical domain.


ABSTRACT:

Background

Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not available yet. Thus, we aimed to construct such a corpus.

Methods and results

In this study, we designed and annotated a plant-disease relations corpus, and proposed a computational model to predict plant-disease relations using the corpus. We categorized plant and disease relations into four types: treatments of diseases, causes of diseases, associations, and negative relations. To construct a corpus of plant-disease relations, we first created its annotation guidelines and randomly selected 200 Medline abstracts. From these abstracts, we identified 1,405 and 1,755 plant and disease mentions, annotated to 105 and 237 unique plant and disease identifiers, respectively. When we selected sentences containing at least one plant and one disease mention, we extracted 878 plant and 1,077 disease entities, which finally generated a corpus of plant-disease relations including 1,309 relations from 199 abstracts. To verify the effectiveness of the corpus, we proposed a convolutional neural network model with the shortest dependency path (SDP-CNN) and applied it to the constructed corpus. The micro F-score with ten-fold cross-validation was found to be 0.764. We also applied the proposed SDP-CNN model to all Medline abstracts. When we measured its performance for 483 randomly selected plant-disease co-occurring sentences, the model showed a precision of 0.707.

Conclusion

The plant-disease relations corpus is unique and represents an important resource for biomedical text-mining. The corpus of plant and disease relations is available at http://gcancer.org/pdr/.

SUBMITTER: Kim B 

PROVIDER: S-EPMC6713337 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

A corpus of plant-disease relations in the biomedical domain.

Kim Baeksoo B   Choi Wonjun W   Lee Hyunju H  

PloS one 20190828 8


<h4>Background</h4>Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not availa  ...[more]

Similar Datasets

| S-EPMC4830473 | biostudies-other
| S-EPMC9135735 | biostudies-literature
| S-EPMC4307891 | biostudies-literature
| S-EPMC3128403 | biostudies-literature
| S-EPMC4602280 | biostudies-literature
| S-EPMC9252824 | biostudies-literature
| S-EPMC3833657 | biostudies-literature
| S-EPMC2774701 | biostudies-literature
| S-EPMC10042099 | biostudies-literature
| S-EPMC10281857 | biostudies-literature