Unknown

Dataset Information

0

Discovering microbe-disease associations from the literature using a hierarchical long short-term memory network and an ensemble parser model.


ABSTRACT: With recent advances in biotechnology and sequencing technology, the microbial community has been intensively studied and discovered to be associated with many chronic as well as acute diseases. Even though a tremendous number of studies describing the association between microbes and diseases have been published, text mining methods that focus on such associations have been rarely studied. We propose a framework that combines machine learning and natural language processing methods to analyze the association between microbes and diseases. A hierarchical long short-term memory network was used to detect sentences that describe the association. For the sentences determined, two different parse tree-based search methods were combined to find the relation-describing word. The ensemble model of constituency parsing for structural pattern matching and dependency-based relation extraction improved the prediction accuracy. By combining deep learning and parse tree-based extractions, our proposed framework could extract the microbe-disease association with higher accuracy. The evaluation results showed that our system achieved an F-score of 0.8764 and 0.8524 in binary decisions and extracting relation words, respectively. As a case study, we performed a large-scale analysis of the association between microbes and diseases. Additionally, a set of common microbes shared by multiple diseases were also identified in this study. This study could provide valuable information for the major microbes that were studied for a specific disease. The code and data are available at https://github.com/DMnBI/mdi_predictor .

SUBMITTER: Park Y 

PROVIDER: S-EPMC7904816 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Discovering microbe-disease associations from the literature using a hierarchical long short-term memory network and an ensemble parser model.

Park Yesol Y   Lee Joohong J   Moon Heesang H   Choi Yong Suk YS   Rho Mina M  

Scientific reports 20210224 1


With recent advances in biotechnology and sequencing technology, the microbial community has been intensively studied and discovered to be associated with many chronic as well as acute diseases. Even though a tremendous number of studies describing the association between microbes and diseases have been published, text mining methods that focus on such associations have been rarely studied. We propose a framework that combines machine learning and natural language processing methods to analyze t  ...[more]

Similar Datasets

| S-EPMC1800308 | biostudies-literature
| S-EPMC8571724 | biostudies-literature
| S-EPMC10873905 | biostudies-literature
| S-EPMC3121722 | biostudies-literature
| S-EPMC1386711 | biostudies-literature
| S-EPMC5378308 | biostudies-literature
| S-EPMC6647533 | biostudies-literature
| S-EPMC6679344 | biostudies-literature
| S-EPMC6061985 | biostudies-literature
| S-EPMC4213352 | biostudies-literature