Unknown

Dataset Information

0

DTMiner: identification of potential disease targets through biomedical literature mining.


ABSTRACT:

Motivation

Biomedical researchers often search through massive catalogues of literature to look for potential relationships between genes and diseases. Given the rapid growth of biomedical literature, automatic relation extraction, a crucial technology in biomedical literature mining, has shown great potential to support research of gene-related diseases. Existing work in this field has produced datasets that are limited both in scale and accuracy.

Results

In this study, we propose a reliable and efficient framework that takes large biomedical literature repositories as inputs, identifies credible relationships between diseases and genes, and presents possible genes related to a given disease and possible diseases related to a given gene. The framework incorporates name entity recognition (NER), which identifies occurrences of genes and diseases in texts, association detection whereby we extract and evaluate features from gene-disease pairs, and ranking algorithms that estimate how closely the pairs are related. The F1-score of the NER phase is 0.87, which is higher than existing studies. The association detection phase takes drastically less time than previous work while maintaining a comparable F1-score of 0.86. The end-to-end result achieves a 0.259 F1-score for the top 50 genes associated with a disease, which performs better than previous work. In addition, we released a web service for public use of the dataset.

Availability and implementation

The implementation of the proposed algorithms is publicly available at http://gdr-web.rwebox.com/public_html/index.php?page=download.php The web service is available at http://gdr-web.rwebox.com/public_html/index.php CONTACT: jenny.wei@astrazeneca.com or kzhu@cs.sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

SUBMITTER: Xu D 

PROVIDER: S-EPMC5181534 | biostudies-literature | 2016 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

DTMiner: identification of potential disease targets through biomedical literature mining.

Xu Dong D   Zhang Meizhuo M   Xie Yanping Y   Wang Fan F   Chen Ming M   Zhu Kenny Q KQ   Wei Jia J  

Bioinformatics (Oxford, England) 20160809 23


<h4>Motivation</h4>Biomedical researchers often search through massive catalogues of literature to look for potential relationships between genes and diseases. Given the rapid growth of biomedical literature, automatic relation extraction, a crucial technology in biomedical literature mining, has shown great potential to support research of gene-related diseases. Existing work in this field has produced datasets that are limited both in scale and accuracy.<h4>Results</h4>In this study, we propos  ...[more]

Similar Datasets

| S-EPMC3541249 | biostudies-literature
| S-EPMC6226017 | biostudies-literature
| S-EPMC5657248 | biostudies-literature
| S-EPMC3629260 | biostudies-literature
| S-EPMC7394276 | biostudies-literature
| S-EPMC6236289 | biostudies-literature
| S-EPMC1885641 | biostudies-literature
2013-12-23 | E-GEOD-53091 | biostudies-arrayexpress