Unknown

Dataset Information

0

Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.


ABSTRACT:

Objective

Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine.

Materials and methods

We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora.

Results

The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively.

Discussion

To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature.

Conclusions

The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale.

SUBMITTER: Singhal A 

PROVIDER: S-EPMC4926749 | biostudies-literature | 2016 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.

Singhal Ayush A   Simmons Michael M   Lu Zhiyong Z  

Journal of the American Medical Informatics Association : JAMIA 20160427 4


<h4>Objective</h4>Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine.<h4>Materials and methods</h4>We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular  ...[more]

Similar Datasets

| S-EPMC5130168 | biostudies-literature
| S-EPMC4457984 | biostudies-literature
| S-EPMC7394276 | biostudies-literature
| S-EPMC4830514 | biostudies-literature
| S-EPMC6692532 | biostudies-literature
| S-EPMC7746840 | biostudies-literature
| S-EPMC5845379 | biostudies-literature
| S-EPMC5042555 | biostudies-literature
| S-EPMC3541249 | biostudies-literature
| S-EPMC3395893 | biostudies-other