Unknown

Dataset Information

0

DiMeX: A Text Mining System for Mutation-Disease Association Extraction.


ABSTRACT: The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

SUBMITTER: Mahmood AS 

PROVIDER: S-EPMC4830514 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

Mahmood A S M Ashique AS   Wu Tsung-Jung TJ   Mazumder Raja R   Vijay-Shanker K K  

PloS one 20160413 4


The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that prepr  ...[more]

Similar Datasets

| S-EPMC4583433 | biostudies-literature
| S-EPMC2901371 | biostudies-literature
| S-EPMC3939821 | biostudies-literature
| S-EPMC5870606 | biostudies-literature
| S-EPMC441622 | biostudies-literature
| S-EPMC3475109 | biostudies-literature
| S-EPMC6007211 | biostudies-literature