Unknown

Dataset Information

0

Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.


ABSTRACT:

Motivation

A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations.

Results

We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder--a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases.

Discussion

Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles.

Availability

Freely available at: http://bioinf.umbc.edu/EMU/ftp.

SUBMITTER: Doughty E 

PROVIDER: S-EPMC3031038 | biostudies-literature | 2011 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

Doughty Emily E   Kertesz-Farkas Attila A   Bodenreider Olivier O   Thompson Gary G   Adadey Asa A   Peterson Thomas T   Kann Maricel G MG  

Bioinformatics (Oxford, England) 20101207 3


<h4>Motivation</h4>A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to pr  ...[more]

Similar Datasets

| S-EPMC2923139 | biostudies-literature
| S-EPMC5338769 | biostudies-literature
| S-EPMC5852055 | biostudies-literature
| S-EPMC2818245 | biostudies-literature
| S-EPMC3681788 | biostudies-literature
| S-EPMC8449627 | biostudies-literature
| S-EPMC8138883 | biostudies-literature
| S-EPMC5156472 | biostudies-literature
| S-EPMC5588695 | biostudies-literature
2013-12-23 | E-GEOD-53091 | biostudies-arrayexpress