Unknown

Dataset Information

0

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.


ABSTRACT:

Objective

We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel.

Methods

Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations.

Results

The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations.

Discussion

Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations.

Conclusions

In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility.

SUBMITTER: Hanauer DA 

PROVIDER: S-EPMC4147617 | biostudies-literature | 2014 Sep-Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

Hanauer David A DA   Saeed Mohammed M   Zheng Kai K   Mei Qiaozhu Q   Shedden Kerby K   Aronson Alan R AR   Ramakrishnan Naren N  

Journal of the American Medical Informatics Association : JAMIA 20140613 5


<h4>Objective</h4>We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel.<h4>Methods</h4>Pairwis  ...[more]

Similar Datasets

| S-EPMC1380197 | biostudies-literature
2017-05-11 | GSE98739 | GEO
| S-EPMC3465642 | biostudies-literature
| S-EPMC4339517 | biostudies-literature
| S-EPMC6642195 | biostudies-literature
2017-05-11 | GSE98738 | GEO
| S-EPMC3325791 | biostudies-other
| S-EPMC7756745 | biostudies-literature
| S-EPMC4091448 | biostudies-literature
| S-EPMC4120011 | biostudies-literature