Dataset Information

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

ABSTRACT:

Objective

Pathology reports are rich in narrative statements that encode a complex web of relations among medical concepts. These relations are routinely used by doctors to reason on diagnoses, but often require hand-crafted rules or supervised learning to extract into prespecified forms for computational disease modeling. We aim to automatically capture relations from narrative text without supervision.

Methods

We design a novel framework that translates sentences into graph representations, automatically mines sentence subgraphs, reduces redundancy in mined subgraphs, and automatically generates subgraph features for subsequent classification tasks. To ensure meaningful interpretations over the sentence graphs, we use the Unified Medical Language System Metathesaurus to map token subsequences to concepts, and in turn sentence graph nodes. We test our system with multiple lymphoma classification tasks that together mimic the differential diagnosis by a pathologist. To this end, we prevent our classifiers from looking at explicit mentions or synonyms of lymphomas in the text.

Results and conclusions

We compare our system with three baseline classifiers using standard n-grams, full MetaMap concepts, and filtered MetaMap concepts. Our system achieves high F-measures on multiple binary classifications of lymphoma (Burkitt lymphoma, 0.8; diffuse large B-cell lymphoma, 0.909; follicular lymphoma, 0.84; Hodgkin lymphoma, 0.912). Significance tests show that our system outperforms all three baselines. Moreover, feature analysis identifies subgraph features that contribute to improved performance; these features agree with the state-of-the-art knowledge about lymphoma classification. We also highlight how these unsupervised relation features may provide meaningful insights into lymphoma classification.

SUBMITTER: Luo Y

PROVIDER: S-EPMC4147603 | biostudies-literature | 2014 Sep-Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

Luo Yuan Y Sohani Aliyah R AR Hochberg Ephraim P EP Szolovits Peter P

Journal of the American Medical Informatics Association : JAMIA 20140115 5

<h4>Objective</h4>Pathology reports are rich in narrative statements that encode a complex web of relations among medical concepts. These relations are routinely used by doctors to reason on diagnoses, but often require hand-crafted rules or supervised learning to extract into prespecified forms for computational disease modeling. We aim to automatically capture relations from narrative text without supervision.<h4>Methods</h4>We design a novel framework that translates sentences into graph repr ...[more]

PMID: 24431333

Dataset Information

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

Objective

Methods

Results and conclusions

Publications

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Grasping frequent subgraph mining for bioinformatics applications.
| S-EPMC6122726 | biostudies-literature

Mining subgraph coverage patterns from graph transactions.
| S-EPMC8636072 | biostudies-literature

A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.
| S-EPMC10036244 | biostudies-literature

Depression Classification Using Frequent Subgraph Mining Based on Pattern Growth of Frequent Edge in Functional Magnetic Resonance Imaging Uncertain Network.
| S-EPMC9106560 | biostudies-literature

Symbolic rule-based classification of lung cancer stages from free-text pathology reports.
| S-EPMC2995652 | biostudies-literature

Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology.
| S-EPMC8664934 | biostudies-literature

Detection of Complexes in Biological Networks Through Diversified Dense Subgraph Mining.
| S-EPMC5610454 | biostudies-literature

Significant subgraph mining for neural network inference with multiple comparisons correction.
| S-EPMC10312259 | biostudies-literature

Approximate subgraph matching-based literature mining for biomedical events and relations.
| S-EPMC3629260 | biostudies-literature

Analysis of Hormone Receptor Status in Primary and Recurrent Breast Cancer Via Data Mining Pathology Reports.
| S-EPMC6401490 | biostudies-literature