Dataset Information

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

ABSTRACT:

Objective

To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports.

Methods

Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer.

Results

When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer.

Conclusion

NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data.

SUBMITTER: Schroeck FR

PROVIDER: S-EPMC5696035 | biostudies-literature | 2017 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

Schroeck Florian R FR Patterson Olga V OV Alba Patrick R PR Pattison Erik A EA Seigne John D JD DuVall Scott L SL Robertson Douglas J DJ Sirovich Brenda B Goodney Philip P PP

Urology 20170912

<h4>Objective</h4>To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports.<h4>Methods</h4>Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and ...[more]

PMID: 28916254

Dataset Information

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

Objective

Methods

Results

Conclusion

Publications

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Automated Data Harmonization in Clinical Research: Natural Language Processing Approach
| S-EPMC12391522 | biostudies-literature

UMLS-based data augmentation for natural language processing of clinical research literature.
| S-EPMC7973470 | biostudies-literature

Natural language processing in Alzheimer's disease research: Systematic review of methods, data, and efficacy.
| S-EPMC11812127 | biostudies-literature

An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports.
| S-EPMC9808011 | biostudies-literature

DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction.
| S-EPMC10187451 | biostudies-literature

Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses
| S-EPMC12311501 | biostudies-literature

A natural language processing-driven map of the aging research landscape.
| S-EPMC12705180 | biostudies-literature

Transforming epilepsy research: A systematic review on natural language processing applications.
| S-EPMC10108221 | biostudies-literature

Using Open Geographic Data to Generate Natural Language Descriptions for Hydrological Sensor Networks.
| S-EPMC4541865 | biostudies-literature

Unsupervised learning and natural language processing highlight research trends in a superbug.
| S-EPMC10991725 | biostudies-literature