Dataset Information

Performance of a Natural Language Processing Method to Extract Stone Composition From the Electronic Health Record.

ABSTRACT:

Objectives

To demonstrate the utility of a natural language processing (NLP) algorithm for mining kidney stone composition in a large-scale electronic health records (EHR) repository.

Methods

We developed StoneX, a pattern-matching method for extracting kidney stone composition information from clinical notes. We trained the extraction algorithm on manually annotated text mentions of calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, uric acid, and struvite stones. We employed StoneX to identify patients with kidney stone composition data and mine >125 million notes from our institutional EHR. Analyses performed on the extracted patients included stone type conversions over time, survival analysis from a second stone surgery, and disease associations by stone composition to validate the phenotyping method against known associations.

Results

The NLP algorithm identified 45,235 text mentions corresponding to 11,585 patients. Overall, the system achieved positive predictive value >90% for calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, and struvite; except for uric acid (positive predictive value = 87.5%). Survival analysis from a second stone surgery showed statistically significant differences among stone types (P = .03). Several phenotype associations were found: uric acid-type 2 diabetes (odds ratio, OR = 2.69, 95% confidence intervals, CI = 1.91-3.79), struvite-neurogenic bladder (OR = 12.27, 95% CI = 4.33-34.79), struvite-urinary tract infection (OR = 7.36, 95% CI = 3.01-17.99), hydroxyapatite-pulmonary collapse (OR = 3.67, 95% CI = 2.10-6.42), hydroxyapatite-neurogenic bladder (OR = 5.23, 95% CI = 2.05-13.36), brushite-calcium metabolism disorder (OR = 4.59, 95% CI = 2.14-9.81), and brushite-hypercalcemia (OR = 4.09, 95% CI = 1.90-8.80).

Conclusion

NLP extraction of kidney stone composition from large-scale EHRs is feasible with high precision, enabling high-throughput epidemiological studies of kidney stone disease. These tools will enable high fidelity kidney stone research from the EHR.

SUBMITTER: Bejan CA

PROVIDER: S-EPMC6778032 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Performance of a Natural Language Processing Method to Extract Stone Composition From the Electronic Health Record.

Bejan Cosmin A CA Lee Daniel J DJ Xu Yaomin Y Hsi Ryan S RS

Urology 20190713

<h4>Objectives</h4>To demonstrate the utility of a natural language processing (NLP) algorithm for mining kidney stone composition in a large-scale electronic health records (EHR) repository.<h4>Methods</h4>We developed StoneX, a pattern-matching method for extracting kidney stone composition information from clinical notes. We trained the extraction algorithm on manually annotated text mentions of calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, uric acid, and s ...[more]

PMID: 31310771

Dataset Information

Performance of a Natural Language Processing Method to Extract Stone Composition From the Electronic Health Record.

Objectives

Methods

Results

Conclusion

Publications

Performance of a Natural Language Processing Method to Extract Stone Composition From the Electronic Health Record.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A Multi-Institutional Natural Language Processing Pipeline to Extract Performance Status From Electronic Health Records.
| S-EPMC11369884 | biostudies-literature

Development of a natural language processing algorithm to extract seizure types and frequencies from the electronic health record.
| S-EPMC9547963 | biostudies-literature

Development and Validation of a Natural Language Processing Algorithm to Extract Descriptors of Microbial Keratitis From the Electronic Health Record.
| S-EPMC8578049 | biostudies-literature

Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives.
| S-EPMC12493229 | biostudies-literature

Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing.
| S-EPMC10474977 | biostudies-literature

Prediction of severe chest injury using natural language processing from the electronic health record.
| S-EPMC7856032 | biostudies-literature

Prediction of intra-abdominal injury using natural language processing of electronic medical record data.
| S-EPMC11330356 | biostudies-literature

Leveraging natural language processing to identify eligible lung cancer screening patients with the electronic health record.
| S-EPMC11537206 | biostudies-literature

Using natural language processing to extract mammographic findings.
| S-EPMC4408241 | biostudies-literature

Identifying Goals of Care Conversations in the Electronic Health Record Using Natural Language Processing and Machine Learning.
| S-EPMC7769906 | biostudies-literature