Dataset Information

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports.

ABSTRACT:

Importance

A rapid learning health care system for oncology will require scalable methods for extracting clinical end points from electronic health records (EHRs). Outside of clinical trials, end points such as cancer progression and response are not routinely encoded into structured data.

Objective

To determine whether deep natural language processing can extract relevant cancer outcomes from radiologic reports, a ubiquitous but unstructured EHR data source.

Design, setting, and participants

A retrospective cohort study evaluated 1112 patients who underwent tumor genotyping for a diagnosis of lung cancer and participated in the Dana-Farber Cancer Institute PROFILE study from June 26, 2013, to July 2, 2018.

Exposures

Patients were divided into curation and reserve sets. Human abstractors applied a structured framework to radiologic reports for the curation set to ascertain the presence of cancer and changes in cancer status over time (ie, worsening/progressing vs improving/responding). Deep learning models were then trained to capture these outcomes from report text and subsequently evaluated in a 10% held-out test subset of curation patients. Cox proportional hazards regression models compared human and machine curations of disease-free survival, progression-free survival, and time to improvement/response in the curation set, and measured associations between report classification and overall survival in the curation and reserve sets.

Main outcomes and measures

The primary outcome was area under the receiver operating characteristic curve (AUC) for deep learning models; secondary outcomes were time to improvement/response, disease-free survival, progression-free survival, and overall survival.

Results

A total of 2406 patients were included (mean [SD] age, 66.5 [10.8] years; 1428 female [59.7%]; 2170 [90.2%] white). Radiologic reports (n = 14 230) were manually reviewed for 1112 patients in the curation set. In the test subset (n = 109), deep learning models identified the presence of cancer, improvement/response, and worsening/progression with accurate discrimination (AUC >0.90). Machine and human curation yielded similar measurements of disease-free survival (hazard ratio [HR] for machine vs human curation, 1.18; 95% CI, 0.71-1.95); progression-free survival (HR, 1.11; 95% CI, 0.71-1.71); and time to improvement/response (HR, 1.03; 95% CI, 0.65-1.64). Among 15 000 additional reports for 1294 reserve set patients, algorithm-detected cancer worsening/progression was associated with decreased overall survival (HR for mortality, 4.04; 95% CI, 2.78-5.85), and improvement/response was associated with increased overall survival (HR, 0.41; 95% CI, 0.22-0.77).

Conclusions and relevance

Deep natural language processing appears to speed curation of relevant cancer outcomes and facilitate rapid learning from EHR data.

SUBMITTER: Kehl KL

PROVIDER: S-EPMC6659158 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports.

Kehl Kenneth L KL Elmarakeby Haitham H Nishino Mizuki M Van Allen Eliezer M EM Lepisto Eva M EM Hassett Michael J MJ Johnson Bruce E BE Schrag Deborah D

JAMA oncology 20191001 10

<h4>Importance</h4>A rapid learning health care system for oncology will require scalable methods for extracting clinical end points from electronic health records (EHRs). Outside of clinical trials, end points such as cancer progression and response are not routinely encoded into structured data.<h4>Objective</h4>To determine whether deep natural language processing can extract relevant cancer outcomes from radiologic reports, a ubiquitous but unstructured EHR data source.<h4>Design, setting, a ...[more]

PMID: 31343664

Similar Datasets

Project description:BackgroundNatural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports.MethodsWe conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.ResultsWe present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results.ConclusionsAutomated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.

Project description:BackgroundAbstraction of critical data from unstructured radiologic reports using natural language processing (NLP) is a powerful tool to automate the detection of important clinical features and enhance research efforts. We present a set of NLP approaches to identify critical findings in patients with acute ischemic stroke from radiology reports of computed tomography (CT) and magnetic resonance imaging (MRI).MethodsWe trained machine learning classifiers to identify categorical outcomes of edema, midline shift (MLS), hemorrhagic transformation, and parenchymal hematoma, as well as rule-based systems (RBS) to identify intraventricular hemorrhage (IVH) and continuous MLS measurements within CT/MRI reports. Using a derivation cohort of 2289 reports from 550 individuals with acute middle cerebral artery territory ischemic strokes, we externally validated our models on reports from a separate institution as well as from patients with ischemic strokes in any vascular territory.ResultsIn all data sets, a deep neural network with pretrained biomedical word embeddings (BioClinicalBERT) achieved the highest discrimination performance for binary prediction of edema (area under precision recall curve [AUPRC] > 0.94), MLS (AUPRC > 0.98), hemorrhagic conversion (AUPRC > 0.89), and parenchymal hematoma (AUPRC > 0.76). BioClinicalBERT outperformed lasso regression (p < 0.001) for all outcomes except parenchymal hematoma (p = 0.755). Tailored RBS for IVH and continuous MLS outperformed BioClinicalBERT (p < 0.001) and linear regression, respectively (p < 0.001).ConclusionsOur study demonstrates robust performance and external validity of a core NLP tool kit for identifying both categorical and continuous outcomes of ischemic stroke from unstructured radiographic text data. Medically tailored NLP methods have multiple important big data applications, including scalable electronic phenotyping, augmentation of clinical risk prediction models, and facilitation of automatic alert systems in the hospital setting.

Project description:Study aimTo develop and apply a natural language processing algorithm for characterization of patients diagnosed with chronic pancreatitis in a diverse integrated U.S. healthcare system.MethodsRetrospective cohort study including patients initially diagnosed with chronic pancreatitis (CP) within a regional integrated healthcare system between January 1, 2006 and December 31, 2015. Imaging reports from these patients were extracted from the electronic medical record system and split into training, validation and implementation datasets. A natural language processing (NLP) algorithm was first developed through the training dataset to identify specific features (atrophy, calcification, pseudocyst, cyst and main duct dilatation) from free-text radiology reports. The validation dataset was applied to validate the performance by comparing against the manual chart review. The developed algorithm was then applied to the implementation dataset. We classified patients with calcification(s) or ≥2 radiographic features as advanced CP. We compared etiology, comorbid conditions, treatment parameters as well as survival between advanced CP and others diagnosed during the study period.Results6,346 patients were diagnosed with CP during the study period with 58,085 radiology studies performed. For individual features, NLP yielded sensitivity from 88.7% to 95.3%, specificity from 98.2% to 100.0%. A total of 3,672 patients met cohort inclusion criteria: 1,330 (36.2%) had evidence of advanced CP. Patients with advanced CP had increased frequency of smoking (57.8% vs. 43.0%), diabetes (47.6% vs. 35.9%) and underweight body mass index (6.6% vs. 3.6%), all p<0.001. Mortality from pancreatic cancer was higher in advanced CP (15.3/1,000 person-year vs. 2.8/1,000, p<0.001). Underweight BMI (HR 1.6, 95% CL 1.2, 2.1), smoking (HR 1.4, 95% CL 1.1, 1.7) and diabetes (HR 1.4, 95% CL 1.2, 1.6) were independent risk factors for mortality.ConclusionPatients with advanced CP experienced increased disease-related complications and pancreatic cancer-related mortality. Excess all-cause mortality was driven primarily by potentially modifiable risk factors including malnutrition, smoking and diabetes.

Project description:BackgroundAutomated language analysis of radiology reports using natural language processing (NLP) can provide valuable information on patients' health and disease. With its rapid development, NLP studies should have transparent methodology to allow comparison of approaches and reproducibility. This systematic review aims to summarise the characteristics and reporting quality of studies applying NLP to radiology reports.MethodsWe searched Google Scholar for studies published in English that applied NLP to radiology reports of any imaging modality between January 2015 and October 2019. At least two reviewers independently performed screening and completed data extraction. We specified 15 criteria relating to data source, datasets, ground truth, outcomes, and reproducibility for quality assessment. The primary NLP performance measures were precision, recall and F1 score.ResultsOf the 4,836 records retrieved, we included 164 studies that used NLP on radiology reports. The commonest clinical applications of NLP were disease information or classification (28%) and diagnostic surveillance (27.4%). Most studies used English radiology reports (86%). Reports from mixed imaging modalities were used in 28% of the studies. Oncology (24%) was the most frequent disease area. Most studies had dataset size > 200 (85.4%) but the proportion of studies that described their annotated, training, validation, and test set were 67.1%, 63.4%, 45.7%, and 67.7% respectively. About half of the studies reported precision (48.8%) and recall (53.7%). Few studies reported external validation performed (10.8%), data availability (8.5%) and code availability (9.1%). There was no pattern of performance associated with the overall reporting quality.ConclusionsThere is a range of potential clinical applications for NLP of radiology reports in health services and research. However, we found suboptimal reporting quality that precludes comparison, reproducibility, and replication. Our results support the need for development of reporting standards specific to clinical NLP studies.

Dataset Information

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports.

Importance

Objective

Design, setting, and participants

Exposures

Main outcomes and measures

Results

Conclusions and relevance

Publications

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets