Dataset Information

Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: A retrospective study.

ABSTRACT:

Background

Pneumothorax can precipitate a life-threatening emergency due to lung collapse and respiratory or circulatory distress. Pneumothorax is typically detected on chest X-ray; however, treatment is reliant on timely review of radiographs. Since current imaging volumes may result in long worklists of radiographs awaiting review, an automated method of prioritizing X-rays with pneumothorax may reduce time to treatment. Our objective was to create a large human-annotated dataset of chest X-rays containing pneumothorax and to train deep convolutional networks to screen for potentially emergent moderate or large pneumothorax at the time of image acquisition.

Methods and findings

In all, 13,292 frontal chest X-rays (3,107 with pneumothorax) were visually annotated by radiologists. This dataset was used to train and evaluate multiple network architectures. Images showing large- or moderate-sized pneumothorax were considered positive, and those with trace or no pneumothorax were considered negative. Images showing small pneumothorax were excluded from training. Using an internal validation set (n = 1,993), we selected the 2 top-performing models; these models were then evaluated on a held-out internal test set based on area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV). The final internal test was performed initially on a subset with small pneumothorax excluded (as in training; n = 1,701), then on the full test set (n = 1,990), with small pneumothorax included as positive. External evaluation was performed using the National Institutes of Health (NIH) ChestX-ray14 set, a public dataset labeled for chest pathology based on text reports. All images labeled with pneumothorax were considered positive, because the NIH set does not classify pneumothorax by size. In internal testing, our "high sensitivity model" produced a sensitivity of 0.84 (95% CI 0.78-0.90), specificity of 0.90 (95% CI 0.89-0.92), and AUC of 0.94 for the test subset with small pneumothorax excluded. Our "high specificity model" showed sensitivity of 0.80 (95% CI 0.72-0.86), specificity of 0.97 (95% CI 0.96-0.98), and AUC of 0.96 for this set. PPVs were 0.45 (95% CI 0.39-0.51) and 0.71 (95% CI 0.63-0.77), respectively. Internal testing on the full set showed expected decreased performance (sensitivity 0.55, specificity 0.90, and AUC 0.82 for high sensitivity model and sensitivity 0.45, specificity 0.97, and AUC 0.86 for high specificity model). External testing using the NIH dataset showed some further performance decline (sensitivity 0.28-0.49, specificity 0.85-0.97, and AUC 0.75 for both). Due to labeling differences between internal and external datasets, these findings represent a preliminary step towards external validation.

Conclusions

We trained automated classifiers to detect moderate and large pneumothorax in frontal chest X-rays at high levels of performance on held-out test data. These models may provide a high specificity screening solution to detect moderate or large pneumothorax on images collected when human review might be delayed, such as overnight. They are not intended for unsupervised diagnosis of all pneumothoraces, as many small pneumothoraces (and some larger ones) are not detected by the algorithm. Implementation studies are warranted to develop appropriate, effective clinician alerts for the potentially critical finding of pneumothorax, and to assess their impact on reducing time to treatment.

SUBMITTER: Taylor AG

PROVIDER: S-EPMC6245672 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundPneumothorax can lead to a life-threatening emergency. The experienced radiologists can offer precise diagnosis according to the chest radiographs. The localization of the pneumothorax lesions will help to quickly diagnose, which will be benefit for the patients in the underdevelopment areas lack of the experienced radiologists. In recent years, with the development of large neural network architectures and medical imaging datasets, deep learning methods have become a methodology of choice for analyzing medical images. The objective of this study was to the construct convolutional neural networks to localize the pneumothorax lesions in chest radiographs.Methods and findingsWe developed a convolutional neural network, called CheXLocNet, for the segmentation of pneumothorax lesions. The SIIM-ACR Pneumothorax Segmentation dataset was used to train and validate CheXLocNets. The training dataset contained 2079 radiographs with the annotated lesion areas. We trained six CheXLocNets with various hyperparameters. Another 300 annotated radiographs were used to select parameters of these CheXLocNets as the validation set. We determined the optimal parameters by the AP50 (average precision at the intersection over union (IoU) equal to 0.50), a segmentation evaluation metric used by several well-known competitions. Then CheXLocNets were evaluated by a test set (1082 normal radiographs and 290 disease radiographs), based on the classification metrics: area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV); segmentation metrics: IoU and Dice score. For the classification, CheXLocNet with best sensitivity produced an AUC of 0.87, sensitivity of 0.78 (95% CI 0.73-0.83), and specificity of 0.78 (95% CI 0.76-0.81). CheXLocNet with best specificity produced an AUC of 0.79, sensitivity of 0.46 (95% CI 0.40-0.52), and specificity of 0.92 (95% CI 0.90-0.94). For the segmentation, CheXLocNet with best sensitivity produced an IoU of 0.69 and Dice score of 0.72. CheXLocNet with best specificity produced an IoU of 0.77 and Dice score of 0.79. We combined them to form an ensemble CheXLocNet. The ensemble CheXLocNet produced an IoU of 0.81 and Dice score of 0.82. Our CheXLocNet succeeded in automatically detecting pneumothorax lesions, without any human guidance.ConclusionsIn this study, we proposed a deep learning network, called, CheXLocNet, for the automatic segmentation of chest radiographs to detect pneumothorax. Our CheXLocNets generated accurate classification results and high-quality segmentation masks for the pneumothorax at the same time. This technology has the potential to improve healthcare delivery and increase access to chest radiograph expertise for the detection of diseases. Furthermore, the segmentation results can offer comprehensive geometric information of lesions, which can benefit monitoring the sequential development of lesions with high accuracy. Thus, CheXLocNets can be further extended to be a reliable clinical decision support tool. Although we used transfer learning in training CheXLocNet, the parameters of CheXLocNet was still large for the radiograph dataset. Further work is necessary to prune CheXLocNet suitable for the radiograph dataset.

Project description:BackgroundAirspace disease as seen on chest X-rays is an important point in triage for patients initially presenting to the emergency department with suspected COVID-19 infection. The purpose of this study is to evaluate a previously trained interpretable deep learning algorithm for the diagnosis and prognosis of COVID-19 pneumonia from chest X-rays obtained in the ED.MethodsThis retrospective study included 2456 (50% RT-PCR positive for COVID-19) adult patients who received both a chest X-ray and SARS-CoV-2 RT-PCR test from January 2020 to March of 2021 in the emergency department at a single U.S.InstitutionA total of 2000 patients were included as an additional training cohort and 456 patients in the randomized internal holdout testing cohort for a previously trained Siemens AI-Radiology Companion deep learning convolutional neural network algorithm. Three cardiothoracic fellowship-trained radiologists systematically evaluated each chest X-ray and generated an airspace disease area-based severity score which was compared against the same score produced by artificial intelligence. The interobserver agreement, diagnostic accuracy, and predictive capability for inpatient outcomes were assessed. Principal statistical tests used in this study include both univariate and multivariate logistic regression.ResultsOverall ICC was 0.820 (95% CI 0.790-0.840). The diagnostic AUC for SARS-CoV-2 RT-PCR positivity was 0.890 (95% CI 0.861-0.920) for the neural network and 0.936 (95% CI 0.918-0.960) for radiologists. Airspace opacities score by AI alone predicted ICU admission (AUC = 0.870) and mortality (0.829) in all patients. Addition of age and BMI into a multivariate log model improved mortality prediction (AUC = 0.906).ConclusionThe deep learning algorithm provides an accurate and interpretable assessment of the disease burden in COVID-19 pneumonia on chest radiographs. The reported severity scores correlate with expert assessment and accurately predicts important clinical outcomes. The algorithm contributes additional prognostic information not currently incorporated into patient management.

Project description:Acute respiratory distress syndrome (ARDS) is a life-threatening lung injury with global prevalence and high mortality. Chest x-rays (CXR) are critical in the early diagnosis and treatment of ARDS. However, imaging findings may not result in proper identification of ARDS due to a number of reasons, including nonspecific appearance of radiological features, ambiguity in a patient's case due to the pathological stage of the disease, and poor inter-rater reliability from interpretations of CXRs by multiple clinical experts. This study demonstrates the potential capability of methodologies in artificial intelligence, machine learning, and image processing to overcome these challenges and quantitatively assess CXRs for presence of ARDS. We propose and describe Directionality Measure, a novel feature engineering technique used to capture the "cloud-like" appearance of diffuse alveolar damage as a mathematical concept. This study also examines the effectiveness of using an off-the-shelf, pretrained deep learning model as a feature extractor in addition to standard features extracted from the histogram and gray-level co-occurrence matrix (GLCM). Data was collected from hospitalized patients at Michigan Medicine's intensive care unit and the cohort's inclusion criteria was specifically designed to be representative of patients at risk of developing ARDS. Multiple machine learning models were used to evaluate these features with 5-fold cross-validation and the final performance was reported on a hold-out, temporally distinct test set. With AdaBoost, Directionality Measure achieved an accuracy of 78% and AUC of 74% - outperforming classification results using features from the histogram (75% accuracy and 73% AUC), GLCM (76% accuracy and 73% AUC), and ResNet-50 (77% accuracy and 73% AUC). Further experimental results demonstrated that using all feature sets in combination achieved the best overall performance, yielding an accuracy of 83% and AUC of 79% with AdaBoost. These results demonstrate the potential capability of using the proposed methodologies to complement current clinical analysis for detection of ARDS from CXRs.