Dataset Information

Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?

ABSTRACT:

SUBMITTER: Li YC

PROVIDER: S-EPMC8208416 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?

Li Yi-Chu YC Chen Hung-Hsun HH Horng-Shing Lu Henry H Hondar Wu Hung-Ta HT Chang Ming-Chau MC Chou Po-Hsin PH

Clinical orthopaedics and related research 20210701 7

<h4>Background</h4>Vertebral fractures are the most common osteoporotic fractures in older individuals. Recent studies suggest that the performance of artificial intelligence is equal to humans in detecting osteoporotic fractures, such as fractures of the hip, distal radius, and proximal humerus. However, whether artificial intelligence performs as well in the detection of vertebral fractures on plain lateral spine radiographs has not yet been reported.<h4>Questions/purposes</h4>(1) What is the ...[more]

PMID: 33651768

Similar Datasets

Project description:ObjectiveTo develop a multi-scene model that can automatically segment acute vertebral compression fractures (VCFs) from spine radiographs.MethodsIn this multicenter study, we collected radiographs from five hospitals (Hospitals A-E) between November 2016 and October 2019. The study included participants with acute VCFs, as well as healthy controls. For the development of the Positioning and Focus Network (PFNet), we used a training dataset consisting of 1071 participants from Hospitals A and B. The validation dataset included 458 participants from Hospitals A and B, whereas external test datasets 1-3 included 301 participants from Hospital C, 223 from Hospital D, and 261 from Hospital E, respectively. We evaluated the segmentation performance of the PFNet model and compared it with previously described approaches. Additionally, we used qualitative comparison and gradient-weighted class activation mapping (Grad-CAM) to explain the feature learning and segmentation results of the PFNet model.ResultsThe PFNet model achieved accuracies of 99.93%, 98.53%, 99.21%, and 100% for the segmentation of acute VCFs in the validation dataset and external test datasets 1-3, respectively. The receiver operating characteristic curves comparing the four models across the validation and external test datasets consistently showed that the PFNet model outperformed other approaches, achieving the highest values for all measures. The qualitative comparison and Grad-CAM provided an intuitive view of the interpretability and effectiveness of our PFNet model.ConclusionIn this study, we successfully developed a multi-scene model based on spine radiographs for precise preoperative and intraoperative segmentation of acute VCFs.Critical relevance statementOur PFNet model demonstrated high accuracy in multi-scene segmentation in clinical settings, making it a significant advancement in this field.Key pointsThis study developed the first multi-scene deep learning model capable of segmenting acute VCFs from spine radiographs. The model's architecture consists of two crucial modules: an attention-guided module and a supervised decoding module. The exceptional generalization and consistently superior performance of our model were validated using multicenter external test datasets.

Project description:Purpose To create and validate a computer system with which to detect, localize, and classify compression fractures and measure bone density of thoracic and lumbar vertebral bodies on computed tomographic (CT) images. Materials and Methods Institutional review board approval was obtained, and informed consent was waived in this HIPAA-compliant retrospective study. A CT study set of 150 patients (mean age, 73 years; age range, 55-96 years; 92 women, 58 men) with (n = 75) and without (n = 75) compression fractures was assembled. All case patients were age and sex matched with control subjects. A total of 210 thoracic and lumbar vertebrae showed compression fractures and were electronically marked and classified by a radiologist. Prototype fully automated spinal segmentation and fracture detection software were then used to analyze the study set. System performance was evaluated with free-response receiver operating characteristic analysis. Results Sensitivity for detection or localization of compression fractures was 95.7% (201 of 210; 95% confidence interval [CI]: 87.0%, 98.9%), with a false-positive rate of 0.29 per patient. Additionally, sensitivity was 98.7% and specificity was 77.3% at case-based receiver operating characteristic curve analysis. Accuracy for classification by Genant type (anterior, middle, or posterior height loss) was 0.95 (107 of 113; 95% CI: 0.89, 0.98), with weighted κ of 0.90 (95% CI: 0.81, 0.99). Accuracy for categorization by Genant height loss grade was 0.68 (77 of 113; 95% CI: 0.59, 0.76), with a weighted κ of 0.59 (95% CI: 0.47, 0.71). The average bone attenuation for T12-L4 vertebrae was 146 HU ± 29 (standard deviation) in case patients and 173 HU ± 42 in control patients; this difference was statistically significant (P < .001). Conclusion An automated machine learning computer system was created to detect, anatomically localize, and categorize vertebral compression fractures at high sensitivity and with a low false-positive rate, as well as to calculate vertebral bone density, on CT images. © RSNA, 2017 Online supplemental material is available for this article.

Project description:PurposeTo develop and validate a deep learning radiomics (DLR) model that uses X-ray images to predict the classification of osteoporotic vertebral fractures (OVFs).Material and methodsThe study encompassed a cohort of 942 patients, involving examinations of 1076 vertebrae through X-ray, CT, and MRI across three distinct hospitals. The OVFs were categorized as class 0, 1, or 2 based on the Assessment System of Thoracolumbar Osteoporotic Fracture. The dataset was divided randomly into four distinct subsets: a training set comprising 712 samples, an internal validation set with 178 samples, an external validation set containing 111 samples, and a prospective validation set consisting of 75 samples. The ResNet-50 architectural model was used to implement deep transfer learning (DTL), undergoing -pre-training separately on the RadImageNet and ImageNet datasets. Features from DTL and radiomics were extracted and integrated using X-ray images. The optimal fusion feature model was identified through least absolute shrinkage and selection operator logistic regression. Evaluation of the predictive capabilities for OVFs classification involved eight machine learning models, assessed through receiver operating characteristic curves employing the "One-vs-Rest" strategy. The Delong test was applied to compare the predictive performance of the superior RadImageNet model against the ImageNet model.ResultsFollowing pre-training separately on RadImageNet and ImageNet datasets, feature selection and fusion yielded 17 and 12 fusion features, respectively. Logistic regression emerged as the optimal machine learning algorithm for both DLR models. Across the training set, internal validation set, external validation set, and prospective validation set, the macro-average Area Under the Curve (AUC) based on the RadImageNet dataset surpassed those based on the ImageNet dataset, with statistically significant differences observed (P<0.05). Utilizing the binary "One-vs-Rest" strategy, the model based on the RadImageNet dataset demonstrated superior efficacy in predicting Class 0, achieving an AUC of 0.969 and accuracy of 0.863. Predicting Class 1 yielded an AUC of 0.945 and accuracy of 0.875, while for Class 2, the AUC and accuracy were 0.809 and 0.692, respectively.ConclusionThe DLR model, based on the RadImageNet dataset, outperformed the ImageNet model in predicting the classification of OVFs, with generalizability confirmed in the prospective validation set.

Project description:ObjectivesTo assess how an artificial intelligence (AI) algorithm performs against five experienced musculoskeletal radiologists in diagnosing scaphoid fractures and whether it aids their diagnosis on conventional multi-view radiographs.MethodsFour datasets of conventional hand, wrist, and scaphoid radiographs were retrospectively acquired at two hospitals (hospitals A and B). Dataset 1 (12,990 radiographs from 3353 patients, hospital A) and dataset 2 (1117 radiographs from 394 patients, hospital B) were used for training and testing a scaphoid localization and laterality classification component. Dataset 3 (4316 radiographs from 840 patients, hospital A) and dataset 4 (688 radiographs from 209 patients, hospital B) were used for training and testing the fracture detector. The algorithm was compared with the radiologists in an observer study. Evaluation metrics included sensitivity, specificity, positive predictive value (PPV), area under the characteristic operating curve (AUC), Cohen's kappa coefficient (κ), fracture localization precision, and reading time.ResultsThe algorithm detected scaphoid fractures with a sensitivity of 72%, specificity of 93%, PPV of 81%, and AUC of 0.88. The AUC of the algorithm did not differ from each radiologist (0.87 [radiologists' mean], p ≥ .05). AI assistance improved five out of ten pairs of inter-observer Cohen's κ agreements (p < .05) and reduced reading time in four radiologists (p < .001), but did not improve other metrics in the majority of radiologists (p ≥ .05).ConclusionsThe AI algorithm detects scaphoid fractures on conventional multi-view radiographs at the level of five experienced musculoskeletal radiologists and could significantly shorten their reading time.Key points• An artificial intelligence algorithm automatically detects scaphoid fractures on conventional multi-view radiographs at the same level of five experienced musculoskeletal radiologists. • There is preliminary evidence that automated scaphoid fracture detection can significantly shorten the reading time of musculoskeletal radiologists.

Project description:PurposeTo assess the utility of deep learning in the detection of geographic atrophy (GA) from color fundus photographs and to explore potential utility in detecting central GA (CGA).DesignA deep learning model was developed to detect the presence of GA in color fundus photographs, and 2 additional models were developed to detect CGA in different scenarios.ParticipantsA total of 59 812 color fundus photographs from longitudinal follow-up of 4582 participants in the Age-Related Eye Disease Study (AREDS) dataset. Gold standard labels were from human expert reading center graders using a standardized protocol.MethodsA deep learning model was trained to use color fundus photographs to predict GA presence from a population of eyes with no AMD to advanced AMD. A second model was trained to predict CGA presence from the same population. A third model was trained to predict CGA presence from the subset of eyes with GA. For training and testing, 5-fold cross-validation was used. For comparison with human clinician performance, model performance was compared with that of 88 retinal specialists.Main outcome measuresArea under the curve (AUC), accuracy, sensitivity, specificity, and precision.ResultsThe deep learning models (GA detection, CGA detection from all eyes, and centrality detection from GA eyes) had AUCs of 0.933-0.976, 0.939-0.976, and 0.827-0.888, respectively. The GA detection model had accuracy, sensitivity, specificity, and precision of 0.965 (95% confidence interval [CI], 0.959-0.971), 0.692 (0.560-0.825), 0.978 (0.970-0.985), and 0.584 (0.491-0.676), respectively, compared with 0.975 (0.971-0.980), 0.588 (0.468-0.707), 0.982 (0.978-0.985), and 0.368 (0.230-0.505) for the retinal specialists. The CGA detection model had values of 0.966 (0.957-0.975), 0.763 (0.641-0.885), 0.971 (0.960-0.982), and 0.394 (0.341-0.448). The centrality detection model had values of 0.762 (0.725-0.799), 0.782 (0.618-0.945), 0.729 (0.543-0.916), and 0.799 (0.710-0.888).ConclusionsA deep learning model demonstrated high accuracy for the automated detection of GA. The AUC was noninferior to that of human retinal specialists. Deep learning approaches may also be applied to the identification of CGA. The code and pretrained models are publicly available at https://github.com/ncbi-nlp/DeepSeeNet.

Project description:BackgroundMachine learning has been used extensively in clinical text classification tasks. Deep learning approaches using word embeddings have been recently gaining momentum in biomedical applications. In an effort to automate the identification of altered mental status (AMS) in emergency department provider notes for the purpose of decision support, we compare the performance of classic bag-of-words-based machine learning classifiers and novel deep learning approaches.MethodsWe used a case-control study design to extract an adequate number of clinical notes with AMS and non-AMS based on ICD codes. The notes were parsed to extract the history of present illness, which was used as the clinical text for the classifiers. The notes were manually labeled by clinicians. As a baseline for comparison, we tested several traditional bag-of-words based classifiers. We then tested several deep learning models using a convolutional neural network architecture with three different types of word embeddings, a pre-trained word2vec model and two models without pre-training but with different word embedding dimensions.ResultsWe evaluated the models on 1130 labeled notes from the emergency department. The deep learning models had the best overall performance with an area under the ROC curve of 98.5% and an accuracy of 94.5%. Pre-training word embeddings on the unlabeled corpus reduced training iterations and had performance that was statistically no different than the other deep learning models.ConclusionThis supervised deep learning approach performs exceedingly well for the detection of AMS symptoms in clinical text in our environment. Further work is needed for the generalizability of these findings, including evaluation of these models in other types of clinical notes and other environments. The results seem promising for the ultimate use of these types of classifiers in combination with other information derived from the electronic health records as input for clinical decision support.

Dataset Information

Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?

Publications

Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets