Project description:ObjectivesTo assess how an artificial intelligence (AI) algorithm performs against five experienced musculoskeletal radiologists in diagnosing scaphoid fractures and whether it aids their diagnosis on conventional multi-view radiographs.MethodsFour datasets of conventional hand, wrist, and scaphoid radiographs were retrospectively acquired at two hospitals (hospitals A and B). Dataset 1 (12,990 radiographs from 3353 patients, hospital A) and dataset 2 (1117 radiographs from 394 patients, hospital B) were used for training and testing a scaphoid localization and laterality classification component. Dataset 3 (4316 radiographs from 840 patients, hospital A) and dataset 4 (688 radiographs from 209 patients, hospital B) were used for training and testing the fracture detector. The algorithm was compared with the radiologists in an observer study. Evaluation metrics included sensitivity, specificity, positive predictive value (PPV), area under the characteristic operating curve (AUC), Cohen's kappa coefficient (κ), fracture localization precision, and reading time.ResultsThe algorithm detected scaphoid fractures with a sensitivity of 72%, specificity of 93%, PPV of 81%, and AUC of 0.88. The AUC of the algorithm did not differ from each radiologist (0.87 [radiologists' mean], p ≥ .05). AI assistance improved five out of ten pairs of inter-observer Cohen's κ agreements (p < .05) and reduced reading time in four radiologists (p < .001), but did not improve other metrics in the majority of radiologists (p ≥ .05).ConclusionsThe AI algorithm detects scaphoid fractures on conventional multi-view radiographs at the level of five experienced musculoskeletal radiologists and could significantly shorten their reading time.Key points• An artificial intelligence algorithm automatically detects scaphoid fractures on conventional multi-view radiographs at the same level of five experienced musculoskeletal radiologists. • There is preliminary evidence that automated scaphoid fracture detection can significantly shorten the reading time of musculoskeletal radiologists.
Project description:Geometric morphometrics is the statistical analysis of landmark-based shape variation and its covariation with other variables. Over the past two decades, the gold standard of landmark data acquisition has been manual detection by a single observer. This approach has proven accurate and reliable in small-scale investigations. However, big data initiatives are increasingly common in biology and morphometrics. This requires fast, automated, and standardized data collection. We combine techniques from image registration, geometric morphometrics, and deep learning to automate and optimize anatomical landmark detection. We test our method on high-resolution, micro-computed tomography images of adult mouse skulls. To ensure generalizability, we use a morphologically diverse sample and implement fundamentally different deformable registration algorithms. Compared to landmarks derived from conventional image registration workflows, our optimized landmark data show up to a 39.1% reduction in average coordinate error and a 36.7% reduction in total distribution error. In addition, our landmark optimization produces estimates of the sample mean shape and variance-covariance structure that are statistically indistinguishable from expert manual estimates. For biological imaging datasets and morphometric research questions, our approach can eliminate the time and subjectivity of manual landmark detection whilst retaining the biological integrity of these expert annotations.
Project description:PURPOSE:To assess the utility of deep learning in the detection of geographic atrophy (GA) from color fundus photographs and to explore potential utility in detecting central GA (CGA). DESIGN:A deep learning model was developed to detect the presence of GA in color fundus photographs, and 2 additional models were developed to detect CGA in different scenarios. PARTICIPANTS:A total of 59?812 color fundus photographs from longitudinal follow-up of 4582 participants in the Age-Related Eye Disease Study (AREDS) dataset. Gold standard labels were from human expert reading center graders using a standardized protocol. METHODS:A deep learning model was trained to use color fundus photographs to predict GA presence from a population of eyes with no AMD to advanced AMD. A second model was trained to predict CGA presence from the same population. A third model was trained to predict CGA presence from the subset of eyes with GA. For training and testing, 5-fold cross-validation was used. For comparison with human clinician performance, model performance was compared with that of 88 retinal specialists. MAIN OUTCOME MEASURES:Area under the curve (AUC), accuracy, sensitivity, specificity, and precision. RESULTS:The deep learning models (GA detection, CGA detection from all eyes, and centrality detection from GA eyes) had AUCs of 0.933-0.976, 0.939-0.976, and 0.827-0.888, respectively. The GA detection model had accuracy, sensitivity, specificity, and precision of 0.965 (95% confidence interval [CI], 0.959-0.971), 0.692 (0.560-0.825), 0.978 (0.970-0.985), and 0.584 (0.491-0.676), respectively, compared with 0.975 (0.971-0.980), 0.588 (0.468-0.707), 0.982 (0.978-0.985), and 0.368 (0.230-0.505) for the retinal specialists. The CGA detection model had values of 0.966 (0.957-0.975), 0.763 (0.641-0.885), 0.971 (0.960-0.982), and 0.394 (0.341-0.448). The centrality detection model had values of 0.762 (0.725-0.799), 0.782 (0.618-0.945), 0.729 (0.543-0.916), and 0.799 (0.710-0.888). CONCLUSIONS:A deep learning model demonstrated high accuracy for the automated detection of GA. The AUC was noninferior to that of human retinal specialists. Deep learning approaches may also be applied to the identification of CGA. The code and pretrained models are publicly available at https://github.com/ncbi-nlp/DeepSeeNet.
Project description:BackgroundIdentification of vertebral fractures (VFs) is critical for effective secondary fracture prevention owing to their association with the increasing risks of future fractures. Plain abdominal frontal radiographs (PARs) are a common investigation method performed for a variety of clinical indications and provide an ideal platform for the opportunistic identification of VF. This study uses a deep convolutional neural network (DCNN) to identify the feasibility for the screening, detection, and localization of VFs using PARs.MethodsA DCNN was pretrained using ImageNet and retrained with 1306 images from the PARs database obtained between August 2015 and December 2018. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were evaluated. The visualization algorithm gradient-weighted class activation mapping (Grad-CAM) was used for model interpretation.ResultsOnly 46.6% (204/438) of the VFs were diagnosed in the original PARs reports. The algorithm achieved 73.59% accuracy, 73.81% sensitivity, 73.02% specificity, and an AUC of 0.72 in the VF identification.ConclusionComputer driven solutions integrated with the DCNN have the potential to identify VFs with good accuracy when used opportunistically on PARs taken for a variety of clinical purposes. The proposed model can help clinicians become more efficient and economical in the current clinical pathway of fragile fracture treatment.
Project description:BACKGROUND:Machine learning has been used extensively in clinical text classification tasks. Deep learning approaches using word embeddings have been recently gaining momentum in biomedical applications. In an effort to automate the identification of altered mental status (AMS) in emergency department provider notes for the purpose of decision support, we compare the performance of classic bag-of-words-based machine learning classifiers and novel deep learning approaches. METHODS:We used a case-control study design to extract an adequate number of clinical notes with AMS and non-AMS based on ICD codes. The notes were parsed to extract the history of present illness, which was used as the clinical text for the classifiers. The notes were manually labeled by clinicians. As a baseline for comparison, we tested several traditional bag-of-words based classifiers. We then tested several deep learning models using a convolutional neural network architecture with three different types of word embeddings, a pre-trained word2vec model and two models without pre-training but with different word embedding dimensions. RESULTS:We evaluated the models on 1130 labeled notes from the emergency department. The deep learning models had the best overall performance with an area under the ROC curve of 98.5% and an accuracy of 94.5%. Pre-training word embeddings on the unlabeled corpus reduced training iterations and had performance that was statistically no different than the other deep learning models. CONCLUSION:This supervised deep learning approach performs exceedingly well for the detection of AMS symptoms in clinical text in our environment. Further work is needed for the generalizability of these findings, including evaluation of these models in other types of clinical notes and other environments. The results seem promising for the ultimate use of these types of classifiers in combination with other information derived from the electronic health records as input for clinical decision support.
Project description:The vertebral compression is a significant factor for determining the prognosis of osteoporotic vertebral compression fractures and is generally measured manually by specialists. The consequent misdiagnosis or delayed diagnosis can be fatal for patients. In this study, we trained and evaluated the performance of a vertebral body segmentation model and a vertebral compression measurement model based on convolutional neural networks. For vertebral body segmentation, we used a recurrent residual U-Net model, with an average sensitivity of 0.934 (± 0.086), an average specificity of 0.997 (± 0.002), an average accuracy of 0.987 (± 0.005), and an average dice similarity coefficient of 0.923 (± 0.073). We then generated 1134 data points on the images of three vertebral bodies by labeling each segment of the segmented vertebral body. These were used in the vertebral compression measurement model based on linear regression and multi-scale residual dilated blocks. The model yielded an average mean absolute error of 2.637 (± 1.872) (%), an average mean square error of 13.985 (± 24.107) (%), and an average root mean square error of 3.739 (± 2.187) (%) in fractured vertebral body data. The proposed algorithm has significant potential for aiding the diagnosis of vertebral compression fractures.
Project description:In this study, we developed a model to predict culture test results for pulmonary tuberculosis (PTB) with a customized multimodal approach and evaluated its performance in different clinical settings. Moreover, we investigated potential performance improvements by combining this approach with deep learning-based automated detection algorithms (DLADs). This retrospective observational study enrolled patients over 18 years of age who consecutively visited the level 1 emergency department and underwent chest radiograph and sputum testing. The primary endpoint was positive sputum culture for PTB. We compared the performance of the diagnostic models by replacing radiologists' interpretations of chest radiographs with screening scores calculated through DLAD. The optimal diagnostic model had an area under the receiver operating characteristic curve of 0.924 (95% CI 0.871-0.976) and an area under precision recall curve of 0.403 (95% CI 0.195-0.580) while maintaining a specificity of 81.4% when sensitivity was fixed at 90%. Multicomponent models showed improved performance for detecting PTB when chest radiography interpretation was replaced by DLAD. Multicomponent diagnostic models with DLAD customized for different clinical settings are more practical than traditional methods for detecting patients with PTB. This novel diagnostic approach may help prevent the spread of PTB and optimize healthcare resource utilization in resource-limited clinical settings.
Project description:PurposeTo design and validate a fully automated computer system for the detection and anatomic localization of traumatic thoracic and lumbar vertebral body fractures at computed tomography (CT).Materials and methodsThis retrospective study was HIPAA compliant. Institutional review board approval was obtained, and informed consent was waived. CT examinations in 104 patients (mean age, 34.4 years; range, 14-88 years; 32 women, 72 men), consisting of 94 examinations with positive findings for fractures (59 with vertebral body fractures) and 10 control examinations (without vertebral fractures), were performed. There were 141 thoracic and lumbar vertebral body fractures in the case set. The locations of fractures were marked and classified by a radiologist according to Denis column involvement. The CT data set was divided into training and testing subsets (37 and 67 subsets, respectively) for analysis by means of prototype software for fully automated spinal segmentation and fracture detection. Free-response receiver operating characteristic analysis was performed.ResultsTraining set sensitivity for detection and localization of fractures within each vertebra was 0.82 (28 of 34 findings; 95% confidence interval [CI]: 0.68, 0.90), with a false-positive rate of 2.5 findings per patient. The sensitivity for fracture localization to the correct vertebra was 0.88 (23 of 26 findings; 95% CI: 0.72, 0.96), with a false-positive rate of 1.3. Testing set sensitivity for the detection and localization of fractures within each vertebra was 0.81 (87 of 107 findings; 95% CI: 0.75, 0.87), with a false-positive rate of 2.7. The sensitivity for fracture localization to the correct vertebra was 0.92 (55 of 60 findings; 95% CI: 0.79, 0.94), with a false-positive rate of 1.6. The most common cause of false-positive findings was nutrient foramina (106 of 272 findings [39%]).ConclusionThe fully automated computer system detects and anatomically localizes vertebral body fractures in the thoracic and lumbar spine on CT images with a high sensitivity and a low false-positive rate.
Project description:BACKGROUND. Sarcopenia is associated with adverse clinical outcomes. CT-based skeletal muscle measurements for sarcopenia assessment are most commonly performed at the L3 vertebral level. OBJECTIVE. The purpose of this article is to compare the utility of fully automated deep learning CT-based muscle quantitation at the L1 versus L3 level for predicting future hip fractures and death. METHODS. This retrospective study included 9223 asymptomatic adults (mean age, 57 ± 8 [SD] years; 4071 men, 5152 women) who underwent unenhanced low-dose abdominal CT. A previously validated fully automated deep learning tool was used to assess muscle for myosteatosis (by mean attenuation) and myopenia (by cross-sectional area) at the L1 and L3 levels. Performance for predicting hip fractures and death was compared between L1 and L3 measures. Performance for predicting hip fractures and death was also evaluated using the established clinical risk scores from the fracture risk assessment tool (FRAX) and Framingham risk score (FRS), respectively. RESULTS. Median clinical follow-up interval after CT was 8.8 years (interquartile range, 5.1-11.6 years), yielding hip fractures and death in 219 (2.4%) and 549 (6.0%) patients, respectively. L1-level and L3-level muscle attenuation measurements were not different in 2-, 5-, or 10-year AUC for hip fracture (p = .18-.98) or death (p = .19-.95). For hip fracture, 5-year AUCs for L1-level muscle attenuation, L3-level muscle attenuation, and FRAX score were 0.717, 0.709, and 0.708, respectively. For death, 5-year AUCs for L1-level muscle attenuation, L3-level muscle attenuation, and FRS were 0.737, 0.721, and 0.688, respectively. Lowest quartile hazard ratios (HRs) for hip fracture were 2.20 (L1 attenuation), 2.45 (L3 attenuation), and 2.53 (FRAX score), and for death were 3.25 (L1 attenuation), 3.58 (L3 attenuation), and 2.82 (FRS). CT-based muscle cross-sectional area measurements at L1 and L3 were less predictive for hip fracture and death (5-year AUC ≤ 0.571; HR ≤ 1.56). CONCLUSION. Automated CT-based measurements of muscle attenuation for myosteatosis at the L1 level compare favorably with previously established L3-level measurements and clinical risk scores for predicting hip fracture and death. Assessment for myopenia was less predictive of outcomes at both levels. CLINICAL IMPACT. Alternative use of the L1 rather than L3 level for CT-based muscle measurements allows sarcopenia assessment using both chest and abdominal CT scans, greatly increasing the potential yield of opportunistic CT screening.