Dataset Information

External validation of radiomics-based predictive models in low-dose CT screening for early lung cancer diagnosis.

ABSTRACT:

Purpose

Low-dose CT screening allows early lung cancer detection, but is affected by frequent false positive results, inter/intra observer variation and uncertain diagnoses of lung nodules. Radiomics-based models have recently been introduced to overcome these issues, but limitations in demonstrating their generalizability on independent datasets are slowing their introduction to clinic. The aim of this study is to evaluate two radiomics-based models to classify malignant pulmonary nodules in low-dose CT screening, and to externally validate them on an independent cohort. The effect of a radiomics features harmonization technique is also investigated to evaluate its impact on the classification of lung nodules from a multicenter data.

Methods

Pulmonary nodules from two independent cohorts were considered in this study; the first cohort (110 subjects, 113 nodules) was used to train prediction models, and the second cohort (72 nodules) to externally validate them. Literature-based radiomics features were extracted and, after feature selection, used as predictive variables in models for malignancy identification. An in-house prediction model based on artificial neural network (ANN) was implemented and evaluated, along with an alternative model from the literature, based on a support vector machine (SVM) classifier coupled with a least absolute shrinkage and selection operator (LASSO). External validation was performed on the second cohort to evaluate models' generalization ability. Additionally, the impact of the Combat harmonization method was investigated to compensate for multicenter datasets variabilities. A new training of the models based on harmonized features was performed on the first cohort, then tested separately on the harmonized and non-harmonized features of the second cohort.

Results

Preliminary results showed a good accuracy of the investigated models in distinguishing benign from malignant pulmonary nodules with both sets of radiomics features (i.e., non-harmonized and harmonized). The performance of the models, quantified in terms of Area Under the Curve (AUC), was > 0.89 in the training set and > 0.82 in the external validation set for all the investigated scenarios, outperforming the clinical standard (AUC of 0.76). Slightly higher performance was observed for the SVM-LASSO model than the ANN in the external dataset, although they did not result significantly different. For both harmonized and non-harmonized features, no statistical difference was found between Receiver operating characteristic (ROC) curves related to training and test set for both models.

Conclusions

Although no significant improvements were observed when applying the Combat harmonization method, both in-house and literature-based models were able to classify lung nodules with good generalization to an independent dataset, thus showing their potential as tools for clinical decision-making in lung cancer screening.

SUBMITTER: Garau N

PROVIDER: S-EPMC7708421 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

External validation of radiomics-based predictive models in low-dose CT screening for early lung cancer diagnosis.

Garau Noemi N Paganelli Chiara C Summers Paul P Choi Wookjin W Alam Sadegh S Lu Wei W Fanciullo Cristiana C Bellomi Massimo M Baroni Guido G Rampinelli Cristiano C

Medical physics 20200623 9

<h4>Purpose</h4>Low-dose CT screening allows early lung cancer detection, but is affected by frequent false positive results, inter/intra observer variation and uncertain diagnoses of lung nodules. Radiomics-based models have recently been introduced to overcome these issues, but limitations in demonstrating their generalizability on independent datasets are slowing their introduction to clinic. The aim of this study is to evaluate two radiomics-based models to classify malignant pulmonary nodul ...[more]

PMID: 32488865

Similar Datasets

Project description:ObjectiveDevelop two fully automatic osteoporosis screening systems using deep learning (DL) and radiomics (Rad) techniques based on low-dose chest CT (LDCT) images and evaluate their diagnostic effectiveness.MethodsIn total, 434 patients who underwent LDCT and bone mineral density (BMD) examination were retrospectively enrolled and divided into the development set (n = 333) and temporal validation set (n = 101). An automatic thoracic vertebra cancellous bone (TVCB) segmentation model was developed. The Dice similarity coefficient (DSC) was used to evaluate the segmentation performance. Furthermore, the three-class Rad and DL models were developed to distinguish osteoporosis, osteopenia, and normal bone mass. The diagnostic performance of these models was evaluated using the receiver operating characteristic (ROC) curve and decision curve analysis (DCA).ResultsThe automatic segmentation model achieved excellent segmentation performance, with a mean DSC of 0.96 ± 0.02 in the temporal validation set. The Rad model was used to identify osteoporosis, osteopenia, and normal BMD in the temporal validation set, with respective area under the receiver operating characteristic curve (AUC) values of 0.943, 0.801, and 0.932. The DL model achieved higher AUC values of 0.983, 0.906, and 0.969 for the same categories in the same validation set. The Delong test affirmed that both models performed similarly in BMD assessment. However, the accuracy of the DL model is 81.2%, which is better than the 73.3% accuracy of the Rad model in the temporal validation set. Additionally, DCA indicated that the DL model provided a greater net benefit compared to the Rad model across the majority of the reasonable threshold probabilities Conclusions: The automated segmentation framework we developed can accurately segment cancellous bone on low-dose chest CT images. These predictive models, which are based on deep learning and radiomics, provided comparable diagnostic performance in automatic BMD assessment. Nevertheless, it is important to highlight that the DL model demonstrates higher accuracy and precision than the Rad model.

Project description:Radiomics analyses commonly apply imaging features of different complexity for the prediction of the endpoint of interest. However, the prognostic value of each feature class is generally unclear. Furthermore, many radiomics models lack independent external validation that is decisive for their clinical application. Therefore, in this manuscript we present two complementary studies. In our modelling study, we developed and validated different radiomics signatures for outcome prediction after neoadjuvant chemoradiotherapy (nCRT) in patients with locally advanced rectal cancer (LARC) based on computed tomography (CT) and T2-weighted (T2w) magnetic resonance (MR) imaging datasets of 4 independent institutions (training: 122, validation 68 patients). We compared different feature classes extracted from the gross tumour volume for the prognosis of tumour response and freedom from distant metastases (FFDM): morphological and first order (MFO) features, second order texture (SOT) features, and Laplacian of Gaussian (LoG) transformed intensity features. Analyses were performed for CT and MRI separately and combined. Model performance was assessed by the area under the curve (AUC) and the concordance index (CI) for tumour response and FFDM, respectively. Overall, intensity features of LoG transformed CT and MR imaging combined with clinical T stage (cT) showed the best performance for tumour response prediction, while SOT features showed good performance for FFDM in independent validation (AUC = 0.70, CI = 0.69). In our external validation study, we aimed to validate previously published radiomics signatures on our multicentre cohort. We identified relevant publications on comparable patient datasets through a literature search and applied the reported radiomics models to our dataset. Only one of the identified studies could be validated, indicating an overall lack of reproducibility and the need of further standardization of radiomics before clinical application.

Project description:BackgroundAs a means to extract biomarkers from medical imaging, radiomics has attracted increased attention from researchers. However, reproducibility and performance of radiomics in low-dose CT scans are still poor, mostly due to noise. Deep learning generative models can be used to denoise these images and in turn improve radiomics' reproducibility and performance. However, most generative models are trained on paired data, which can be difficult or impossible to collect.PurposeIn this article, we investigate the possibility of denoising low-dose CTs using cycle generative adversarial networks (GANs) to improve radiomics reproducibility and performance based on unpaired datasets.Methods and materialsTwo cycle GANs were trained: (1) from paired data, by simulating low-dose CTs (i.e., introducing noise) from high-dose CTs and (2) from unpaired real low dose CTs. To accelerate convergence, during GAN training, a slice-paired training strategy was introduced. The trained GANs were applied to three scenarios: (1) improving radiomics reproducibility in simulated low-dose CT images and (2) same-day repeat low dose CTs (RIDER dataset), and (3) improving radiomics performance in survival prediction. Cycle GAN results were compared with a conditional GAN (CGAN) and an encoder-decoder network (EDN) trained on simulated paired data.ResultsThe cycle GAN trained on simulated data improved concordance correlation coefficients (CCC) of radiomic features from 0.87 (95%CI, [0.833,0.901]) to 0.93 (95%CI, [0.916,0.949]) on simulated noise CT and from 0.89 (95%CI, [0.881,0.914]) to 0.92 (95%CI, [0.908,0.937]) on the RIDER dataset, as well improving the area under the receiver operating characteristic curve (AUC) of survival prediction from 0.52 (95%CI, [0.511,0.538]) to 0.59 (95%CI, [0.578,0.602]). The cycle GAN trained on real data increased the CCCs of features in RIDER to 0.95 (95%CI, [0.933,0.961]) and the AUC of survival prediction to 0.58 (95%CI, [0.576,0.596]).ConclusionThe results show that cycle GANs trained on both simulated and real data can improve radiomics' reproducibility and performance in low-dose CT and achieve similar results compared to CGANs and EDNs.

Project description:BackgroundRandomized controlled trials have evaluated the efficacy of low-dose CT (LDCT) lung cancer screening on lung cancer (LC) outcomes.ObjectiveMeta-analyze LDCT lung cancer screening trials.MethodsWe identified studies by searching PubMed, Google Scholar, the Cochrane Registry, ClinicalTrials.gov , and reference lists from retrieved publications. We abstracted data on study design features, stage I LC diagnoses, LC and overall mortality, false positive results, harm from invasive diagnostic procedures, overdiagnosis, and significant incidental findings. We assessed study quality using the Cochrane risk-of-bias tool. We used random-effects models to calculate relative risks and assessed effect modulators with subgroup analyses and meta-regression.ResultsWe identified 9 studies that enrolled 96,559 subjects. The risk of bias across studies was judged to be low. Overall, LDCT screening significantly increased the detection of stage I LC, RR = 2.93 (95% CI, 2.16-3.98), I2 = 19%, and reduced LC mortality, RR = 0.84 (95% CI, 0.75-0.93), I2 = 0%. The number needed to screen to prevent an LC death was 265. Women had a lower risk of LC death (RR = 0.69, 95% CI, 0.40-1.21) than men (RR = 0.86, 95% CI, 0.66-1.13), p value for interaction = 0.11. LDCT screening did not reduce overall mortality, RR = 0.96 (95% CI, 0.91-1.01), I2 = 0%. The pooled false positive rate was 8% (95% CI, 4-18); subjects with false positive results had < 1 in 1000 risk of major complications following invasive diagnostic procedures. The most valid estimates for overdiagnosis and significant incidental findings were 8.9% and 7.5%, respectively.DiscussionLDCT screening significantly reduced LC mortality, though not overall mortality, with women appearing to benefit more than men. The estimated risks for false positive results, screening complications, overdiagnosis, and incidental findings were low. Long-term survival data were available only for North American and European studies limiting generalizability.

Project description:PurposeTo develop a radiomics prediction model to improve pulmonary nodule (PN) classification in low-dose CT. To compare the model with the American College of Radiology (ACR) Lung CT Screening Reporting and Data System (Lung-RADS) for early detection of lung cancer.MethodsWe examined a set of 72 PNs (31 benign and 41 malignant) from the Lung Image Database Consortium image collection (LIDC-IDRI). One hundred three CT radiomic features were extracted from each PN. Before the model building process, distinctive features were identified using a hierarchical clustering method. We then constructed a prediction model by using a support vector machine (SVM) classifier coupled with a least absolute shrinkage and selection operator (LASSO). A tenfold cross-validation (CV) was repeated ten times (10 × 10-fold CV) to evaluate the accuracy of the SVM-LASSO model. Finally, the best model from the 10 × 10-fold CV was further evaluated using 20 × 5- and 50 × 2-fold CVs.ResultsThe best SVM-LASSO model consisted of only two features: the bounding box anterior-posterior dimension (BB_AP) and the standard deviation of inverse difference moment (SD_IDM). The BB_AP measured the extension of a PN in the anterior-posterior direction and was highly correlated (r = 0.94) with the PN size. The SD_IDM was a texture feature that measured the directional variation of the local homogeneity feature IDM. Univariate analysis showed that both features were statistically significant and discriminative (P = 0.00013 and 0.000038, respectively). PNs with larger BB_AP or smaller SD_IDM were more likely malignant. The 10 × 10-fold CV of the best SVM model using the two features achieved an accuracy of 84.6% and 0.89 AUC. By comparison, Lung-RADS achieved an accuracy of 72.2% and 0.77 AUC using four features (size, type, calcification, and spiculation). The prediction improvement of SVM-LASSO comparing to Lung-RADS was statistically significant (McNemar's test P = 0.026). Lung-RADS misclassified 19 cases because it was mainly based on PN size, whereas the SVM-LASSO model correctly classified 10 of these cases by combining a size (BB_AP) feature and a texture (SD_IDM) feature. The performance of the SVM-LASSO model was stable when leaving more patients out with five- and twofold CVs (accuracy 84.1% and 81.6%, respectively).ConclusionWe developed an SVM-LASSO model to predict malignancy of PNs with two CT radiomic features. We demonstrated that the model achieved an accuracy of 84.6%, which was 12.4% higher than Lung-RADS.

Dataset Information

External validation of radiomics-based predictive models in low-dose CT screening for early lung cancer diagnosis.

Purpose

Methods

Results

Conclusions

Publications

External validation of radiomics-based predictive models in low-dose CT screening for early lung cancer diagnosis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets