Dataset Information

The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images.

ABSTRACT: Radiomics uses high-dimensional sets of imaging features to predict biological characteristics of tumors and clinical outcomes. The choice of the algorithm used to analyze radiomic features and perform predictions has a high impact on the results, thus the identification of adequate machine learning methods for radiomic applications is crucial. In this study we aim to identify suitable approaches of analysis for radiomic-based binary predictions, according to sample size, outcome balancing and the features-outcome association strength. Simulated data were obtained reproducing the correlation structure among 168 radiomic features extracted from Computed Tomography images of 270 Non-Small-Cell Lung Cancer (NSCLC) patients and the associated to lymph node status. Performances of six classifiers combined with six feature selection (FS) methods were assessed on the simulated data using AUC (Area Under the Receiver Operating Characteristics Curves), sensitivity, and specificity. For all the FS methods and regardless of the association strength, the tree-based classifiers Random Forest and Extreme Gradient Boosting obtained good performances (AUC ≥ 0.73), showing the best trade-off between sensitivity and specificity. On small samples, performances were generally lower than in large-medium samples and with larger variations. FS methods generally did not improve performances. Thus, in radiomic studies, we suggest evaluating the choice of FS and classifiers, considering specific sample size, balancing, and association strength.

SUBMITTER: Corso F

PROVIDER: S-EPMC8234634 | biostudies-literature | 2021 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images.

Corso Federica F Tini Giulia G Lo Presti Giuliana G Garau Noemi N De Angelis Simone Pietro SP Bellerba Federica F Rinaldi Lisa L Botta Francesca F Rizzo Stefania S Origgi Daniela D Paganelli Chiara C Cremonesi Marta M Rampinelli Cristiano C Bellomi Massimo M Mazzarella Luca L Pelicci Pier Giuseppe PG Gandini Sara S Raimondi Sara S

Cancers 20210621 12

Radiomics uses high-dimensional sets of imaging features to predict biological characteristics of tumors and clinical outcomes. The choice of the algorithm used to analyze radiomic features and perform predictions has a high impact on the results, thus the identification of adequate machine learning methods for radiomic applications is crucial. In this study we aim to identify suitable approaches of analysis for radiomic-based binary predictions, according to sample size, outcome balancing and t ...[more]

PMID: 34205631

Similar Datasets

Project description:PurposeAccurate lesion segmentation is a prerequisite for radiomic feature extraction. It helps to reduce the features variability so as to improve the reporting quality of radiomics study. In this research, we aimed to conduct a radiomic feature reproducibility test of inter-/intra-observer delineation variability in hepatocellular carcinoma using 3D-CT images, 4D-CT images and multiple-parameter MR images.Materials and methodsFor this retrospective study, 19 HCC patients undergoing 3D-CT, 4D-CT and multiple-parameter MR scans were included in this study. The gross tumor volume (GTV) was independently delineated twice by two observers based on contrast-enhanced computed tomography (CECT), maximum intensity projection (MIP), LAVA-Flex, T2W FRFSE and DWI-EPI images. We also delineated the peritumoral region, which was defined as 0 to 5 mm radius surrounding the GTV. 107 radiomic features were automatically extracted from CECT images using 3D-Slicer software. Quartile coefficient of dispersion (QCD) and intraclass correlation coefficient (ICC) were applied to assess the variability of each radiomic feature. QCD<10% and ICC≥0.75 were considered small variations and excellent reliability. Finally, the principal component analysis (PCA) was used to test the feasibility of dimensionality reduction.ResultsFor tumor tissues, the numbers of radiomic features with QCD<10% indicated no obvious inter-/intra-observer differences or discrepancies in 3D-CT, 4D-CT and multiple-parameter MR delineation. However, the number of radiomic features (mean 89) with ICC≥0.75 was the highest in the multiple-parameter MR group, followed by the 3DCT group (mean 77) and the MIP group (mean 73). The peritumor tissues also showed similar results. A total of 15 and 7 radiomic features presented excellent reproducibility and small variation in tumor and peritumoral tissues, respectively. Two robust features showed excellent reproducibility and small variation in tumor and peritumoral tissues. In addition, the values of the two features both represented statistically significant differences among tumor and peritumoral tissues (P<0.05). The PCA results indicated that the first seven principal components could preserve at least 90% of the variance of the original set of features.ConclusionDelineation on multiple-parameter MR images could help to improve the reproducibility of the HCC CT radiomic features and weaken the inter-/intra-observer influence.

Project description:Artificial intelligence and emerging data science techniques are being leveraged to interpret medical image scans. Traditional image analysis relies on visual interpretation by a trained radiologist, which is time-consuming and can, to some degree, be subjective. The development of reliable, automated diagnostic tools is a key goal of radiomics, a fast-growing research field which combines medical imaging with personalized medicine. Radiomic studies have demonstrated potential for accurate lung cancer diagnoses and prognostications. The practice of delineating the tumor region of interest, known as segmentation, is a key bottleneck in the development of generalized classification models. In this study, the incremental multiple resolution residual network (iMRRN), a publicly available and trained deep learning segmentation model, was applied to automatically segment CT images collected from 355 lung cancer patients included in the dataset "Lung-PET-CT-Dx", obtained from The Cancer Imaging Archive (TCIA), an open-access source for radiological images. We report a failure rate of 4.35% when using the iMRRN to segment tumor lesions within plain CT images in the lung cancer CT dataset. Seven classification algorithms were trained on the extracted radiomic features and tested for their ability to classify different lung cancer subtypes. Over-sampling was used to handle unbalanced data. Chi-square tests revealed the higher order texture features to be the most predictive when classifying lung cancers by subtype. The support vector machine showed the highest accuracy, 92.7% (0.97 AUC), when classifying three histological subtypes of lung cancer: adenocarcinoma, small cell carcinoma, and squamous cell carcinoma. The results demonstrate the potential of AI-based computer-aided diagnostic tools to automatically diagnose subtypes of lung cancer by coupling deep learning image segmentation with supervised classification. Our study demonstrated the integrated application of existing AI techniques in the non-invasive and effective diagnosis of lung cancer subtypes, and also shed light on several practical issues concerning the application of AI in biomedicine.

Dataset Information

The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images.

Publications

The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets