Dataset Information

Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data.

ABSTRACT: Radiomics utilizes a large number of image-derived features for quantifying tumor characteristics that can in turn be correlated with response and prognosis. Unfortunately, extraction and analysis of such image-based features is subject to measurement variability and bias. The challenge for radiomics is particularly acute in Positron Emission Tomography (PET) where limited resolution, a high noise component related to the limited stochastic nature of the raw data, and the wide variety of reconstruction options confound quantitative feature metrics. Extracted feature quality is also affected by tumor segmentation methods used to define regions over which to calculate features, making it challenging to produce consistent radiomics analysis results across multiple institutions that use different segmentation algorithms in their PET image analysis. Understanding each element contributing to these inconsistencies in quantitative image feature and metric generation is paramount for ultimate utilization of these methods in multi-institutional trials and clinical oncology decision making.To assess segmentation quality and consistency at the multi-institutional level, we conducted a study of seven institutional members of the National Cancer Institute Quantitative Imaging Network. For the study, members were asked to segment a common set of phantom PET scans acquired over a range of imaging conditions as well as a second set of head and neck cancer (HNC) PET scans. Segmentations were generated at each institution using their preferred approach. In addition, participants were asked to repeat segmentations with a time interval between initial and repeat segmentation. This procedure resulted in overall 806 phantom insert and 641 lesion segmentations. Subsequently, the volume was computed from the segmentations and compared to the corresponding reference volume by means of statistical analysis.On the two test sets (phantom and HNC PET scans), the performance of the seven segmentation approaches was as follows. On the phantom test set, the mean relative volume errors ranged from 29.9 to 87.8% of the ground truth reference volumes, and the repeat difference for each institution ranged between -36.4 to 39.9%. On the HNC test set, the mean relative volume error ranged between -50.5 to 701.5%, and the repeat difference for each institution ranged between -37.7 to 31.5%. In addition, performance measures per phantom insert/lesion size categories are given in the paper. On phantom data, regression analysis resulted in coefficient of variation (CV) components of 42.5% for scanners, 26.8% for institutional approaches, 21.1% for repeated segmentations, 14.3% for relative contrasts, 5.3% for count statistics (acquisition times), and 0.0% for repeated scans. Analysis showed that the CV components for approaches and repeated segmentations were significantly larger on the HNC test set with increases by 112.7% and 102.4%, respectively.Analysis results underline the importance of PET scanner reconstruction harmonization and imaging protocol standardization for quantification of lesion volumes. In addition, to enable a distributed multi-site analysis of FDG PET images, harmonization of analysis approaches and operator training in combination with highly automated segmentation methods seems to be advisable. Future work will focus on quantifying the impact of segmentation variation on radiomics system performance.

SUBMITTER: Beichel RR

PROVIDER: S-EPMC5834232 | biostudies-literature | 2017 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data.

Beichel Reinhard R RR Smith Brian J BJ Bauer Christian C Ulrich Ethan J EJ Ahmadvand Payam P Budzevich Mikalai M MM Gillies Robert J RJ Goldgof Dmitry D Grkovski Milan M Hamarneh Ghassan G Huang Qiao Q Kinahan Paul E PE Laymon Charles M CM Mountz James M JM Muzi John P JP Muzi Mark M Nehmeh Sadek S Oborski Matthew J MJ Tan Yongqiang Y Zhao Binsheng B Sunderland John J JJ Buatti John M JM

Medical physics 20170201 2

<h4>Purpose</h4>Radiomics utilizes a large number of image-derived features for quantifying tumor characteristics that can in turn be correlated with response and prognosis. Unfortunately, extraction and analysis of such image-based features is subject to measurement variability and bias. The challenge for radiomics is particularly acute in Positron Emission Tomography (PET) where limited resolution, a high noise component related to the limited stochastic nature of the raw data, and the wide va ...[more]

PMID: 28205306

Similar Datasets

Project description:BackgroundWe assessed and compared image quality obtained with clinical 18F-FDG whole-body oncologic PET protocols used in three different, state-of-the-art digital PET/CT and two conventional PMT-based PET/CT devices. Our goal was to evaluate an improved trade-off between administered activity (patient dose exposure/signal-to-noise ratio) and acquisition time (patient comfort) while preserving diagnostic information achievable with the recently introduced digital detector technology compared to previous analogue PET technology.MethodsWe performed list-mode (LM) PET acquisitions using a NEMA/IEC NU2 phantom, with activity concentrations of 5?kBq/mL and 25?kBq/mL for the background (9.5?L) and sphere inserts, respectively. For each device, reconstructions were obtained varying the image statistics (10, 30, 60, 90, 120, 180, and 300?s from LM data) and the number of iterations (range 1 to 10) in addition to the employed local clinical protocol setup. We measured for each reconstructed dataset: the quantitative cross-calibration, the image noise on the uniform background assessed by the coefficient of variation (COV), and the recovery coefficients (RCs) evaluated in the hot spheres. Additionally, we compared the characteristic time-activity-product (TAP) that is the product of scan time per bed position × mass-activity administered (in min·MBq/kg) across datasets.ResultsGood system cross-calibration was obtained for all tested datasets with <?6% deviation from the expected value was observed. For all clinical protocol settings, image noise was compatible with clinical interpretation (COV?<?15%). Digital PET showed an improved background signal-to-noise ratio as compared to conventional PMT-based PET. RCs were comparable between digital and PMT-based PET datasets. Compared to PMT-based PET, digital systems provided comparable image quality with lower TAP (from ~?40% less and up to 70% less).ConclusionsThis study compared the achievable clinical image quality in three state-of-the-art digital PET/CT devices (from different vendors) as well as in two conventional PMT-based PET. Reported results show that a comparable image quality is achievable with a TAP reduction of ~?40% in digital PET. This could lead to a significant reduction of the administered mass-activity and/or scan time with direct benefits in terms of dose exposure and patient comfort.

Project description:BACKGROUND:Today, the standardized uptake value (SUV) is essentially the only means for quantitative evaluation of static [18F-]fluorodeoxyglucose (FDG) positron emission tomography (PET) investigations. However, the SUV approach has several well-known shortcomings which adversely affect the reliability of the SUV as a surrogate of the metabolic rate of glucose consumption. The standard uptake ratio (SUR), i.e., the uptake time-corrected ratio of tumor SUV to image-derived arterial blood SUV, has been shown in the first clinical studies to overcome most of these shortcomings, to decrease test-retest variability, and to increase the prognostic value in comparison to SUV. However, it is unclear, to what extent the SUR approach is vulnerable to observer variability of the additionally required blood SUV (BSUV) determination. The goal of the present work was the investigation of the interobserver variability of image-derived BSUV. METHODS:FDG PET/CT scans from 83 patients (72 male, 11 female) with non-small cell lung cancer (N = 46) or head and neck cancer (N = 37) were included. BSUV was determined by 8 individuals, each applying a dedicated delineation tool for the BSUV determination in the aorta. Two of the observers applied two further tools. Altogether, five different delineation tools were used. With each used tool, delineation was performed for the whole patient group, resulting in 12 distinct observations per patient. Intersubject variability of BSUV determination was assessed using the fractional deviations for the individual patients from the patient group average and was quantified as standard deviation (SD is), 95% confidence interval, and range. Interobserver variability of BSUV determination was assessed using the fractional deviations of the individual observers from the observer-average for the considered patient and quantified as standard deviations (SD p, SD d) or root mean square (RMS), 95% confidence interval, and range in each patient, each observer, and the pooled data respectively. RESULTS:Interobserver variability in the pooled data amounts to RMS = 2.8% and is much smaller than the intersubject variability of BSUV (SD is= 16%). Averaged over the whole patient group, deviations of individual observers from the observer average are very small and fall in the range [ -?0.96, 1.05]%. However, interobserver variability partly differs distinctly for different patients, covering a range of [0.7, 7.4]% in the investigated patient group. CONCLUSION:The present investigation demonstrates that the image-based manual determination of BSUV in the aorta is sufficiently reproducible across different observers and delineation tools which is a prerequisite for accurate SUR determination. This finding is in line with the already demonstrated superior prognostic value of SUR in comparison to SUV in the first clinical studies.

Project description:Background New-generation silicon-photomultiplier (SiPM)-based PET/CT systems exhibit an improved lesion detectability and image quality due to a higher detector sensitivity. Consequently, the acquisition time can be reduced while maintaining diagnostic quality. The aim of this study was to determine the lowest 18F-FDG PET acquisition time without loss of diagnostic information and to optimise image reconstruction parameters (image reconstruction algorithm, number of iterations, voxel size, Gaussian filter) by phantom imaging. Moreover, patient data are evaluated to confirm the phantom results. Methods Three phantoms were used: a soft-tissue tumour phantom, a bone-lung tumour phantom, and a resolution phantom. Phantom conditions (lesion sizes from 6.5 mm to 28.8 mm in diameter, lesion activity concentration of 15 kBq/mL, and signal-to-background ratio of 5:1) were derived from patient data. PET data were acquired on an SiPM-based Biograph Vision PET/CT system for 10 min in list-mode format and resampled into time frames from 30 to 300 s in 30-s increments to simulate different acquisition times. Different image reconstructions with varying iterations, voxel sizes, and Gaussian filters were probed. Contrast-to-noise-ratio (CNR), maximum, and peak signal were evaluated using the 10-min acquisition time image as reference. A threshold CNR value ≥ 5 and a maximum (peak) deviation of ± 20% were considered acceptable. 20 patient data sets were evaluated regarding lesion quantification as well as agreement and correlation between reduced and full acquisition time standard uptake values (assessed by Pearson correlation coefficient, intraclass correlation coefficient, Bland–Altman analyses, and Krippendorff’s alpha). Results An acquisition time of 60 s per bed position yielded acceptable detectability and quantification results for clinically relevant phantom lesions ≥ 9.7 mm in diameter using OSEM-TOF or OSEM-TOF+PSF image reconstruction, a 4-mm Gaussian filter, and a 1.65 × 1.65 x 2.00-mm3 or 3.30 × 3.30 x 3.00-mm3 voxel size. Correlation and agreement of patient lesion quantification between full and reduced acquisition times were excellent. Conclusion A threefold reduction in acquisition time is possible. Patients might benefit from more comfortable examinations or reduced radiation exposure, if instead of the acquisition time the applied activity is reduced. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09993-4.

Project description:BackgroundDigital anthropomorphic phantoms, such as the 4D extended cardiac-torso (XCAT) phantom, are actively used to develop, optimize, and evaluate a variety of imaging applications, allowing for realistic patient modeling and knowledge of ground truth. The XCAT phantom defines the activity and attenuation for a simulated patient, which includes a complete set of organs, muscle, bone, and soft tissue, while also accounting for cardiac and respiratory motion. However, the XCAT phantom does not currently include the lymphatic system, critical for evaluating medical imaging tasks such as sentinel node detection, node density measurement, and radiation dosimetry.PurposeIn this study, we aimed to develop a scalable lymphatic system in the XCAT phantom, to facilitate improved research of the lymphatic system in medical imaging. Using this scalable lymphatic system, we modeled the lymph node conglomerate pathology that is characteristically observed in primary mediastinal B-cell lymphoma (PMBCL). As an extended application, we evaluated positron emission tomography (PET) image quantification of metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of these simulated lymphomas, though the phantoms may be applied to other imaging modalities and study design paradigms (e.g., image quality, detection).MethodsA template model for the lymphatic system was developed based on anatomical data from the Visible Human Project of the National Library of Medicine. The segmented nodes and vessels were fit with non-uniform rational basis spline surfaces, and multichannel large deformation diffeomorphic metric mapping was used to propagate the template to different XCAT anatomies. To model conglomerates observed in PMBCL, lymph nodes were enlarged, converged within the mediastinum, and tracer concentration was increased. We used the phantoms as inputs to a PET simulation tool, which generated images using ordered subsets expectation maximization reconstruction with 2-8 mm Gaussian filters. Fixed thresholding (FT) and gradient segmentation were used to determine MTV and TLG. Percent bias (%Bias) and coefficient of variation (COV) were computed as measures of accuracy and precision, respectively, for each MTV and TLG measurement.ResultsUsing the methodology described above, we introduced a scalable lymphatic system in the XCAT phantom, which allows for the radioactivity and attenuation ground truth to be generated in 116 ± 2.5 s using a 2.3 GHz processor. Within the Rhinoceros interface, lymph node anatomy and function were modified to create a cohort of 10 phantoms with lymph node conglomerates. Using the lymphoma phantoms to evaluate PET quantification of MTV, mean %Bias values were -9.3%, -41.3%, and 20.9%, while COV values were 4.08%, 7.6%, and 3.4% using 25% FT, 40% FT, and gradient segmentations, respectively. Comparatively for TLG, mean %Bias values were -27.4%, -45.8%, and -16.0%, while COV values were 1.9%, 5.7%, and 1.4%, for the 25% FT, 40% FT, and gradient segmentations, respectively.ConclusionsIn this work, we upgraded the XCAT phantom to include a lymphatic system, comprised of a network of 276 scalable lymph nodes and corresponding vessels. As an application, we created a cohort of phantoms with lymph node conglomerates to evaluate lymphoma quantification in PET imaging, which highlights an important application of this work.

Project description:BACKGROUND:[18F]fluoro-2-deoxy-D-glucose ([18F]FDG) positron emission tomography (PET) is a valuable tool for monitoring response to therapy in oncology. In longitudinal studies, however, patients are not scanned in exactly the same position. Rigid and non-rigid image registration can be applied in order to reuse baseline volumes of interest (VOI) on consecutive studies of the same patient. The purpose of this study was to investigate the impact of various image registration strategies on standardized uptake value (SUV) and metabolic volume test-retest variability (TRT). METHODS:Test-retest whole-body [18F]FDG PET/CT scans were collected retrospectively for 11 subjects with advanced gastrointestinal malignancies (colorectal carcinoma). Rigid and non-rigid image registration techniques with various degrees of locality were applied to PET, CT, and non-attenuation corrected PET (NAC) data. VOI were drawn independently on both test and retest scans. VOI drawn on test scans were projected onto retest scans and the overlap between projected VOI and manually drawn retest VOI was quantified using the Dice similarity coefficient (DSC). In addition, absolute (unsigned) differences in TRT of SUVmax, SUVmean, metabolic volume and total lesion glycolysis (TLG) were calculated in on one hand the test VOI and on the other hand the retest VOI and projected VOI. Reference values were obtained by delineating VOIs on both scans separately. RESULTS:Non-rigid PET registration showed the best performance (median DSC: 0.82, other methods: 0.71-0.81). Compared with the reference, none of the registration types showed significant absolute differences in TRT of SUVmax, SUVmean and TLG (p > 0.05). Only for absolute TRT of metabolic volume, significant lower values (p < 0.05) were observed for all registration strategies when compared to delineating VOIs separately, except for non-rigid PET registrations (p = 0.1). Non-rigid PET registration provided good volume TRT (7.7%) that was smaller than the reference (16%). CONCLUSION:In particular, non-rigid PET image registration showed good performance similar to delineating VOI on both scans separately, and with smaller TRT in metabolic volume estimates.

Dataset Information

Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data.

Publications

Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets