Dataset Information

Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis.

ABSTRACT: Although variables are often measured with error, the impact of measurement error on machine-learning predictions is seldom quantified. The purpose of this study was to assess the impact of measurement error on the performance of random-forest models and variable importance. First, we assessed the impact of misclassification (i.e., measurement error of categorical variables) of predictors on random-forest model performance (e.g., accuracy, sensitivity) and variable importance (mean decrease in accuracy) using data from the National Comorbidity Survey Replication (2001-2003). Second, we created simulated data sets in which we knew the true model performance and variable importance measures and could verify that quantitative bias analysis was recovering the truth in misclassified versions of the data sets. Our findings showed that measurement error in the data used to construct random forests can distort model performance and variable importance measures and that bias analysis can recover the correct results. This study highlights the utility of applying quantitative bias analysis in machine learning to quantify the impact of measurement error on study results.

SUBMITTER: Jiang T

PROVIDER: S-EPMC8408353 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:Quantitative imaging biomarkers are widely used in PET for both research and clinical applications, yet bias in the underlying image data has not been well characterized. In the absence of a readily available reference standard for in vivo quantification, bias in PET images has been inferred using physical phantoms, even though arrangements of this sort provide only a poor approximation of the imaging environment in real patient examinations. In this study, we used data acquired from patient volunteers to assess PET quantitative bias in vivo. Image-derived radioactivity concentrations in the descending aorta were compared with blood samples counted on a calibrated γ-counter. Methods: Ten patients with prostate cancer were studied using 2-(3-(1-carboxy-5-[(6-18F-fluoro-pyridine-3-carbonyl)-amino]-pentyl)-ureido)-pentanedioic acid PET/CT. For each patient, 3 whole-body PET/CT image series were acquired after a single administration of the radiotracer: shortly after injection as well as approximately 1 and 4 h later. Venous blood samples were obtained at 8 time points over an 8-h period, and whole blood was counted on a NaI γ-counter. A 10-mm-diameter, 20-mm-long cylindric volume of interest was positioned in the descending thoracic aorta to estimate the PET-derived radioactivity concentration in blood. A triexponential function was fit to the γ-counter blood data and used to estimate the radioactivity concentration at the time of each PET acquisition. Results: The PET-derived and γ-counter-derived radioactivity concentrations were linearly related, with an R 2 of 0.985, over a range of relevant radioactivity concentrations. The mean difference between the PET and γ-counter data was 4.8% ± 8.6%, with the PET measurements tending to be greater. Conclusion: Human image data acquired on a conventional whole-body PET/CT system with a typical clinical protocol differed by an average of around 5% from blood samples counted on a calibrated γ-counter. This bias may be partly attributable to residual uncorrected scatter or attenuation correction error. These data offer an opportunity for the assessment of PET bias in vivo and provide additional support for the use of quantitative imaging biomarkers.

Dataset Information

Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets