Dataset Information

Comparing regression modeling strategies for predicting hometime.

ABSTRACT:

Background

Hometime, the total number of days a person is living in the community (not in a healthcare institution) in a defined period of time after a hospitalization, is a patient-centred outcome metric increasingly used in healthcare research. Hometime exhibits several properties which make its statistical analysis difficult: it has a highly non-normal distribution, excess zeros, and is bounded by both a lower and upper limit. The optimal methodology for the analysis of hometime is currently unknown.

Methods

Using administrative data we identified adult patients diagnosed with stroke between April 1, 2010 and December 31, 2017 in Ontario, Canada. 90-day hometime and clinically relevant covariates were determined through administrative data linkage. Fifteen different statistical and machine learning models were fit to the data using a derivation sample. The models' predictive accuracy and bias were assessed using an independent validation sample.

Results

Seventy-five thousand four hundred seventy-five patients were identified (divided into a derivation set of 49,402 and a test set of 26,073). In general, the machine learning models had lower root mean square error and mean absolute error than the statistical models. However, some statistical models resulted in lower (or equal) bias than the machine learning models. Most of the machine learning models constrained predicted values between the minimum and maximum observable hometime values but this was not the case for the statistical models. The machine learning models also allowed for the display of complex non-linear interactions between covariates and hometime. No model captured the non-normal bucket shaped hometime distribution.

Conclusions

Overall, no model clearly outperformed the others. However, it was evident that machine learning methods performed better than traditional statistical methods. Among the machine learning methods, generalized boosting machines using the Poisson distribution as well as random forests regression were the best performing. No model was able to capture the bucket shaped hometime distribution and future research on factors which are associated with extreme values of hometime that are not available in administrative data is warranted.

SUBMITTER: Holodinsky JK

PROVIDER: S-EPMC8261957 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:Motivation: Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT, RDX and HMX. One important goal of microarray experiments is to discover novel biomarker genes for quantitative phenotypic prediction. We have developed an earthworm microarray containing 15,208 unique oligo probes. Our objective was to identify biomarker genes that can be used to quantitatively predict earthworm tissue residues of the explosives compounds that they were exposed to and took in from the HMX-spiked soil. Results: We collected a large microarray gene expression and earthworm tissue residue dataset. First, differentially expressed genes were identified for each exposure duration (4, 14 and 28 days). These genes were used in multivariate regression modeling for HMX residue prediction. Eighteen different regression models were tested and compared. The best performing model was able to achieve very high prediction accuracies with R2 values of 0.715, 0.728 and 0.822 for 4 days, 14 days and 28 days exposures, separately. Conclusions: This study demonstrated that multivariate regression coupled with high throughput microarray gene expression was a promising approach to quantitative phenotypic prediction. Adult earthworms (Eisenia fetida) were exposed in soil spiked with HMX (0, 8, 16, 32, 64, or 128 mg/kg) for 4, 14 or 28 days. Each treatment originally had 10 replicate worms with all 10 worms survived at the end of exposure. Total RNA was isolated from the surviving worms. A total of 120 worm RNA samples were hybridized to a custom-designed oligo array using AgilentM-bM-^@M-^Ys one-color Low RNA Input Linear Amplification Kit. The array contains 15,208 non-redundant 60-mer probes, each targeting a unique E. fetida transcript. After hybridization and scanning, gene expression data were acquired using AgilentM-bM-^@M-^Ys Feature Extraction Software (v.9.1.3). The 120-array dataset consists of three exposure groups (4 day, 14 day and 28 day) with each group having the following 8 treatments: day 0 control, blank control, solvent control, 8, 16, 32, 64, and 128 mg HMX/g soil. Each treatment had 5 biological replicates (i.e., five individual worms).

Project description:INTRODUCTION: There is no consensus on how to investigate men with negative transrectal ultrasound guided prostate biopsy (TRUS-B) but ongoing suspicion of cancer. Three strategies used are transperineal (TP-B), transrectal saturation (TS-B) and MRI-guided biopsy (MRI-B). We compared cancer yields of these strategies. METHODS: Papers were identified by search of Pubmed, Embase and Ovid Medline. Included studies investigated biopsy diagnostic yield in men with at least one negative TRUS-B and ongoing suspicion of prostate cancer. Data including age, PSA, number of previous biopsy episodes, number of cores at re-biopsy, cancer yield, and Gleason score of detected cancers were extracted. Meta-regression analyses were used to analyse the data. RESULTS: Forty-six studies were included; 12 of TS-B, 14 of TP-B, and 20 of MRI-B, representing 4,657 patients. Mean patient age, PSA and number of previous biopsy episodes were similar between the strategies. The mean number of biopsy cores obtained by TP-B and TS-B were greater than MRI-B. Cancer detection rates were 30·0%, 36·8%, and 37·6% for TS-B, TP-B, and MRI-B respectively. Meta-regression analysis showed that MRI-B had significantly higher cancer detection than TS-B. There were no significant differences however between MRI-B and TP-B, or TP-B and TS-B. In a sensitivity analysis incorporating number of previous biopsy episodes (36 studies) the difference between MRI-B and TP-B was not maintained resulting in no significant difference in cancer detection between the groups. There were no significant differences in median Gleason scores detected comparing the three strategies. CONCLUSIONS: In the re-biopsy setting, it is unclear which strategy offers the highest cancer detection rate. MRI-B may potentially detect more prostate cancers than other modalities and can achieve this with fewer biopsy cores. However, well-designed prospective studies with standardised outcome measures are needed to accurately compare modalities and define an optimum re-biopsy approach.

Project description:BACKGROUND:Different strategies toward implementing competing risks in discrete-event simulation (DES) models are available. This study aims to provide recommendations regarding modeling approaches that can be defined based on these strategies by performing a quantitative comparison of alternative modeling approaches. METHODS:Four modeling approaches were defined: 1) event-specific distribution (ESD), 2) event-specific probability and distribution (ESPD), 3) unimodal joint distribution and regression model (UDR), and 4) multimodal joint distribution and regression model (MDR). Each modeling approach was applied to uncensored individual patient data in a simulation study and a case study in colorectal cancer. Their performance was assessed in terms of relative event incidence difference, relative absolute event incidence difference, and relative entropy of time-to-event distributions. Differences in health economic outcomes were also illustrated for the case study. RESULTS:In the simulation study, the ESPD and MDR approaches outperformed the ESD and UDR approaches, in terms of both event incidence differences and relative entropy. Disease pathway and data characteristics, such as the number of competing risks and overlap between competing time-to-event distributions, substantially affected the approaches' performance. Although no considerable differences in health economic outcomes were observed, the case study showed that the ESPD approach was most sensitive to low event rates, which negatively affected performance. CONCLUSIONS:Based on overall performance, the recommended modeling approach for implementing competing risks in DES models is the MDR approach, which is defined according to the general strategy of selecting the time-to-event first and the corresponding event second. The ESPD approach is a less complex and equally performing alternative if sufficient observations are available for each competing event (i.e., the internal validity shows appropriate data representation).