Dataset Information

Sample size for binary logistic prediction models: Beyond events per variable criteria.

ABSTRACT: Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.

SUBMITTER: van Smeden M

PROVIDER: S-EPMC6710621 | biostudies-literature | 2019 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sample size for binary logistic prediction models: Beyond events per variable criteria.

van Smeden Maarten M Moons Karel Gm KG de Groot Joris Ah JA Collins Gary S GS Altman Douglas G DG Eijkemans Marinus Jc MJ Reitsma Johannes B JB

Statistical methods in medical research 20180703 8

Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and dist ...[more]

PMID: 29966490

Similar Datasets

Project description:Risk prediction models are routinely used to assist in clinical decision making. A small sample size for model development can compromise model performance when the model is applied to new patients. For binary outcomes, the calibration slope (CS) and the mean absolute prediction error (MAPE) are two key measures on which sample size calculations for the development of risk models have been based. CS quantifies the degree of model overfitting while MAPE assesses the accuracy of individual predictions. Recently, two formulae were proposed to calculate the sample size required, given anticipated features of the development data such as the outcome prevalence and c-statistic, to ensure that the expectation of the CS and MAPE (over repeated samples) in models fitted using MLE will meet prespecified target values. In this article, we use a simulation study to evaluate the performance of these formulae. We found that both formulae work reasonably well when the anticipated model strength is not too high (c-statistic < 0.8), regardless of the outcome prevalence. However, for higher model strengths the CS formula underestimates the sample size substantially. For example, for c-statistic = 0.85 and 0.9, the sample size needed to be increased by at least 50% and 100%, respectively, to meet the target expected CS. On the other hand, the MAPE formula tends to overestimate the sample size for high model strengths. These conclusions were more pronounced for higher prevalence than for lower prevalence. Similar results were drawn when the outcome was time to event with censoring. Given these findings, we propose a simulation-based approach, implemented in the new R package 'samplesizedev', to correctly estimate the sample size even for high model strengths. The software can also calculate the variability in CS and MAPE, thus allowing for assessment of model stability. The calibration and MAPE formulae suggest sample sizes that are generally appropriate for use when the model strength is not too high. However, they tend to be biased for higher model strengths, which are not uncommon in clinical risk prediction studies. On those occasions, our proposed adjustments to the sample size calculations will be relevant.

Project description:ObjectivesTo investigate how studies determine the sample size when developing radiomics prediction models for binary outcomes, and whether the sample size meets the estimates obtained by using established criteria.MethodsWe identified radiomics studies that were published from 01 January 2023 to 31 December 2023 in seven leading peer-reviewed radiological journals. We reviewed the sample size justification methods, and actual sample size used. We calculated and compared the actual sample size used to the estimates obtained by using three established criteria proposed by Riley et al. We investigated which characteristics factors were associated with the sufficient sample size that meets the estimates obtained by using established criteria proposed by Riley et al. RESULTS: We included 116 studies. Eleven out of one hundred sixteen studies justified the sample size, in which 6/11 performed a priori sample size calculation. The median (first and third quartile, Q1, Q3) of the total sample size is 223 (130, 463), and those of sample size for training are 150 (90, 288). The median (Q1, Q3) difference between total sample size and minimum sample size according to established criteria are -100 (-216, 183), and those differences between total sample size and a more restrictive approach based on established criteria are -268 (-427, -157). The presence of external testing and the specialty of the topic were associated with sufficient sample size.ConclusionRadiomics studies are often designed without sample size justification, whose sample size may be too small to avoid overfitting. Sample size justification is encouraged when developing a radiomics model.Key pointsQuestion Sample size justification is critical to help minimize overfitting in developing a radiomics model, but is overlooked and underpowered in radiomics research. Findings Few of the radiomics models justified, calculated, or reported their sample size, and most of them did not meet the recent formal sample size criteria. Clinical relevance Radiomics models are often designed without sample size justification. Consequently, many models are too small to avoid overfitting. It should be encouraged to justify, perform, and report the considerations on sample size when developing radiomics models.

Dataset Information

Sample size for binary logistic prediction models: Beyond events per variable criteria.

Publications

Sample size for binary logistic prediction models: Beyond events per variable criteria.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets