Dataset Information

Estimating misclassification error: a closer look at cross-validation based methods.

ABSTRACT: BACKGROUND: To estimate a classifier's error in predicting future observations, bootstrap methods have been proposed as reduced-variation alternatives to traditional cross-validation (CV) methods based on sampling without replacement. Monte Carlo (MC) simulation studies aimed at estimating the true misclassification error conditional on the training set are commonly used to compare CV methods. We conducted an MC simulation study to compare a new method of bootstrap CV (BCV) to k-fold CV for estimating clasification error. FINDINGS: For the low-dimensional conditions simulated, the modest positive bias of k-fold CV contrasted sharply with the substantial negative bias of the new BCV method. This behavior was corroborated using a real-world dataset of prognostic gene-expression profiles in breast cancer patients. Our simulation results demonstrate some extreme characteristics of variance and bias that can occur due to a fault in the design of CV exercises aimed at estimating the true conditional error of a classifier, and that appear not to have been fully appreciated in previous studies. Although CV is a sound practice for estimating a classifier's generalization error, using CV to estimate the fixed misclassification error of a trained classifier conditional on the training set is problematic. While MC simulation of this estimation exercise can correctly represent the average bias of a classifier, it will overstate the between-run variance of the bias. CONCLUSIONS: We recommend k-fold CV over the new BCV method for estimating a classifier's generalization error. The extreme negative bias of BCV is too high a price to pay for its reduced variance.

SUBMITTER: Ounpraseuth S

PROVIDER: S-EPMC3556102 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Estimating misclassification error: a closer look at cross-validation based methods.

Ounpraseuth Songthip S Lensing Shelly Y SY Spencer Horace J HJ Kodell Ralph L RL

BMC research notes 20121128

<h4>Background</h4>To estimate a classifier's error in predicting future observations, bootstrap methods have been proposed as reduced-variation alternatives to traditional cross-validation (CV) methods based on sampling without replacement. Monte Carlo (MC) simulation studies aimed at estimating the true misclassification error conditional on the training set are commonly used to compare CV methods. We conducted an MC simulation study to compare a new method of bootstrap CV (BCV) to k-fold CV f ...[more]

PMID: 23190936

Dataset Information

Estimating misclassification error: a closer look at cross-validation based methods.

Publications

Estimating misclassification error: a closer look at cross-validation based methods.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.
| S-EPMC5920056 | biostudies-literature

Thermally gated liposomes: a closer look.
| S-EPMC2765578 | biostudies-literature

A closer look into NADPH oxidase inhibitors: Validation and insight into their mechanism of action.
| S-EPMC7042484 | biostudies-literature

A closer look at the Azzolino collection
| S-EPMC10096476 | biostudies-literature

A closer look at the mysterious HSD17B13.
| S-EPMC7604724 | biostudies-literature

Relational graph convolutional networks: a closer look.
| S-EPMC9680895 | biostudies-literature

A closer look at 'Cheap White' cigarettes.
| S-EPMC5036225 | biostudies-literature

A closer look into the α-helix basin.
| S-EPMC5137006 | biostudies-literature