Unknown

Dataset Information

0

Multiple-rule bias in the comparison of classification rules.


ABSTRACT:

Motivation

There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule.

Results

This article provides a careful probabilistic analysis of the second issue and the 'multiple-rule bias', resulting from choosing a classification rule having minimum estimated error on the dataset. It quantifies this bias corresponding to estimating the expected true error of the classification rule possessing minimum estimated error and it characterizes the bias from estimating the true comparative advantage of the chosen classification rule relative to the others by the estimated comparative advantage on the dataset. The analysis is applied to both synthetic and real data using a number of classification rules and error estimators.

Availability

We have implemented in C code the synthetic data distribution model, classification rules, feature selection routines and error estimation methods. The code for multiple-rule analysis is implemented in MATLAB. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi11a/. Supplementary simulation results are also included.

SUBMITTER: Yousefi MR 

PROVIDER: S-EPMC3106200 | biostudies-literature | 2011 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Multiple-rule bias in the comparison of classification rules.

Yousefi Mohammadmahdi R MR   Hua Jianping J   Dougherty Edward R ER  

Bioinformatics (Oxford, England) 20110505 12


<h4>Motivation</h4>There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule.<h4>Results</h4>This article provides a careful probabilistic analysis of the secon  ...[more]

Similar Datasets

| S-EPMC3164859 | biostudies-other
| S-EPMC4050968 | biostudies-literature
| S-EPMC8796360 | biostudies-literature
| S-EPMC9312403 | biostudies-literature
| S-EPMC7924428 | biostudies-literature
| S-EPMC3856189 | biostudies-literature
| S-EPMC8336635 | biostudies-literature
| S-EPMC2777919 | biostudies-literature
| S-EPMC6515573 | biostudies-literature
| S-EPMC6157829 | biostudies-literature