Unknown

Dataset Information

0

Neyman-Pearson classification algorithms and NP receiver operating characteristics.


ABSTRACT: In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, ?, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than ? do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than ?, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose ? in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.

SUBMITTER: Tong X 

PROVIDER: S-EPMC5804623 | biostudies-literature | 2018 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Neyman-Pearson classification algorithms and NP receiver operating characteristics.

Tong Xin X   Feng Yang Y   Li Jingyi Jessica JJ  

Science advances 20180202 2


In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while e  ...[more]

Similar Datasets

| S-EPMC5332475 | biostudies-literature
| S-EPMC6929587 | biostudies-literature
| S-EPMC2795956 | biostudies-literature
| S-EPMC3743052 | biostudies-literature
| S-EPMC8671363 | biostudies-literature
| S-EPMC5241446 | biostudies-literature
| S-EPMC5807850 | biostudies-other
| S-EPMC3240682 | biostudies-literature
| S-EPMC6745617 | biostudies-literature
| S-EPMC6768691 | biostudies-literature