Dataset Information

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

ABSTRACT: BACKGROUND:To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. RESULTS:The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. CONCLUSIONS:In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.

SUBMITTER: Chicco D

PROVIDER: S-EPMC6941312 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

Chicco Davide D Jurman Giuseppe G

BMC genomics 20200102 1

<h4>Background</h4>To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F<sub>1</sub> score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification t ...[more]

PMID: 31898477

Similar Datasets

Project description:The accuracy of a classification is fundamental to its interpretation, use and ultimately decision making. Unfortunately, the apparent accuracy assessed can differ greatly from the true accuracy. Mis-estimation of classification accuracy metrics and associated mis-interpretations are often due to variations in prevalence and the use of an imperfect reference standard. The fundamental issues underlying the problems associated with variations in prevalence and reference standard quality are revisited here for binary classifications with particular attention focused on the use of the Matthews correlation coefficient (MCC). A key attribute claimed of the MCC is that a high value can only be attained when the classification performed well on both classes in a binary classification. However, it is shown here that the apparent magnitude of a set of popular accuracy metrics used in fields such as computer science medicine and environmental science (Recall, Precision, Specificity, Negative Predictive Value, J, F1, likelihood ratios and MCC) and one key attribute (prevalence) were all influenced greatly by variations in prevalence and use of an imperfect reference standard. Simulations using realistic values for data quality in applications such as remote sensing showed each metric varied over the range of possible prevalence and at differing levels of reference standard quality. The direction and magnitude of accuracy metric mis-estimation were a function of prevalence and the size and nature of the imperfections in the reference standard. It was evident that the apparent MCC could be substantially under- or over-estimated. Additionally, a high apparent MCC arose from an unquestionably poor classification. As with some other metrics of accuracy, the utility of the MCC may be overstated and apparent values need to be interpreted with caution. Apparent accuracy and prevalence values can be mis-leading and calls for the issues to be recognised and addressed should be heeded.

Project description:BackgroundTo develop an original and standardized ureteral stricture disease (USD) score and classification system for quantifying ureter stricture characteristics, assessing complexity of the minimally invasive upper urinary tract reconstructive (UUTR) surgical procedure, formulating preoperative plans, and offering objective comparisons of surgical techniques between different institutions and surgeons.MethodsWe retrospectively reviewed a test set of 64 patients and a validation set of 170 patients who underwent minimally invasive UUTR surgery from January 2018 to January 2021. Three factors were selected to be included in the USD score and classification system: (I) stricture etiology (E, 1-2 points); (II) stricture segment (S, 0-3 points); and (III) length of stricture (L, 1-5 points). The UUTR surgery involves low-complex surgeries (cystoscopy with ureteral dilation and stent placement, ureteropyeloplasty, end to end repair, ureteral reimplantation) and high-complex surgeries (onlay repair (buccal mucosae, lingual mucosae, appendix mucosae), Boari flap repair and ileal ureter replacement). Estimated blood loss and operative time were used as surrogate indicators of surgical complexity.ResultsThe interrater reliability of the USD score and classification system was 0.908. A linear relationship between the USD score and estimated blood loss was observed (rs =0.676, P<0.001). The USD score was also correlated with operative time (rs =0.638, P<0.001). A significant difference in USD scores was found between the high and low complexity surgery groups (4 vs. 7, P<0.001). Variability of UUTR surgery is based on USD classification system, but with regularity to conform to.ConclusionsThe USD score and classification system is a concise, easily applicable, and validated scale to delineate the clinically significant features of ureter stricture that correlate with the complexity of the UUTR surgical procedure. The use of this score and classification system can facilitate preoperative plan and comparison of USD treatments in clinical practice and urological literature. Research with large sample is needed to further examine and modify the use of the system.

Dataset Information

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

Publications

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets