Loading [a11y]/accessibility-menu.js
On the Rates of Convergence From Surrogate Risk Minimizers to the Bayes Optimal Classifier | IEEE Journals & Magazine | IEEE Xplore

On the Rates of Convergence From Surrogate Risk Minimizers to the Bayes Optimal Classifier


Abstract:

In classification, the use of 0–1 loss is preferable since the minimizer of 0–1 risk leads to the Bayes optimal classifier. However, due to the nonconvexity of 0–1 loss, ...Show More

Abstract:

In classification, the use of 0–1 loss is preferable since the minimizer of 0–1 risk leads to the Bayes optimal classifier. However, due to the nonconvexity of 0–1 loss, this optimization problem is NP-hard. Therefore, many convex surrogate loss functions have been adopted. Previous works have shown that if a Bayes-risk consistent loss function is used as a surrogate, the minimizer of the empirical surrogate risk can achieve the Bayes optimal classifier as the sample size tends to infinity. Nevertheless, the comparison of convergence rates of minimizers of different empirical surrogate risks to the Bayes optimal classifier has rarely been studied. Which characterization of the surrogate loss determines its convergence rate to the Bayes optimal classifier? Can we modify the loss function to achieve a faster convergence rate? In this article, we study the convergence rates of empirical surrogate minimizers to the Bayes optimal classifier. Specifically, we introduce the notions of consistency intensity and conductivity to characterize a surrogate loss function and exploit this notion to obtain the rate of convergence from an empirical surrogate risk minimizer to the Bayes optimal classifier, enabling fair comparisons of the excess risks of different surrogate risk minimizers. The main result of this article has practical implications including: 1) showing that hinge loss (SVM) is superior to logistic loss (Logistic regression) and exponential loss (Adaboost) in the sense that its empirical minimizer converges faster to the Bayes optimal classifier and 2) guiding the design of new loss functions to speed up the convergence rate to the Bayes optimal classifier with a data-dependent loss correction method inspired by our theorems.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 33, Issue: 10, October 2022)
Page(s): 5766 - 5774
Date of Publication: 21 April 2021

ISSN Information:

PubMed ID: 33882001

Funding Agency:


I. Introduction

Classification is a fundamental machine learning task used in a wide range of applications. When designing classification algorithms, the 0–1 loss function is preferred, as it helps produce the Bayes optimal classifier, which has the minimum probability of classification error. However, the 0–1 loss is difficult to optimize because it is neither convex nor smooth [5], [9]. Many different computationally friendly surrogate loss functions have therefore been proposed as approximations for the 0–1 loss function.

Contact IEEE to Subscribe

References

References is not available for this document.