I. Introduction
Classification is a fundamental machine learning task used in a wide range of applications. When designing classification algorithms, the 0–1 loss function is preferred, as it helps produce the Bayes optimal classifier, which has the minimum probability of classification error. However, the 0–1 loss is difficult to optimize because it is neither convex nor smooth [5], [9]. Many different computationally friendly surrogate loss functions have therefore been proposed as approximations for the 0–1 loss function.