I. Introduction
Over the past decade, convolutional neural networks (CNNs) have gained remarkable popularity as they have significantly advanced the state-of-the-art in various real-world tasks, such as image classification [1], object tracking [2], and segmentation [3], [4]. The increased availability of labeled data, improvements in modern graphical processing units (GPUs), and the algorithmic breakthroughs in software have provided CNNs with great learning capacity to explore the hypothesis space that have never been successfully explored by the so-called “shallow” models. However, in various real-world applications, the training data often exhibit significantly imbalanced class distributions [5]–[8]. This is a classic difficult problem in traditional machine learning. Despite their great learning capacity, this problem has not gone away for CNN-based methods and their performances can still suffer when the data have a skewed distribution [9], [10]. Despite its fundamental importance, class imbalance in the context of deep representation learning has been under-researched [7], [8].