1. Introduction
With the advancement of deep neural networks, large-scale datasets have appeared. In general, these real-world large data sets often have shown long-tailed label distributions [1,2,3,4] as depicted in Fig.1. On these datasets, the models [3] have a tendency to perform poorly on the minority classes due to the over-fitting. This tendency has to do with biased predictions, i.e., the trained deep model tends to predict the majority classes rather than the minor ones. Thus, overfitting to minority classes seems to be one of the challenges of generalization.