Learning From Imbalanced Data Using Triplet Adversarial Samples

The imbalance of classes in real-world datasets poses a major challenge in machine learning and classification, and traditional synthetic data generation methods often fail to address this problem effectively. A major limitation of these methods is that they tend to separate the process of generating synthetic samples from the training process, resulting in synthetic data that lack the necessary informative characteristics for proper model training. We present a new synthetic data generation method that addresses this issue by combining adversarial sample generation with a triplet loss method. This approach focuses on increasing the diversity in the minority class while preserving the integrity of the decision boundary. Furthermore, we show that reducing triplet loss is equivalent to maximizing the area under the receiver operating characteristic curve under specific conditions, providing a theoretical basis for the effectiveness of our method. In addition, we present a model training approach to further improve the generalization of the model to small classes by providing a diverse set of synthetic samples optimized using our proposed loss function. We evaluated our method on several imbalanced benchmark tasks and compared it to state-of-the-art techniques, demonstrating that our method can deliver even better performance, making it an effective solution to the class imbalance problem.


I. INTRODUCTION
Large-scale real-world datasets have been a driving force behind the recent success of deep-learning techniques for image classification [1], [2]. However, these techniques encounter a major challenge in real-world scenarios: class imbalance. A class imbalance problem occurs when the distribution of classes in a dataset is unequal, with one or more classes having far fewer instances than the others, causing machine learning models to perform poorly because of their bias towards the majority classes. Class imbalance is a major concern in machine learning and frequently occurs across various real-world domains [3], [4], [5]. In datasets with class imbalance, the majority classes, which dominate the data, The associate editor coordinating the review of this manuscript and approving it for publication was Wai-Keung Fung .
often receive the majority of attention and resources, whereas the minority classes, which are of utmost importance, are not adequately represented. Using conventional training methods on class-imbalanced datasets generally results in highly accurate predictions for the majority classes but poor generalization capabilities for the minority classes [6], [7].
The class imbalance problem has prompted the development of several solutions, which can be divided into three main categories: cost-sensitive learning, resampling, and modification of learning objectives. Cost-sensitive learning adjusts the cost of misclassifying minority classes and trains a model to minimize the overall expected cost [8], [9], [10]. Resampling aims to balance the classes by oversampling the minority classes, undersampling the majority classes, or generating synthetic minority class instances [6], [11], [12], [13], [14]. Modifying learning objectives involves using VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ alternative metrics, such as the area under the receiver operating characteristic curve (AUC) [15], [16] or margin-based loss [17], instead of conventional objective functions, such as traditional cross-entropy loss, which often perform poorly on minority classes owing to the imbalance in the dataset. Regarding synthetic data generation techniques, the limitation of most methods developed thus far lies in the separation between the data generation and classifier training processes. Many existing methods generate synthetic samples before the training process and do not revise them during training, resulting in suboptimal performance. In this study, we present a novel synthetic data generation scheme that addresses this limitation. Our proposed method considers the current state of the classifier when generating synthetic samples, resulting in samples that better reflect local data structures, thus leading to improved performance of the classifier on imbalanced datasets.
Specifically, the proposed method for generating synthetic minority class samples leverages the triplet loss [18], which is often used to learn the similarity for learning embeddings. The proposed method is based on the idea that instead of training the network with a triplet loss, helpful synthetic samples can be generated by minimizing the triplet loss over an anchor point based on the embedding space of the current model. It has been proposed based on our theoretical findings that by adding synthetic minority samples to minimize the triplet loss, the AUC of a simple distance-based classifier in the embedding space can be maximized. The proposed method differs from other oversampling techniques that generate samples independently of the model. Our method considers the current state of the model and instances in a given training set to create synthetic samples that possess valuable information, such as class borderlines. By fine-tuning the model with these generated anchor points, we aim to enhance the performance in terms of addressing the class imbalance problem.
The main contributions of this study are as follows. (1) We show that combining adversarial training with a triplet loss function and an adversarial sample generation method can generate high-quality synthetic samples that address the class imbalance issue. (2) Our analysis reveals that minimizing the triplet loss is mathematically equivalent to maximizing the AUC of a classifier, thus providing a deeper insight into the mechanism of the proposed method. (3) By incorporating a fine-tuning step using the generated anchor points, which reflect the current state of the model, we achieve a performance improvement compared with that of existing methods for synthetic data generation.
The remainder of this paper is organized as follows: In Section II, we review the existing studies related to the class-imbalance problem and our proposed method. Section III describes the proposed method, including its underlying principles and the developed algorithms. Section IV reports the results of the experiments conducted to evaluate the proposed method, including a comparison with existing methods. Finally, Section V provides a summary of the findings and implications of the research.

II. RELATED WORKS
We briefly summarize previous related studies addressing class imbalance in Subsections II-A, II-B, and II-C and provide a background for the proposed method in Subsection II-D. Table 1 presents the comparison of imbalanced learning techniques. Our approach is positioned on three criteria, namely, consideration of other classes, end-toend learning, and AUC as a learning objective. In the following literature review, we will further explore these particular criteria.

A. CLASS IMBALANCE: RESAMPLING & SYNTHETIC DATA GENERATION
Oversampling is a common approach to solving the class imbalance problem. It involves duplicating minority samples to balance the dataset but can result in overfitting as the model becomes too closely adapted to the minority samples. To address this problem, the synthetic minority oversampling technique (SMOTE) was introduced [11]. SMOTE creates synthetic minority samples through linear interpolation with the nearest samples, thereby simulating a minority distribution. The new minority instances may not always enhance classification performance since SMOTE does not account for the other majority class. Certain SMOTE variants, which leverage Euclidean distances to incorporate inter-class information, have shown promising outcomes on tabular datasets [13], [14]. Nonetheless, they have proven to be less effective on image datasets because of their limitations in generating high-quality images through linear interpolation [22], [23].
An alternative approach involves generating minority samples by transferring them from the existing samples. The context-rich minority oversampling was proposed by utilizing synthetic minority images created through the mixup [24] with weighted datasets [12]. Another approach, so-called M2m, involves using adversarial examples to balance class distribution and altering a majority sample into a synthetic minority sample through optimization [21]. The above-mentioned is similar to our method in that they also use optimization to generate synthetic samples.
While generating synthetic samples for the minority class by considering the majority class has demonstrated promising results, it is an incomplete approach due to the fact that many existing sampling methods generate synthetic samples in isolation from the classifier training process. This approach may not result in an optimized classifier due to its limited integration with the training process. The M2m approach incorporates some classifier information by using a pre-trained independent classifier [21], but does not fully leverage in-training classifier information.

B. CLASS IMBALANCE: NOVEL LOSS FUNCTIONS
The standard cross-entropy loss function used in training classification models often performs poorly on class-imbalanced datasets because a classifier is trained to minimize the overall error rate, which has a limited effect on the minority classes. This results in a bias towards the majority classes. To overcome this issue, researchers have proposed alternative loss functions that are better suited for handling class imbalances. A popular approach involves maximizing the AUC [25], [26]. However, unlike the cross-entropy loss, which can be calculated from a single example or mini-batch examples, the AUC must be calculated for all given data, making it difficult to optimize with the stochastic gradient descent method. To address this issue, researchers have proposed online AUC maximization methods using surrogate loss functions that can be updated using mini-batch training samples. Hinge losses for convex surrogates and unique sampling and updating rules for shallow networks have been proposed [15]. A previous study reformulated the AUC maximization problem with an L 2 loss surrogate as a minimax problem and solved it as a stochastic saddle-point problem [27]. Recent studies focused on training deep neural networks for AUC optimization [28], [29]. However, these methods needed to solve a challenging min-max optimization problem.
Another line of research focuses on margin-based loss functions, with studies showing that hinge loss, which is used to obtain maximum margins, is robust to class imbalance [30]. Label distribution-aware margin (LDAM) loss was also proposed to encourage larger margins for minority classes, thereby reducing the generalization error of the minority class [17]. To compensate for the influence of feature deviation, Class Dependent Temperatures (CDT) were developed to simulate feature deviation in the training phase by enlarging the decision values for minor-class data [20]. Similar to LDAM, CDT can be considered as a margin-based approach for increasing margins for minority classes. Nonetheless, it should be noted that penalizing majority classes with large margins or temperatures may result in an increase in the AUC metric, but these strategies are not explicitly directed to AUC maximization. Our work is similar to the above-mentioned studies in that it uses a loss function related to AUC but differs in that we generate synthetic samples rather than using the loss to train a classifier.

C. DEEP-METRIC LEARNING
Deep-metric learning (DML) leverages deep neural networks to learn a representation for each data point such that the resulting representation space directly corresponds to a similarity metric [31]. DML aims to map the input data to a high-dimensional representation space, where similar data points are close to each other and dissimilar points are far apart. This is accomplished by training the network to maintain a specific relationship with the data points, where the distance between representations of similar data points is minimized and the distance between dissimilar data points is maximized.
Earlier DML methods utilized the bottleneck layer of a classification model trained on labeled data to represent unlabeled data [32]. However, a new approach has been introduced that uses a triplet of two positive samples, one negative sample, and a new loss function to separate the positive samples from the negative samples in the embedding space of a deep neural network [18]. This is known as the triplet loss. To train a network using examples that effectively minimize the distance between similar data points and maximize the distance between dissimilar data points in the embedding space, selection of informative triplets is a critical aspect of the triplet learning process. For fast convergence, a minibatch typically consists of the hardest positive and negative samples, meaning that the selected samples are the most difficult to differentiate in the embedding space, thus providing the network with a demanding and varied set of examples to learn from.
Given the similarity between triplet loss and AUC optimization, several studies have used DML as a viable solution for class imbalance [33], [34]. To train a network and optimize AUC simultaneously, they use triplet loss as the primary loss function. Training a network using triplet loss, on the other hand, necessitates careful triplet sampling strategies, which may limit the applicability of some methods. In this study, we provide the theoretical basis for AUC maximization being a triplet loss minimization in particular scenarios, which explains the usefulness of triplet loss in imbalanced situations.

D. ADVERSARIAL EXAMPLES
It has been observed that many machine learning models are susceptible to small, imperceptible changes in their inputs that can result in drastic changes to their predictions [35]. These altered examples, called adversarial examples, are frequently misclassified by classifiers with different architectures, suggesting that they have unique characteristics that make accurate classification challenging. A recent study has indicated that deep neural networks can learn both robust and non-robust features from image datasets, where non-robust features are those that are easily exploited by  adversarial examples [36]. These non-robust features are incomprehensible to humans, and their existence may highlight the vulnerability of deep neural networks to adversarial attacks. To address the problem of adversarial examples, a new method for creating them was proposed in a previous study [37], and adversarial training was introduced as a regularization technique to improve the robustness of machine learning models against these examples. Adversarial training involves adding adversarial examples to the training data, thereby allowing the model to learn more robust features and become less susceptible to adversarial attacks. In our study, we incorporated augmented training data to receive these benefits.

III. PROPOSED METHOD
As briefly indicated in our literature review, a crucial element of our study is the use of the triplet loss to generate new minority (positive) samples rather than directly training a classifier using the loss function; to effectively secure the positive regions, we generate adversarial samples for the new samples. Figure 1 shows an overview of the proposed method. In Section III-A, we establish a theoretical basis by demonstrating how the samples newly generated through the triplet loss function can enhance the AUC of a classifier. Based on this theoretical foundation, in Sections III-B and III-C, we describe the proposed loss functions and learn-ing algorithms, respectively. It is important to note that this study addresses multi-class classification problems. Therefore, with the given multiple classes, each class, except for the largest one, is designated as the minority class, that is, the positive class, once in sequence, which is an aspect worth considering.

A. THEORETICAL FOUNDATION
The receiver operating characteristic (ROC) curve [26], which is a two-dimensional plot of the false positive rate versus the true-positive rate, is widely used to assess the performance of classifiers for class-imbalanced data. We quantified the classification performance for the two-class case by computing the AUC. The AUC metric is equivalent to the Wilcoxon-Mann-Whitney statistic [25], [38]. The AUC for a scoring function g is given by: where x + and x − denote the positive (minority) and negative (majority) samples, respectively, and I(·) is the zero-one indicator function. Because the AUC function is not differentiable, it is often replaced by its convex surrogates that satisfy the condition during optimization. Hinge loss is an often-used surrogate function [15], defined in Equation (2).
Because it is a differentiable and convex function that bounds the indicator function, the hinge loss is a popular choice in literature. Its use enables computationally efficient and robust AUC optimization. AUC maximization for a given dataset can be formulated as an empirical minimization problem, as depicted in Equation (3).
where N + and N − denote the sizes of the positive and negative classes, respectively. The objective is to assign higher rankings to the positive class samples and lower rankings to the negative class samples to maximize the AUC. Triplet losses [18] are often used in metric learning to incorporate relative ranks between samples. The purpose of both AUC and triplet loss is to rank samples in the desired order. Accordingly, several studies have employed triplet losses to optimize the AUC metric or have used the AUC metric instead of the triplet loss. This strategy has been proved in the literature to be successful in a variety of classification problems with class-imbalanced data [33], [34].
consists of an anchor point x + a , a positive sample x + i (sample of the same class as the anchor point), and a negative sample x − j (sample of a class different from the anchor point). When training a neural network with an embedding layer (penultimate layer) f (x), the triplet loss is defined as follows: where ∥·∥ and α denote the norm and margin, respectively, and both must be pre-defined. Minimizing Equation (4) involves training f (x) of a neural network such that samples from the same class are embedded closer to each other than samples from different classes. Margin α controls the proximity of the positive sample to the anchor point in comparison to the negative sample. We now show that minimizing the triplet loss is equivalent to maximizing the AUC for a distance-based classifier As x approaches the anchor point x + a , g returns a greater value, and vice versa. Using g, the triplet loss in Equation (4) can be reformulated as: When the anchor point x + a is given, Equation (5) represents the single term with α = 1 over the summation in Equation (3). Therefore, Equation (3) can be re-written as Equation (6) for all pairs of (x + i , x − j ) and the given anchor point x + a , and minimizing Equation (6) is equivalent to maximizing the AUC of the distance-based classifier g.
However, it is impractical to find g, implying training f in a neural network, by minimizing Equation (6) because the optimal anchor point x + a must be determined from a given dataset for every pair of (x + i , x − j ). By shifting the notion of training, this study proposes generating the optimal anchor point x + a rather than finding it from the data by utilizing the triplet loss to generate a synthetic sample that helps maximize AUC. This was the main objective of this study. The creation of synthetic positive samples also help address the issue of class imbalance. In particular, we generated the anchor point x + a by solving Equation (7) for a pair of ( arg min By this approach, a distance-based classifier is expected to accurately classify minority samples in the embedding space by ensuring that all minority samples have their nearest minority anchor point, thereby maximizing the AUC. 1 Notably, Equation (7) has a trivial solution, that is, i . This solution is equivalent to duplicating x + i , which is a random oversampling (ROS) approach. To avoid a trivial 1 Each of the majority samples has its nearest majority neighbor according to our presumption. solution, we adopted data-augmentation techniques. To minimize Equation (7), an augmented x + i is selected as the starting point and is repeatedly updated to the optimal x + a . For regularization, we used an adversarial sample generation strategy to move the synthetic anchor point x + a away from x + i . Details are provided in the following subsection. Figure 2 shows the conceptual basis of this study. The minority and majority samples are represented by circles and crosses, respectively. We attempted to expand the decision boundary toward the majority class by generating a synthetic anchor sample, denoted by the star symbol. The hatched region is the area that satisfies the expression L triplet (★, ⃝, ✕) = 0, implying that the decision boundary is broadened while both the majority and minority decision areas are retained. The dashed red circle indicates the initial augmented sample, which is updated to the synthetic anchor point as the optimization progresses.
Although this subsection discusses the main theoretical basis for a two-class scenario, it can be applied to a multi-class situation because the AUC for a multi-class case is known to be the average AUC from all one-against-one combinations [39]. Therefore, maximizing the AUC for each two-class combination also maximizes the AUC for the multiclass case. The synthetic anchor generation scheme was extended and described as an algorithm for the multi-class classification problems.

B. PROPOSED LOSS FUNCTIONS
In this subsection, we present the proposed loss functions used in our model training algorithms. As outlined in Section III-A, a triplet adversarial sample (TAS) denoted by x + a is generated through an optimization process that involves minimizing the triplet loss, denoted by L Triplet , along with an additional adversarial sample generation loss term. Training with adversarial samples is an effective method of generalizing models [40], [41]. Recent research has highlighted the importance of using diverse and challenging samples when training machine learning models to enhance their robustness and generalization abilities [42], [43]. Adversarial samples were created by adding small perturbations to the original samples, making them intentionally incorrect, with the goal of encouraging the network to learn robust representations. This leads to larger margins between classes, resulting in an improved model generalization. The optimization problem for generating adversarial samples [35] is defined as: where h is a trained model. The objective is to find the adversarial perturbation δ that maximizes the cross-entropy (CE), L CE (·, ·), between the ground truth (y) and prediction by h. By incorporating the cross-entropy in Equation (8), the proposed loss function for generating a TAS is obtained as VOLUME 11, 2023 follows: arg min where x + i is the seed minority (positive) sample, x − j is the negative sample, and y + is the class label of x + i . The loss function is a combination of the triplet loss term L Triplet and adversarial term L CE (h; x + a , y + ). The former minimizes the distance between the anchor point and positive sample and maximizes the distance between the anchor point and negative sample by the margin α, whereas the latter positions the anchor point near the boundary of the classifier and away from the positive sample x + i . Therefore, the adversarial term can be regarded as a regularization term, where λ controls the extent of regularization. By combining these terms, we can generate synthetic samples that are similar to the original positive samples and adversarial to the classifier. This is expected to improve the classification accuracy for minority classes as the generated synthetic minority samples are close to the minority because of the triplet loss term and near the decision boundary owing to the adversarial regularization term.
For the effective training of a model with a triplet adversarial samples, it is important that the model embeds positive samples (x + i ) and synthetic anchor points (x + a ). To achieve this, we use the logit pairing loss as a regularization technique to stabilize the training process [44], [45] by reducing the KL-divergence, denoted by D KL , between the predicted class probabilities of the synthetic and seed samples. The logit pairing loss is defined as: where p(y|x + a ) and p(y|x + i ) are the predicted class probabilities of the synthetic and seed samples, respectively, and M = (p(y|x + a ) + p(y|x + i ))/2 is the average of the above probabilities. The KL-divergence loss term was employed to make the model predict similarly for both the synthetic and seed samples, thus avoiding the deviation of the synthetic samples from the original samples in the feature space.
Finally, given a batch of data B and synthetic anchor set A of the same size as B generated using Equation (9), the training loss for the classification model h is defined as follows: The training loss consists of three components, namely, the loss for the synthetic anchor set A, loss for the original batch of data B, and logit pairing loss. The objective of the training is to minimize the loss for both the synthetic anchor set and original dataset and to reduce the difference between their respective posterior distributions. This helps strengthen the robustness of the classifier model in handling class-imbalanced datasets and improves its ability to learn from both synthetic and real data.

C. PROPOSED LEARNING ALGORITHMS
Given the loss functions for generating triplet adversarial samples and training a classification model, we outline the corresponding learning algorithms in this subsection. Let us consider a multi-class imbalanced-data classification problem with K classes and N instances in a training set, that is, x ∈ R d and y ∈ {1, . . . , K } have a skewed class distribution. Let h(x) denote the prediction of a neural network with K outputs and f (x) denote the embedding layer (penultimate layer) of network h. We denote the sample size in class k as N k , and the total sample size is N = K k=1 N k . In addition, we assume that class '1' is the largest class and class 'K ' is the smallest class. For the classes between the largest and smallest classes, it is assumed that the sample sizes decrease sequentially, satisfying To solve the optimization problem as described in Equation (9), we employ a variation of the fast gradient sign (FGS) method [37], which is a one-step adversarial sample generation technique. The FGS method creates perturbed samples using the following equation: where η a is an adversarial sample generation step length.

end if
14: x + a ← x + a − η a sign(∇(L Triplet + λL Adv )) 15: end if 19: end for 20: end for 21: return A across all classes, with k = 1, . . . , K . For the selection of the anchor and negative samples in the triplet, the augmented seed sample x + i is used as the anchor, and a random sample among the 5-nearest relatively majority samples with class label k − , satisfying the condition k − < k, is selected as the negative sample (lines [6][7][8]. The cutout method is applied to the selected seed sample x + i . This technique masks certain regions of an image to a constant value. Triplet and adversarial losses are then calculated for the selected triplet (x + a , x + , x − ). However, the adversarial regularization term is applied only to well-classified samples that have been successfully classified as the target class. This is implemented to make the classification harder for easy samples by creating synthetic samples (lines [9][10][11][12][13][14]. The algorithm returns the set A, which is a collection of triplet adversarial samples used for training the model. It is important to note that each sample in A corresponds to its respective batch set B, which enables the calculation of the training loss, as defined in Equation (11), using pairs of elements from sets A and B.
An important aspect of the proposed synthetic data generation method is its incorporation into the training process of the classifier model. Thus, synthetic samples are generated based on the current state of the model, including its representation space and decision boundaries. This dynamic and adaptive A ← GenTAS(B) 9: 10: if t = T 0 then 11: end if 13: h θ ← h θ − η h ∇L(h θ ) 14: end for 15: return h θ approach results in a more effective synthetic sample generation because the model's understanding of the data evolves as the iterations progress. The overall training process, as shown in Figure 1, involves generating triplet adversarial samples and concurrently updating the classifier model using both the original and synthetic samples. This approach enables the classifier to learn a robust decision boundary by considering not only the original data but also the synthetic samples.
The procedure for training the model using the triplet adversarial samples is outlined in Algorithm 2. Our approach includes a deferred rebalancing schedule (DRS), as introduced in a previous study [17]. The network begins by training on an imbalanced dataset and then transitions to a balanced dataset with a decreasing learning rate τ at the pre-defined epoch T 0 . DRS is based on the idea that training on imbalanced data can help the model learn the underlying patterns of the majority classes, which aids in the rebalancing phase. The imbalanced mini-batch is generated using the SampleMiniBatch(D, N B ) function, and the balanced mini-batch is obtained using the SampleBalancedMiniBatch(D, N B ) function with different sampling weights (line 3-6).
The training algorithm was designed to train the model using triplet adversarial samples. Therefore, the GenTAS algorithm was used to generate the synthetic samples (line 8). Subsequently, the loss of the model h θ is computed using both the mini-batch (B) and generated samples (A) (line 9). Given that the softmax layer can be used effectively as an approximation of a distance-based classifier in the embedding layer [46], [47], the model is updated accordingly (line 13). It is important to note that using triplet adversarial VOLUME 11, 2023 samples in an imbalanced dataset can enhance the performance of the model, even before the rebalancing phase. This is because the triplet adversarial samples not only regularize the model but also help mitigate class imbalance. Therefore, the GenTAS algorithm was invoked in all epochs, regardless of the pre-training epochs T 0 .

A. EXPERIMENTAL SETTINGS
To demonstrate the effectiveness of our new synthetic data generation method, extensive experiments were conducted on two commonly used benchmark datasets: CIFAR-10/100 [48] and ImageNet-LT [49]. The detailed training specifications, including modifications to the data and base models, are outlined below. The optimal hyperparameter in the proposed method, λ which controls the power of regularization, was selected using a validation set from a fixed set of candidate values of λ ∈ {0.1, 0.5, 1.0}. The values of η a and α were fixed at 0.1 and 1, respectively, during optimization for synthetic samples.

1) CIFAR-10/100-LT
The CIFAR-10/100 datasets, commonly used for image classification, have balanced class distributions. To create class imbalance, we applied the method described in a previous study [17]. Let LIR = N 1 /N K be the largest imbalance ratio. We considered LIR = 10 and LIR = 100 for both datasets. The CIFAR-10 dataset contains 10 classes, each with 5,000 instances. One class, designated as class '1', was selected to contain all the instances, while the other classes were randomly undersampled to create class imbalance. For LIR = 10, the undersampled class 'K = 10' has 500 instances, and for LIR = 100, it has 50 instances. To create not just class imbalance but also a long-tail distribution, we reduced the sizes of the intermediate classes (k = 2, . . . , 9) using N k = N ′ k µ k−1 , where N ′ k is the original size of the class 'k' and fraction ratio µ ∈ (0, 1). When LIR = 10, µ is determined by solving the equation 500 = 5000 × µ 9 . The CIFAR-100 dataset contains 100 classes, each with 500 instances. We created a class-imbalanced and long-tail distribution for this dataset by following the same scheme described above.
We followed the training procedure outlined in a previous study [17] for the datasets. Each model was trained using stochastic gradient descent with a momentum of 0.9. The initial learning rate was set to 0.1 (η h = 0.1), reduced by a factor of 10 at the 160th epoch (T 0 = 160), and the training was terminated at the 200th epoch (T = 200). Simple data augmentation techniques such as 4-pixel padding, random cropping, and horizontal flipping were applied. All the input images were normalized based on the training set. We used the ResNet-32 model with an input size of 32 × 32 and batch size of 128.

2) ImageNet-LT
The ImageNet-LT dataset was first introduced in [49] for long-tailed recognition. It contains 115,800 images across 1,000 classes, with a maximum of 1,280 images per class and minimum of five images per class, selected from the original ImageNet dataset [50] using an intended sampling process based on the Pareto distribution. In our experiments, training was performed using the ResNet-50 model as the backbone network, with the initial learning rate set at 0.1 (η h = 0.1) and reduced by a factor of 10 at the 60th epoch (T 0 = 60). Training was terminated at the 100th epoch (T = 100). The augmentation and normalization were the same as those for CIFAR-10/100.

3) BENCHMARKING METHODS
We compared the proposed method with various existing methods aimed at addressing the class-imbalance problem; these methods are grouped into three categories: instance resampling or reweighting methods, class-dependent loss methods, and synthetic instance generation methods. These categories were selected to evaluate the rebalancing effects of the methods through reweighting, resampling, or generation. The following are the baselines used for comparison.
1) Comparison to conventional training and various instance resampling or reweighting methods, such as: • Standard training using empirical risk minimization (ERM) with cross-entropy loss.
• Reweighting (RW), where each class is weighted by the inverse of the sample size.
• Focal loss, which focuses training on a sparse set of hard examples [19]. 2) Comparison to class-dependent loss methods that reweight instances based on class information, including: • LDAM-DRW, which regularizes minority classes more strongly than frequent classes through an optimal trade-off between per-class margins and DRS [17].
• Class-dependent temperatures (CDT), which reduces the decision values of minority class instances by temperature [20]. 3) Comparison to synthetic instance generation methods, such as: • SMOTE, which generates new samples by interpolating between existing minority samples [11].
• M2m, which generates synthetic minority samples using majority class samples and an independently trained classifier [21]. The performances were evaluated using the balanced accuracy (bACC) [8], defined as: where t k is the number of correctly classified samples for the class k. The experiments were repeated five times using stratified random partitioning, and the average bACC value for each method was reported.

1) CIFAR-10/100-LT
The results of our experiment on the CIFAR datasets are presented in Table 3. Our proposed method of generating synthetic data performed the best when compared to the resampling and reweighting methods, as well as the class-dependent loss weighting methods, with the exception of the ''CIFAR-10'' dataset with LIR = 10. The improvement in performance of our method over other methods is relatively minimal in the ''CIFAR-10'' dataset with LIR = 10 scenario, which is the least imbalanced class case. Thus, we believe that the rebalancing effect of our method may not be as significant in this setting as in others.

2) ImageNet-LT
The results of the experimental evaluation on the ImageNet-LT dataset are presented in Table 4, which shows the balanced accuracy. The ''All'' category indicates the balanced accuracy for all classes. As shown in the table, our proposed synthetic data generation method in combination with the DRS method outperformed the other methods, resulting in significant performance improvements. The classification results were divided into three parts, and the balanced accuracy was recalculated for the many-shot classes (with over 100 images), medium-shot classes (with over 20 but less than 100 images), and few-shot classes (with under 20 images). Upon examining the columns for ''Many'', ''Medium'', and ''Few'', it is observed that the proposed method achieves its best performance by enhancing the accuracy for the ''Medium'' and ''Few'' classes while not greatly sacrificing the accuracy for the ''Many'' classes. This implies that the proposed method effectively addresses the class imbalance problem.
It is noteworthy to compare our method with decoupling methods. These methods isolate representation learning from classifier learning by independently fine-tuning the classifier layers. Our approach outperforms decoupling methods without fine-tuning the classifier layer. This demonstrates that our proposed method, which optimizes synthetic samples while training the network, can achieve both strong representation and classifier performance without the need to fine-tune the classifier. This highlights the effectiveness of our approach in balancing class imbalance and training the classifier.

C. FURTHER ANALYSES
To thoroughly assess the proposed synthetic data generation method, we conducted an ablation study that focused on evaluating the effects of regularization, initial augmentation, and perturbation. This study was conducted using the CIFAR-10-LT dataset with LIR = 100 and the ResNet-32 model. Similar   to that in the ImageNet-LT dataset experiments, balanced accuracy was reported for the All, Many, Medium, and Fewshot classes.

1) DIFFERENT REGULARIZATIONS
The regularization term in Equation (9) makes the synthetic samples 'hard' by positioning them near the decision boundaries of the minority classes. To explore the effectiveness of the regularization, we conducted an investigation using four alternative options instead of the original regularization to generate adversarial samples. The alternatives are as follows: 1) No regularization: The generation of synthetic samples is not influenced by any regularization, meaning that only the triplet loss is minimized. VOLUME 11, 2023  2) Targeted adversarial example: The synthetic sample is guided to become a negative class, as determined by the current classifier, by using the regularization term where y − is the class of the sample x − . 3) Reverse triplet: The synthetic sample is guided to be closer to the negative class sample by using the opposite triplet loss, as expressed in

4) Class balance:
The synthetic sample is guided to become a majority class, while still maintaining a balance between the minority and majority class loss, through the use of where y − is the class of the sample x − j . 5) Original adversarial example: The synthetic sample is guided to become one of the other classes, as determined by the current classifier, through the use of L Adv (x + a ) = −L CE (h; x + a , y + ). As depicted in Table 5, the results show that the original adversarial example method outperforms other regularization methods, although the other regularization methods exhibit a relatively similar performance. This may be because the other methods explicitly focus on the label of a negative sample y − , leading to synthetic samples that excessively push the decision boundary towards the minority class, whereas the original adversarial example method does not provide explicit guidance for the synthetic samples.

2) DIFFERENT INITIAL AUGMENTATIONS
In Section III, we suggest the use of a random initial point for data generation optimization to avoid a trivial solution and produce more varied synthetic samples. Our baseline approach involves the use of a cutout augmentation strategy. Nevertheless, recent research has shown that the use of heavy augmentation techniques can result in a more robust feature representation. With this in mind, we investigated other potential starting points for optimization, including 1) No augmentation: Using the anchor point x + as is.
2) Gaussian random noise: Adding a small amount of Gaussian noise to the anchor point, denoted as x + + ϵ. 3) Random flip: Horizontally flipping the anchor point with a specified probability.

4)
Cutout in [43]: Using cutout augmentation on the anchor point by randomly selecting a rectangular region from x + and erasing its pixels. 5) RandAugment in [52]: Using random sampling within a reduced search space to automatically generate an augmented anchor point.
We evaluated the performance of these augmentation methods individually and in combination with our proposed method to examine the impact of having a triplet adversarial sample.
The experimental results are presented in Table 6. The trend indicates that incorporating an augmentation method leads to improved performance compared with that of no augmentation or only light augmentation. The proposed method without any augmentation (TAS+NoAug) and Gaussian noise (TAS+Noise) showed relatively low classification performances. When the augmentation method 'X' was combined with TAS (TAS+X), the performance was improved, compared to that of method used alone. Notably, augmentation method 4) is heavier than 3), and 5) is even heavier than 4). The performance improvement was enhanced when a heavier augmentation method was applied to the proposed method, and the best performance was observed with the augmentation method 5). These results suggest that combining our proposed synthetic data generation method with heavy augmentation techniques can effectively enhance the classification performance.

3) EFFECT OF GUIDED PERTURBATION
The proposed method generates a synthetic sample by updating the anchor point x + a with a 'guided perturbation' term ϵ * = sign(∇L TAS (x + a , x + i , x − i )). This term is effective because it characterizes the inter-class discrepancy while preserving the intrinsic intra-class characteristics by considering the decision boundary during sample generation. To verify the generalization capability of the proposed method, we visualized and compared the generated samples with those generated by adding Gaussian random noise, denoted by ϵ. The experiments were conducted on a long-tailed version of MNIST [53] using a modified LeNet-5 model, with the unit size of the penultimate layer constrained to three for better visualization. To ensure a fair comparison, the magnitude of the random noise was set approximately equal to that of the guided perturbation such that ||ϵ * || ≈ ||ϵ||. Visualizations are shown in Figure 3. The original samples are denoted by the outlined circles in Figures 3(a) and (b), while the synthetic samples are depicted with lighter saturation colors. A comparison of the representations generated by our method, as shown in Figure 3(a), and those generated by the Gaussian random noise, as shown in Figure 3(b), highlights that our method results in synthetic samples that display larger variability and class-specific directionality. This is evident from the distinct patterns observed in the guided perturbations generated using our method, as shown in Figure 3(c), when compared with the random noise injections in Figure 3(d). This indicates that the proposed method effectively captures the underlying characteristics of the minority classes and preserves them in the generated synthetic samples, leading to improved performance on imbalanced datasets.

V. CONCLUSION
This paper presents a novel method for generating synthetic data to address the class imbalance problem using a combination of triplet loss and adversarial sample generation. The proposed method leverages new learning objectives and resampling techniques to produce synthetic minority samples that preserve their inherent features and adequately represent the minority classes. The results indicate that by minimizing the triplet loss function based on the current embedding space of a neural network and incorporating synthetic minority samples into training, the AUC of the classifier can be increased. The proposed method presents a promising alternative to traditional resampling techniques because it generates synthetic samples that are more representative of the minority classes and are based on the current state of the model.
Since direct maximization of AUC through sampling is not feasible due to the disconnected nature of the distance-based classifier g and network f , our work faces the limitation in optimizing a network through g, as discussed in Section III-A. However, future research may investigate the idea of directly optimizing a network using a proxy function of g, which would be an interesting topic to look into further. To extend our research, it would be worthwhile to investigate the mutual link between DML losses, such as angular margin loss [54] and infoNCE loss [55], and AUC.
JAESUB YUN received the B.S. degree in systems management engineering from Sungkyunkwan University, South Korea, where he is currently pursuing the Ph.D. degree in industrial engineering. His research interests include the design of algorithms for deep neural network learning, learning from imbalanced data, and their applications to real-world problems.