Multi-Class Triplet Loss With Gaussian Noise for Adversarial Robustness

Deep Neural Networks (DNNs) classifiers performance degrades under adversarial attacks, such attacks are indistinguishably perturbed relative to the original data. Providing robustness to adversarial attacks is an important challenge in DNN training, which has led to extensive research. In this paper, we harden DNN classifiers under the adversarial attacks by regularizing their deep internal representation space with Multi-class Triplet regularization method. This method enables DNN classifier to learn a feature representation that detects similarities between adversarial and clean images and brings similar images close to their original class and pushes dissimilar images away from their false classes. This training process with our Multi-class Triplet regularization method in combination with Gaussian noise injection proves to be more robust in detecting adversarial attacks exceeding that of adversarial training on strong iterative attacks.


I. INTRODUCTION
Deep Neural Networks (DNNs) have made significant progress in cyber-security [1], face detection [2] and objects classification [3]. This success is driven by the emergence of big data as a result of high traffic volume from unsatisfied online users. DNNs require less statistical and feature engineering from experts in order to be implemented. The intricacy of the data can be extracted with higher and more abstract level presentation from these raw traffic data. However, it has been proven that, the performance of DNNs degrades under adversarial attacks [4]- [7], where input examples are slightly modified with human-imperceptible. This limitation makes the application of DNNs in the field of safety and reliability critical applications a great concern [4].
A large body of work have been developed on improving the adversarial robustness of DNNs classifiers, such as feature squeezing, denoising, and encoding [8]- [11]. These methods perform earlier pre-processing on the input image to remove any adversarial perturbations. Despite all of these innovations, Adversarial Training [5], [12], one of the earliest defenses, still remains among the most effective and popular strategies. In its simplest form, adversarial training augment the training procedure with adversarial inputs The associate editor coordinating the review of this manuscript and approving it for publication was Mostafa M. Fouda . produced by an adversarial attack thereby minimizes a loss function that measures performance of the model on both clean and adversarial data. Madry et al. [12] instantiated adversarial training using a strong iterative adversary and showed that their approach can train models which are highly robust against the strongest known adversarial attacks such C&W [4] and Project Gradient Decent (PGD) method [12] attacks. However, to train a network against strong attacks requires training on strongest adversarial examples such as the PGD or the Basic Iteration Method (BIM) [6]. Adversarial training on these strong attacks may be 10-100x more computational intensive. Furthermore, it is difficult to secure general robustness in this way, as there are many classes of adversarial examples that cause false classification, and model robustness to one class of adversarial examples does not bestow robustness to other classes [13].
A different approach to adversarial training on adversarial samples is to add randomization to the neural network [14], [15], making it harder for the attacker to evaluate gradients and to find the vulnerability of the network. Zhezhi et al. [16] added Gaussian noise to the weights and activation of the neural network and showed improvement in their model training process. Generation of Gaussian noise is less computational and its introduction in Deep learning training processes has proven to boost classifiers stability [13], [17].
Recent studies by [5], [7], [8], [18], [19], on the latent representation space of DNNs classifiers under strong adversarial attacks suggested that adversarial attacks cause a false classification as a result of the adversarial representations spreading across the false class distribution making it difficult to be distinguished. Motivated by these studies and following the framework of Adversarial Training with Gaussian noise injection, we propose to improve DNN classifiers robustness using metrics learning method. Our intuition is that by regularizing the DNNs classifiers representation space with Multi-class Triplet regularization term [20] will encourage the adversarial examples to approach their true classes and far way from their false classes hence improve robustness. Our main contribution in this paper is summarized as follows: • We propose an adversarial learning method (MCT) that is susceptible against Black box and White box attacks.
• The MCT method is a combination of Muti-class Triplet loss with Gaussian noise, that minimizes the inter-real adversarial distance and maximizes intra-class distance.
• Training DNN classifiers against adversarial attacks with MCT method requires no expensive iterative adversarial examples generation which makes it an advantage for large datasets. Furthermore, our MCT method requires no modification to the model architecture and thus can improve the robustness on most off-the-shelf deep neural networks without additional overhead during training.
• Evaluation of MCT on MNIST [21], CIFAR-10 [22] and CIC-IDS2018 [23] datasets show that the MCT method classifiers adversarial attacks more accurately compared to that of adversarial training on strong iterative attacks.
The rest of this paper is organized as follows: In Section II, we review the related works on adversarial attack detection and prevention. Our methodology is presented in Section III. The application and the experiments are presented in Section IV and we presents our conclusions in Section V.

II. RELATED WORK A. ADVERSARIAL ATTACKS
Deep learning models limitation to the adversarial attacks where first discovered by [24]. The work in [24] generated small perturbations on images for the image classification problem and fooled state of the-art neural networks with high probability. Goodfellow et al. [5] proposed the Fast Gradient Sign method (FGSM) and also proposed a defense mechanism by training Deep learning model on the FGSM adversarial examples. Other effective adversarial attacks includes the Project Gradient Decent (PGD) method [12], C&W [4], Basic Iteration Method (BIM) [6], Jacobian-based Saliency Map Attack (JSMA) [7], HopSkipJump [27], and DeepFool [26] which are proposed to fool deep neural network.

B. COUNTERMEASURE
Countermeasure for these adversarial examples has been researched on. Madry et al. [12], demonstrated successful defense by training the model on PGD generated attack which randomly initialize an adversarial examples with the allowed norm ball before running iterative attack. Kannan et al. [28] proposed Adversarial Logit Pairing (ALP). The ALP method matches the logits from clean image and it's corresponding adversarial image and provide an extra regularization term for better representation of the data. However, the loss function adopted in this method is not scalable to untarget adversarial attacks [29]. Metric learning has also been introduced to counter the treat of adversarial attacks [30]. These technique add an additional constraints to the deep learning model by applying the Triplet loss term [31], [32] on the latent representations of the adversarial examples to the original loss function. However, Triplet loss has shown to suffer from slow convergence and poor local optima [20]. In this work, however, we adopt the Multi-class Triplet loss that allows joint comparison among N − 1 negative classes, therefore, alleviates the limitations in the Triplet loss [20]. Contrarily to the works in [12], [29], [30] which train on expensive iterative attacks examples, a disadvantage for a large dataset, our model on the other hand trains on additive Gaussian noise.

III. METHODOLOGY
This section explains our Multi-class Triplet regularization method with Gaussian noise for adversarial training process. Our purposed method regularizes the DNN classifiers representation space with Multi-class Triplet loss function to learn a feature representation that detects adversarial and clean images similarity and bring these similar images close to their original class and push dissimilar images away from their false classes.

A. PROBLEM STATEMENT
We consider a classification task with data x ∈ [0, 1] W ×H of dimension W × H and labels y ∈ Z k with k classes sampled from a distribution D. We identify a model with a hypothesis f from a space F on input image x, the model outputs class f (x) ∈ R k . The loss function L(.) is used to train the model L((f (x), y); θ ), where θ is the network parameter to learn. For some target model f ∈ F and inputs (x, y true ) the adversarial goal is to find an adversarial examples (perturb image) x and x are ''close'' yet the model missclassified x .

B. MULTI-CLASS TRIPLET LOSS
Multi-class Triplet loss [20] is trained on one anchor x a , one positive sample are from different class and x a and x p are from the same class. This loss forces the network to generate an embedding where the distance between x a and {x i } N −1 i=1 is larger than the distance between x a and x p . The standard Multi-class Triplet loss is defined as: where N is the cardinality of the set of triplets used in the training process and (f (x i )f (x j )) represent the distance between x i and x j in the representation space. In this setting, we define the distance between x i and x j in a representation space as a cosine similarity distance defined as Given a clean image x, we generate adversarial image x by injecting uncorrelated Gaussian noise ε into the mini-batch x. We use uncorrelated Gaussian noise to simulate many types of adversarial images since adversarial examples are themselves considered to be noise [4]- [7], [19], [29], [33]. Specifically, for each mini-batch x, we generate new examples as shown in Eq. 3 where σ 2 k represents the variance of the Gaussian noise at pixel k. Under our setting, we adopt uniform sample σ k = σ from an unbiased sample neighborhood of clean image x based on the optimization parameter σ 2 . We train the model on these augmented images x and clean image x under the joint supervision of Softmax Cross-entropy loss (L SCE ) and Multi-class Triplet loss (L Np ). The adversarial training objective is presented as, here α controls the strength of the training stability, and λ is the weight for the feature norm decay term, to reduce the L 2 norm of the feature.  [20], [31], [32], in this settings, we choose the anchor to serve as an adversarial image since the anchor contains more information about the decision boundary between the ''true'' class and the ''false'' class [31].  classes. Figure 1 shows our modified Multi-class Triplet loss for adversarial training.

Algorithm 1: Adversarial Training With Multi-Class Triplet Loss
Input: training data x; training iteration T t ; learning rate lr; model parameter θ ; mini-batch K for each iteration X t k∈{1,...,K } . for t = 1: T t do Sample the anchor X a from X . Generate adversarial images X a from X a using Equ 3. Sample the positive X p and X of the same class. Sample the negative X i from X with strategy mentioned in Section III-C. Compute L all . Update θ . until training converged.

IV. EXPERIMENTS
We analyze the effect of the MCT method on already established datasets that have been employed in state-of-theart methods [4], [5], [8]- [14], [18], [24], [25], [28], [30]; CIFAR-10, MNIST and CIC-IDS2018 datasets. We compare the performance of MCT with these state-of-the-art methods that has demonstrated their applicability to the task of adversarial robustness in Deep Neural Networks such as Triplet Loss Adversarial training (TLA) mention in [30], Adversarial Logit Pairing (ALP) [28], Adversarial Training method (AT) proposed in [12]. We use MCT to denote the Multi-class Triplet Regularization method with Gaussian noise mentioned in Section III. We conduct our experiments using TensorFlow on a Windows PC with Intel Core i7-2600 and a 16GB memory. For MNIST and CIC-IDS2018, we use a network consisting of two convolutional layers with {32, 64} filters respectively, each followed by 2 × 2 max-pooling, BatchNormalization, Dropout with sizes {0.2, 0.3} and fully connected layer of We evaluate MCT, AT, ALP, and TLA under the White-box and Black-box untargeted attacks scenarios with different combinations of the attacking parameters: the perturbation and iteration steps. We consider L ∞ bounded and (L 0 , L 2 ) norm-bounded settings in [30] during the attacks generation.

1) WHITE-BOX ATTACK
We assume that the adversary has full access to the MCT classifier, as well as its parameter values, weights of the model, training method, architecture, and in some cases its training data as well. We evaluate MCT, AT, ALP, and TLA on the following White-box attacks: • The Projected Gradient Descent (PGD) attack proposed by [12].

2) BLACK-BOX ATTACK
We assume that the adversary has no or a limited knowledge of the MCT classifier (e.g. its training procedure and/or its architecture) but definitely does not know about the classifiers parameters. Compared to white-box attack methods that request the target neural network classifier to be differentiable, Black-box attacks are introduced to deal with non-differentiable systems or systems whose parameters and weights cannot be reached. We evaluate MCT, AT, ALP, and TLA on the following black-box attacks: • The Basic Iteration Method (BIM) attacks proposed by [6].

A. DATASETS
The MNIST dataset consist of handwritten digits which 60,000 images are training set and 10,000 images a test set. CIFAR-10 is a collection of 60,000 color images in 10 classes having 6,000 images per class. CIFAR-10 training images 50,000 images and a test images of 10,000 images. We scaled the pixel values of images in both datasets to be in the range of   [30]. Attack classification accuracy results for MCT are averaged over 1000 runs. High scores are indicated in bold. The MCT method improves the adversarial accuracy by up to 23.74%, which demonstrates that MCT generalizes better to unseen attacks than adversarially trained models on complex datasets.   Table 1 and 2, the performance of the models trained on adversarial samples are not that extremely higher than MCT trained on Gaussian noise. We further conduct the nearest neighbor analysis on the latent representations of MCT method on CIFAR-10 dataset. The results in Fig 2, illustrate the advantage of our learned representations for retrieving the nearest neighbor under PGD adversarial attacks.

C. PERFORMANCE UNDER WHITE-BOX ATTACK
We set the bound as L ∞ = 0.3 and L ∞ = 4 on MNIST and CIFAR-10 respectively. We apply 40, and 100 steps of PGD, 100 step of PGD with 20 times of random restart and 40 steps C&W on MNIST, and 7 and 20 steps of PGD, 20 step of PGD with 20 times of random restart and 30 steps C&W on CIFAR-10. Table 2 shows that MCT method improves the adversarial accuracy by up to 24.23%, which demonstrates that MCT generalizes better to White-box attacks than adversarially trained models on complex datasets such as CIFAR-10. Similar results can be seen in Table 1, the MCT method trained on Gaussian noise performed considerably well on less complex MNIST dataset with an average improvement of 0.42% compared to that of models trained on adversarial samples.

D. PERFORMANCE UNDER BLACK-BOX ATTACK
As suggested in [4], providing evidence of being robust against the black-box attacks is critical to claim reliable robustness. We first perform the transfer-based attacks using BIM with 7 steps and report the results in Table 1 and  Table 2. These results indicate that training with the MCT also leads to robustness under the black-box attacks especially on CIFAR-10 dataset with an average improvement of 23.48%.

E. PERFORMANCE ON DIFFERENT MODEL ARCHITECTURES
To demonstrate that MCT is adaptive to different model architectures, we conduct experiments using ResNet-50 [36] and DenseNet121 [35] architectures trained on CIFAR-10. We set L 0 = 0.02 bound for JSMA, L 2 = 32 norm bounded PGD and C&W attacks. We apply 20 steps of PGD and 30 steps of C&W and 2 steps for DeepFool attack. As shown in Table 3, compared with other robust training model, the MCT method improves the AUC scores by up to 20.15% and 19.81% using DenseNet121 and ResNet-50 architectures respectively.

F. EXPERIMENT ON CIC-IDS2018 DATASET
To further evaluate the novelty of our model comprehensively and deeply, we investigate the performance of the model under black and white box attacks on real traffic dataset; CIC-IDS2018 dataset. For white-box attacks, we evaluate MCT under C&W attack and under black-box attacks we adopt the decision-based method HopSkipJump attack. We generate datasets of adversarial samples by considering all illegitimate flows in the testing dataset and modify their features using the adversarial attacks discussed above. The obtained adversarial datasets are then used to evaluate MCT. The row-normalized confusion matrices shown in Fig. 3  (a), (b) and (c), correlates to the samples class, and the columns corresponds to the predicted label. A low value indicates that the model is finding it difficult to differentiate between two or more classes, whereas a high value with dark color indicates the level of confidence it has in characterizing that input. It can be seen that the dark color of the confusion matrix diagonal suggests MCT has the ability to classify each attack with minor confusion on clean dataset and under C&W attacks, however, MCT performed poorly  on the decision-based attack method this could be due to the uneven distribution of outliers in the dataset and MCT is sensitive to this unbalanced in the predictor class under decision-based attacks.

G. EFFECT OF THE σ SIZE ON DETECTION ACCURACY
In this work, the choice of σ plays an important role in achieving robustness because of its high sensitivity to perturbation. Therefore, we study how different values of σ affect the empirical accuracy. We train the model with different σ size and evaluate the robustness against FGSM and PGD attacks. As shown in Table 4, the adversarial robustness first increases and then decreases as σ approaches 0.1 on all attacks. Our results show that it is important to train MCT by choosing the proper σ size. For strong adversarial attacks requires an additive noise to lead to a better empirical results, however, this scenario is very difficult to implement practically, since attacks are unknown beforehand.

V. DISCUSSION AND CONCLUSION
Scholars have attempted to establish an efficient model for detecting adversarial attacks. Evidence suggests that more work needs to be done. In this paper, we proposed a Multi-class Triplet Learning with Gaussian noise injection for adversarial robustness (MCT). We only limited ourselves to already established and popular datasets (such as MNIST and CIFAR-10 and CIC-IDS2018 datasets) that has been used by the state of-the-art methods or in the literatures to evaluate Deep learning adversarial robustness techniques. Although we could have tested our model on many different datasets, it still doesn't change the fact that deep neural networks need to be robust against adversarial attacks.
This work was also evaluated on model architectures and untargeted, state-of-the-art adversarial attacks, including projected gradient descent (PGD) and C&W, shows that the combination of Gaussian noise and Multi-class Triplet regularization method led to high accurate adversarial classification compared to state-of-the-art detection and techniques. Another advantage of our approach is that, it requires no modification to the model architecture and thus can improve the robustness on most off-the-shelf deep neural networks without additional overhead during training. In our experiments, we found that smaller noise levels yield larger better robustness to both PGD and FGSM attacks. However, larger noise levels causes a very slight drop in robustness. This situation could be a limitation in real scenario since attacks to deep neural networks are unknown beforehand, therefore, finding the right noise level size to tune the model becomes an issue. In the future, we plan to enhance MCT method by combining it with other Deep learning adversarial robustness techniques such as label smoothing.
BENJAMIN APPIAH is currently pursuing the Ph.D. degree with the University of Electronic Science and Technology of China, Chengdu, China. His research interests include machine learning, deep learning, data mining, and big data analysis. KWABENA OWUSU-AGYEMANG received the M.Sc. degree from Coventry University. He is currently pursuing the Ph.D. degree with the School of Information and Software Engineering, University of Electronic Science and Technology of China. His research interests include machine learning, data mining, big data analysis, applied cryptography, blockchain technology, and medical image processing.
ZHIGUANG QIN (Member, IEEE) is currently a Full Professor with the School of Information and Software Engineering, University of Electronic Science and Technology of China (UESTC), where he is also the Director of the Key Laboratory of New Computer Application Technology and the UESTC-IBM Technology Center. His research interests include medical image processing, computer networking, information security, cryptography, information management, intelligent traffic, and electronic commerce, distribution, and middleware.
MUHAMMED AMIN ABDULLAH received the B.Sc. degree in computing with accounting from the University for Development Studies, Navrongo, in 2017. He is currently pursuing the master's degree with the School of Information and Software Engineering, University of Electronic Science and Technology of China. His research interests include deep learning, network security, and the Internet of Things. VOLUME 8, 2020