Evaluation of Model Quantization Method on Vitis-AI for Mitigating Adversarial Examples

Adversarial examples (AEs) are typical model evasion attacks and security threats in deep neural networks (DNNs). One of the countermeasures is adversarial training (AT), and it trains DNNs by using a training dataset containing AEs to achieve robustness against AEs. On the other hand, the robustness obtained by AT greatly decreases when its parameters are quantized from a 32-bit float into an 8-bit integer to execute DNNs on edge devices with restricted hardware resources. Preliminary experiments in this study show that robustness is reduced by the fine-tuning process, in which a quantized model is trained with clean samples to reduce quantization errors. We propose quantization-aware adversarial training (QAAT) to address this problem, optimizing DNNs by conducting AT in quantization flow. In this study, we constructed a QAAT model using Vitis-AI provided by Xilinx. We implemented the QAAT model on the evaluation board ZCU104, equipped with Zynq UltraScale+, and demonstrate the robustness against AEs.


I. INTRODUCTION
Model evasion attacks, which cause mis-recognition in deep neural networks (DNNs), is a threat to the reliability of DNNs. The model evasion attacks generate adversarial examples (AEs) by adding small perturbations to inputs that are imperceptible to humans, causing DNNs to make wrong predictions of AEs. In image classification tasks, an attacker perturbs pixel values in an input image, inducing misclassification. The first attack was reported by Szegedy et al. in 2014 [1]. This attack generates AEs by using an approximate calculation with a box-constrained L-BFGS. A typical attack is the fast gradient sign method (FGSM) reported by Goodfellow et al. [2]. This attack generates AEs by adding a constant value ϵ to each pixel only once in the sign direction of the gradient. An extended attack is the projected gradient descent attack, which iteratively conducts FGSMs with small ϵ by projecting onto a specified region [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Mercaldo .
There are several methods to avoid AE threats, such as detecting AEs [4], [5], [6], [7], [8] or pre-processing to remove perturbations [9], [10]. Adversarial training (AT) is one of these methods. AT is a training method to construct a robust DNN model against AEs by using AEs as training images for the DNN models. A typical example is the method presented by Madry et al., which trains DNNs with AEs generated by PGD [3].
In recent years, inference processing using deep learning has been increasingly used on edge devices due to the demands for real-time performance and privacy protection. DNN model parameters are typically trained in 32-bit floating-point numbers (hereinafter called 32-bit DNN models), but these are often quantized in 8-bit integer numbers (hereinafter called 8-bit DNN models) when DNNs are operated on edge devices. This is due to demands for reduced memory usage, faster inference processing, and power saving. Quantization-aware training (QAT) was introduced to prevent the degradation of model accuracy during quantization [11]. This technique trains the model by adjusting its parameters so that it can perform well under quantization. However, it has been reported that combining AT with QAT results in an even worse performance than implementing AT only [12].
In this paper, we propose a training method for DNNs that maintains the robustness of the AT models even after quantization. We use QAT as the quantization method for DNNs. We show that it is possible to construct DNNs that are robust to AEs even after quantization by using both QAT and AT during training. We implemented the proposed method on Vitis-AI, an integrated development environment for rapid AI inference on Xilinx platforms 1 . In our experiments, we evaluate the robustness of the DNN models against AEs on FPGA.
The main contributions of this paper are as follows.
• We experimentally show that quantization of a 32-bit model trained with AT by QAT reduces its robustness to AEs. This scenario has not been evaluated in previous studies.
• We propose the quantization-aware adversarial training (QAAT) flow which introduced AT under QAT to mitigate reduced robustness against AEs.
• In fact, the model trained by the proposed method was implemented on the evaluation board ZCU104, on which the Zynq UltraScale+ is mounted. Our evaluation is conducted on the FPGA assuming edge devices. This paper is organized as follows. Section II introduces adversarial examples and defenses. Section III introduces methods and techniques for deploying a neural network into edge devices. Section IV explains the robustness evaluations of quantized DNN models with QAT after AT. Section V explains the proposed method and the robustness evaluations of the quantized DNN models with the proposed method. Section VI concludes our work.

II. ADVERSARIAL EXAMPLES AND DEFENCES A. ADVERSARIAL EXAMPLES
An overview of model evasion attacks with AEs in image classification tasks is shown in Figure 1. An input image in the figure belongs to the ''Airplane'' class, but the attacker attempts to misclassify it to a class other than ''Airplane''. The DNN model is assumed to be a trained model with sufficient accuracy to correctly classify the input image as ''Airplane''. In the phase of generating AEs, the attacker inputs the image to the DNN model and calculates the loss between the output probability and the correct label ''Airplane''. The attacker calculates the perturbation of the input image so that the loss increases. The attacker generally chooses a small perturbation that is imperceptible to humans and sufficient to cause misclassification. In the attack phase, the attacker adds the calculated perturbation to the input images as an AE. The DNN model classifies it into a class other than the ''Airplane'' class (the ''Ship'' class in Figure 1) by an input AE.
There are roughly two scenarios in model evasion attacks with AEs. White-box attacks, in which AEs are generated 1 https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html assuming that the internal information of the DNN model (such as weight, bias, and model architecture) is known to the attacker. Black-box attacks, in which AEs are generated assuming that the internal information of the DNN model is unknown to the attacker. We focus on white-box attacks in this paper.
Thereafter, section II-B introduces the basic calculation methods of AEs. Section II-C introduces the details of ATs which is a defense method used in this paper. Section II-D introduces other related works.

B. BASIC METHODS ON ADVERSARIAL EXAMPLES
The FGSM, a typical method in white-box attack scenarios, was proposed by Goodfellow et al. [2] as a fast method to generate AEs. The attacker generates AEs x adv by using Equation (1).
where ∇ x J (θ, x, y) denotes the gradient of loss J with respect to x of the correct label y when the clean image x is input into DNN model θ. The function sign calculates the sign of the input and the amount of perturbation is determined by ϵ. This equation implies adding a perturbation following the sign of the gradient to the clean (original) image. The perturbation increases the loss in classifying the image into the correct label. It indicates that it is difficult to classify the perturbation-added image into the correct label, then the image is AE. Projected gradient descent (PGD) attack was reported in 2018 by Madry et al. as a method to generate AEs with smaller perturbations. This attack method adjusts by iteratively adding small perturbations, rather than adding a certain amount of perturbations at once, as in FGSM. The calculation of perturbations using PGD is formulated in Equation (2).
where x 0 represents the original input (clean image). When t > 0, the x t is an AEs candidate, and it is optimized as x t + 1 incrementally. If x t (t > 0) causes misclassifying, it is AE. The parameter α indicates the amount of perturbation added at each step. The function x+S projects the input onto VOLUME 11, 2023 Algorithm 1 Adversarial Training Using PGD Attack [3] Require: Training dataset D containing images x i and labels y i , learning rate γ , number of epochs N ep , 32-bit DNN model parameter θ Ensure: AT model parameter θ AT 1: Initialize θ 2: for epoch = 1 to N ep do 3: for B ⊂ D do 4: Build B adv ∋ (x adv , y) by PGD from B 5: θ ← optimizer(θ, g θ , γ ) 7: end for 8: end for 9: θ AT ← θ 10: return θ AT the surface of the region x + S. Equation (2) is similar to the equation (1) in the FGSM. Superscripts t for iteratively adding small perturbations and function x+S for projection are added to the equation. The perturbation is optimized by iterative addition. The function x+S for projection restricts the amount of perturbation so that it does not exceed a certain amount. In other words, it searches for the perturbation, that maximizes the loss in classifying the image into the correct label, in the area of the projection.
where D denotes the training dataset consisting of clean images x i and labels y i , while E[·] denotes the expected value. A trainer computes perturbations for AEs ( x i ) to maximize the loss J (θ, x i , y i ). At the same time, they optimize the DNN model parameter θ to minimize the loss J (θ, x i , y i ). It indicates the model is trained by AT so that the model classifies AE candidates into the correct labels. After training, it supposes the model can classify AEs with the perturbation ϵ into the correct labels. In other words, the model obtains robustness against AEs by AT. A detailed procedure of AT with PGD is shown in Algorithm 1. Line 4 creates a minibatch B adv to optimize the model parameter θ. Where the mini-batch B adv consists of a pair of AEs calculated by PGD and correct labels. In line 5, the gradient g θ of the input B adv of model θ is computed. In line 6, θ is optimized using the calculated gradient g θ and learning rate γ .
Carlini et al. proposed an attack that broke the AE countermeasure called distillation defense [14]. This attack sets an objective function that considers the following conditions and solves the minimization problem; (1) make the distance between the clean image and the AE as small as possible.
(2) misclassify AEs into target classes to satisfy a pre-defined confidence level. Brendel et al. proposed boundary attack as an attack algorithm in a black box scenario [15]. In this attack, AEs are generated by starting from an image in the target class and moving closer to the boundary line of the original image. Eykholt et al. proposed a method to calculate physical perturbations enough to cause misclassification even when objects in physical space were captured by the image sensor [16]. They consider robustness against viewpoint, angle, and distance to calculate perturbations. Brown et al. proposed an adversarial patch, which is large enough to perceive by humans but can misclassify many images with a single patch [17].
There are a variety of countermeasure methods against AEs. These include AEs detection [4], [5], [6], [7], [8] or pre-processing to remove perturbations [9], [10]. However, these methods require additional models to achieve their objectives and are unsuitable for resource-limited edge devices. In this paper, we focused on AT. It does not require additional processes to defeat AEs because AT makes the model robust against AEs. AT is proposed by Madry et al. [3], and variants of it have also been proposed [18], [19], [20], [21], [22]. Cai et al. proposed curriculum adversarial training that gradually increases the strength of the perturbation used for learning [18]. This allows the model to learn AEs according to its capability, which mitigates the degradation of classification accuracy for natural images. Lee et al. proposed a training method using a sample mixup of adversarial vertex, in which perturbations are multiplied by a constant, and natural images [19]. Their method also aims to mitigate the degradation of classification accuracy for natural images. Ding et al. proposed max margin adversarial training (MMA) [20]. MMA is a method that calculates different losses for correctly and incorrectly classified samples. For correctly classified samples, the loss is minimized using the calculated AEs. On the other hand, for incorrectly classified samples, the loss is minimized using the natural images in order to encourage the model to classify the natural image correctly. Zhang et al. proposed AT focusing on decision boundaries [21]. In this method, the model is trained to minimize the KL divergence between the posterior probability for natural images and the posterior probability for AEs. Wu et al. proposed a method to maximize the margin of the feature space by inserting an SVM auxiliary classifier prior to the last layer of the DNN [22].

III. DEPLOYING NEURAL NETWORKS ON EDGE DEVICES A. MODEL QUANTIZATION
Edge devices have limited computing resources and require low power consumption. To achieve this, smaller models and an accelerator are required to execute them at high speed. When the parameters are quantized from 32-bit to 8-bit, the model size is reduced. The bit-width of the accelerator also becomes smaller, allowing more of them to be equipped in the same area. As a result, processing can be performed faster. Hereafter, we introduce the typical methods used to quantize a model.
Model quantization converts the width of the bit representations of the model parameters to a smaller one. The relationship between the floating-point number r before quantization and the quantized value Q are shown in Equation (6) using two parameters: the scale s (Equation (4)) and the zero point Z (Equation (5)).
where f max and f min are the maximum and minimum values before quantization, respectively, and Q max and Q min are the maximum and minimum values after quantization, respectively. Quantization-aware training (QAT) is a typical method for quantizing DNNs, in which DNNs are trained considering the quantization error. QAT is a method for training DNNs that uses the ''fake quantization'' technique to generate pseudo-quantization errors in a 32-bit state during training, subsequently training the DNNs to consider these errors. Fake quantization is a technique to represent 8-bit quantized values in a 32-bit state by conducting quantization and dequantization. This operation reduces the quantization error when the DNNs are actually quantized after training, and maintains the accuracy of the DNNs after quantization. The formulas for quantization and de-quantization are shown in Equations (7) and (8), respectively.
where s is the scale, Z is the zero point, ⌊·⌉ is the rounding operation to the nearest integer, r is the 32-bit value before quantization, Q is the 8-bit quantized value, and r new is the 32bit value after de-quantization. Figure 2 shows an overview of QAT, and the procedure is described in Algorithm 2. Note that the bias is used as a 32-bit integer according to [11]. for B ⊂ D do 3: θ Q ← ⌊θ/s + Z ⌉ 4: θ ← optimizer(θ new , g θ , γ ) 7: end for 8: end for 9: θ QAT ← θ 10: return θ QAT a quantization algorithm to maintain accuracy as much as possible. AI Compiler is a tool for converting 8-bit quantized DNN models by the AI Quantizer into data and instruction formats executable on the DPU. AI Quantizer reduces memory usage and can run on Xilinx's deep learning processing unit (DPU) IP 3 , allowing inference to be performed on FPGAs at low power consumption and at faster speeds.
The development flow using Vitis-AI in this paper is shown in Figure 3. A developer prepares a trained 32-bit model. Vitis-AI work is performed in a docker container deployed on the host PC. Firstly, an 8-bit model is constructed using AI Quantizer. This paper focuses on training 8-bit DNN models using QAT. QAT is a method for both quantizing DNN models and re-training them to reduce quantization errors. It is used to not only quantize 32-bit DNN models into 8-bit DNN models but also to train DNN models from an untrained state (parameters initialized with random numbers). Secondly, the AI Compiler compiles the quantized 8-bit DNN model, which means converting it into a format (.xmodel file) that the DPU IP on the FPGA can execute. Finally, the images for evaluation and the compiled 8-bit DNN model (.xmodel file) are transferred to the FPGA board, and the inference evaluation is executed on the FPGA board. In the case of robustness evaluations against AEs in this paper, AEs generated in advance on the host PC were transferred to the FPGA board, and input into the model.

IV. ROBUSTNESS EVALUATION OF QUANTIZED DNN MODELS WITH QAT AFTER AT A. OVERVIEW
We assume a scenario that the DNN model developer and FPGA application developer are different. In this case, the model developer trains a DNN model and sends it to the  application developer. The application developer applies QAT to the model and combines the quantized model with the FPGA application. In the same way, the application developer naturally applies QAT to the model even if the model developer is interested in the robustness against AEs and trains the model by AT. According to the above scenario, this section focuses on a model that is trained by QAT after AT; hereafter, it is described as AT+QAT. The algorithm for AT+QAT is shown in Algorithm 3. Note that AT is performed on a 32-bit DNN model. However, this method uses clean images for training during QAT, which may reduce the robustness to AEs. In this section, we evaluate the robustness of AT+QAT.

B. EXPERIMENTAL SETUP
We focus on inference for image classification using neural networks on edge device in which 8bit model parameters are used for DNN, because there is a restricton of memory size and power. When performing inference on edge devices, we believe that it is common to handle small tasks in terms of HW resources and power consumption. Therefore, we chose two datasets, MNIST [23] and CIFAR-10 [24]. These two datasets have been handled for method validation in various AT studies. The classification of MNIST, a handwritten digit dataset, and CIFAR-10, a low-resolution color image dataset, are used for evaluation. MNIST consists of 70,000 gray-scale images with 28 × 28 and corresponding labels, of which 60,000 are for training and 10,000 are for validation and testing. CIFAR-10 consists of 60,000 color images with 32 × 32 × 3 corresponding labels, of which 50,000 are for training and 10,000 are for validation and testing. Samples of MNIST and CIFAR-10 are shown in Figures 4 and 5, respectively. The model architecture used in this paper is a multi-layer perceptron (MLP) with three layers for the MNIST classification task and a VGG-11 (without batch normalization)-based convolutional neural network (CNN) [25] for the CIFAR-10 classification task. The detailed model architecture is shown in Tables 1 and 2, respectively. Cross-entropy was used as the loss function for both models. For the MNIST 87204 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   classification model, Adam was used as the optimization algorithm, and a learning rate of 0.01 was set. For the CIFAR-10 classification model, SGD was used as the optimization algorithm, with an initial learning rate of 0.05, 0.01 after 30 epochs, 0.001 after 70 epochs and 0.0001 after 110 epochs. The MNIST classification model applied early stopping with monitored validation accuracy, and the CIFAR-10 classification model was trained for 150 epochs. The perturbation in AT were set so that the L ∞ norm 4 was 8/255.
Training the models and generating AEs for robustness evaluations were conducted in the Vitis-AI docker container on the host PC using Tensorflow 2.8. AEs for evaluation were moved onto the FPGA in the same way as the compiled models (.xmodel file). ZCU104 5 was used for the FPGA board, equipped with a Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC. A photograph of the ZCU104 is shown in Figure 6. The DPUCZDX8G, Xilinx's IP for DNN model inference, is implemented on the FPGA. The DPUCZDX8G adopts the B4096 architecture, and two cores implemented. Table 3 summarizes the HW resources of ZCU104 used in this experiment. The inference times using the MNIST and CIFAR-10 classification models executed on FPGA were 4005.04 fps and 619.27 fps, respectively. 4 The L ∞ means the maximum absolute value of all elements, i.e., the maximum value of the calculated perturbation. 5 https://xilinx.com/products/boards-and-kits/zcu104.html

C. EXPERIMENTAL RESULTS
We trained the following three types of DNN models for evaluation.
• QAT model, in which QAT was conducted using clean images. The model is not resistant to AEs.
• AT model (32-bit), trained AT with 32-bit precision. The model has robustness against AEs, but it is not quantized. The model was evaluated on a host PC.
• AT+QAT model, which is the above AT model fine-tuned by QAT. The model is trained with 8-bit precision. The model was evaluated on an FPGA board. We evaluated the model at 1, 30, and 150 epochs to confirm the robustness during training with QAT.  The result of the robustness evaluation of the MNIST and CIFAR-10 classification models against AEs are shown in Figures 7 and 8, respectively. Detailed scores are shown in Table 4 in the Appendix. The x-axis represents the amount of perturbation ϵ (i.e., L ∞ norm between the clean image and AE) when AEs are generated using PGD attacks for robustness evaluation, and the y-axis represents the accuracy when AEs are input into the model. When the amount of perturbation was 0 (ϵ = 0), the input images were clean images. The results of the MNIST classification model show that the AT model (32-bit) has a small decrease in accuracy as the amount of perturbations increase. In contrast, the QAT model without AT shows a significant decrease in accuracy as the amount of perturbations increase. The AT+QAT model shows approximately the same trend as the QAT model, confirming the significant reduction in robustness against AEs. In the results of the CIFAR-10 classification model, the relationships among the AT (32-bit), AT+QAT, and QAT models were similar. In the QAT at the first epoch, the AT+QAT model's accuracy of both clean images and AEs decreased due to quantization errors. As the number of epochs of QAT increases, the accuracy of clean images improves. On the other hand, the robustness against AEs decreases.
We assume that the reason why the robustness against AEs of the AT+QAT model decreases is due to fine-tuning with clean images. Figure 9 illustrates our consideration of the projected inputs and decision boundaries on the hyper-dimensional space by the DNN model during the AT+QAT procedure. In the AT phase, the DNN model is trained to draw decision boundaries, where the samples are given a margin of ϵ. This procedure is conventionally conducted with 32-bit precision. As a result, it is expected that an attacker would need more than ϵ perturbations to cause misclassification by AEs. Next, in the QAT phase, the model is fine-tuned using clean images. This training is conducted with 8-bit precision. Quantization errors slightly scramble the inputs and decision boundaries. The accuracy of the clean images (ϵ = 0) drops to 28.3%, as shown by the blue line (AT+QAT (Epoch1)) in Figure 8. The purpose of QAT is to undo this while considering the quantization error. However, there is no different from training with clean images unless the margin given by AT is considered in the process. That is, QAT does not consider the margin, and the robustness against AEs decreases from AT with 32-bit precision.

V. ROBUSTNESS EVALUATION OF QUANTIZED DNN MODELS WITH QAAT A. PROPOSED METHOD
We propose quantization-aware adversarial training (QAAT) to train 8-bit DNN models while maintaining robustness against AEs. QAAT combines AT and QAT to train a DNN model to obtain robustness against AEs even after quantization. Algorithm 4 shows the QAAT procedure, and a schematic diagram is shown in Figure 10. The method is based on the AT shown in Algorithm 1, but at the same time, it is trained to reduce quantization errors by using fake quantization introduced in Algorithm 2. The model parameters are initialized on line 1, and the model is trained between lines 2 and 10. Line 4 shows the quantization process, which is equivalent to line 3 of Algorithm 2. Line 5 shows the de-quantization process, which is equivalent to line 4 of The results using MNIST and CIFAR-10 are shown in Figures 11 and 12, respectively. Detailed scores are shown in Table 4 in the Appendix. As in the previous section, the xaxis represents the amount of perturbations ϵ (i.e., L ∞ norm between the clean image and AE) when AEs are generated using PGD attacks for robustness evaluation, and the y-axis represents the accuracy when AEs are input into the model. The evaluation results for the MNIST and CIFAR-10 classification models show that the QAAT model has the same robustness against AEs as the AT model. On the other hand, the results of Figure 12 show that the classification accuracy of the AT and QAAT models for clean images (ϵ = 0) is 13.3% and 12.9 % lower than that of QAT, respectively. The reason for this is that the AT algorithm (Algorithm 1) does not include training with clean images. In this paper, we used the basic algorithm for AT proposed by Madry et al., while several AT methods that improve robustness and accuracy have been proposed elsewhere [19], [21]. We will introduce these methods into QAAT to improve the resistance against AEs and the classification accuracy for clean images in the future.

VI. CONCLUSION
We considered methods for training adversarial examples (AEs)-resistant neural networks to run on edge devices such as FPGA. We assumed a scenario that the DNN model developer and FPGA application developer were different. In this case, the application developer applies QAT to the model even if the model developer is interested in the robustness against AEs and trains the model by AT. According to this scenario, we evaluated the robustness of models trained with AT and quantized with QAT against AEs. The experimental results showed that the robustness against AEs was reduced from the 32-bit model with AT. This result suggested that QAT could not maintain the robustness of the 32-bit AT model.
We proposed quantization-aware adversarial training (QAAT) to execute 8-bit DNN models on FPGA while maintaining robustness against AEs. In this study, we implemented QAAT in the Vitis-AI development environment and confirmed that an 8-bit parameter QAAT model could be executed in an FPGA. We evaluated the classification accuracy with clean inputs and robustness against AEs on MNIST and CIFAR-10 classification tasks. The QAT model, in which quantization-aware training was applied after 32-bit precision AT with clean images, greatly decreased the robustness against AEs. On the other hand, the QAAT model achieved comparable robustness to the AT model with 32-bit precision.
In our next work, we plan to create more lightweight, robust, and accurate DNN models. More lightweight DNN models can be achieved by applying model pruning techniques [26], [27] to make the weight parameters sparse and distillation techniques to transfer knowledge to smaller models [28]. Next, with DNN models, we will increase the robustness and clean accuracy by applying various extensions to AT [18], [20], [21]. We will investigate methods for building 8-bit DNN models that have resistance to AEs by combining these methods with QAAT in the future. We will also evaluate these methods (including QAAT) on large datasets such as ImageNet as future works. Table 4. TAKESHI FUJINO (Member, IEEE) was born in Osaka, Japan, in March 1962. He received the B.E., M.E., and Ph.D. degrees in electronic engineering from Kyoto University, Kyoto, Japan, in 1984, 1986, and 1994, respectively. He joined the LSI Research and Development Center, Mitsubishi Electric Corporation, in 1986. Since then, he has been engaged in the development of micro-fabrication processes, such as electron beam lithography and embedded DRAM circuit design. He has been a Professor with Ritsumeikan University, since 2003. His research interests include hardware security, such as side-channel attacks and physically unclonable functions. He is a member of IEICE and IPSJ.