EEJE: Two-Step Input Transformation for Robust DNN against Adversarial Examples

—Adversarial examples are human-imperceptible perturbations to inputs to machine learning models. While attacking machine learning models, adversarial examples cause the model to make a false positive or a false negative. So far, two representative defense architectures have shown a signiﬁcant effect: (1) model retraining architecture; and (2) input transformation architecture. However, previous defense methods belonging to these two architectures do not produce good outputs for every input, i.e., adversarial examples and legitimate inputs. Speciﬁcally, model retraining methods generate false negatives for unknown adversarial examples, and input transformation methods generate false positives for legitimate inputs. To produce good-enough outputs for every input, we propose and evaluate a new input transformation architecture based on two-step input transformation. To solve the limitations of the previous two defense methods, we intend to answer the following question: How to maintain the performance of Deep Neural Network (DNN) models for legitimate inputs while providing good robustness against various adversarial examples? From the evaluation results under various conditions, we show that the proposed two-step input transformation architecture provides good robustness to DNN models against state-of-the-art adversarial perturbations, while maintaining the high accuracy even for legitimate inputs.


INTRODUCTION
A S a representative machine learning model, Deep Neural Networks (DNNs) have shown good outputs for legitimate inputs in various real-world applications [1], [2]. However, many studies showed that DNNs produce false positives or false negatives for adversarial examples, which are human-imperceptible perturbations to inputs to machine learning models [3]- [5]. For example, recent studies showed that such adversarial examples cause false positives or false negatives of practical machine learning systems such as face recognition systems, object recognition systems, and perceptual ad-blocking system [6]- [9]. Also, Bengio, Hinton, and LeCun acknowledged adversarial examples as a representative shortcoming of deep learning at AAAI 2020 [10].
However, traditional techniques, such as dropout, for making machine learning models robust generally do not provide a practical defense against adversarial examples. Here, the term "robustness" is the ability of a machine learning model to cope with adversarial input during execution.
To provide robustness to DNN models against adversarial examples, we can consider proactive countermeasures, which make deep neural networks more robust be-  Fig. 1b, model retraining architecture changes the DNN model itself or the training process for the given DNN model [11], [12], [13]. The model retraining architecture is known to be effective for early adversarial perturbation calculation methods such as Fast Gradient Sign Method (FGSM) [14] and Basic Iterative Method (BIM) [6]. However, the performance of model retraining architecture is limited because retrained DNN models are highly dependent on adversarial perturbation calculation methods. Thus, it is impossible to identify unknown adversarial examples. Also, the model retraining architecture requires high computation and memory usage when retraining DNN models. On the other hand, we can consider a representative reactive countermeasure, commonly referred to as input transformation architecture. As shown in Fig. 1c, input transformation architecture transforms inputs to reduce perturbations of adversarial examples before feeding into DNN models [15], [16], [17]. Compared to the model retraining architecture, it can provide robustness to DNN models against adversarial examples with low computation and memory usage. Unfortunately, since input transformation architecture transforms even the legitimate input while removing perturbations of adversarial examples, it does not work well for the legitimate input. Failure in working well for the legitimate input means that it can cause significant damage to specific applications such as self-driving car, bio-medicine, and certification which are sensitive to small accuracy variation.
To solve the limitations of the previous two defense methods, we intend to answer the following question: How to maintain the performance of DNN models for legitimate inputs while providing good robustness against various adversarial examples?
Specifically, we propose a new type of input transformation architecture called two-step transformation architecture as shown in Fig. 1d. Different from the previous one-step input transformation architecture in Fig. 1c, the two-step input transformation architecture consists of two transformation steps: Conversion and Inversion. In Conversion step, input images are transformed into corrupted input images by using a conversion function, p(·), before construction of adversarial examples. In Inversion step, adversarial examples are transformed into inputs to DNN models by using the inverse conversion function, p −1 (·). Due to the inverse relationship between Conversion and Inversion, the proposed two-step input transformation architecture works well for the legitimate input while also providing good robustness against various adversarial examples.
To evaluate the effectiveness of two-step input transformation architecture, we also introduce a practical implementation, called EEJE. Here, the term 'EEJE' indicates a chinese phrase, which means to use a barbarian to control the barbarian. From the experimental results, we show that they work well for adversarial examples as well as for legitimate inputs. Compared to model retraining methods, the proposed EEJE method does not require retraining of existing models and does not produce false negatives for various adversarial examples. Also, compared to previous input transformation methods, the proposed EEJE method does not produce bad outputs(false positives) to legitimate inputs.
Main contributions of this paper can be summarized as follows: (1) We proposed a new type of input transformation architecture using on two-step input transformation to produce good-enough outputs for both legitimate inputs and adversarial examples; (2) As a practical way to implement the two-step input transformation architecture, we introduce new defense method called EEJE; (3) From analysis results using EEJE under state-of-the-art adversarial pertur-bations, we show that the two-step input transformation architecture provides better robustness than the model retraining architecture and the one-step input transformation architecture while maintaining the high accuracy even for legitimate inputs. Through such contributions, we present the necessity of various studies on the two-step input transformation architecture.
The rest of the paper is organized as follows. In section 2, we overview well-known adversarial perturbation calculation methods and defense methods. We show the proposed input transformation architecture and the operational details of the proposed EEJE method in section 3. In section 4, we show the influence of different adversarial perturbation calculation methods on EEJE. Finally, we summarize this paper in section 5.

PRELIMINARIES AND RELATED WORKS
In this section, after we overview the state-of-the-art perturbation calculation methods for generating adversarial examples, we introduce two well-known defense architectures against adversarial examples. We also introduce a practical attack model which exploits adversarial examples.

Adversarial Perturbation Calculation Methods
In this section, we summarize the characteristics of five wellknown adversarial perturbation calculation methods, which are commonly used as construction models for adversarial examples [14], [6], [18], [19], [20]. Equations for every adversarial perturbation calculation method are summarized in appendix for further reference.
• Fast Gradient Sign Method (FGSM): As a noniterative-based fast adversarial perturbation calculation method, FGSM was introduced by Goodfellow et al. [14]. To calculate adversarial perturbations, FGSM uses the sign of the gradient to increase loss of DNN models.  [6]. Unlike FGSM, which performs only one gradient update, BIM performs several gradient updates for the fine optimization and clips the pixels of each intermediate result.
• DeepFool: As an L 2 distance-based (Euclidean distance) untargeted attack, DeepFool performs an iterative linearization of the classifier to generate minimal adversarial perturbation [18]. To minimize the magnitude of adversarial perturbation, DeepFool finds the nearest decision boundary from an input X, and calculates perturbation which is closest to the boundary value with multiple iterations.
• C&W's Method: Carlini and Wagner [19] introduced three new perturbation calculation methods, which not only minimize the magnitude of perturbation but also have a higher attack success rate than other methods. Each C&W method is defined as L 0 , L ∞ and L 2 type based on the distance metric used to calculate the perturbation. In this paper, we consider only the L 2 type of C&W method (CW), which is most frequently mentioned in other works [21], [22].
• Jacobian Saliency Map Approach (JSMA): JSMA makes an adversarial perturbation based on the forward-derivative calculation [20]. JSMA is L 0 distance-based method and allows an adversary to compute adversarial saliency maps, which are used to identify the input features causing the most significant changes to the output.

Defense Architecture and Methods against Adversarial Examples
In this section, we overview well-known defense methods against adversarial examples based on two significant defense architectures, i.e., model retraining architecture and input transformation architecture.

Model retraining architecture and methods
To make DNN models more robust against adversarial examples, the model retraining architecture changes the current DNN model f (·) into the new DNN model f (·). It aims at generating good outputs to inputs to DNN models regardless of the existence of adversarial examples. When the adversarial perturbation calculation method h(·) and a legitimate input X are given, the objective function of the model retraining architecture can be expressed into: So far, as representative defense methods corresponding to f (·), Adversarial Training [11], [12], [23], Defensive Distillation [13] have shown a significant effect.
Since Adversarial Training trains DNN models by using both adversarial examples and legitimate inputs, the trained DNN models are robust against known adversarial examples. However, since Adversarial Training is highly dependent on known adversarial examples, they may require periodic retraining to generate good outputs for the new adversarial examples [24].
Defensive Distillation is a defense method that uses distillation training to reduce the bad output caused by adversarial input to DNN models. Note that the original distillation training method trains two DNNs with different architectures to reduce the dimension of DNNs. On the other hand, a Defensive Distillation trains two DNNs with the same architecture to improve robustness against adversarial examples. However, in a recent research, Defensive Distillation has been proved to be ineffective to defend C&W's method [25]. Even though these two model retraining methods are simple to implement, they require a lot of costs while retraining the current deployed DNN models.

Input transformation architecture and methods
Different from the model retraining architecture which retrains the existing DNN models, the input transformation architecture transforms inputs to make adversarial examples less threatened. As shown in Equation 2, the output of Note that since not only the adversarial example but also the legitimate input are transformed by i(·), input transformation architecture should be carefully selected to minimize the classification accuracy degradation for legitimate inputs as well as to maximize the classification accuracy improvement for adversarial examples. That is when the adversarial perturbation calculation method h(·), the input transformation methods i(·) and an input X are given, the objective function of the input transformation architecture can be expressed into: Note that most input transformation methods do not require to retrain DNN models and generally require lower computation cost than model retraining methods [15], [26]. However, while transforming every input into the corrupted ones, the classification accuracy of DNN models for the legitimate input is inevitably worse. So far, as representative defense methods corresponding to i(·), there exist Denoising [15], Feature Squeezing, Image Purify [16], [17]

Threat model
As a form of session hijacking, such as sidejacking, sniffing and so on, a man-in-the-middle(MITM) attack allows an adversary to eavesdrop the flow of plain or encrypted traffic between two parties. Especially, in the cloud service environment including the clients and the server, MITM allows adversary to impersonate two parties. After gaining access to data in the flow of traffic between two parties, adversary intercepts the data and exploits the real-time transfer of other data [27], [28]. As shown in Fig. 2b, let us consider a cloud-based deep learning environment where the client transmits data (e.g., image, text, and sound) to the server for prediction (e.g, regression and classification). Let us assume that parameter values used for training deep neural networks in the server are publicly known(: white-box). However, details of how to process input data at the client is not known publicly. Let us consider when adversary gains access to data in the flow of traffic after session hijacking. As a result, the normal session between two parties is broken. Even though the normal flow of traffic between two parties is intercepted and perturbed by the adversary, two parties believe that they are communicating each other securely. That is, after analyzing the adversarial examples instead of the normal input data, the server returns the abnormal prediction results to the client.
For example, a cloud-based deep learning environment for autonomous driving such as TuSimple using AWS has been applied to image recognition and object detection for intelligent transportation systems. To perform traffic safety and driver assistance in autonomous driving, input data to deep learning models, collected from multiple sensors such as onboard camera and LiDAR, are transmitted through the vehicular infrastructure. When adversary intercepts and tampers sensor data transmitted from the vehicle(: client) to the server, the server returns the abnormal prediction results on image recognition and object detection. Since such abnormal prediction directly influences on traffic safety and driver assistance, autonomous driving can confront with a major risk. Thus, defense methods which maintain the performance of deep learning models for legitimate inputs while providing good robustness against various adversarial examples are critical issues for performing practical applications.

PROPOSED ARCHITECTURE AND METHOD
In this section, we describe the proposed input transformation architecture against adversarial examples to obtain good outputs to every input. We also introduce a practical defense method that implements the two-step input transformation architecture.

Two-Step Input Transformation Architecture
Different from the previous input transformation architecture, the proposed input transformation architecture performs two-step input transformation before feeding inputs into DNN models.
As shown in Fig. 1d, the proposed input transformation architecture consists of two transformation steps, i.e., Conversion and Inversion. In the Conversion step, the service client side adds the large perturbation to the original input and thus, causes the adversary to add small perturbation to the original input. In the Inversion step, the service server side restores the original image for every input to DNN models by subtracting the perturbation added by Conversion step.
Such a method is motivated from the following observations. In Fig. 3, we show the input data distribution from CIFAR-10 test dataset [29] over the various L 0 and L 2 norm values, which are computed from the pixel value difference in the original input image and the adversarial example by DeepFool [18]. We used a 3×3 median smoothing for the one-step input transformation architecture and DeepFool conversion for the proposed input transformation architecture. Here, the L 0 value in x-axis of Fig. 3a represents the number of transformed pixels and the L 2 value in x-axis of Fig. 3b represents the normalization value of L 2 (Euclidean) distance. In Figs. 3a and 3b, we observe that while most of test data in the two-step input transformation architecture shows the lower L 0 and L 2 values than the one-step input transformation architecture. We note that even though not directly influencing on robustness of DNN models, the input data distribution over the different L 0 and L 2 norm values can affect the classification accuracy degradation of DNN models.
As shown in Fig. 4, for an input X, the DNN model f (·) and the adversarial perturbation calculation method h(·), the objective function of the two-step input transformation architecture can be expressed into: where, p(·) is a conversion function which transforms the input to make DNN models robust to adversarial examples, and p −1 (·) is an inverse function of p(·), which restores the input image transformed by p(·).
Note that different from the previous one-step input transformation architecture, the proposed two-step input transformation architecture shows the good accuracy even for legitimate inputs. As shown in Equation 3, if adversarial perturbations do not occur, DNN models in the proposed two-step transformation architecture return the same result as DNN models f (·) with no defense as shown in Fig. 1a. This is because Inversion, p −1 (·), is the inverse of Conversion, p(X), for a legitimate input X. In other words, the legitimate input X is completely restored due to the inverse relationship between Conversion and Inversion. That is, after the service client side transforms legitimate input X to p(X), the service server can restore legitimate input X from Let us also note that the proposed architecture may increase robustness of DNN models against adversarial examples after being combined with the other architectures. This is because the proposed new architecture offers the possibility of triple-defense by combining with model retraining architecture and input transformation architecture. In previous research, only dual-defense was possible through the combination of model retraining architecture and input transformation architecture [16], [17]. The combination of different types of defense methods is important for two reasons. First, the combination of defense methods can complement each other's weaknesses. Second, DNN models combined with different defense architectures can increase complexity of the perturbation calculation of the adversary, thereby lowering the success rate of the adversarial examples.

A Practical Two-Step Input Transformation Method: EEJE
As a practical two-step input transformation method, we introduce a new method, called EEJE. Here, the term 'EEJE' indicates a chinese phrase, which means to use a barbarian to control the barbarian. In EEJE, perturbations to key features in inputs to DNN models are added by both defender and attacker. That is, different from attacker who minimizes the magnitude of adversarial perturbations to key features in inputs to DNN models, defender makes the magnitude of adversarial perturbations large. Overall operation procedures of EEJE are as follows: (1) Defender adds a certain perturbation p(·) to an input X from Conversion; (2) Attacker adds an adversarial perturbation h(·) into the input transformed by Conversion; and (3) Defender adds an inverse perturbation of p −1 (·) from Inversion. Here, the inverse perturbation is additive inverse of the perturbation added in Conversion.
Let us compare outputs corresponding to inputs to DNN models in details according to the existence of EEJE. The a Using no defense method b Using the proposed EEJE method Fig. 5: Four input images randomly selected from CIFAR-10 dataset [29], where each image is transformed using different adversarial perturbation calculation methods: Using FGSM for 1st and 3rd row images; and using DeepFool for 2nd and 4th row images.
white-colored and the black-colored pixels in adversarial perturbation images in Fig. 5 indicate pixels which are the same as the original ones and different from the original ones, respectively. For adversarial examples located at the 1st row to the 4th row in Fig. 5a, DNN models return bad outputs because most pixels in adversarial perturbation images have the black color. However, for inversion images located at the 1st row and the 2nd row in Fig. 5b, DNN models return good outputs because most pixels in adversarial perturbation images have the white color. This is because most pixel values in the original input image are kept without change. Also, even though most pixels in adversarial perturbation images for the inversion images located at the 3rd row and the 4th row in Fig. 5b have the black color, DNN models show good outputs because key features which affect classification are restored by the inverse function p −1 (·). That is, the magnitude of perturbation added into key features remains in the range of identification into the good output.

EVALUATION RESULTS
To show how robust the proposed two-step input transformation architecture is against adversarial examples, we measured the performance of the proposed EEJE method under various conditions including different adversarial perturbations [6], [14], [18], [19], [20]. Specifically, we evaluated the performance of the proposed EEJE method to answer the following questions: We selected these questions based on the results of many representative works [12], [16], [22], [23] and the evaluation checklist of a paper for evaluating adversarial robustness [30]. Such questions encompass evaluations of the scalability(4.2.1, 4.2.2 and 4.2.6) and effectiveness(4.2.3 and 4.2.7) of the proposed two-step input transformation architecture. Also, we answer the questions(4.2.4 and 4.2.5) to verify the reliability of our evaluation results. To measure the performance under the practical usage scenario, we assume that defender knows target DNN model architecture, but does not know adversarial perturbation calculation methods added into inputs to DNN models.

Experimental Environment
When evaluating the performance of the proposed EEJE method on DNN models, we embedded the proposed EEJE method into a Convolution Neural Network(CNN), which is a class of DNN, to process input images effectively. To answer the first six questions listed above, we performed experiments using the CIFAR-10 image classification dataset [29]. CIFAR-10 image dataset consists of 50,000 training images and 10,000 testing images corresponding to 10 classes. For more accurate image classification, we used the entire testing images while measuring the classification accuracy of image classification. To answer the influence of various types of data on the performance of the proposed two-step input transformation architecture, we also per-formed experiments using MNIST dataset of handwritten digits [31].
To evaluate the influence of different types of DNN architectures on the proposed EEJE method, we measured the classification accuracy under different conversion methods for five ResNet architectures with different sizes and for two state-of-the-art DNN architectures, i.e., ResNet-110 and VGG16. Also, when considering the influence of various perturbation models for Conversion and adversarial example generation, we measured the classification accuracy under two categorizes of models: (1) the Gaussian Random Noise (GRN) for generating a random perturbation; and (2) the five well-known adversarial perturbation calculation methods, i.e., FGSM, BIM, DeepFool, C&W and JSMA, for considering practical use cases.
When measuring the influence of different perturbation calculation methods on the proposed EEJE method and the other defense methods, we set the values of parameters into: (1) 0.3 for the magnitude of perturbation ( ) in FGSM; (2) 10 and 0.3 for the number of iterations (N ) and , respectively, in BIM; (3) 50 and 0.02 for the maximum number of iterations and overshoot to prevent updates from vanishing, respectively, in DeepFool; and (4) 0 for the parameter to control the confidence value(κ) in C&W's method. These parameter values are set following the recommended configuration values from the cleverhans library [32] and some representative works [16], [22].
We implemented the classification models using TensorFlow-gpu version 1.10.1 and Python version 2.7.15, and performed adversarial perturbation calculations by using the cleverhans software library, which provides standardized reference implementations of adversarial examples [32]. For the efficient experiments, we measured the performance on the Ubuntu 18.04.1 LTS machine with kernel version 4.15.0-36-generic, 2.40GHz CPU clock(Intel Xeon CPU E5-2630 v3), GeForce GTX 1080 Ti, and 32GB memory.

How do different types of DNN architectures influence on the performance of the proposed two-step input transformation architecture?
To evaluate the performance of the proposed EEJE method under different types of DNN architectures, we measured the classification accuracy under different sizes of ResNet architectures without adversarial perturbation. The performance of five ResNet architectures, which have different numbers of layers, from ResNet-20 to ResNet-110 is measured. Evaluation results are listed at the 'None' column in Table. 1.
We observed that as the size of the ResNet architecture [33] increased, the classification accuracy increased from 90.93% for ResNet-20 to 92.26% for ResNet-110. These results show that as the number of layers in ResNet increases, ResNet becomes more robust against adversarial examples. From the 'None' column in Table. 1, we also observe how the performance of the proposed EEJE method varied under two different state-of-the-art DNN architectures, i.e., ResNet-110 and VGG16 [34]. The proposed two-step input transformation architecture showed the high accuracy by as much as 92.26% and 93.68% for ResNet-110 and VGG16, respectively. These observations imply that various DNN architectures combined with the proposed two-step input transformation architecture can show the high accuracy.

Result 1 :
The proposed two-step input transformation architecture effectively works even when being embedded into different types of DNN architectures.

How does the performance of the proposed two-step input transformation architecture vary under different adversarial perturbations?
To show the influence of different adversarial perturbations on the proposed two-step input transformation architecture, we measured the classification accuracy of the proposed EEJE method under the various combination of Conversion methods and adversarial perturbation calculation methods when the DNN model is given.
First, the proposed EEJE method showed better accuracy than no Conversion (marked into the term 'None' in Conversion method column) under various adversarial perturbation calculation methods. For example, while the classification accuracy with no conversion was 5.03% on average in ResNet-20 and 6.95% in VGG16, the classification accuracy of the proposed EEJE method measured into 49.56% in ResNet-20 and 76.12% in VGG16 on average. Among Conversion methods, GRN showed the lowest accuracy against adversarial examples. That is, the classification accuracy of EEJE using GRN Conversion method was ranged from 21.81% in VGG16 to 26.02% in ResNet-110 on average, while the classification accuracy of EEJE using the DeepFool Conversion method was ranged from 77.91% in VGG16 to 56.19% in ResNet-110 on average.
Second, we observe that the proposed EEJE method shows the higher accuracy for the state-of-the-art adversarial examples from the DeepFool and C&W methods than those from the FGSM and BIM methods. This is because the magnitudes of perturbations calculated by the FGSM and BIM methods is larger than those by the DeepFool and C&W methods. For example, while EEJE using FGSM Conversion method against FGSM and BIM perturbations in ResNet-110 showed the classification accuracy by as much as 11.74% and 17.52% on average respectively, EEJE using FGSM Conversion method against DeepFool and C&W perturbations in ResNet-110 showed the classification accuracy by as much as 92.26% and 90.44% on average respectively. Third, as observed from the last column in Table 1, EEJE using various Conversion methods showed no accuracy degradation even for legitimate inputs, i.e., inputs without adversarial perturbations. For example, given the ResNet-20 model, EEJE using different Conversion methods showed the same accuracy regardless of the existence of Conversion methods. This is because the legitimate inputs are restored by Inversion, i.e, the inverse function of Conversion.
Since the implementation of JSMA perturbation [20] needs a lot of memory and computation time, we measured the classification accuracy under the combination of the JSMA Conversion method with various perturbations in ResNet-20. As shown in Table 2, EEJE using JSMA Conversion method showed the classification accuracy by

Result 2 :
The proposed two-step input transformation architecture using various Conversion methods is robust against various adversarial perturbations while maintaining the classification accuracy even for legitimate inputs. Especially, EEJE using the DeepFool Conversion method shows the best robustness against various adversarial perturbations on average.

Does the proposed EEJE method show the better performance than the other state-of-the-art defense methods?
To compare the performance of the proposed EEJE method with the other state-of-the-art defense methods, we measured the classification accuracy of the Adversarial Training [12], [23], Feature Squeezing [16] and Guo et al.'s method [22] in ResNet-20. Here, Adversarial Training is a representative method of model retraining and Feature Squeez- ing and Guo et al.'s method are the state-of-the-art methods using one-step input transformation. For Adversarial Training, we configured F GSM -based Adversarial Training, where is set into 0.3 and Projected Gradient Descent (PGD)-based Adversarial Training, where is set into 0.1. For Feature Squeezing, we used five squeezing methods which are frequently used in many references [21], [35]. For Guo et al.'s method, we transformed the inputs using Total Variance Minimization (TVM), where weight is set into 0.03 and 5.0, and used transformed inputs at training and test time. For more accurate measurement of the classification accuracy, we also excluded data augmentation that could affect the performance of the Adversarial Training.
In Table 3 Also, Adversarial Training, Feature Squeezing and Guo et al.'s method showed worse robustness against various adversarial perturbations than EEJE on average. Even though Adversarial Training showed the higher robustness than the proposed EEJE method against the FGSM perturbation method, the classification accuracy of Adversarial Training against the BIM, DeepFool and C&W perturbations was lower than EEJE using the DeepFool Conversion method. Also, the classification accuracy of Feature Squeezing and Guo et al.'s method showed the worse robustness against most adversarial perturbations than the proposed EEJE method.

Result 3 :
The proposed EEJE method can be used as a standalone defense method against various adversarial perturbations. While changing the magnitude of perturbation generated from the FGSM perturbation, which intuitively shows classification accuracy changes when the magnitude of perturbation increases, we conducted the basic sanity test of EEJE as descried in [30]. According to basic sanity test in [30], the classification accuracy(attack rate) decreases(increases) when the magnitude of perturbation increases. Thus, we showed the classification accuracy of EEJE using the FGSM Conversion method while gradually varying the value of of FGSM perturbation by as much as 0.01 from 0.01 to 0.3.
As shown in Fig. 6, the classification accuracy of the proposed EEJE method decreased as the magnitude of adversarial perturbation increased. Especially, the classification accuracy of the proposed EEJE method rapidly decreased in the range from 0.05 to 0.10. When the value of was greater than 0.15, the classification accuracy of the proposed EEJE method no longer significantly decreased.  Table 4, the classification accuracy of proposed EEJE method decreased under both white-box and gray-box attacks. For example, while the stand-alone EEJE showed the classification accuracy by as much as 55.09%, EEJE under white-box attack and gray-box attack showed the classification accuracy by as much as 16.20% and 46.50% on average, respectively. For gray-box attack, we observed that key features in inputs are restored by Inversion method even though Conversion method is mitigated by the adversary. For example, even though the adversary mitigates Conversion method with a non-local means filter, EEJE showed the classification accuracy by as much as 90.87% against DeepFool perturbation. Also, the classification accuracy of EEJE under gray-box attack was similar regardless of the denoising techniques. For white-box attack, the classification accuracy of the proposed EEJE method significantly decreased by as much as 16.20%, but is still higher than non-defense architecture. Also, different from the case of gray-box attack, the proposed EEJE method shows the lower accuracy against DeepFool and C&W perturbations. For example, while EEJE against DeepFool and C&W perturbations under the gray-box attack showed the classification accuracy by as much as 90.16% and 71.78% on average respectively, EEJE against DeepFool and C&W perturbations under the white-box attack showed the classification accuracy by as much as 21.49% and 19.47% on average respectively.

Result 6 :
The proposed two-step input transformation architecture effectively works even the worst-case adversary. Especially, the proposed two-step input transformation architecture shows the good enough performance under the gray-box attack.

4.2.6
Is it efficient to combine the proposed two-step input transformation architecture with the other defense architectures?
To improve the classification accuracy degradation under the increase of , especially under equal to 0.3, we considered to combine the proposed EEJE method with Adversarial Training or Feature Squeezing. To show the effectiveness of the proposed EEJE method combined with other defense methods, we measured the classification accuracy of EEJE using the DeepFool Conversion method combined with FGSMbased Adversarial Training or median smoothing(2×2)-based Feature Squeezing.
In Table 5, we summarize the evaluation results. EEJE combined with Feature Squeezing, EEJE with Adversarial Training, and EEJE with Feature Squeezing and Adversarial Training showed better accuracy than using the stand-alone EEJE in ResNet-20. For example, while the stand-alone EEJE showed the classification accuracy by as much as 48.11% on average, EEJE combined with Adversarial Training showed the classification accuracy by as much as 61.35% on average. Specifically, the classification accuracy against FGSM perturbation was improved from 20.56% to 58.66%, and the classification accuracy against BIM perturbation was improved from 22.71% to 39.39%. Even when showing the slightly lower accuracy than using the stand-alone EEJE in ResNet-20 against DeepFool and C&W perturbations,  To show how effective the proposed two-step input transformation architecture is under various types of data, we performed additional experiments using MNIST dataset [31]. Unlike CIFAR-10 dataset, which is a color dataset, MNIST is a grayscale dataset of handwritten digits. MNIST consists of 60,000 training images and 10,000 testing images corresponding to 10 classes. As shown in Table 6, the proposed two-step input transformation architecture showed the high robustness even for MNIST dataset. For example, while the classification accuracy with no conversion was 6.71% on average, the classification accuracy of the proposed EEJE method measured into 76.65% on average. Especially, EEJE using DeepFool Conversion method showed the highest accuracy by as much as 97.04% against adversarial examples. We also observed that EEJE using various Conversion methods showed no accuracy degradation for legitimate inputs even when using MNIST dataset.
While EEJE method showed the low robustness for FGSM and BIM perturbations in CIFAR-10 dataset, EEJE method using DeepFool Conversion method and C&W Conversion method showed the high robustness for FGSM and BIM perturbations in MNIST dataset. For example, EEJE using the DeepFool Conversion method against FGSM and BIM perturbations showed the high classification

Result 7 :
The proposed two-step input transformation architecture shows the good-enough performance under various types of data.

Theoretical Analysis
Since the defender cannot know which perturbation calculation method the adversary is using, there exist uncertainties in the optimal selection of adversarial perturbation calculation methods and defense methods. Game theory is useful for analysis when there exist uncertainties in the strategies for each player. Thus, we evaluated the efficiency of the EEJE by analyzing the results based on adversary-defender game.
In adversary-defender game, the defender P d converts an input image X by selecting an adversarial perturbation calculation method according to a defender's strategy S j , and the adversary P a adds an adversarial perturbation into the converted image according to a attacker's strategy S i .
The game arises from the fact that each player does not know the opponent's strategy, although they do know each other's strategy space. That is, as a two-player game, the adversary-defender game consists of the defender P d and the adversary P a with each designated strategy space, i.e., S D and S A , where S j ∈ S D and S i ∈ S A . As a result of the adversary-defender game, P a receives a payoff p ij which indicates attack success rate, and P d receives a payoff 1 − p ij . Note that the adversary-defender game is a constant sum game since the sum of P d 's payoff and P a 's payoff does not change. Thus, the optimal strategy of P d can be obtained as follows: where S d j and S a i are mixed (random) strategies, which are defined according to probability distribution of pure strategies over the strategy spaces S D and S A , respectively. In Equation 4, P d 's optimal strategy guarantees a certain classification accuracy, regardless of P a 's strategy.
To evaluate the performance of EEJE, we analyzed two adversary-defender games, which have different sets of S D but the same sets of S A . In the first game, we consider S D = {GN R, F GSM, BIM, DeepF ool, C&W } and S A = {F GSM, BIM, DeepF ool, C&W } to find the optimal Conversion method for EEJE. For ResNet-20, Table 7 shows the payoff table of P a for strategies S i ∈ S A and S j ∈ S D . From the solution of Equation 4 for the first game, EEJE using DeepFool Conversion is observed into the optimal strategy of P d . As shown in the experimental analysis, such an observation indicates that EEJE using DeepFool Conversion is the best strategy for defender.
In the second game, we considered S D = {EEJE (DeepF ool), F eatureSqueezing (2 × 2 median smoothing), Adversarial T raining} and S A = {F GSM, BIM, DeepF ool, C&W } in order to show that EEJE is more efficient than the other defense methods.
Payoffs of P a according to different S j (∈ S D ) in ResNet-20 are shown in Table 8. From the solution of Equation 4 for the second game, EEJE is also selected into the optimal strategy of P d . This observation indicates that EEJE is better suited for defending adversarial examples than the other defense methods.

CONCLUSION
DNN models have shown impressive accuracy in various real-world application fields. However, as adversarial examples cause the model to make a false positive or a false negative, the study on how to maintain the performance of DNN models for legitimate inputs while providing good robustness against various adversarial examples has been emerged. So far, two types of defense architectures have shown a significant effect: (1) model retraining architecture; and (2) input transformation architecture. However, previous defense methods belonging to two architectures did not produce good outputs for adversarial examples as well as legitimate inputs. In this paper, to produce goodenough outputs for every input, we proposed a new type of input transformation architecture using on two-step input transformation. Also, as a practical implementation method, we introduced a practical defense method, called EEJE. From evaluation results under various experimental conditions, we showed that the proposed EEJE method provided robustness to DNN models against various state-of-the-art adversarial perturbations while maintaining the high accuracy even for legitimate inputs. Specifically, the classification accuracy of EEJE using DeepFool Conversion showed better performance than Adversarial Training or Feature Squeezing. Also, the proposed EEJE method combined with Adversarial Training or Feature Squeezing showed the better classification accuracy than the stand-alone usage of EEJE. From such evaluation results, we believe that the proposed two-step input transformation architecture can support robustness of DNN models against various adversarial perturbations.