Type-I Generative Adversarial Attack

Deep neural networks are vulnerable to adversarial attacks either by examples with indistinguishable perturbations which produce incorrect predictions, or by examples with noticeable transformations that are still predicted as the original label. The latter case is known as the Type I attack which, however, has achieved limited attention in literature. We advocate that the vulnerability comes from the ambiguous distributions among different classes in the resultant feature space of the model, which is saying that the examples with different appearances may present similar features. Inspired by this, we propose a novel Type I attack method called generative adversarial attack (GAA). Specifically, GAA aims at exploiting the distribution mapping from the source domain of multiple classes to the target domain of a single class by using generative adversarial networks. A novel loss and a U-net architecture with latent modification are elaborated to ensure the stable transformation between the two domains. In this way, the generated adversarial examples have similar appearances with examples of the target domain, yet obtaining the original prediction by the model being attacked. Extensive experiments on multiple benchmarks demonstrate that the proposed method generates adversarial images that are more visually similar to the target images than the competitors, and the state-of-the-art performance is achieved.


INTRODUCTION
M OST of existing deep neural networks (DNNs) have proven to be vulnerable to attacks by adversarial examples [1], [2]. This poses a great threat to the working conditions of the DNN-based real applications, such as autonomous driving [3], [4], object intrusion detection systems [5], [6] and face recognition [7], [8]. Hence, the research on the robustness of DNNs, including adversarial attacking and adversarial training, has recently attracted a great attention from the community. As for the adversarial attack,which can be roughly classified into Type I attack and Type II attack, it is aims at finding a modified example that can fool a target model with or without a defense mechanism. Type II attack tries to disturb the input data with an imperceptible perturbation, which can cause a misclassification by the attacked model [9], [10]. Currently, most of existing attack methods belong to Type II. An alternative attack manner is called Type I attack [11], however, which is seldomly focused in literatures. This attack targets at making significant changes to the input but the result is predicted as the original label by the attacked model. This is very dangerous in some scenario. For example, there are many images and videos on social networks, and the cost of manual review is too high and inefficient. Hence, various platforms will use deep learning models to detect and filter illegal images and videos. However, the platform's detection system will fail if someone uses a Type I attack to convert ordinary videos and images into unlawful content, which results in severely negative consequences. Existing research [12] shows that similar to the Type II attack, the Type I attack can also be used to exploit the weakness of the deep models.
Mathematically, assume that we have an example x with the ground-truth label as y and a to-be-attacked model f that makes a correct prediction on x, i.e., y ¼ fðxÞ. In the Type II attack, the adversarial example x 0 is generated by adding an imperceptible perturbation to x, resulting in an incorrected prediction by f, i.e., y 6 ¼ fðx 0 Þ. Regarding the Type I attack, we use x to generate the adversarial example x 0 which, however, exhibits a totally different appearance to x and has a complete different label y 0 6 ¼ y. A successful attack makes the model f unconscious of the changes of the input category, i.e., y ¼ fðx 0 Þ. Hence, the Type I attack can be viewed as a task of input transformation which cannot be easily solved by, for example, adding noises to the input.
The above analyses inform us that the transformation from x to x 0 is indeed equivalent to the transformation between the distributions of two classes, that is, mapping the data distribution that x follows to the data distribution that x 0 follows while keeping the features by f consistent. The possibility of such a transformation relies on the ambiguity of the model f on the examples of these two classes.
Specifically, if the feature distributions of the two classes are overlapped with each other, the model will have low confidence on the examples located in the overlapped region and then, it is possible to transform the appearance of the example while keeping the feature unchanged, as shown in Fig. 1a. On the other hand, if the model well characterizes the decision boundaries of the two classes (for example, there is a large margin between the boundaries), it will be very difficult to make a transformation between them. We regard the Type I attack as exploiting the inter-class diversity property of the model, and as a counterpart, the Type II attack is exploiting the intra-class aggregation property of the model. Hence, during attack, we are reasonably motivated to employ generative functions to implement the distribution transformation.
Furthermore, consider that the model f may have enhanced robustness owing to robust learning or some defense mechanisms, which means that the model has low ambiguity on different classes. In this case, the transformation is limited because the overlapped region is suppressed, as shown in Fig. 1b. To be clear, by involving robust learning, the resultant model exhibits improved discriminating ability between difference classes. Hence, the overlap between classes is suppressed and it becomes more difficult to transform a clean example to an adversarial example. Here, we advocate that it is encouraged to involve some randomness in the transformation process such that the generated examples exhibit more details relating the target distribution (i.e., x 0 ). This is reasonable because when the overlapped region is limited, the information that is transformable is also constrained. To enrich the information, randomness is a good choice for modern generative models, which has already been verified in the task of face generation [13].
In this paper, we follow the above ideas and propose a novel Type I attack method, called generative adversarial attack (GAA). Specifically, we employ a generative adversarial network (GAN) architecture to learn the transformation from the original distribution to a pre-defined target distribution. To ensure the attack ability of the generated example, we employ the auxiliary classifier GAN (AC-GAN) loss to encourage the label of the generated data to be the same as the original data, which is expected by the attack objective. Simultaneously, a feature similarity constraint is imposed to keep feature consistency. Moreover, randomness is introduced in the generator to enrich the transformation and to diversify the generated examples. We conduct a series of examples on multiple public datasets, which demonstrate the superiority of the proposed model by comparing with the existing state-of-the-arts. Our contributions can be summarized as follows: 1) We propose a novel Type I attack method called generative adversarial attack (GAA). Specifically, we model the generation of adversarial examples as a discriminative inference process, while the welltrained generative model can be used to generate adversarial example very efficiently. This is different from previous works, where the adversarial example is generally generated via an optimization process which is time and resource-consuming. Notably, GAA is 60 times faster than the previous optimization-based method. 2) To ensure a smooth translation process from the original image to its counterpart adversarial example and to enrich the generated information in the target class, we develop an encoder-decoder-based generator G by involving a random noise in the hidden feature space. This allows to generate visually pleasing adversarial examples. At the same time, GAA improves the blackbox attack capability by imposing a soft constraint on the encoded features of intra classes. 3) Compared with the existing Type I attacks, the experiments indicate that the proposed model produces the adversarial examples with more naturalness and more features of the original image. The attack success rates on normal models and robust models are largely improved. The rest of this paper is organized as follows. In Section 2, we briefly analyze the exiting adversarial attack methods and the defense methods. Section 3 presents the problem definition. The techniques of the Type-I generative adversarial attack are given in Section 4. In Section 5, we discuss the experiments, including the basic settings, the compared methods, and the experimental results. Section 6 draws the conclusions of this paper.

RELATED WORK
Deep Neural Networks have a powerful ability of modeling complex problems and achieve superior performance, hence being widely used in real applications. Due to the unexplainability and the data bias, most of the deep models have been confirmed to be vulnerable to adversarial examples in computer vision tasks [14]. Advanced techniques for generating adversarial examples could produce a high success rates in attacking while remaining imperceptible perturbations in the examples [9], [15]. To perform attack in real world, the physical adversarial attack is also researched [15], [16], [17], especially in the topics of object detection [18], [19], [20], semantic segmentation [21], [22], [23], and natural language processing [24], [25], [26]. Here, we focus on the adversarial attack on image classification, while is briefly reviewed in this section.

Type II Attack
Assume a classifier fðxÞ : x 2 XÀ!y 2 Y , which outputs the label y as the prediction of an input x. The Type II attack aims to find a small perturbation d which is added to x, such that the generated input misleads the classifier output fðx þ dÞ 6 ¼ y. The perturbation d is usually constrained by L pðp¼1;2...1Þ norm, i.e., kdk p 4. Then, the constrained optimization problem can be written as: where J is, for example, the cross-entropy loss. Adversarial attack is an active area in recent years that has witnessed extensive publications of advanced techniques. As mentioned previously, the research community consistently express interests on the Type II attack and have published a series of advanced algorithms, including the gradient-based methods [9], [15], [27], [28],such a fast gradient sign method (FGSM) [29], FGSM is a simple adversarial attack method, and it attacks the picture by maximizing the loss function J.Subsequently, it evolved into iterative FGSM [15], and can be formulated as: where the J is the loss function, x and y represent the input image and the true label, u represents the network parameters, a is the step size, and signðÞ is the sign function. As a seminal work, momentum iterative FGSM (MI-FGSM) [28] is proposed,which intergrates the momentum term into the iterative process for attacks to ensure the noise adding direction more smooth: where m is the decay factor of the momentum term, and the Clip function clips the input values to a specified permissible range i.e ½x À ; x þ and where the d is the disturbance, D is the distance between the clean image and the adversarial example, f is the classification model, and c is the hyperparameter. And the localregion attack methods [31], [32], for example, One Pixel Attack [32], an extreme adversarial attack method, can attack the classification model by only changing the value of one pixel in the image.

Type I Attack
Unlike the Type II attack, the Type I attack tries to make a significant change to clean image x, but still fools the classifier to produce the original label. Mathematically, this process can be described as where f 1 is the classifier to be attacked while f 2 is the attacker, which could be an oracle classifier, e.g., a human eyes. However, The Type I attack has received very limited attention. For example, [11] proposed that the existing deep models were vulnerable to both Type I and Type II attacks. [33] explained the relationship between the two types of attacks, and provided a supervised variational auto-encoder (SVAE) model. Based on the original Variational Auto-encoder (VAE) [34], this method proposed a fake latent variable that followed the standard Gaussian distribution [35]. The true and fake latent variables were distinguished by using a discriminator such that the model was sensitive in generating normal examples and adversarial examples. Then, the fake latent variable helped the perturbation generation to be controllable. It can be expressed as: where the KL is Kullback-Leibler, qðzÞ is an arbitrary distribution in hidden space and qðzjxÞ=Nðmðx : u enc Þ; sðx : u enc ÞÞ is variance,and u enc represents the encoder. Both types of attack try to analyze the imperfect characterization of data distribution by the to-be-attacked model. The difference says that the Type II attack focuses on the improper intra-class aggregation and finds the examples that do not follow the distribution of a certain class, through different manners. Instead, the Type I attack is interested in the unpleasing inter-class diversity and generates the examples that may simultaneously follow the distribution of two classes, thus confusing the model. The proposed method is based on this understanding and tries to exploit the ambiguity of the model to different classes, resulting in a novel generative model based on the generative adversarial network. In this paper, we design a generative adversarial attack model (GAA), which is the supervised extension from the original Generative Adversarial Network (GAN). The generator tries to transform the clean image to its corresponding adversarial example by editing the hidden feature space based on the original example.
The generator try to transform the clean image to its corresponding adversarial example by editing the hidden feature space based on the original example.

Defend Against Adversarial Examples
The vulnerability of neural networks poses a serious security problem for applying deep neural networks in real applications. Recently, many methods have been proposed to defend against adversarial examples. [9], [36] proposed to inject adversarial examples into the training data to increase the network robustness. Tramer et al. [37] pointed out that such adversarially trained models are still vulnerable to new adversarial examples, and proposed an ensemble adversarial training scheme, which augmented the training data with the examples transferred from other models. [38], [39] applied random transformation to the model inputs at inference time to mitigate the adversarial effects. Dhillon et al. [40] pruned a random subset of activations according to their magnitude to enhance network robustness. Prakash et al. [41] proposed a framework which combined pixel deflection with soft wavelet denoising to defend against adversarial examples. [42], [43], [44] leveraged generative models to purify the adversarial images such that the examples could follow the distribution of the clean images. Bin Liang et al. [45] considered the perturbation to images as a kind of noise and introduced two classic image processing techniques, including scalar quantization and smoothing spatial filter, to reduce the attack effect of adversarial examples.
These defense methods are against the Type II attack, whereas no defense methods are proposed for the Type I attack. Hence, we will employ these Type II defense methods to test the performance of defending the Type I attack in Section 5.6.

GAN for Adversarial Attacks and Defenses
There are already many GAN-based methods for Type II attack. For example, Mangla et al. [46] proposed AdvGAN+ + which used the hidden layer vector in the classifier as the input of GAN to generate adversarial examples. This method contained the target model M, the feature extractor F, the generator G, and the discriminator D. The clean image was processed by F to obtain the feature vectors which were used as the prior information. The features and the noise vector z were cascaded and input to G to generate adversarial examples. Natural GAN [47] was an innovative method based on the WGAN framework, which focused on finding the hidden vector of the adversarial example in the lowdimensional hidden feature space such that the generated adversarial example was natural to human recognition. This method contained two stages, where the first stage established a correspondence between the sample space and the hidden feature space, while the second stage searched for the hidden representation of the expected adversarial examples. Liu et al. [48] proposed RobGAN which introduced adversarial examples in the training of GAN and strengthened the discriminating ability of the discriminator D. This method learnt the adversarial factors to improve the quality of the generated adversarial examples.
Regarding adversarial defense, there have already been many GAN-based methods. Jin et al. [49] considered that the imperceptible disturbance caused the misclassification problem and proposed an APE-GAN algorithm to eliminate the adversarial disturbance from the input image. The authors used WGAN to reconstruct the adversarial image being similar to the original image. Similar to APE-GAN, Samangouei et al. [50] proposed Defense-GAN which employed WGAN to reconstruct the adversarial examples for defense. The difference of these two methods on the training process were that APE-GAN input the clean images and the adversarial samples into the discriminator and the generator for training, while Defense-GAN used random noise instead of the adversarial examples.

PROBLEM DEFINITION
Given a clean image x with the ground-truth label y, our task is to synthesize an adversarial example x 0 with the oracle-determined label y t 6 ¼ y, while the attacked model f predicts x 0 as y ¼ fðx 0 Þ. Let x t denote the sample that has the same label y t as x 0 . The transformation function from x to x 0 is denoted as x 0 ¼ Gðx; uÞ, where u is the parameter of the transformation. Assume that the all the examples including ðx; yÞ, ðx 0 ; y t Þ, and ðx t ; y t Þ follow the distribution P o which is determined by an oracle classifier, e.g. human. Then, our objective is min u log ðP o ðx 0 jy; uÞÞ À log ðP o ðx 0 jy t ; uÞÞ s:t: fðxÞ ¼ fðx 0 Þ: The above problem informs that the generated adversarial example x 0 is different to those examples belonging to the class y, but is highly similar to the examples belong to y t . As expected, the examples of the two classes exhibit noticeable differences on their appearances. By keeping the prediction of f on x and x 0 consistent, a successful Type I attack is achieved. Based on the above formulation, the minimization is implemented by our proposed generative adversarial attack method which is detailed in the following.

GENERATIVE ADVERSARIAL ATTACK
In this section, we introduce the framework and the training details of the proposed GAA.

GAA Structure and Loss Function
The proposed generative adversarial attack (GAA) is composed of three important models, i.e., a generator network G, a discriminator network D, and a to-be-attacked model f which is termed as the function model. The whole architecture is illustrated in Fig. 2. The generator and the discriminator compose of a generative adversarial network, which optimizes the transformation from the original image x to the images in the target domain. During the transformation process, we note that the label of the generated example is important for generating class-related image details. Hence, we build up the training loss based on the AC-GAN loss [51]. Mathematically, the loss function is: $Pdataðx;zÞ ½log ð1 À DðGðyÞÞÞ þ L C ; (8) where the y comes from the original image x and Gaussian noise z, and x t is the target image, generator computes as x 0 ¼ GðyÞ, the discriminator classifies the example x t of the y t -th class as real and the generated example x 0 as fake, and is a hyper-parameter that balances the influence of the category loss L C : where the discriminator forces the label of the generated sample to be the same as the target example x t . Eq. (8) can be decomposed into: where needs to maximize D. For the true distribution x t , Dðx t Þ should be close to 1, and for the generated distribution x 0 , Dðx 0 Þ should be close to 0. G should be minimized, generating samples x 0 can fool D, and making Dðx 0 Þ close to 1. To simplify the transformation process and generate more vivid details, we propose an operation based on the residual learning strategy. As seen in Fig. 3, the generated example x 0 is computed as We need to convert from x to x 0 , x 0 and x have a huge gap, and x 0 is the same category as the target image x t , while still be misleading f classifies x category. The conversion process of this attack is fundamentally different from the Type II attack. At the same time, this is much more difficult than Type II attacks. The difference states that on one hand, we do not constrain the perturbation size within a predefined threshold and on the other hand, we restrict the resultant label of x 0 to be y. With this understanding, we could view that the Type I attack and the Type II attack are coupled attacks on different classes.
In the implementation of the generator G, we employ a U-net architecture composed of an encoder and a decoder. The encoder is fully convolutional while the decoder is a deconvolutional neural network. By simply using the encoder-decoder model, we encounter a problem that the synthesized images are noisy as show in the second column of Fig. 4, even though the noisy image could attack the model f successfully. This is perceptually unacceptable in real attack scenarios since the adversarial example can be easily detected by an oracle. While existing research [13] suggests that adding randomness could improve the vividness of the synthesized details, we propose to add a Gaussian random variable on the embedded code. Let G enc denote the encoder part of G and G dec denote the decoder part, the Gaussian variable z $ N ð0; 1Þ is added as GðxÞ ¼ G dec ðG enc ðxÞ þ zÞ: In this way, the generated images would exhibit more perceptible details, as illustrated in the last column of Fig. 4. Fig. 3. After using G enc to extract the image features, we will feature and Gaussian noise z in the RGB channel for feature concatenation and inputting to G dec .  The above optimized GAN guarantees to synthesize a vivid image similar to images of class y t . Now consider that another objective is to enforce the prediction of x 0 by f being y. A possible manner is to directly minimize L y ¼ Jðy; fðx 0 ÞÞ, where J denotes the loss function measuring the difference between y and fðx 0 Þ, for example the cross-entropy loss. But we empirically find that this loss cannot produce strong transferability of the adversarial examples, which may be caused by the hard constraint on labels. Alternatively, we know that when x and x 0 have the same label, their features extracted by f should be similar to each other. Then, it is natural to minimize the distance between the features, i.e., where f e is the feature extraction function in the model f, for example the output of the stage-2 in Resnet [52]. This loss imposes a soft constraint instead of a hard constraint between x and x 0 , which is demonstrated to be effective in experiments. By combining the above introduced losses, we obtain the final loss for training the whole model: where g is a hyper-parameter to balance the influence of L f .

Training Details
Here, we introduce a specific setting in our method. From the previous discussion, we know that the proposed model transforms the image x belong to class y to the image x t belonging to class y t . Hence, in the training process, we need to sample a pair ðx; x t Þ from two classes. Specifically, assume that a dataset has C classes in total. We randomly draw a sample x from the whole dataset, which has the ground-truth label y. Then, the target image x t is randomly selected from the class y t ¼ ðy þ 1Þ%C. The Gaussian vector z is randomly sampled at each time. Note that the function model f is fixed during training since it is the model to be attacked and is used to measure the distance between the features of x and x 0 . The other training settings of the proposed model follow most of existing GAN settings, which will be detailed in Section 5.1. The proposed model follows the convergence property of a typical GAN, which plays a Nash equilibrium problem. That is saying, the generator and the discriminator are optimized alternately until there is no incentive for both models to deviate from their states. As well known, training such a GAN model is not a stable process. Hence, to alleviate this issue, we adjust the learning rate dynamically when updating the parameters of G. Fig. 5 illustrates the curves of the training losses on different datasets. It can be seen that the models on MINST and CIFAR-10 converge quickly, being stable at around 1000 epochs and 2000 epochs, respectively. The models on ImageNet require more epochs for convergence. Fig. 6 displays the attack success rate on the verification set of MNIST, CIFAR-10, and ImageNet, in which similar convergence effects are exhibited.

Generation of Adversarial Examples
Given the well-trained generator G and an original image x, we generate the adversarial example as following. A random vector z is drawn from the standard Gaussian distribution. To perform attacking based on x, we manually identify its label y and then determine the target label according to the class pairs used in training, i.e., y t ¼ ðy þ 1Þ%C. The target image x t is randomly sampled from the data set of the y t -th class. Finally, the adversarial example is computed as: x 0 ¼ G dec ðG enc ðxÞ þ zÞ:

EXPERIMENTS
In this section, we conduct a series of experiments on multiple datasets to validate the effectiveness of the proposed method. The competitor for comparison is selected as the existing Type I attack method [33], which introduces two models including SVAE and StyleGAN. It is verified that the existing Type II attack defense method cannot withstand Type I attack.

Datasets
To examine the attack performance of the proposed method on different data, we involve four datasets, including MNIST 1 , CIFAR-10 2 , ImageNet ILSVRC 2016 validation set 3 , and CelebA 4 .

Attacked Models
The models to be attacked include FC which is composed of 5 fully connected layers, CFC which is composed of 3 convolutional layers and 2 fully connected layers, VGG16 5 , ResNet18, ResNet50 6 , ResNet101 7 , DenseNet121 8 , Incep-tionV3 9 , and EfficientNetB0 10 . On each dataset, we select  from the above models such that the model size matches with the size of the dataset, to avoid underfitting or overfitting. The models with public links are pretrained on Image-Net. Given a certain dataset, the model is trained from scratch or finetuned on the dataset before being attacked, except for using an ImageNet-pretrained model on the ImageNet dataset.

Metrics
To measure the success rate of Type I attack, we compute the recognition rate of each attacked model on the generated adversarial examples. In each case of the following comparisons, we randomly select 1000 images from the corresponding dataset as the original images and then compute the recognition rate. This process is repeated for 3 times in each comparison and the reported performance is an averaged value.

Implementations
In the proposed model, G enc is implemented as the stacked convolutional layers of VGG16 without the fully connected layers. The feature distance between the original image x and the adversarial example x 0 is computed based on f e . In different models, the selected layers for computing f e are listed in Table 2. Note that for all models, we use multiple layers to extract features such that the similarity between the two examples can be enhanced during training.
For the hyper-parameters, we set g ¼ 0:8 for MNIST and CIFAR-10, g ¼ 1 for ImageNet and CelebA. The whole model is trained by using the Adam algorithm [53]. The learning rate is set to 0.0002 for MNIST and CIFAR-10, 0.0001 for ImageNet and CelebA. The batch size is set to 512 for MNIST, 128 for CIFAR-10, and 32 for ImageNet and Cel-ebA. Training runs for 100,000 iterations on all datasets. All the experiments are conducted on a GPU server with one Intel Xeon E5 2620 v4, 128 GB RAM, and two NVIDIA RTX   2080 TI GPUs. The implementation is under CentOS 7 with Python 3.6 and Tensorflow-GPU 2.0.

Stability of G Structure
In this part, we explore the impact of the G structures on MNIST, CIFAR-10, and ImageNet. A layer in our implementation consists of the Conv, BN, and Maxpool operations. The number of layers in G enc and G dec are equal. We test the cases that G contains 8, 12, and 16 layers in MNIST and CIFAR10, which are denoted as G 1 , G 2 , and G 3 , respectively. The experimental results are shown in Table 4. On ImageNet, we use the layers of VGG13, VGG16, and VGG19 as G enc , which are denoted as G V 13 , G V 16 , and G V 19 , respectively, while the layer numbers in G dec are the same as G enc .
The experimental results are listed in Table 5. These results show that the best performance is obtained when the model complexity matches the data complexity. G 3 's attack success rate fluctuates greatly on the MINST. Others are relatively stable. This may be because G 3 is too large compared to MNIST, which makes it prone to overfitting during network training.

Results on MNIST
On this dataset, we employ FC, CFC, and ResNet18 as the attacked models, which are trained from scratch. The attack performance is listed in Table 3, we can observe that GAA outperforms SVAE by 0.5% and 0.4% on the CFC and ResNet18 model attack, from which it is seen that GAA produces a slightly higher accuracy than SVAE when attacking CFC and ResNet18, yielding lower influence on the model performance. Since the data of MNIST is relatively simple, the transformation from the original images to the adversarial examples is easy to be finished. We also plot the generated examples by GAA in different iterations of training, as illustrated in Fig. 7, which shows consistent appearances between the generated examples and the targets.

Results on CIFAR-10
In this dataset, CFC, VGG16, ResNet50, and DenseNet121 are selected as the attacked models. Here, we conduct the experiment of within-dataset attack, i.e., both the original images and the target images coming from CIFAR-10. Fig. 8 shows the visual result of the attack. The comparison results are listed in Table 6, which shows GAA is only 0.1% lower than SVAE when attacking ResNet50. While attacking other models, the attack success rate of GAA is 2% to 4% higher than SVAE, which indicates that GAA produces noticeable improvement compared with SVAE.

Results on ImageNet
In this experiment, we attack the models including ResNet50, InceptionV3, ResNet101, DenseNet121, and Effi-cientNetB0. The accuracy is computed as the top-1 performance and the results are reported in Table 7. We can see that GAA is significantly better than SVAE when attacking large-scale images. The attack success rate of GAA is 2%, 4.3%, 2.3%, and 2.5% higher than SVAE on ResNet50, ResNet101, DenseNet121 and EfficientNetB0, respectively.

Results on CelebA
The experiment on CelebA focuses on how the Type I attack affects the face recognition rate. We follow the settings of [33] and use FaceNet [54] trained on CelebA to perform face recognition. FaceNet verifies two images to be the same person if the face feature distance is smaller than a threshold (e.g. 1.06). To attack FaceNet, both SVAE and GAA try to change the gender of the character in an image but fool FaceNet to make the original recognition. Alternatively, StyleGAN and GAA can generate a face image without the gender constraint, where only is the identity required to be changed. We make comparisons on these two cases, and obtain the results listed in Table 8 which validates the superiority of GAA. Several generated examples are plotted in Fig. 9. It is interesting that the examples by GAA have lower feature distances to the original images than the competitors, and StyleGAN may produce non-face images even through the other image exhibits high-quality details.
While the above results validate the Type I attack performance, we also note that the adversarial examples generated by GAA can be used to perform Type II attack, i.e., using x 0 to attack f with respect to x t . Due to page limitation, more high-quality results can be found in the supplementary material where we quantify the perturbation, showing that the perturbation is imperceptible as required by Type II attack. Hence, it may be possible to optimize both Type I and Type II attacks simultaneously, which could be an interesting topic. The structures contain VGG13, VGG16, and VGG19 as G e nc, which are denoted as G V 13 , G V 16 , and G V 19 , respectively, while the layer numbers in G d ec are the same as G e nc.  The GAA and SVAE columns represent the attack's success rate, and the ORIGINAL column represents the accuracy of the clean image on models such as ResNet18.

Outlier Attack
We identify an example as an outlier to the attacked model f if it is selected from a dataset on which f is not optimized. Regarding this, it is challenging to perform the cross-dataset attack, which states that the original images come from CIFAR-10 while the target images are selected from the Comic Avatar dataset 11 . This helps us to test whether the attack is successful when using an image of different styles. The results are presented in Table 9, where we see much better accuracies of GAA than SVAE. Several generated adversarial examples are exhibited in Fig. 10, which illustrates that GAA produces clean images but SVAE generates noisy images. All those images are nevertheless labelled as the labels of the original images by the model ResNet50. This experiment demonstrates the attackability of the deep models by external data with different styles. To improve model robustness, it would be encouraged to consider the resistance to style variances in a robust learning process.

Transferability Analyses
In this part, we examine the influence of the loss functions (i.e., L y ¼ Jðy; fðx 0 ÞÞ and L f ¼ kf e ðxÞ À f e ðx 0 Þk 2 ) on the adversarial examples, which relates to Section 4.1. The baseline is selected as SVAE. The experiment is conducted on ImageNet, with the results shown in Table 10. GAA y indicates the model using L y , while GAA f denotes the model using L f . We generate the adversarial examples based on ResNet20, ResNet101, and InceptionV3, and use other models for testing the transferability. It can be seen from Table 10 that the adversarial examples generated by GAA f possess improved transferability over the others. In contrast, the adversarial examples generated by GAA y and SVAE produce limited success rates on attacking different black-box models. Similar results could be observed from Fig. 12. In addition, as shown in Fig. 11, we use Grad-CAM to observe the receptive fields that the model pays attention to when extracting example features. It can be seen that the receptive fields are extremely similar when extracting features from the original image and its corresponding adversarial example. However, the receptive fields of the adversarial example and target image are completely different, although they are very similar. The L f imposes the transferability on the feature space instead of the label space, which suggests that the feature similarity is a key factor in generating adversarial examples. In the following context, we use GAA to denote the specific GAA f unless otherwise noted.

Attacks Versus Defense Models
To explore whether the existing Type II adversarial training defense models have a defensive effect on the Type I adversarial examples generated by GAA, we employ three adversa-    The GAA and SVAE columns represent the attack's success rate, and the ORIGINAL column represents the accuracy of the clean image on different models. 11. https://github.com/chenyuntc/pytorch-book rially trained networks in [37], including ens3-adv-Inception-v3(Inc À v3 ens3 ), ens4-adv-Inception-v3 (Inc À v3 ens4 ), and ens-adv-Inception-ResNet-v2(IncRes À v2 ens ). Note that the experiment is under a black-box setting. As shown in Table 11, we notice that these defense models for Type II attacks are difficult to resist Type I attacks.
Inspired by Cihang Xie [55], we expect to improve the robustness of the model by using the Type I adversarial examples during training. At this regard, the adversarial examples generated by SVAE and GAA are used for adversarial training of ResNet50, VGG16, and DenseNet121 on CIFAR-10 and ImageNet. The resultant models are denoted as ResNet50 R , VGG16 R , and DenseNet121 R , respectively. The training details are as follows, the initial value of the learning rate is 0.02. We set learning rate decay to 0.0001 as the standard scheme in Keras, the batch-size is 256 and 64 respectively, and the label of the adversarial examples x 0 will be set to the corresponding target class x t . The whole training process stops when it is close to overfitting. The training accuracy is shown in Table 12. We change the target  The GAA and SVAE columns represent the attack's success rate, and the ORIGINAL column represents the accuracy of the clean image in models such as VGG16.    Table 14.
Through the above experiments, we conclude that it is not feasible to improve the model robustness against Type I attack through adversarial training. The reason is that the Type II attack limits the perturbation, whereas the Type I attack tries to maximize the perturbation. Hence, models are challenging to fit adversarial examples and clean images. By definition, the Type I attack could generate examples that poison the training process, yielding problematic models, whereas the Type II attack could be used to improve the model robustness. Fig. 12. Illustration of the transferability. We use the last convolutional layer of VGG16 and ResNet50 to detect the feature similarity between the original image (x) and the adversarial example (x 0 ) generated by GAA. x f represents the deep features of x. x 0 f represents the deep features of x 0 . The pairwise_distance is the average of the pixel-wise Euclidean distance, and the cosine_similarity is the average of the cosine similarity.

Efficiency Analyses
We note that the proposed GAA requires only a single forward pass of the generator when making an adversarial example, whereas SVAE conducts an iterative optimization process. Here, we compare the time-consuming generation of adversarial examples on the same datasets (MNIST, CIFAR-10 and ImageNet) with the identical experimental platform, and calculate the time from the first to the last adversarial example generation. A comparison on the time cost by SVAE and GAA is listed in Table 15, which verifies that GAA has higher efficiency than SVAE, especially when the images have large sizes, e.g., in ImageNet.

CONCLUSION
The Type I attack views the ambiguity on different classes as the weakness of a deep model. Adversarial examples can be generated via transformation from the original images to those located in the ambiguity region. This motivates us to propose a novel Type I attack method, called generative adversarial attack. Based on AC-GAN, we develop a framework that employs the to-be-attacked model to constrain the learning process of the generator. A specialised architecture of the generator is designed by involving randomness. Extensive experiments demonstrate the effectiveness of the proposed method and furthermore, the efficiency is much higher than the existing optimization-based methods. Notably, the defense experiments tell us that the adversarial examples produced by GAA successfully deceive the existing defense models including the Type II defense methods. Moreover, adversarial training with the Type I adversarial examples is not a feasible way to improve the model robustness. Generally, the Type I adversarial attack could be used to poison the model training process, resulting in weak deep models, while the Type II attack is helpful for enhancing the models.   Tongliang Liu (Senior Member, IEEE) is currently a lecturer with the School of Computer Science and the faculty of engineering, and a core member with the UBTECH Sydney AI Centre, The University of Sydney. His research interests include machine learning, computer vision, and data mining. He has authored and coauthored more than 60 research papers including IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Neural Networks and Learning Systems, the IEEE Transactions on Image Processing, ICML, NeurIPS, AAAI, IJCAI, CVPR, ECCV, KDD, and ICME, with best paper awards, e.g., the 2019 ICME Best Paper Award. He is a recipient of DECRA from Australian Research Council.
Chao Yi (Member, IEEE) received the MS and PhD degrees in computer science and technology from the Yunnan University in 2003 and 2009. He is currently a associate professor with the National Piolet School of Software,Yunnan University, China. His current research interests include artificial intelligence and Big Data computing.
Xin Jin (Member, IEEE) received the BS degree in electronics and information engineering from Henan Normal University, Xinxiang, China, in 2013, and the PhD degree in communication and information systems from Yunnan University, Kunming, China, in 2018. He is currently a associate professor with the School of Software, Yunnan University. His current research interests include pulse coupled neuralnetwork theory and its applications, image processing, optimization algorithms, and bioinformatics.
Renyang Liu received the BE degree in computer science from the Northwest normal University in 2017, He is currently working toward the PhD degree with the School of Information Science and Engineering, Yunnan University, Kunming, China. His current research interest includes deep learning, adversarial attack and generative models.
Wei Zhou (Member, IEEE) received the PhD degree from the University of Chinese Academy of Sciences. He is currently a full professor with the Software School, Yunnan University. His current research interests include the distributed data intensive computing and bio-informatics. He is currently a fellow of the China Communications Society, a member of the Yunnan Communications Institute, and a member of the Bioinformatics Group of the Chinese Computer Society. He won the Wu Daguan Outstanding Teacher Award of Yunnan University in 2016, and was selected into the Youth Talent Program of Yunnan University in 2017. Hosted a number of National Natural Science Foundation projects.