Adversarial Attack Using Sparse Representation of Feature Maps

Deep neural networks can be fooled by small imperceptible perturbations called adversarial examples. Although these examples are carefully crafted, they involve two major concerns. In some cases, adversarial examples generated are much larger than minimal adversarial perturbations while in others the attack method involves an extensive number of iterations making it infeasible. Moreover, the sparse attacks are either too complex or are not sparse enough to achieve imperceptibility. Therefore, attacks designed should be fast and minimum in terms of $\ell _{2}$ -norm. In this research, we used a dictionary learning technique to generate sparse adversarial examples based on feature maps of target images. We present two novel algorithms to tune the dictionary learning process and feature map selection. The results on MNIST and Imagenet show our attack is better or competitive with the state-of-the-art methods. We also compared our method with sparse attacks recently introduced in literature. As a result, we have achieved comparable attack success rate when compared to the state-of-the-art with smaller $\ell _{2}$ -norm. We also tested the efficacy of our attack in the presence of defense mechanisms and none of the defenses were able to combat the effect of our proposed attack


I. INTRODUCTION
Deep Neural Networks have gained a lot of success and reached human-level performance in image recognition, detecting faces and objects, autonomous driving, reading addresses, solving captchas, and many more [1], [2]. The convolutional neural networks particularly have been useful since 2012, after giving promising results on Imagenet Large Scale Visual Recognition Challenge [3]. Since that time improvements from researchers are coming at a high pace in the form of a wide range of applications, more complex and deep architectures, and improving the overall classification process.
Despite the success of CNN on image recognition tasks, we still lack in complete understanding of these complex networks. Szegedy et al. [4] explored the unusual mistake that deep networks-based classifiers can make. They can be fooled by carefully computed images called adversarial images, The associate editor coordinating the review of this manuscript and approving it for publication was Md. Moinul Hossain .
revealing the unstable nature of these architectures. These images are indistinguishable from humans when compared to the original images.
This area has received a lot of interest from researchers and practitioners all over the world. One stream of research focuses on generating adversarial attacks with the lowest imperceptibility while the other focuses on creating defenses for such attacks. The researchers are still working on the precise inner workings and reasoning of deep networks. The attacks help understand the internal working of these architectures and thus motivate extensive research on designing robust classifiers. For this purpose, a lot of attacks have been introduced by different researchers in the literature.
Fast Gradient Sign Method [5] and C&W [6] are among the famous state-of-the-art attack methods. The current mainstream possesses certain problems: In terms of 2 -norm distortions, the C&W is argued to be the most effective attack but is slow since it requires thousands of iterations making it unsuitable for adversarial training too [7]. Researchers have argued perturbations estimated using the FGSM are much larger than minimal adversarial perturbations [8]. Adversarial examples generated by iterative attacks contain a certain amount of redundant noises that cannot be completely removed by simply increasing the number of iterations [9]. In light of the above mentioned problems, the attacks designed should be fast, and minimum in terms of 2 -norm. Exploiting the internal details of DNNs to generate effective imperceptible attack is particularly relevant and the subject of this paper. In this paper, we addressed the above-mentioned problems using ideas from Sparse Representation, Sparse Coding, and Dictionary Learning. Sparse representation is a linear internal representation of images using only a few active coefficients making it easy to interpret and manipulate content-based image indexing and retrieval. This field uses a dictionary and a sparse linear combination of the atoms in the dictionary to represent every input signal. The computation of the representation coefficients X also remains a non-trivial operation which is solved by the Orthogonal Matching Pursuit (OMP) which is greedy and has a fast running time. It has received great interest in machine learning, pattern recognition, signal processing [10], and has been successfully applied to image classification [11], image compression [12], reconstruction, noise reduction [13], face recognition [14] etc.
Recently, some of the nominal work that focuses on sparse attacks include Corner Search, Sparse Fool and Greedy Fool. All these methods are either suffer from high complexity that they cannot be extended to high-resolution images or perturbs redundant pixels therefore, not applicable to real scenarios. Current algorithms are highly complexed NP-hard problems.
The adversarial examples generated by these models usually consist of high-magnitude noise, concentrated over a small number of pixels. As a result, the adversarial images become quite perceptible and might even exceed the dynamic range of the image. We have tried to address the limitations of state-of-the-art methods mentioned above as well as recent sparse methods in this paper. The idea is to mimic the internal representation of target images. For this purpose, we designed an attack based on the feature maps from the first layer of convolutional neural networks. The perturbation designed using feature maps is added to the original image to attack the classifier as shown in Fig. 1. The Block Diagram of our approach shows a classifier being attacked by an adversary which perturbs the input x l by adding a feature map of the target image. We have optimized our perturbation vector using dictionary learning to have a linear, non-redundant, sparse noise added to the original input image. Feature maps get the important pixels of a respective image that are used for classification.
Experiments on MNIST and Imagenet datasets show the efficacy of the proposed approach in terms of decreased error and smaller 2 -norm even for a one-shot method. The proposed approach has been applied to both targeted and untargeted scenarios.
We have also tested our adversarial images against various defense methods. The attack is not defended by any of the defense strategies.
We summarize our contributions as follows: 1) We used ideas from dictionary learning and sparse coding to generate adversarial attacks. These are the first attacks based on dictionary learning proposed so far, to the best of our knowledge. 2) We have tried to overcome the limitations of both stateof-the-art methods as well as recent sparse attacks. 3) We have also presented novel algorithms to learn tuned dictionary based on feature maps. These ideas to tune dictionaries can be extended to other machine learning problems solved by dictionary learning 4) We presented a comprehensive experimental analysis to back our approach. A detailed investigation on tuning the dictionary to create an effective attack and then testing it against various defense methods to prove its efficacy. 5) We motivate a new area for designing adversarial attacks. The structure of the paper is as follows. The related literature is discussed in Section II. The detailed methodology is described in Section III followed by the experimental setting and details in Section IV. The discussion and analysis of results are tabulated in Section V. Finally, the experiments regarding defense strategies are explained and analyzed in Section VI and the paper is concluded in Section VII.

II. RELATED WORK
The vulnerability of neural networks towards adversarial examples was introduced by authors in [4]. The attacks can be targeted or un-targeted. In targeted attacks, the adversary forces the classifier into predicting a specified label, while any label in case of untargeted.
Among the state-of-the-art in [5] the authors' proposed Fast Gradient Sign Method which creates adversarial examples by computing the sign of the gradient of the loss of the input images. Later, iterative methods such as Deep Fool [15] and C&W attacks [6] were introduced. C&W attacks are considered very strong and effective against defensive distillation. Universal adversarial attacks were also proposed early on to fool all kinds of neural networks [16]. Recently, steganographic universal adversarial perturbations are introduced by [17]. They used a single secret image (computed in the transform domain) to fool deep architectures. Similarly, Yahya et al. generated an adversarial attack by selecting a targeted watermark, using a steganographic approach [18] VOLUME 10, 2022 As our work is related to internal representation as well as sparse attacks, the remaining matter of this section discusses the relationships to both areas. In [19] the authors used DNN logits as vectors to represent features and exploited them to create targeted universal attacks. These perturbations generalize well across different neural networks. They can be designed using the information of networks like training data, weights, etc called white-box attacks, or can be black-box in nature without knowing the architecture, learned weights, or training data. Moreover, these attacks are transferable among different architectures, and a lot of recent literature provides insights into the transferability of adversarial images [20], [21]. Moreover, authors are motivated to generate attacks that explain the deep representations of the model rather than fooling it [22]. Shi et. al [23], recently explained robustness through an adaptive iterative attack.
More recently, feature maps are used to generate transferable attacks. In [21] the source image is perturbed by reducing the distance between layer L activations of a source image and a target image in a white-box setting. The images are then fed into the black box model to test the transferability. Yucheng Shi et al. [9] argued in their research that there is no refinement mechanism to squeeze redundant noises in most of the attacks. Thus, their work is based on adding diversity by using gradient ascent and descent and then optimizing by filtering out noises of groups of similar pixels.
The other area of related work is the sparse representation and sparse attacks. Sparse attacks have been recently introduced in the field of adversarial attacks. Some of the early sparse attacks in adversarial setting includes JSMA [24], Sparse Fool [25], Corner Search [26] and Greedy Fool [27]. Sparse Fool [25] disrupts the geometrical properties of the images whereas, Corner Search [26] aims at minimizing the distance of the perturbation to the original image.
All these attacks have certain limitations: JSMA [24] is highly complex and is difficult to apply to high-resolution images. SparseFool [25] cannot perform a targeted attack and isn't sparse enough. In PGD [28] the number of pixels to be perturbed is defined beforehand therefore, it results in perturbing redundant pixels and might not be flexible for real scenarios. [29] These researches present interesting ideas but are addressing different problems. We have used ideas from Sparse Representation and Dictionary Learning. Sparse Representation has gained a lot of attention in computer vision applications. It has wide applications in image reconstruction, denoising, image inpainting, and many more. Aharon et al. [30] proposed the K-SVD method to learn the dictionary to achieve sparse representation. As compared to previous sparse attacks, our work is different as we aim to discover a dictionary that can optimize the perturbation vector to achieve performance in terms of smaller 2 -norm. The smaller 2 -norm helps achieve imperceptibility one of the major limitations of existing work as highlighted above. We used dictionary learning to add sparse perturbation in input images which change the minimum important pixels of the clean image, another limitation highlighted above. It has been proved through various experiments that Dictionary Learning was able to overcome the limitations of existing work. In this paper, we learned the dictionary to optimize the targeted noises. Our dictionary consists of perturbations instead of clean images. This is the first time dictionary learning has been used for this task. However, the sparse representation has only been used as a defense mechanism to reduce feature space against adversarial attacks, to the best of our knowledge [31], [32], [33].

III. METHODOLOGY
In this section, we describe in detail the methodology of the proposed approach to generate adversarial images. We have introduced a novel dictionary learning technique that is based on the feature maps of the image associated with the targeted label. These sparse representations of feature maps serve as noise to be added to the original image.
We first formulate the problem in Section III-A. The methodology for sparse adversarial image generation is then divided into three phases. The tuned dictionary learning algorithm based on feature maps is explained in Section III-B. The computation of the perturbation vector using feature maps of the target image is explained in Section III-C. Finally, in the third phase, we generate an adversarial image from sparse perturbation vectors by the one-shot method explained in Section III-D. The detailed methodology highlighting all phases is illustrated in Fig. 2.

A. PROBLEM FORMULATION
Let X be the image space and Y be the label space. f θ (.) : X → Y is a classifier parameterized by θ that assigns a label y to an input image x. Let x l denote the legitimate image to be perturbed by noise p. We aim to generate an adversarial example x a = x l + p which is imperceptible from x l but fools the classifier i.e.: d(.,.) is the distance e.g. 2 -norm of the difference between the clean and the adversarial sample, y l is the correct label of the legitimate input image, and is the perturbation scale which is often set to a very small value to ensure imperceptibility between x a and x l . In case of targeted attacks: where y t is the target label we want the classifier to predict. In this work, we consider both targeted and un-targeted labels. We aim to inject the noise p to make a strong attack by learning an adverse transformation T (.) such that adversarial detection-based defense methods should not be able to detect the attack. The noise is derived from the internal representation of the image x t associated with the targeted label y t . This enables the adversary to create an attack with a smaller norm as opposed to most of the attacks in the literature. Moreover, most detection-based defense mechanisms detect the attack based on the redundant noises left by the adversarial attacks. Therefore, it is desirable to make this transformation T (.) stronger by preserving the important information required while limiting the space of adversarial noise. It should further remove redundant noises and should be difficult to detect by defense mechanisms.

B. TUNED DICTIONARY LEARNING ALGORITHM
The operator T (p) transforms the perturbation vector derived from feature maps in close proximity to the local neighborhood of the image by linear projection. Let p be the perturbation vector, x t the image associated with the target label y t , we look for the transformation operator T (.) satisfying the following conditions: The classifier f assigns the targeted label to the fabricated input image which is our ultimate goal, given the condition that T (p) (Transformed feature map of the target) and x t (image associated with the target label) should be situated closely. We present the tuned dictionary learning algorithm to learn this transformation satisfying both conditions. We propose a feature map-based dictionary learning algorithm to learn this transformation. The idea is to mimic the internal representation of the target image. The image associated with the target label which we want the classifier to predict. So, we want to learn the transformation that should be close to the target image. For that purpose, we used the feature maps of the target image to create perturbation. Sparse representation approximates an input signal X by a sparse linear combination of items from an overcomplete dictionary. Let the projection of p be T (p) given by: The projection in our algorithm is learned through a dictionary by the following optimization problem [10]. The optimization problem solved is a dictionary learning with an 1 penalty on the components.
where, p = perturbation signal and λ is a regularization parameter, and n is the number of dictionary atoms.The sparsity-inducing 1 -norm also prevents learning components from noise when few training samples are available. The degree of penalization that is sparsity level can be adjusted through the α. Small values result in gently regularized coefficients, while larger values shrink many coefficients to zero. The squared error between the original and transformed signal is the basis of tuning the dictionary learning algorithm. Unlike other dictionary learning algorithms that are used to learn a dictionary of clean images to support the application of denoising, compression, or inpainting, we learn the dictionary to optimize the targeted noises. Our dictionary is called an adverse dictionary as it consists of perturbations instead of clean images. We used sampled feature map selection to improve computational efficiency. We use a novel feature map selection technique to learn this dictionary which is explained in the preceding section. The performance of the dictionary learning algorithm is enhanced by tuning it after different hyper-parameter selection. The dictionary learning algorithm tunes these hyper-parameters based on the squared error. The algorithm runs for a fixed number of iterations with different values of hyper-parameters: sparsity level and the number of components. Our experiments show that the squared error is highly correlated with the selection of these hyper-parameters. We chose test set images from the MNIST dataset to conduct these experiments. These hyper-parameters and their effect is described later in the section on ablation studies. The detailed algorithm for dictionary learning is provided in Algorithm 1.

C. FEATURE MAP SELECTION TO LEARN THE DICTIONARY
In this paper, we aim to mimic the internal representations of the target inputs to create our adversarial images. The idea is to produce an adversarial image whose internal representation matches that of the target input. Sabour et al. tried to do the same by reducing the Euclidean distance between the source and the target guide images [34].
The internal representation is captured by using the feature maps of the target images. These feature maps result in an output of one filter applied to the previous layer. These filters also known as kernels, are called feature identifiers. The feature maps detect low-level features at initial layers of the CNN and high-level features as we go deep in the architecture.
The low-level features are closely related to images, the high-level are difficult to map to the image. Therefore, we use feature maps from the first layer of the CNN.
The knowledge about feature maps and kernels to mimic the internal representation highlighted so far is used in this paper to generate the perturbation/noises for our adversarial images. The natural workflow of the CNN applies a kernel on an image to produce a feature map. We want to add noise in that image to generate a targeted feature map (feature map of the image we are targeting). The idea can be mathematically written as Therefore, the perturbation vector is given by Here, x l is the legitimate source image, and F t is the feature map of the target image x t . F t is the feature map of a target image generated by a well-trained network. K is the pre-learned filter/kernel from the same well-trained network. K −1 is the deconvolution operation. The effect of deconvolution of CNN layers is discussed in detail in [35]. p is the perturbation we want to compute. The sparse representation of this perturbation will be added in the original image as noise described in detail in Section III-D. This perturbation is generated using the test data keeping the essence of a black-box attack where the adversary doesn't have access to the training data. We feed these perturbations to learn the dictionary for sparse representation.
Next, we explain a novel efficient feature map selection algorithm to improve dictionary learning. Feature Maps possess information about the important pixels of the image [21]. Likewise, learning a discriminative dictionary is necessary to improve representation. The traditional approaches often suffer from the problem of local minima. Therefore, researchers have proposed to learn dictionaries with good representational power, and better discrimination capabilities for all classes [13]. Therefore, the idea is to build a dictionary by selecting important and diverse inputs. We select important and diverse patches by greedily sampling the test data. The target image of a particular class is selected for a dictionary if the 2 -norm of that image is greater than a threshold. This threshold is basically, the mean 2 -norm of all the images in a particular class. This way we get to learn the dictionary with diverse images. The images of all classes are included and we try to include as many diverse images of the same class as possible. The detailed algorithm is presented in Algorithm 2. The testing sets are used to sample feature maps. These selected images are then used to generate feature maps described earlier in this section. In this way, we learn a discriminative dictionary and optimize its performance by reducing the size of the number of atoms with sampling.
The sparse representation of images has gained growing interest. In this report, we solve our problem of redundant noises, and smaller 2 -norm by feature map selection-based dictionary learning. We describe how a sparse representation framework has been tailored to generate sparse adversarial images. Since all the required pieces are together we finally generate adversarial images by adding the desired perturbation vector to the legitimate image controlled by given in (6). The determines the magnitude of noise to be added to the legitimate source image to maintain imperceptibility and limit the 2 -norm of the adversarial image. The final noise is not a combination of different noises. We propose a feature map-based dictionary learning algorithm to learn this transformation. Sparse representation approximates an input signal X by a sparse linear combination of items from an overcomplete dictionary. The projection of p given by T (p) = Dα, it is mentioned that since we are adding this transformed (sparse representation) as noise to the original image so it's a dictionary of noises. The 2 -norm is the calculated difference between the legitimate and the adversarial image.
x a = x l + p The final adversarial image is then fed to the classifier.

IV. EXPERIMENTAL SETTINGS AND RESULTS
We evaluated the proposed attack methodology for both targeted and un-targeted scenarios on MNIST (black and white handwritten digits) and Imagenet dataset (colored images In this section, we describe various metrics to define the performance of our algorithm. We report the mean and median 2 -norm using the following formulae A smaller 2 -norm distance indicates a stronger attack effect and higher transferability [9]. In the ablation, studies section targeted success rate (TSR) is calculated. The targeted success rate is the rate at which sparse adversarial images generated are classified as the target label. The larger the targeted success rate, the more effective the targeted attack. Another metric used in the ablation studies section is the Squared Error distance calculated between the transformed image and the image associated with the target label given as: T (p) − x t 2 In the defense evaluation section fooling ratio is recorded for all the defense strategies. It is the percentage of images on which the classifier changes its prediction label after they are perturbed. The high values of the fooling ratio mean that the attacks are more strong. In this paper, it is shown that even after applying various defense strategies the fooling ratio of our proposed attack remains high.

B. MNIST
The training set consists of 50,000 images whereas, the test set consists of 10,000 images with resolution (28 × 28). The proposed attacks are evaluated on MNIST using a model with 99.25% Top-1 accuracy and an error of 0.04. We trained the model for 50 epochs with a learning rate of 0.01 using ADAM optimizer. The model was trained on a simple CNN architecture consisting of 6 layers. First, starting with 2 convolutional layers with 3 × 3 kernel size, then 2D max pooling size (2 × 2), followed by dropout (0.25), and finally Flatten and Dense Layers. The total trainable parameters were 55,658 We generated adversarial images using the proposed strategy with = 0.01 for un-targeted and targeted attacks. The un-targeted images are poisoned with any perturbation vector. The 10,000 images from MNIST test data are all used for evaluation purposes. The experiments are conducted for the proposed approach as well as state-of-the-art attacks: FGSM, Corner Search, and C&W. The adversarial robustness toolbox is used to conduct experiments for FGSM and C&W [36]. The publicly available original code of corner search was used to conduct the experiments. The = 1 is used for FGSM for un-targeted and targeted attacks. We did not use the same values of for FGSM because for smaller values (as used in our case) the method cannot attack the network at all. Results are reported both for targeted and un-targeted scenarios. For targeted attacks following the methodology from [7] we generate adversarial images for all classes of MNIST. This indicated 9 attacks per image. The results are reported by averaging overall attacks. The error, attack success rate, mean 2 -norm, and median 2 -norm are reported in every case.

C. ImageNet
The Imagenet consists of (224 × 224) sized images from 1000 categories. The proposed attacks are evaluated on Imagenet using a pre-trained VGG-19 model with 70.2% Top-1 accuracy and an error of 1.20. The adversarial images for targeted attacks are created with = 0.0001. The un-targeted are generated with = 0.0001 with any sparse perturbation. We chose 1000 images from its validation set representing each category of class for evaluation purposes. The experiments are conducted for the proposed approach as well as state-of-the-art attacks: FGSM, C&W, Sparse-Fool and GreedyFool. The experiments for state-of-the-art attacks are conducted using the library [36] for FGSM and C&W. The publicly available original implementations of SparseFool and GreedyFool were used to conduct the experiments. Results are reported both for targeted and un-targeted scenarios. For targeted attacks, following the methodology from [7], we generate adversarial images for 10 classes chosen randomly. The results are reported by averaging overall attacks. The = 0.01 is used for un-targeted and = 0.9 targeted attacks while conducting experiments for FGSM. The error, attack success, mean 2 -norm, and median 2 -norm are reported in every case. We couldn't conduct experiments of corner search on Imagenet due to a lack of memory resources. It required 113 GiB for an array with shape (100352, 224, 224, 3). SparseFool-based attacks cannot be extended to targeted attacks, Therefore targeted attacks were not applicable in this case.

D. EXPERIMENTAL RESULTS
We report the mean, and median 2 -norm using the formulae described above. A smaller 2 -norm distance indicates a stronger attack effect.

1) UN-TARGETED ATTACK
The results indicate that the proposed attack is effective in terms of 2 -norm when compared to others. Table 1shows the performance of the un-targeted proposed attack on MNIST and Imagenet. The mean and median values of 0.1 for MNIST and 0.02 for Imagenet are calculated which is comparable to the state-of-the-art attacks in case of un-targeted attacks. The second column shows the error of the classifier. The greater value of the error indicates a stronger attack. The loss of classifier is reported in the first column of Table 1. The third column shows the attack success rate. It's highest for our proposed approach, other than corner search but its 2 -norm is much higher than all other approaches. In the case of Imagenet the FGSM has a higher success rate but at the cost of a higher 2 -norm than all other approaches.

2) TARGETED ATTACK
The results for the proposed targeted attacks are reported in Table 2. The mean and median value of 0.1 is reported for MNIST and 0.17, 0.002 for Imagenet. These are the results of the Average case where each image is attacked by different classes of images and in the end, the average result of all attacks is reported. The second column shows the loss. The lower the value of loss the stronger the attack is. Our approach has the lowest value of loss of any state-of-the-art. The values reported suggesting that our proposed attack performs better than the FGSM method and is as good as C&W. Although C&W is a very strong attack, it is computationally very expensive. We computed the run-time of an un-targeted C&W attack to be 4,303 seconds for MNIST, on a machine with an Intel(R) Core(TM) i7-7th generation CPU and 8GB of RAM. In contrast, the run-time for our proposed un-targeted attack is 245 seconds on the same machine. Hence, C&W attack is an order of magnitude slower than the presented method.
The third column in Table 2 records the attack success rate. Here an error with a low value implies a stronger targeted attack. In the case of MNIST, the attack success values are quite promising for FGSM and C&W. This is because the epsilon value is kept very low for FGSM. The C&W is the most effective targeted attack and reports the lowest(best) values for mean and median 2 , but it needs a lot of iterations which makes them infeasible [7].
In a nutshell, the proposed method is competitive with state-of-the-art in terms of performance. The C&W outperforms in terms of 2 -norm, but requires a lot of iterations. Moreover, it lacks performance in terms of loss and attack success. The detailed illustration is provided in Fig. 3. The feature map is used to create a perturbation vector and is then transformed into sparse representation as shown in Fig. 3 both for MNIST and Imagenet examples.

V. ABLATION STUDIES & ANALYSIS OF RESULTS
The critical analysis of results reported in the previous section is explained in this section with the help of ablation studies. Dictionary learning is the key to why we achieve promising results reported in the previous section. C&W provides a very strong state of the art targeted attack with minimum 2 distance but requires thousands of iterations making it infeasible. On the other hand, we achieved promising results by improving the efficiency of dictionary learning by training it on diverse and sampled feature maps. The results can be better explained by learning the effect of hyper-parameters of the dictionary. Hyper-parameters of the dictionary learning algorithm can be used to optimize its performance. The proposed tuned dictionary learning algorithm has two hyperparameters: sparsity k, and dictionary size n, i.e. no. of components. They affect the performance in different ways. We compute the following proximity metric to compute the performance of the dictionary learning algorithm for different hyper-parameters.
A. SQUARED ERROR is the Euclidean distance between the transformed image and the image associated with the target label. The lower value of the squared error means less difference between the transformed and original image and it helps us achieve adversarial images with a much smaller 2 -norm. Figure 5. The experiments are conducted on all the test images of the MNIST dataset. The authors in [33] showed that increasing the sparsity, helps preserve more details but are less robust against attacks as well as increasing the no. of components also decreases the robustness of classifiers. We analyze the effect of sparsity k on the squared error in Fig. 4a. The difference between transformed and original image i.e. squared error is increased by increasing the sparsity. So sparsity can be used as a trade-off parameter here for targeted attacks.
We also studied the effect of the dictionary size i.e. the number of components as illustrated in Fig. 4b. We computed the values for squared error for k = 1 and k = 3 as the number of components of the dictionary are increased. It first decreases as the number of the components increase but starts increasing again for k = 1. It almost attains no further change in the squared error after the number of the components are increased till 255. When k = 3, the error attains stability earlier at the number of components 144 and further starts increasing after n = 361. Increasing the number of components improves the reconstruction and hence the accuracy on the clean images. It can be inferred from the visual analysis that we get the desired result at a smaller size of the dictionary. This is in contrast to a regular trend in the literature because our work is not a reconstruction task where increasing the dictionary size increases accuracy on clean images. This way we save the computation cost, which increases as dictionary size increases in other regular tasks.
Next, we study the effect of these parameters on our attack strategy. The experiments in ablation studies show that squared error is also highly correlated with the targeted success rate (TSR) explained earlier in the metrics section. The effect of sparsity on TSR is illustrated in Fig. 5a. The TSR oscillates in the beginning and shows stability later after k = 4 as shown in the graph. The TSR attains the highest value for k = 3 and starts decreasing afterward. We see that at k = 3, more information is retained as compared to k = 1. Therefore, more noise is reconstructed as we increase the sparsity but increasing sparsity further increases the squared error i.e. the distance between original and transformed so it negatively affects the targeted attack. The peak at k = 3 in Fig. 5a, shows that we achieve maximum value for the targeted attack. After k = 3 the noise starts reconstructing as we are learning the dictionary of feature maps. When the noise gets reconstructed it does have a more strong attack on classifier but at the same time, the targeted attack also suffers from the reconstruction of more and more noise. That is why the squared error also increases showing that less important information is preserved and the 2 -norm also increases. Therefore, k = 3 serves as a sweet spot in this case. The above details emphasize to trade off the sparsity to make an effective attack in terms of TSR as well as squared error. The experiments show k = 3 is the optimal value for this case. Therefore, we used both k = 1 and k = 3 to study the effect of the no. of the components on TSR explained in Fig. 5b and Fig. 6.
The same behavior can be seen for the number of the components. TSR increases at first for an increase in the dictionary size, but subsequently decreases. For k = 3 as the sparsity level is already high so TSR is the highest even for n = 81. For better understanding, we have again plotted squared error with TSR in Figures 6 and 7. It can be seen   that the highest TSR is reported for lowest squared error which is the reason we achieve the targeted and un-targeted misclassifications with a very low 2 -norm as reported earlier in the results section. In conclusion, we need to have a smaller squared error but a very small value will not retain enough information for the targeted attack. A very high value of squared error will again result in a higher 2 -norm and low TSR. This behavior of reconstructing noise as we increase sparsity and the no. of the components is attributed to the fact that we are in fact, learning the dictionary of perturbation vectors. We also conducted experiments to check the effect of choosing feature maps from other layers of CNN. The results are illustrated in Table3.

VI. DEFENSE EVALUATION AGAINST THE PROPOSED ATTACK
Devising defense strategies against adversarial attacks is an equally active area of research just like adversarial attacks. Goodfellow et al. [5] proposed the method of adversarial training in which the model is trained using adversarial images. In order, to evaluate the strength of our proposed attack we tested it against various defense methods. We used three different defense strategies to measure the effectiveness of our attack. Spatial Smoothing [37] is a technique used in image processing to reduce noise in the data. The authors in [37] applied the local smoothing method as a defense against attacks. Local Smoothing smooths each pixel by using neighboring pixels. Feature Squeezing [37] is used to reduce the bit depth of images. Images are normally represented using color bit depths which is a major cause of irrelevant features. In this paper, the authors tested the hypothesis that reducing bit depth can reduce the effect of adversarial attacks without affecting classifiers' accuracy. The method is applied to each pixel. JPEG Compression [38] is also used as an effective defense technique. Its strength lies in its ability to eliminate high-frequency signal components. These are removed inside the square blocks of a particular image.
The training data as well as adversarial data are transformed using defense methods and are then evaluated on the same model architecture as employed in Section IV. The data is trained on transformed training data for 30 epochs for MNIST. In the case of Imagenet, only adversarial data is transformed due to computational complexity and the use of a pre-trained model. The results show that our attack is not defended by any of these defense methods. The fooling ratio of the classifier, when fed with sparse adversarial perturbations, is recorded in the second column of Table 4. The next columns show the fooling ratio after applying different defense strategies. It can be seen from Table 4 that our proposed attack has a success rate of 86%. The next columns show the fooling ratios after applying defense methods to the MNIST dataset.
For MNIST, the fooling ratio remained the same for spatial smoothing, and JPEG compression whereas, it increases in case of feature squeezing. This is because the basic idea behind defense methods in general and feature squeezing, in particular, is to compare the model's prediction on the original sample with the same model's prediction on the sample after squeezing [37]. Since our proposed attack has already used a smaller subspace and is minimum in terms of 2 -norm therefore, feature squeezing didn't help. Moreover, the analysis in [37] shows that feature squeezing is not immune to adversarial adaptation and hurts the accuracy of legitimate images as well.
In the case of Imagenet again the defense methods failed to counter the effect of our proposed attack. It is reduced to 52.51% and 52.23% in the case of spatial smoothing and feature squeezing but is still not able to provide an effective defense. The emphasis of our approach has been on squeezing the noise magnitude. The important pixels are there but the sparse transformation of noise and lower value of 2 -norm has made it almost difficult to detect the attack.

VII. CONCLUSION
We propose sparse adversarial image generation which obtains comparable results in terms of 2 -norm. The feature map makes it possible to highlight the important pixels of the image to attack. We used a feature map to create our perturbation vector. This perturbation vector is then optimized using dictionary learning. The sparse adversarial noise is then added to the image by the one-shot method. The proposed attack is fast and minimum in terms of 2 distance between the input image and the adversarial image. The tables show that our results are comparable with state-of-the-art methods. There is room for improvement in terms of the fooling ratio of the attack. The comparable results are achieved using a small size of dictionary thus saving the computation cost. We motivate a new area for designing adversarial attacks not explored before. The researchers can explore this area to create more robust classifiers.
Further, we tested the strength of our proposed attack with different defense strategies. The results show that these defense methods are not able to defend neural networks from our proposed attack. Since this is a new direction to create adversarial examples. In the future, the proposed attack can be combined with existing gradient-based attacks and can be used in adversarial training to create more robust classifiers.
We have also presented novel algorithms to learn tuned dictionary based on feature maps. These ideas to tune dictionaries can be extended to other machine learning problems solved by dictionary learning. This research is still needs improvement in terms of attack success rate, especially in the case of targeted attacks. Since this area was not explored yet, the future avenues hold strong. One stream of work can be conducted to improve the results in terms of attack success rate. The other is to test the transferability of these attacks to other models as well as other machine learning problems.