Probabilistic Analysis of Targeted Attacks Using Transform-Domain Adversarial Examples

In the past decade, Deep Neural Networks (DNNs) have achieved breakthrough collaborations in developing smart intelligent systems within the field of computer vision, natural language processing, autonomous systems, etc. Recent research has revealed that stability of such smart systems is at greater risk when they come across to adversarial perturbations. Although, these perturbations may not be perceivable in nature when seen from naked eye, yet, they are capable enough to fool state-of-the-art DNN classifiers. Till now, much of the previous work related to fool such classifiers focuses on generating adversaries that directly change pixel values of an image in spatial-domain. In this paper, we propose a novel transform-domain imperceptible attack methodology “TDIAM” to generate adversaries based on image steganography-approach using a “single carefully selected targeted watermark”. We use three different frequency-domain approaches, i.e., Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT) and Fast Fourier Transform (FFT) to craft perturbations in selective frequency component which makes it robust and it requires less computational time as it is a non-gradient approach. We present our case study on MNIST handwritten digits dataset. Our results demonstrate that the generated perturbation vector successfully fool simple Convolutional Neural Network (CNN), LeNet-5 and AlexNet architectures by increasing probability of adversarial examples for the targeted class (to which the targeted watermark belongs) in both “black-box” and “white-box” adversarial attacks. The results have shown that among these three perturbation approaches, DWT based perturbation shown promising results by effectively fooling DNNs while ensuring the high imperceptibility as well.


I. INTRODUCTION
The ability to discern a visual imagery and understand the real world data is critical and arguably the most complicated cognitive capability. Humans solve such tasks through their receptive and productive skills and can easily exploit the available contextual information using their prior knowledge. Deep Neural Networks (DNNs) is a gist of similar notion, which has readily been applied to the domains including but not limited to computer vision tasks [1] (such as optical character recognition [2], template matching [3], etc.), natural language processing [4], speech processing [5], and reinforcement learning [6]. Today, smart artificial intelligence based systems incorporating state-of-the-art The associate editor coordinating the review of this manuscript and approving it for publication was Paolo Napoletano . deep learning techniques have influenced the scientific community to discover and formulate solutions to more complex problems. These techniques, thereby, learning important sub-spaces within the data, have earned to contribute towards the development of physical systems, such as, autonomous vehicles [7], UAVs [8], robots [9], security and surveillance systems [10], medical sciences [11], and many others.
Despite a rapid progress in the field of computational intelligence, vulnerabilities of such smart systems is a major area of concern within the scientific community [12]- [14]. For instance, a careful chosen small perturbation implanted in the system's input can cause an opposite behavior at the output or may impede its functionality [15]. A similar disruption can happen in DNNs (victim model), such that, it can cause misclassification, if a small and cautious perturbation embedded in the host image changes the network's output VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ label. Whereas, prior to perturbation, the classifier has shown greater accuracy on classifying a wrong prediction. Although, this small added perturbation vector can go unnoticed to human visual system, yet, it has the capability to fool any network. For example, an attacker can train a classifier and use it to generate an adversarial version of the image to fool another model. Meanwhile, the researchers have made great progress in understanding the space of adversarial examples, Su et al. in [16] are first to show that DNNs can even be fooled by changing a careful selection of single pixel in an image. Since then, the adversarial attacks have become a serious concern that can target digital domain [13], [15], [17] (adversarial perturbations are directly applied to digital images, e.g., by modifying images corresponding to a scene), and more recently in the physical domain [18]- [20] applications (objects of interest are modified, e.g., by putting stickers on a stop sign). Along similar lines, our work contributes towards injecting targeted attacks in a victim model, where a small perturbation vector [17] forces that network to output a meticulous class.
In this paper, a novel approach to generate the desired adversarial examples is presented. Unlike previous methods that craft adversaries in digital domain, our proposed approach embeds a secret image inside the host image in transform-domain. The computation of perturbation vector and embedding in the host image is done using frequency domain methods, that are, Discrete Cosine Transform (DCT) [21], variants of Discrete Wavelet Transform (DWT) [22], and Fast Fourier Transform (FFT) [23]. Careful selection of targeted watermark enables our approach to keep the perturbation imperceptible while still requiring less computational power. Our main motivation behind this work focuses on ''image steganography-approach'' for generating adversarial attacks in a targeted manner. Unlike previous approaches, such as, FGSM that computes gradients to craft image-specific perturbations, the proposed work reduces the computational complexity and time by crafting adversaries in the transform domain which does not involve any gradient descent computation. The proposed ''TDIAM'' approach does not require any gradient computation and is imperceptible as well as capable enough to fool DNNs. The other motivational contribution is the selection of watermark image, which in our work is chosen based on the highest individual class probability score instead of randomly selecting it. Below are the major contributions of this paper: • We propose an algorithm for careful selection of secret-image (we call it as targeted watermark) on the basis of higher-probability-score that would generate strong perturbations instead of selecting a random watermark.
• We use transform-domain methods (DWT, DCT, and FFT) to craft perturbation in frequency-domain unlike other methods that manipulate pixel values directly in the spatial-domain such as FGSM. The comparison of our transform-domain perturbation methods and FGSM is tabulated in Table 1.
• Unlike other methods that craft perturbation at training time [24] and generate adversaries using a small training data, the proposed technique does not involve any training process or gradient estimation; hence, it requires less time and less computations as compared to gradient-based perturbation methods.
• Our method ''TDIAM'' uses only one carefully selected targeted watermark to craft perturbation in the selective frequency-component of the host image instead of adding perturbation in all pixels of the image which makes it more robust and efficient.
• We empirically demonstrate the effectiveness of our proposed perturbation for both black-box-attack and white-box-attacks on an employed CNN architecture (as shown in Fig. 2) and compare the results with the stateof-the-art AlexNet [1] and LeNet-5 [25] architectures. The rest of the paper is organized as follows. Section II outlines the related work in context of digital and physical adversarial examples. Section III explains our methodology for (1) selection of a targeted watermark (2) generation of adversarial examples and (3) analysis of generated adversarial examples in terms of deep network. Section IV reports the experimental results, whereas, Section V concludes the paper with a discussion.

II. RELATED WORK
In the past, different methods have been proposed under the adversarial knowledge of white-box setting, where the threat model knows every thing about the victim model, including the network architecture, and training dataset [12], [13], [17], [26]. Contrary to white-box setting, some methods can directly be deployed in the black-box setting, where the threat model does not have access to the victim model. Rather, it only has information to the input labels and corresponding scores of the victim model [26]. These methods do not require gradient information, as discussed in [16], [27], [28].
Given that, DNNs are vulnerable to adversarial attacks, the difficulty of attack yet varies according to the adversarial goals, defined as, targeted attack and untargeted attack. The goal of targeted attack is to force the victim model to incorrectly misclassify all the inputs to a specific targeted class. Whereas, the goal of untargeted attack is to force the victim model to incorrectly classify the input into an arbitrary class [26]. Despite the plethora of published knowledge in black-box and white-box attacks, our work is based on analyzing the effect of embedding a targeted watermark into host images, such that, how well the perturbed images (adversarial examples) differentiate from original images under the scenario of transform domain. For this purpose, famous approaches to craft such perturbations include iterative methods, gradient based approaches and optimization based approaches, yet, they all are restricted to digital or physical domain applications. Hence, we survey the related work both in the digital and physical domains.

A. DIGITAL ADVERSARIAL EXAMPLES
In a digital scenario, Szegedy et al. in [29] claimed a major breakthrough in adversarial attacks in which he claims DNNs to be susceptible to small perturbations. These perturbations when added to the input of start-of-the-art DNNs results in the misclassification of previously classified images. Similar to [17], Moosavi et al. in [30] computed adversarial perturbations using iterative linearization of classifier that can fool state-of-the-art DNNs. In the first iteration, a minimal perturbation embedded in input image exploits the network linearity at decision boundary. Hence, addition of these small perturbations in successive iterations are sufficient to keep dragging the output class label towards decision boundary until the goal of misclassification is achieved.
The mentioned approaches in [30] generate perturbation on a specific image, and hence, it can not be treated as a generalized perturbation crafted model that can easily fool DNNs on multiple images. Moosavi et al. in [31] has accomplished the task of generating universal adversarial perturbation, such that, state-of-the-art DNNs become highly vulnerable to misclassify natural images with a higher probability, thus making the perturbation doubly-universal (image-agnostic, network-agnostic). The authors have used optimization based approach to generate perturbations by restricting l 2 -norm and l ∞ -norm. This results in a good transferable property to fool multiple networks.
Junde Wu et al. in [26] generated transferable adversarial examples that are also universal, transferable, and can target different networks. Here, the attacks learn a universal mapping relation between inputs and adversarial examples without solving the optimization problem for each input. Dong et al. in [32] use momentum method to craft perturbations. While computing gradients, they integrate the velocity vector iteratively, such that, the update direction is stabilized and local maxima is avoided.

B. PHYSICAL ADVERSARIAL EXAMPLES
Given that, the recent work has examined adversarial examples in digital domain, physical perturbations can also exploit the vulnerability of DNNs. For example, Goodfellow et al. in [13] uses Fast Gradient Sign Method (FGSM) that computes the perturbation by exploiting linear behaviour of DNN models. FGSM calculates gradients using a single large step, while, Kurakin et al. in [18] showed that printed adversarial examples can be misclassified when viewed through a smart phone camera. The gradients are estimated using multiple small steps in an iterative way. The algorithm runs these iterations until the fool rate is maximized.
Sharif et al. in [19] attacked the facial authentication system. The physical perturbation in the form of eyeglass frames that, when printed and worn, has fooled state-of-art face recognition systems. Their work demonstrated successful physical attacks in relatively stable physical conditions with a slight variation in pose, distance/angle from the camera, and lighting conditions. This contributes to a realistic and practical threat to the physical systems that are already deployed in stable environments. However, environmental conditions can vary widely in general and can contribute to reducing the effectiveness of perturbations. The detailed workflow of our proposed methodology TDIAM comprising of three main steps are illustrated in Fig. 1. We describe the methodology of each individual step in detail below.

A. HOW TO SELECT THE TARGETED WATERMARK?
Before proceeding to generate adversarial examples for targeted attacks using steganography-approach, we need a watermark image at first. We could select a random watermark image to perform steganography, but in this paper, we are discussing adversarial attacks in a targeted context. The aim of targeted attacks is to change the class probabilities of the original images in such a way that the probability increases for the class to which the watermark image belongs. Therefore, we need an appropriate watermark image that effectively targets each host image and cause the network to output the particular class. For this purpose, we train a simple CNN architecture and extract the predicted probabilities from the last Fully Connected (FC) layer of the network. The  network architecture that we incorporate for the mentioned purpose is shown in Fig. 2.
After getting predicted probability scores (Pr) from CNN classifier, we sum up the probabilities of all images against each class (Sum{class j }), where j represents the class index (0-9). From sum-of-all-class probabilities (All{class j }), we select the top class that has the highest sum-of-probability score (max(All{class j })). After that, we select the targeted watermark from the selected top class (Top_Class) on the basis of highest-probability-score. The complete method for selecting targeted watermark is described in Algorithm 1. Now, we have targeted watermark for generating adversarial examples using three different steganography-based approaches. In the next section, we describe in detail the methodology for embedding selected watermark in original images.

B. HOW TO GENERATE ADVERSARIAL EXAMPLES?
This section describes how we generate adversarial examples using different methods. For this purpose, we use steganography-based approach in which the confidential data is embedded into some cover media with the intent that the difference between the original image and the image with confidential data embedded in remains non-distinguishable (imperceptible) by human eye. The resultant image is called stego-image (or adversarial example) while the data hided in the original image is termed as adversaries or perturbation vector. To generate stego-images, we have used different transform-domain methods, as we are manipulating the original image (known as host image in terms of steganography) in frequency-domain instead of spatial-domain. The reason behind manipulating the data in frequency-domain is that any changes applied on an image in spatial-domain is performed directly on pixel values which is easy but the imperceptibility is low and we want comparatively high imperceptibility.
For transform-domain methods, we transform spatialdomain image pixels into frequency-domain coefficients using three different transformations i.e., DWT, DCT, and FFT. After transforming the image into frequency-domain using one of the above methods, we embed the coefficients of targeted watermark into the coefficients of host image followed by re-transformation to spatial-domain using inverse transformation. Although, above mentioned prob ← P r (k, j) 8: Count{class j } ← Count{class j } + 1 10: end for 11: All{class j } ← Sum{class j } 12: end for 13: Select high probability class as Top_class (C T ) i.e., max(All{class j }) 14: Select high probability image from Top_class as Targeted_watermark i.e., max(image{Top_class}) 15: return Targeted_watermark(TW ) transform-domain methods are slower than spatial-domain methods, yet they are more secure, efficient, and tolerant towards noise [33].

1) DISCRETE COSINE TRANSFORM (DCT)
We use DCT to generate stego-images. Following are the steps of embedding procedure using DCT approach: (i) Apply 2-dimensional discrete cosine transform on both host image (I) and targeted watermark (TW) separately. The two-dimensional DCT of an M-by-N image matrix pixels f(x,y) are defined as follows.
The values F(u, v) are called the DCT coefficients of image pixels f (x, y), whereas, the basis functions are, (iii) The purpose of dividing the transformed host image and targeted watermark into blocks is to embed the targeted watermark in the particular block of host image, thus, providing a minimum perceptibility. Hence, we embed the bottom_right block of secret image (TWB4) into the bottom_right block of the host image (IB4), while keeping the other blocks same. In this way, the embedded targeted watermark is not perceivable in the host image when we apply inverse discrete cosine transform, as explained in Section IV-B.2. We define the blocks for resultant stego-image as: where factor defines the ratio by which components of both host image and targeted image are fused together. A factor of '0' means no information of targeted watermark embeds into the host image and factor of '1' means that all information of targeted watermark embeds into the host image. Therefore, higher the value of the factor is, the lower the imperceptibility of embedded information. (iv) The final stego-image is then produced by combining the resultant blocks (Stego_B1, Stego_B2, Stego_B3, Stego_B4) into a single matrix and then applying the inverse DCT transformation. The Fig. 3 shows the detailed illustration of discrete cosine transform (DCT) based steganography-approach.

2) FAST FOURIER TRANSFORM (FFT)
The second transform-domain method that we used for the generation of stego-images or adversarial examples is FFT. The method has previously been used for steganography VOLUME 8, 2020 purposes [34]- [36] and to generate watermark in the images [37], [38]. In our work, we use it to create adversarial attack on DNNs in order to estimate the strength of network against the embedded perturbation. Following are the steps of embedding procedure using FFT approach.
(i) Apply 2-dimensional FFT on both host image (I) and targeted watermark (TW) respectively. The two-dimensional FFT of an M-by-N image matrix pixels f(x,y) are defined as follows: (ii) For image steganography purpose, it is well known that the phase of Fourier transform is more important and has more impact as compared to its magnitude [36], [39]. Therefore, we embed the phase component of targeted watermark (TW_Phase) into the phase component of host image (I_Phase), while the magnitude of host image (I_Magnitude) remains the same. The resultant phase and magnitude components of stego-image becomes while, Detailed illustration of fast fourier transform (FFT) based steganography-approach is shown in Fig. 4.

3) DISCRETE WAVELET TRANSFORM (DWT)
The third method that we use to generate stego-images is DWT. The 2-dimensional DWT decomposes an image into four frequency sub bands, i.e., LL (low-low), LH (lowhigh), HL (high-low), and HH (high-high). In literature, many authors have used DWT for steganography purpose. Chen at el in [40] used Haar-wavelets for steganography purpose and embed the secret message in three sub-bands i.e., LH (horizontal-component), HL (vertical-component),   [41] used DWT along with SVD (singular value decomposition) for robust watermarking. They embed the watermark image only in LL sub-band after performing SVD on that particular sub-band. Sharma et al in [42] used 3-level Haar-wavelets based watermarking technique for copyright protection. The image is decomposed into 3-levels and alpha blending technique is applied to embed the watermark image into the LL sub-band for robustness.
In this work, we are using different families of wavelets (DWT) at different decomposition levels (1 and 3) for extensive generation of stego-images. The wavelet families that we use are: Haar and Daubechies. The detailed comparison of these two wavelet families is illustrated in Table 2 [43], [44].
Following are the step of embedding procedure using DWT approach.
(i) We generate stego-images by sequential selection of wavelet families (Haar and Daubechies). Apply 2-dimensional selected wavelet transform on both host image (I) and targeted watermark (TW) separately. The two-dimensional DWT of an M-by-N image matrix pixels f(x,y) are defined as follows: where, here, the W φ (j 0 , m, n) coefficients gives approximation of f (x, y) at scale j 0 , while W ψ (j, m, n) coefficients represents horizontal, vertical, and diagonal details of f (x, y). (ii) Decompose the host image (I) and targeted watermark (TW) for particular type of wavelet family at particular level-of-decomposition. (iii) After decomposing the image into four sub-bands, i.e., LL(n, k), LH(n, k), HL(n, k), and HH(n, k) for wavelet family 'k' and at particular level 'n', we embed the targeted watermark into host image using two different approaches. In the first approach, HH(n,k) sub-band of targeted watermark (TW_HH) is embedded into the HH(n,k) sub-band of host image (I_HH). By doing this, we manipulate only the diagonal component of host image and leaving the other components unchanged. Hence, the resultant sub-bands (Stego_LL, Stego_LH, Stego_HL, Stego_HH) for the generation of stego-image are Stego_LL(n, k) = I _LL(n, k) (15) Stego_LH (n, k) = I _LH (n, k) (16) Stego_HL(n, k) = I _HL(n, k) (17) Stego_HH (n, k) = (1 − factor) * I _HH (n, k) +factor * TW _HH (n, k) (18) In the second approach, we embed the three sub-bands of targeted watermark, i.e., LH(n,k) sub-band, HL(n,k) sub-band, and HH(n,k) sub-band into the corresponding sub-bands of host image. Hence, the resultant sub-bands (Stego_LL, Stego_LH, Stego_HL, Stego_HH) for the generation of stego-image becomes  The detailed illustration of DWT based steganographyapproach is shown in Fig. 5.

C. HOW TO EVALUATE THE IMPACT OF ADVERSARIAL EXAMPLES?
As mentioned earlier, the perturbations are computed using transform-domain steganography-approaches by a careful selection of watermark. These perturbations are then crafted into host images, thereby, producing a set of perturbed images (adversarial examples). In this paper, we shall fool state-ofthe-art DNNs. The purpose of producing adversarial examples is to check whether, the perturbation caused in host images by a careful selection of watermark are strong enough to increase the probabilities of stego-images. The probabilities of stego-images (perturbed images) for a particular class (to which the targeted watermark belongs) are further compared to the probabilities of the host images. For this purpose, we will compute the class-probabilities of perturbed images using the pre-trained DNN classifiers i.e., CNN (as shown in Fig. 2), LeNet-5 and AlexNet. The class-probability scores for perturbed images (SP r ) are compared with the class-probability-scores obtained for the host images (P r ). After that, we count the number of samples (stego_count) for which the probability of targeted class increases after perturbation such that SP r (k, C T )>= P r (k, C T ), where, k is prob_org ← P r (k, C T ) 7: prob_stego ← SP r (k, C T ) 8: if prob_stego>= prob_org then 9: stego_count ← stego_count + 1 10: stego_new_prob ← prob_stego − prob_org 11: stego_sum ← stego_sum + stego_new_prob 12: else 13: org_count ← org_count + 1 14: org_new_prob ← prob_org − prob_stego 15: org_sum ← org_sum + org_new_prob 16: end if 17: end for 18: return org_sum, org_count, stego_sum, stego_count the number of perturbed images and C T is the targeted class (Class-1). The detail method for analyzing the impact of perturbation caused in adversarial examples is described in Algorithm 2.

IV. EXPERIMENTS AND RESULTS
To demonstrate the effectiveness of proposed frameworks, we evaluate our methodology on MNIST database of handwritten digits [45]. The dataset has 10 classes for 0-9 digits. The samples for each class are shown in Fig. 6.

A. SELECTION OF TARGETED WATERMARK (TW)
For selection of targeted watermark, we employ a CNN architecture with two convolution layers (Conv2D) followed by max-pool and two fully-connected (FC) layers. The last FC layer uses Softmax activation for classification purpose. It is important to mention that number of training images in each class(0-9) of MNIST dataset are not of equal amount. Therefore, in order to avoid the class-imbalance problem, training samples of each class are reduced to the minimum number of training samples of a particular class. Thus, we randomly select equal amount of training images from each class. With this topology, we train the CNN architecture over 12 epochs with a batch_size of 128. We achieve the final accuracy of 99.24 % on MNIST test dataset. The model architecture is described in Fig. 2.
Likewise, we follow the same approach for MNIST test dataset. We randomly select 892 test images from each class, as this is the minimal number that corresponds to amount of test images from Class 5. With this topology, we test our trained MNIST model on a balanced test data comprising of 892 images in each class. Now, we will select the targeted watermark from test set of MNIST on the basis of classprobability-scores. We compute the probability score for all test images against every class. Furthermore, for each class, we sum the probability score of all images and we choose that class which has the maximum probability score. The class probability scores are shown in Table 3. From Table 3, we see that Class-1 has the highest probability score as compared to the other classes. Hence, we select Class-1 as our targeted top-class (C T ).
After selecting the targeted top-class, i.e., Class-1 on the basis of probability scores, we will now select the targeted watermark (TW) by simply choosing that image from Class-1 which has the highest individual probability score. In this way, the highest probability image from Class-1 is selected, as shown in Fig. 7. We will use this image as a targeted watermark for generating perturbation in host images.

B. EFFECT OF PERTURBATION ON ADVERSARIAL EXAMPLES 1) EVALUATION METRIC
We will evaluate the performances of DWT, DCT, and FFT based image steganography on MNIST digits dataset using two evaluation metrics, namely, Mean Square Error (MSE) and Structural Similarity Index Measurement (SSIM) [46]. We are using MSE which is also known as reconstruction error variance to estimate the imperceptibility rate. It is a metric used to evaluate the difference between a host image and a stego-image and can be defined as follows: where I (x, y) is the host image of size M-by-N and S(x, y) is the stego-image of same size as host image. We shall use the normalized version of MSE, i.e., NMSE, in order to obtain normalized MSE values between the range of 0-1.
The other metric that we are using for measuring the imperceptibility rate of adversarial examples is SSIM. The metric actually measures the perceptual difference between a reference image and a processed image. In other terms, it measures the perceived similarity between two images and can be defined as follows: where µ represents the mean, σ represents the variance, c 1 and c 2 are the variables. In order to analyze the effects of above mentioned transform-domain perturbations caused in MNIST digits at class-level, we average out the NMSE values and SSIM values of all images lying under one class. By doing this, we get single NMSE value and SSIM value for each individual class.

2) ANALYSIS OF ADVERSARIAL EXAMPLES GENERATED USING DISCRETE COSINE TRANSFORM (DCT)
In order to check the imperceptibility of adversarial examples generated with DCT based steganography-approach using targeted watermark (TW), we take images from MNIST   dataset as reference images and the adversarial examples generated above as processed images. We compare the processed image with the corresponding reference image and compute NMSE and SSIM against each class. We perform this step for all the classes of MNIST digits (0-9). The evaluation results are illustrated in Table 4.
From the results, we can see that by increasing the embedding rate by a factor of 0.6, the maximum NMSE increases by 0.22%, indicating that slight perturbation is added. The imperceptibility is further ensured from the values VOLUME 8, 2020 obtained through SSIM metric. As we increase the embedding rate from 0.3 to 0.9, the perceptual similarity between reference image and the processed image drops down to 10%. Therefore, increasing the embedding rate from factor of 0.3 to 0.9 does not show enormous change when applying DCT, as we are only embedding high frequency components of targeted watermark in the host image. The sample of adversarial examples generated using DCT based steganography-approach at factors 0.3, 0.6, and 0.9 are shown in Fig. 8(a).

3) ANALYSIS OF ADVERSARIAL EXAMPLES GENERATED USING FAST FOURIER TRANSFORM (FFT)
While embedding targeted watermark (TW) using FFT based steganography-approach, the perturbation caused in the host image effects its imperceptibility a lot as compared to DCT based steganography-approach. This is due to the fact that in DCT based steganography-approach, we are only targeting high-frequency components of the host-images for adding perturbation while keeping the low-frequency components unchanged. The low-frequency components contain approximation details of an image. Whereas, in FFT approach, there is a presence of phase and magnitude components instead of simple high and low frequency components.
The results obtained from FFT based steganographyapproach for all MNIST classes (0-9) using targeted watermark 'TW' are illustrated in Table 5. The results reveal that, by increasing the embedding rate from 0.3 to 0.9, the maximum error recorded in terms of NMSE is 19.16%. Contrarily, SSIM values incur a notable change even at a lowest embedding factor of 0.3. Here, the perceptual similarity between reference image and process image drops down to 40% while it achieves 10% similarity at the embedding rate of 0.9. Hence, the perceptual difference between the two compared images decreases by 30%, as we increase the embedding rate from the factor of 0.3 to 0.9.
In this particular scenario, SSIM is a better evaluation metric as compared to MSE for performing perceptual comparison between two images. From Table 5, we can clearly see that SSIM value decreases a lot due to the fact that we are embedding phase of targeted watermark (TW) in the host image (I) instead of its magnitude, and the phase-component has more impact as compared to the magnitude-component. Hence, SSIM metric gives a better idea about how much perceptibility affects when a targeted watermark is embedded using FFT based steganography-approach. The sample of adversarial examples generated using FFT based steganography-approach at factors 0.3, 0.6, and 0.9 are shown in Fig. 8(b).

4) ANALYSIS OF ADVERSARIAL EXAMPLES GENERATED USING DISCRETE WAVELET TRANSFORM (DWT)
The results in terms of NMSE and SSIM metric obtained for DWT based steganography-approach using targeted watermark (TW) for all classes of MNIST dataset (0-9) are tabulated in Table 6, Table 7, Table 8, and Table 9. We extensively generate adversarial examples (stego-images) using DWT approach; (a) at different embedding factors, i.e., 0.3 and 0.9, (b) using different DWT sub-bands for embedding, i.e., HH (only diagonal sub-band) and VHD (horizontal, vertical, and diagonal sub-bands), (c) using different wavelets family, i.e., Haar (variant 'haar') and Daubechies (variant 'db2'), and (d) at different decomposition-levels, i.e., Level-1 and Level-3. Now, we will discuss in detail the effect of each scenario in terms of imperceptibility.
At Different Factors: From Table 6 (NMSE values for haarwavelet) and Table 8 (NMSE values for daubechies-wavelet), it can be incurred that, regardless of level-of-decomposition or wavelet family used, the maximum NMSE difference is 5.37% when the embedding factor is increased from 0.3 to 0.9. On the other hand, in Table 7 (SSIM values for haarwavelet) and Table 9 (SSIM values for daubechies-wavelet), SSIM value slightly decreases indicating that a small perceptual change is occurred in the processed image. This can also be verified from Fig. 9, as the perceptual difference between the second and the third row of each Fig. (9(a)−9(h)) is minimum.
At Different Sub-Bands: We are using different DWT sub-bands, i.e., HH (only diagonal sub-band) and VHD (horizontal, vertical, and diagonal sub-bands) of targeted watermark for the purpose of embedding in host image (I). Compared to HH sub-band, imperceptibility is affected more when perturbation is caused in VHD sub-band. This is due to the fact that, in case of embedding HH sub-band of targeted watermark (TW) in host image (I), there is only one band that is affected while in case of VHD, the three sub-bands, i.e., horizontal, vertical, and diagonal components of targeted watermark are embedded in the host image (I). The maximum NMSE difference between HH sub-band and VHD sub-band is 5.63% (Table 6 and Table 8). The SSIM values (Table 7 and Table 9) also decreases a bit, as we can see the perceptual difference between respective second rows (HH sub-band at factor 0.3 and VHD sub-band at factor 0.3) and respective third row (HH sub-band at factor 0.9 and VHD sub-band at factor 0.9) of Fig. 9(a) and Fig. 9(b).
For Different Wavelet-Families: We are using two different wavelet families, i.e., Haar (variant 'haar') and Daubechies (variant 'db2') for the generation of adversarial examples. Compared to haar, imperceptibility effects more to daubechies, as the maximum NMSE difference is found out to be 14.72% (Table 6 and Table 8). In case of haar, the minimum value for SSIM is recorded as 0.6, depicting a 60% perceptual similarity between reference and process image (Table 7), while 40% similarity is recorded in case of daubechies (Table 9). This can also be seen from respective second and third rows of Fig. 9(d) and Fig. 9(h).
At Different Level-of-Decomposition: By increasing the decomposition level from Level-1 (L1) to Level-3 (L3), NMSE value (Table 6 and Table 8) significantly increases irrespective of the wavelet type (Haar or Daubechies). The reason can be stated that any perturbation caused in the sub-bands of Level-3 effects the LL sub-band (approximation details) of the previous level (i.e., Level-2) and hence, imperceptibility decreases as we increase the levelof-decomposition. Likewise NMSE, SSIM values (Table 7  and Table 9) highlight a significant change, as we move to higher levels of decomposition. As tabulated in Table 6 and Table 8, the maximum NMSE difference recorded is 24.44%, while SSIM value (Table 9) decreases up to the value of 0.3 when perturbation is caused at Level-3. This shows that perceptual similarity between reference image and process image is 30%, whereas, at Level 1, both images are 100% similar. We can clearly notice this perceptual difference from respective second and third rows of Fig. 9(e) and Fig. 9(g).
From an over all comparison of the three steganographyapproaches, we can clearly see that FFT performs worst of all while DCT performs well as compared to the other two approaches in terms of imperceptibility. For an extensive experimentation purpose, we will use both worst and best case adversarial examples to test whether the defined CNN architecture (Fig. 2), LeNet-5 and AlexNet are fool proof towards these adversarial examples or not.

C. IMPACT OF ADVERSARIAL EXAMPLES ON DEEP NETWORK
In order to check the validity and performance of our proposed method, we have performed two different types of adversarial attacks, classified as: (1) black-box attack, and (2) white-box attack. Using these attacks, we can check whether our crafted perturbation is strong enough to affect the defined CNN (Fig. 2), LeNet-5 and AlexNet architectures. Furthermore, we can also verify whether the probabilities of perturbed samples raise for the targeted class or not.

1) BLACK-BOX-ATTACK MODEL
In black-box attack model, it is assumed that attacker has no access to the training samples and has no knowledge of the underlying architecture of DNN classifier. Hence, in this particular scenario, we will evaluate perturbed samples using a pre-trained model and predict the class probabilities. We will then compare these probabilities with the probabilities of original samples that are not perturbed and count those number of samples for which the probability of the targeted-class (Class-1) increases.
The results for black-box-attack model are tabulated in Table 10. From the results, we can see that, in case of FFT (at 0.3, 0.6, and 0.9) and some variants of DWT (at level-3), i.e., Haar-L3-HH, Haar-L3-VHD, Db-L3-HH, and Db-L3-VHD, probabilities are increased for more than 80% of samples. It is further noted that for these approaches, the perturbation does not remain imperceptible due to higher number of edges or information embedded in the host image. Although, our aim is to target maximum number of samples by increasing their probability for the targeted class but not at a low imperceptibility. The imperceptibility is high for DCT approach but it only targets 50% of the samples. The results depict that the best imperceptible perturbation with higher number of targeted samples is obtained for Haar-L1-VHD (at embedding rate of 0.9) which targets around 77.55% of the samples.

2) WHITE-BOX-ATTACK MODEL
In white-box attack model, it is assumed that the attacker has access to the training samples and has knowledge of underlying architecture of DNN classifier. Therefore, in this   particular scenario, we perturb the test samples for evaluation as well as the training samples. We pass the perturbed training samples to the architectures (CNN, LeNet-5 and AlexNet) and re-train the model by unfreezing the first (CONV layer) and the last layer (FC layer). After getting the probabilities for the test samples, we compare it with the probabilities of original samples and counts those number of the samples for which probability of targeted class increases.
The results for white-box-attack model are shown in Table 11. From the results, we can see that probability of the targeted class increases for the samples of FFT approach and for some variants of DWT approach (at level-3) but the perturbation does not remain imperceptible in these particular cases. The results obtained for the variants of DWT, i.e., Haar-L1-VHD and Db-L1-HH (at embedding rate 0.3, 0.6, and 0.9) shows that probability of the targeted-class (Class-1) increases for more than 70% of the samples, while ensuring the imperceptiblility of perturbation as well.

V. CONCLUSION
In this paper, we demonstrated that using only a single image, an adversarial example can be generated which has the ability to successfully fool state-of-the art neural network classifiers. We proposed the methodology for selecting a ''single targeted watermark'' (secret image) instead of randomly selecting it from available samples. We also explained the procedure of generating and embedding the perturbation vector in host images in the transform-domain contrary of embedding the perturbation vector in spatial-domain at pixel level. We have also shown the effectiveness of crafting adversaries in transform-domain which does not require any kind of training and are imperceptible as well as capable enough to fool DNNs. We successfully showed the attack of our generated adversarial examples for two different types of adversarial attacks, i.e., white-box-attack and black-boxattack in targeted context.
The overall purpose of this paper is to understand the impact of ''single carefully selected targeted watermark'' on generated adversarial examples and the effect of generated perturbation vector on the deep neural network. The experimental results of Section IV-C shows a successful impact of our adversarial attacks on defined CNN architecture (Fig. 2), LeNet-5 and AlexNet. The overall results shows that DCT based perturbation fools deep networks lesser as compared to DWT and FFT based perturbations. The FFT and DWT (at decomposition level-3) fools deep network the most. Hence, we can conclude that FFT is a good option to craft perturbations in applications where imperceptibility is not a constraint while DCT is most suitable for the applications where imperceptibility matters the most. Furthermore, the overall effect of white-box-attack is more stronger as compared to black-box-attack, as large number of test samples are affected by it. Our study shows that, if deep neural networks are vulnerable towards such simple, yet powerful attacks, then security measures should be one step further to protect smart intelligent systems.