A Random Ensemble of Encrypted Vision Transformers for Adversarially Robust Defense

Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In previous studies, the use of models encrypted with a secret key was demonstrated to be robust against white-box attacks, but not against black-box ones. In this paper, we propose a novel method using the vision transformer (ViT) that is a random ensemble of encrypted models for enhancing robustness against both white-box and black-box attacks. In addition, a benchmark attack method, called AutoAttack, is applied to models to test adversarial robustness objectively. In experiments, the method was demonstrated to be robust against not only white-box attacks but also black-box ones in an image classification task on the CIFAR-10 and ImageNet datasets. The method was also compared with the state-of-the-art in a standardized benchmark for adversarial robustness, RobustBench, and it was verified to outperform conventional defenses in terms of clean accuracy and robust accuracy.


I. INTRODUCTION
Deep neural networks (DNNs) have achieved great success in various computer vision tasks and have been deployed in many applications including security-critical ones, such as face recognition and object detection for autonomous driving.However, DNNs are known to be vulnerable to adversarial examples (AEs), which fool DNNs by adding small perturbations to input images without any effect on human perception.In addition, AEs designed for a source model can deceive other (target) models.This property, called transferability, makes it easy to mislead various DNN models.This is an urgent issue that has a negative impact on the reliability of applications using DNNs.
In previous studies, various methods were proposed to build models robust against AEs.Adversarial training [1]- [4] is widely known as a means of defense against AEs, where AEs are used as a part of training data to improve robustness against AEs.However, it has the problem of degraded model performance that occurs when test data is clean.
Another approach for constructing robust models is to train models by using images encrypted with secret keys [5], [6].The encrypted models have been verified to be robust against white-box attacks if the keys are not known to adversaries.Furthermore, the approach is effective in avoiding the in-fluence of transferability [7].However, this approach is still vulnerable to black-box attacks because these attacks do not need to know any secret key.
Accordingly, we propose a random ensemble of encrypted vision transformers (ViTs) that is inspired by perceptual image encryption [8], [9], model encryption [5]- [7], [10], [11] and an ensemble of models [12], [13].An ensemble of encrypted models was discussed as one type of adversarial defense [6], [14], but the ensemble model is not yet robust against black-box attacks when it is used as the source model.The proposed method with a random ensemble allows us not only to use models robust against black-box attacks but to also utilize ones robust against white-box attacks.In experiments, the effectiveness of the proposed method is verified on the CIFAR-10 and ImageNet datasets by using a benchmark attack, AutoAttack [15] We reported that a random ensemble of encrypted submodels was effective for enhancing robustness against AEs in a preliminary experiment as the first report [16].The report is extended this time by adding additional experiments and detailed considerations including comparison with the stateof-the-art and discussion on the ImageNet dataset.The rest of this paper is structured as follows.Section II presents related work on adversarial examples, defense methods, and the vision transformer.Regarding the proposed method, Section III includes an overview, threat model, and a random ensemble of encrypted models.Experiments for verifying the effectiveness of the method, including classification accuracy, effects of the number of sub-models, and comparison with the state-of-the-art, are presented in Section IV, and Section V concludes this paper.

A. ADVERSARIAL EXAMPLES
AEs are used to mislead machine learning models [1], [17], [18].Traditionally, adversarial attacks are classified into three types in accordance with the knowledge of a particular model and training data available to the adversary: white-box, blackbox, and gray-box.Under white-box settings [1], [19], [20], the adversary has direct access to the model, its parameters, training data, and defense mechanism.In contrast, the adversary does not have any knowledge on the model, except the output of the model in black-box attacks [21]- [23].Between white-box and black-box attacks, there are gray-box ones that imply that the adversary knows something about the system [24], [25].
The model used for the design of AEs is called a source model, and the model that is the final objective of the attack is called a target model.The source model is the same as the target model in general.However, even if the source model is different from the target model, AEs may deceive the target model [17], [26]- [28].This property is called adversarial transferability.Therefore, several black-box and gray-box attacks prepare substitute models to generate AEs.
AEs are also classified into two types depending on how they are created: perturbation-based AEs and unrestricted AEs.Perturbations in perturbation-based AEs are restricted by matrix p-norms such as ℓ ∞ [19], ℓ 2 [29], ℓ 1 [30], and ℓ 0 [31] to be imperceptible to humans.In contrast, unrestricted AEs [32] are crafted by using special transformation [33] or generative models [34].In this paper, we use perturbationbased (ℓ ∞ -norm bounded) AEs to evaluate the proposed method because they are used in a benchmark attack method called AutoAttack [15].

B. DEFENSE METHOD
Various adversarial defenses have been studied to construct models robust against AEs so far.There are two strategies for adversarial defenses.
The first strategy aims to build models that directly classify AEs without removing perturbations.Adversarial training [1]- [4], which trains models with AEs, is one such widely known strategy.Madry et al. approach adversarial training as an optimization problem, and they utilize the projected gradient descent (PGD) adversary as a universal attack one to craft AEs [19].While adversarial training with the PGD adversary is high in robustness, the required computational resources are also high.Many studies have made progress to reduce the computational cost, such as free-adversarial training [35], fast adversarial training [36], and single-step adversarial training [37].However, it was shown that models trained with AEs under the ℓ ∞ -norm can still be vulnerable to ℓ 1 normbounded AEs [38].In contrast, certified defenses are also one type of defense in which perturbation is not removed.Certified defenses provide strict mathematical guarantees of robustness against AEs in such a way that there are no AEs within some bounds [39]- [42].However, these defenses do not work well against some types of perturbation such as generative perturbation [43] and parametric perturbation [44] The second strategy is to pre-process input images to reduce the effect of perturbations in AEs.In previous studies, various transformation methods were proposed, such as thermometer encoding [45], diverse image processing techniques [46], [47], denoising strategies [48], [49], and GAN-based transformation [50].At first, the performance of these input transformation-based defenses was high, but it was seen that they were vulnerable to adaptive attacks [51], [52].Keybased defenses are also one type of defense method belonging to the second strategy.Unlike other defenses, they utilize secret keys to defend models so that they have an information advantage over adversarial attacks [5], [6], [53], [54].Unless secret keys are leaked, adversarial attacks do not break keybased defenses.
However, key-based attacks are not robust against blackbox attacks such as Square Attack [22].In this paper, we aim to solve the issue that conventional key-based defenses have.

C. PROPERTIES OF VISION TRANSFORMER
Transformer-based models have been widely used in natural language processing (NLP) tasks [55].Inspired by the success of NLP tasks, the vision transformer (ViT) was proposed for computer vision tasks [56].We focus on the following properties of ViT that enhance the robustness of models against AEs.
(a) ViT provides high performance in image classification tasks, compared with CNN models [56].(b) The transferability between ViT and CNN models is low [28].(c) The transferability among ViT models encrypted with different keys is low [7].(d) Isotropic networks including ViT have a high similarity with block-wise image encryption [54], [57], [58].
Properties (b) and (c) are important to robust models.If the transferability between models is high, effective AEs are easily designed even when the model parameters of a target model are not disclosed.Property (d) means that it is easy to prepare encrypted models with high performance by using a block-wise encryption method.The properties are obtained from the patch embedding structure of ViT.Accordingly, it is expected that ViT models encrypted with a block-wise encryption method are robust against various attacks under some requirements.

III. METHODOLOGY
A. OVERVIEW Fig. 1 shows an overview of the proposed method.The method considers satisfying the two requirements below.
• It is robust against various AEs.
• It has a high classification accuracy even when inputting test images without any adversarial noise.To fulfill the above requirements, we propose a random ensemble of ViTs in this paper.
As shown in the figure, each training image is encrypted to generate N encrypted images by using N keys K = {K 1 , . . ., K N }.N encrypted sub-models are trained by using encrypted images, and an ensemble of the sub-models is constructed.For testing, N test images encrypted with the keys used for training the sub-models are generated from a plain image, and they are input to the random ensemble model to get an estimated result.

B. THREAT MODEL
The goal of an adversarial defense is to keep the classification accuracy on both clean images and adversarial examples high.To evaluate a defense method, precisely defining threat models is necessary.A threat model includes a set of assumptions such as an adversary's goals and knowledge.Adversary's Goals: An adversary can construct adversarial examples to achieve different goals when attacking a model: whether to reduce the performance accuracy (i.e., untargeted attacks) or to classify a targeted class (i.e., targeted attacks).Formally, untargeted attacks will mislead a classifier to a wrong label, given an adversarial example, and targeted ones will force the classifier to a targeted label.Adversary's Knowledge: The adversary's knowledge can be white-box (inner workings of the defense mechanism, complete knowledge on the model and its parameters), blackbox (no knowledge on the model) and gray-box, that is, anything in between white-box and black-box.
In this paper, we focus on key-based defense.We consider both white-box and black-box attacks while keeping a secret key.Attack Scenarios: Many defenses against AEs have been proposed, but it is very difficult to objectively evaluate defense methods without an independent test.For this reason, AutoAttack [15], which is an ensemble of the adversarial attacks used to test adversarial robustness objectively, was proposed as a benchmark attack.AutoAttack consists of four attack methods: Auto-PGD-cross entropy (APGD-ce) [15], APGD-target (APGD-t), FAB-target (FAB-t) [20], and Square Attack [22], which have the properties summarized in Table 1.In this paper, we use these four attack methods and AutoAttack to objectively compare the proposed method with the state-of-the-art.

C. NOTATIONS
The following notations are utilized throughout this paper.
• W , H, and C denote the width, height, and number of channels of an image, respectively.
where K i is the key for the i th sub-model, and N is the number of sub-models.

D. RANDOM ENSEMBLE
A novel framework for adversarial defenses is proposed here.Fig. 2 shows a random ensemble of encrypted sub-models in which each encrypted test image is input to the corresponding sub-model, and S outputs are randomly selected from N outputs, where 3 ≤ S ≤ N .An estimated result is finally decided by using S outputs.Image encryption, the training of sub-models with encrypted images, and the ensemble of sub-models and testing used in the framework are explained under the use of ViT with a patch size of M × M for sub-models below.

Image Encryption
Sub-models are trained with encrypted images, and encrypted images are also used as test ones.In this paper, one of the block-wise image encryption methods presented in [6] is used to generate encrypted images.The procedure of the encryption with pixel shuffling is summarized below (see Fig. 3) Generate a secret key K i as where k, k ′ ∈ {1, ..., p b }, and  By using a random selection of S outputs, the random ensemble model provides a different estimation value every time, even if the same image is input to the model.Some black-box attacks such as Square Attack [22] repeatedly input images to the model and gradually update the perturbations on the basis of the information obtained from prediction results.Accordingly, it is expected that random ensemble models make such attacks more difficult.

IV. EXPERIMENT
To verify the effectiveness of the proposed defense, we ran several experiments on the CIFAR-10 and ImageNet datasets in an image classification task.

A. EXPERIMENTAL SETUP
Experiments were conducted on the CIFAR-10 [59] and ImageNet [60] datasets.CIFAR-10 comprises 60, 000 images with 10 classes (6, 000 images for each class), with 50, 000 images for fine-tuning and 10, 000 images for testing.Im-ageNet consists of 1.28M images for training and 50, 000 images for validation with 1000 classes.All images were resized to 224 × 224 × 3 to fit the input to ViT and scaled to [0, 1] as a range of values.
We used a pre-trained ViT with a patch size of M = 16, which was prepared in [56], as sub-models where it was pre-trained with ImageNet-21k.ImageNet-21k is a dataset consisting of 21, 000 classes with a total of 1, 400 million images, which were resized to an image size of 224 × 224 × 3 when pre-training ViT.For CIFAR-10, we fine-tuned ViT for 5, 000 epochs with the stochastic gradient descent (SGD) optimizer, which was implemented in PyTorch.The parameters of the SGD optimizer were a learning rate of 0.03 and a momentum of 0.9.For ImageNet, we also used the SGD optimizer and ran 20, 000 epochs for fine-tuning.The parameters of the SGD optimizer were the same as for CIFAR-10.For model encryption, pixel shuffling with a block size of M = 16, as the patch size of ViT, was used.

B. IMAGE CLASSIFICATION ACCURACY
The effectiveness of random ensemble models was verified on CIFAR-10 in terms of image classification accuracy.In Table 2, four kinds of models were compared against five attacks, where "clean" means that test images did not include any adversarial noise, and "simple ensemble" means models without random selection [14].From Table 2, the experimental results are summarized as follows.
• ViT (baseline): The baseline model that was trained with clean images achieved the highest clean accuracy.However, it was vulnerable to all attacks.
• Encrypted ViT: The model trained with encrypted images, which did not consist of sub-models, was robust against white-box attacks such as APGD-ce as confirmed in [7], but it was not robust against Square Attack and AutoAttack.• Simple Ensemble: The ensemble model without random selection was still vulnerable to Square Attack and AutoAttack, but it had a higher clean accuracy than that of the encrypted ViT.• Random Ensemble: The proposed model had a high classification accuracy even if white-box attacks were applied as well as for simple ensemble.In addition, it was robust against Square Attack and AutoAttack.
From Table 2, it was confirmed that the use of encrypted ViT models was effective at improving robustness against whitebox attacks.In addition, by combining a random ensemble with encrypted sub-models, the model was robust against not only white-box attacks but also Square Attack.As a result, it had a high accuracy even when using AutoAttack.
Table 3 shows experiment results on the ImageNet dataset.It was confirmed that the trend in classification accuracy on ImageNet is similar to that on CIFAR-10.

C. EFFECTS OF NUMBER OF SUB-MODELS
In Tables 2 and 3, four sub-models (N = 4) were used to construct ensemble models.In Table 4, N = 5 was used, and an experiment was carried out on the CIFAR-10 dataset.
From the table, the random ensemble model with N = 5 was demonstrated to improve classification accuracy for Square Attack and AutoAttack, compared with the random ensemble model with N = 4.
In contrast, the simple ensemble model with N = 5 had almost the same accuracy as that of the simple ensemble model with N = 4. From these results, increasing the number of sub-models can improve the robustness against blackbox attacks under only the use of a random ensemble of sub-models because a greater number of sub-models makes black-box attacks more difficult when using a random ensemble.

D. COMPARISON WITH STATE-OF-THE-ART
Various adversarial defenses have been studied so far, and they have been compared to evaluate the performance of each defense method.RobustBench is a standardized benchmark for adversarial robustness, and the goal of RobustBench is to systematically track the real progress in adversarial robustness [61].AutoAttack, which is an ensemble of white-box and black-box attacks, is used to standardize the evaluation.Accordingly, we compared our method with state-of-the-art defenses under AutoAttack.
Table 5 shows the clean and robust accuracy of the top 5 models in RobustBench on the CIFAR-10 dataset, and the top 5 models on the ImageNet dataset are given in Table 6.They were evaluated by AA (ℓ ∞ , ϵ = 8/255) in Table 5    our models with a random ensemble achieved the highest accuracy in both clean and AutoAttack on the two datasets.

E. DISCUSSION
In the above experiments, we confirmed that our models with a random ensemble are robust against both white-box and black-box attacks.In addition, if we can use a greater number of sub-models, the random ensemble models are more robust against black-box attacks.We assume that the adversary knows the model architecture and has access to pre-trained models and training data.In addition, we assume that the adversary also knows the mechanism of the keybased defense, but not the secret keys.
To clearly show the properties of our models, we investigated the relationship between the number of leaked keys and the robustness.To examine the relationship, we conducted an additional experiment on the CIFAR-10 dataset (see Table 7).In the experiment, random ensemble models with N = 4 were evaluated under the use of APGD-ce.In Table 7, "# of leaked keys" indicates the number of secret keys known to the adversary.As shown in Table 7, random ensemble models achieved high accuracy when the number of leaked keys was zero or one.In contrast, for encrypted models without any sub-model [5], [6], which have one key, the accuracy will be significantly reduced.Accordingly, the use of sub-models can also enhance robustness against the leak of keys.

V. CONCLUSION
In this paper, we proposed a novel method for adversarial defenses.In the method, a random ensemble of ViTs encrypted with secret keys is used to construct a model.Simple encrypted models are known to be robust against whitebox attacks, but not against black-box ones.To enhance robustness against all attacks, the use of ViT and an ensemble of sub-models was proposed in the paper.In addition, a benchmark attack method, called AutoAttack, which con-   sists of four attacks: Auto-PGD-cross entropy, APGD-target, FAB-target, and Square Attack, was applied to models to test adversarial robustness objectively.In an experiment, the proposed method was demonstrated to be robust against not only white-box attacks but also black-box ones in an image classification task.In addition, it was compared with state-ofthe-art defenses in a standardized benchmark for adversarial robustness, RobustBench, and it was verified to outperform the conventional ones in terms of clean accuracy and robust accuracy.

3 ) 1 )
Split a three channel (RGB) color image x into nonoverlapped blocks with a size of M × M such that {B 1 1 , . . ., B w h , . . ., B W b H b }. 2) Flatten each block B w h into a vector b w h .

4 ) 2 ) 5 )
Randomly permute pixels in each vector b w h to generate an encrypted vector b ′ w h as b ′ w h (k) = b w h (v k ).(Reshape each encrypted vector b ′ w h into an encrypted block B ′ w h .6) Concatenate encrypted blocks into an encrypted image x ′ .Fig. 4 shows an example of encryption with the above procedure where M = 16.In the method, training and test images are encrypted by using the same key set.Training Sub-models with Encrypted Images N sub-models are fine-tuned by using images encrypted with N keys as shown in Fig. 1.The details of training submodels are illustrated in Fig. 5.As shown in the figure, all training images are encrypted by using key K i , and then the pre-trained ViT [56] is fine-tuned to produce an encrypted sub-model i with the encrypted images.N sub-models are created with N keys in accordance with the above procedure.Random Ensemble of Sub-models and Testing Fig. 2 shows the details of testing with a random ensemble of N encrypted sub-models.The steps for getting an estimation result from a test image are summarized below.1) Generate N images encrypted with N keys from a test image.2) Input each encrypted image into the corresponding sub-model.3) Select S outputs (3 ≤ S ≤ N ) from N sub-models randomly.4) Determine a final output by the average of S results.

FIGURE 3 .
FIGURE 3. Procedure of image encryption

FIGURE 4 .
FIGURE 4. Example of encrypted images

FIGURE 5 . 5 .
FIGURE 5. Training of sub-models with encrypted images

TABLE 1 .
Attack methods used in AutoAttack C×W ×H represents an input color image.• The tensor x ′ ∈ [0, 1] C×W ×H represents an encrypted image.• M is the block size of an image.• W b = W M and H b = H M are the number of blocks across width W and height H.We assume that W and H are divisible by M , so W b and H b are positive integers.

TABLE 7 .
Robustness of random ensemble model against key leaks