An Effective Adversarial Attack on Person Re-Identification in Video Surveillance via Dispersion Reduction

Person re-identification across a network of cameras, with disjoint views, has been studied extensively due to its importance in wide-area video surveillance. This is a challenging task due to several reasons including changes in illumination and target appearance, and variations in camera viewpoint and camera intrinsic parameters. The approaches developed to re-identify a person across different camera views need to address these challenges. More recently, neural network-based methods have been proposed to solve the person re-identification problem across different camera views, achieving state-of-the-art performance. In this paper, we present an effective and generalizable attack model that generates adversarial images of people, and results in very significant drop in the performance of the existing state-of-the-art person re-identification models. The results demonstrate the extreme vulnerability of the existing models to adversarial examples, and draw attention to the potential security risks that might arise due to this in video surveillance. Our proposed attack is developed by decreasing the dispersion of the internal feature map of a neural network to degrade the performance of several different state-of-the-art person re-identification models. We also compare our proposed attack with other state-of-the-art attack models on different person re-identification approaches, and by using four different commonly used benchmark datasets. The experimental results show that our proposed attack outperforms the state-of-art attack models on the best performing person re-identification approaches by a large margin, and results in the most drop in the mean average precision values.


I. INTRODUCTION
In order to continuously track targets across multiple cameras with disjoint views, it is essential to re-identify the same target across different cameras. However, this is a very challenging task due to several reasons including changes in illumination and target appearance, and variations in camera intrinsic parameters and viewpoint.
There has been great interest and significant progress in person re-identification (ReID) [1]- [6], which is important for security and wide-area surveillance applications as well as human computer interaction systems. Fueled by the new models, including the neural network-based approaches, proposed in recent years, the performance of The associate editor coordinating the review of this manuscript and approving it for publication was Adam Czajka .
person ReID approaches has improved significantly. For instance, the rank-1 accuracy of the state-of-the-art method on the Market 1501 dataset [7] is 94.8% [1], which has increased from 44.4% when the dataset was initially released in 2015.
In this paper, we demonstrate the effectiveness of an attack model in generating adversarial examples (AEs) for the person ReID application, attack multiple state-of-theart person ReID models, and also compare the performance of the presented attack approach with other state-of-the-art attack models via an extensive set of experiments on various person ReID benchmark datasets. One of our goals is to demonstrate the extreme vulnerability of multiple state-ofthe-art person ReID approaches to this attack, and draw the attention of the research community to the existing security risks. In person ReID, the paired probe and gallery images are VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ expected to have high similarity. However, by adding humanimperceptible perturbations to the probe images, the models are easily fooled even when the probe images appear the same as the original images. Adversarial examples [8]- [10] have been extensively investigated recently in image classification [10], [11], object detection [12]- [14] and semantic segmentation [12], [15], etc. However, relatively less attention has been paid to the robustness of person ReID models. Bai et al. [16] proposed an adversarial metric attack, which targets on fooling the distance metrics in person ReID systems. An early attempt for the defense shows that a metric-preserving network can be applied to defend against such attack. Zheng et al. [17] propose Opposite Direction Feature Attack (OFDA) to generate adversarial examples/queries for retrieval tasks such as person ReID. The idea is to push away the feature of the adversarial query in the opposite direction of the original feature.
In this paper, we present and employ an effective approach to generate adversarial examples targeting person ReID methods. Our approach [18] is referred to as the Dispersion Reduction (DR), and it is a black-box attack. The main idea behind our approach is reducing the ''contrast" of the internal feature map of a neural network. The intuition is that just like reducing the contrast of an image would make the objects less recognizable or distinguishable, reducing the contrast of an internal feature map would have a similar effect on recognizability of objects by the neural network. In our previous work [18], we showed the transferability of the DR attack across different tasks including object detection, classification and text recognition. The contribution of this work includes the following: We adapt the DR attack for the person ReID problem, and perform an extensive comparison and evaluation on different state-of-the-art methods and multiple benchmarks. In addition, we compare the performances of multiple attack methods. We show that making a feature map ''featureless'', through dispersion reduction, is very-well suited to fool any state-of-the art ReID model. Moreover, we use different network models (different from the model used by the victim ReID networks) as the source model to generate the adversarial examples and show the effectiveness and generalizability of our attack approach. We also analyze the effect of the perturbation budget on the attack performance.
The rest of this paper is organized as follows: The related works on both person ReID and attack models are summarized in Section II. The proposed dispersion reductionbased attack approach, and the methodology are described in Section III. The experimental results are presented in Section IV, and the paper is concluded in Section V.

A. PERSON ReID METHODS
Various person re-identification (ReID) approaches have been proposed in the past [19], which can be classified into different categories. There have been methods based on distance learning [20]- [26], on feature design and selection [27]- [33], and on mid-level feature learning [34]- [38].
Many works relied on color transformation and statistic models for person re-identification. Cheng and Piccardi [39] applied a cumulative color histogram transformation and employed an incremental major color spectrum histogram representation. Trajectory matching, height estimation and illumination-tolerant color representation were used by Madden and Piccardi [40]. Chae and Jo [41] employed a Gaussian Mixture Model (GMM) for the segmented regions in a person, and used a ratio of the GMMs to identify the same person. The Brightness Transfer Function (BTF) and its variants have been introduced to improve the matching performance. Porikli [42] proposed BTF for inter-camera color calibration. Later, Javed et al. [43] and Posser et al. [44] proposed the Mean Brightness Transfer Function (mBTF) and the Cumulative Brightness Transfer Function (cBTF), respectively. Datta et al. [45] presented the Weighted-BTF (wBTF), and Bhuiyan et al. [46] presented the Minimum Multiple Brightness Transfer Function (Min-MCBTF) to model the appearance variation by using a learning approach. However, it was assumed that multiple consecutive images are available for training, which is not the case for the commonly used benchmark datasets.
Researchers then focused on combining the features and distance metrics at the same time. Liao et al. [47] proposed Local Maximal Occurrence (LOMO) and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA) for person ReID. Chen et al. [48] formulated a new view-specific person ReID framework, referred to as camera correlation aware feature augmentation (CRAFT). In this framework, cross-view feature adaptation is performed by measuring cross-view correlation from visual data distribution and carrying out adaptive feature augmentation. Matsukawa et al. [49] proposed the hierarchical Gaussian Of Gaussian (GOG) descriptor, which generates discriminative and robust features that describe color and textural information simultaneously. An image is first divided into horizontal strips. Then, local patches in the strips are modeled using a Gaussian distribution. Köstinger et al. [50] proposed the KISSME, which is a statistical inference perspective to address the problem of metric learning.
More recent works employ neural networks and achieve state-of-the-art performance in person ReID. Zheng et al. [1] proposed DG-Net that encompasses a generative module, which separately encodes a specific person into both appearance and structure. It also integrates a discriminative module that shares the appearance encoder with the generative module. As a result, the high-quality cross-id composed images are fed back to the appearance encoder online and used to improve the model for discriminative module. Zhang et al. [3] proposed AlignedReID performing automatic part alignment during learning, without requiring extra supervision or pose estimation. By learning jointly on global and local features, it aims to address existing drawbacks. Xie et al. [51] proposed PLR-OSNet, which introduces Part-level resolution (PLR) into Omni-Scale Network (OSNet) [52]. It has two branches including both global and local feature representations. The global branch adopts a global-max-pooling layer, while the local branch employs a part-level feature resolution scheme for producing only a single ID-prediction loss, which is in contrast to existing part-based methods.

B. ADVERSARIAL ATTACK METHODS
Szegedy et al. [9] introduced the adversarial images, which can fool the Convolutional Neural Network (CNN)-based models, and cause misclassification by adding small perturbations to the original images. In one of the earlier works, Goodfellow et al. [53] proposed fast gradient sign method (FGSM), which generates AEs in one step. Several works extended this by iteratively updating the AEs with multistep attacks including the basic iterative method (BIM) [10], deep fool [54], momentum iterative method [11], Diverse Inputs Method (DIM) [55] and Translation-Invariant (TI) attacks [56]. Compared with FGSM, the iterative methods generate a smaller perturbation, which makes the adversarial examples even more imperceptible to human eye.
The transferability property of adversarial examples motivated research on black-box adversarial attacks. To perform black-box attacks, methods have been introduced [57], [58], which employ a substitute model that is trained to mimic the target model. Gradient-free attacks use feedback on query data, i.e., soft predictions [59], [60] or hard labels [61]. However, these aforementioned approaches require feedback from the target model, which is not practical in some scenarios. More recently, several methods have been proposed, which study the attack generation process itself. In general, an iterative attack [8], [62], [63] achieves a higher attack success rate than a single-step attack [53] in a white-box setting, but performs worse when transferred to other models. Below, we will summarize some of these attack methods.

1) GRADIENT-BASED ADVERSARIAL ATTACK METHODS
Fast Gradient Sign Method (FGSM) [53] generates the adversarial example x adv by linearizing the loss function in the input space and performing one-step update as follows where ∇ x J (x real , y) is the gradient of the loss function w.r.t.
x, sign(·) is the sign function that constrains the perturbation in L ∞ norm bound. FGSM can generate more transferable adversarial examples, however, it may not be as effective in white-box attacks [10]. Basic Iterative Method (BIM) [10] extends FGSM by updating the gradient in a multi-step manner with a small step size α, which can be expressed as where x adv 0 = x real . BIM clips x adv t after each update, or sets α = /T , with T being the number of iterations to ensure that they are in an -neighbourhood of the real image.
Momentum Iterative Fast Gradient Sign Method (MI-FGSM) [11] integrates momentum term into the iterative attack process. The update procedure is as follows where g t collects the gradient information up to the t-th iteration, and µ is the decay factor. Diverse Inputs Method (DIM) [55] applies random and differentiable transformations to the input images with probability p and maximizes the loss function with respect to these transformed inputs. The transformed images are fed into the classifier for gradient calculation. Such transformation includes random resizing and padding with a given probability p. This method can be combined with the momentumbased method to further improve the transferability.

2) TRANSLATION-INVARIANT ATTACK METHODS
Translation-Invariant (TI) attack methods have been proposed by Dong et al. [56] to further improve the transferability on white-box models. The authors notice the difference between the discriminative regions used by defenses to identify object categories and the normal trained models. Rather than optimizing the objective function, TI attack method uses a set of translated images to optimize the adversarial examples as arg max where T ij (x) is the translation operation that shifts image x by i and j pixels along the two-dimensions, respectively, and w ij is the weight for the loss J (T ij (x adv , y)).
Note that the TI can be integrated into any gradient-based attack such as FGSM or DIM. For example, the translationinvariant method for fast gradient sign method (TI-FGSM) updates as Also, the translation-invariant method for diverse inputs method (DIM) can also be obtained by a similar approach.

III. PROPOSED APPROACH
In this section, we will describe the dispersion reductionbased attack on the person ReID application.

A. NOTATION
We use x real to denote the original query image, and f (·) to denote a deep neural network classifier. The output feature map at layer k is denoted by F, where F = f (x real )| k at the first time step. For each step afterwards, we calculate the dispersion, which is denoted as g(·), and the gradient of dispersion as ∇ x real g(F k ) to update the adversarial examples x adv . More details will be provided in the following section.

B. DISPERSION REDUCTION
For person ReID, the existing models are trained with various benchmark datasets, which have different labeling schemes. Thus, compared to the image classification problem, person ReID is more complicated. More specifically, treating and attacking the person ReID models as black-boxes require an approach that is highly transferable and effective at attacking different training datasets and model architectures. The aforementioned existing black-box attacks, however, use a pretrained model as surrogate, which shares the same training dataset and same labeling scheme with the targeted models. Moreover, most existing attack methods rely on task-specific loss functions, which greatly limits their transferability across tasks and different network models.
In our previous work [18], we showed that Dispersion Reduction (DR) has good transferability properties, and is successful in across task attack scenarios. DR employs a publicly available classification network as the surrogate source model, and attacks models that are used in different computer vision tasks, such as object detection, semantic segmentation and cloud API applications. DR is a black-box attack. Conventional black-box attacks establish a source model as the surrogate, for which the inputs are paired with the labels generated from the target model instead of the ground truth labels. In this way, the source model mimics the behavior of the target model. Our proposed DR attack, on the other hand, does not rely on the labeling system or a task-specific loss function, since DR only accesses top part of the model. Although a source model is still required, there is no need for training with new target models or querying the target model for labels. Instead, a pre-trained public model could simply serve as the source model due to the strong transferability of the proposed DR attack. As shown in Fig. 1, the DR attack reduces the contrast of an internal feature map, by reducing its dispersion, so that the information that is in the feature map becomes indistinguishable, and the following layers are not able to extract any useful information regardless of what kind of computer vision task is at hand. The adversarial example, shown in the second column of Fig. 1, was generated by attacking (reducing the dispersion of) the conv3.3 layer of VGG16 surrogate model. This also results in the distortion of the feature maps of the subsequent layers (e.g. conv5.3). As can be seen, compared to the feature maps of the original image, the standard deviations of the feature maps for the adversarial image are lower after the attacked layer.
Moreover, we have analyzed the effect of attacking different convolutional layers of the VGG16 network with the proposed DR attack based on the PASCAL VOC2012 validation set [18]. Fig. 2a shows the mAP value for Yolov3 and Faster RCNN, and mIoU for Deeplabv3 and FCN. Fig. 2b is the plot of the standard deviation values before and after the DR attack, together with the change. As can be seen, attacking the middle layers of VGG16 results in higher drop in the performance compared to attacking top or bottom layers. At the same time, the change in the standard deviation for middle layers is larger compared to the top and bottom layers. We can  infer that for initial layers, the budget constrains the loss function to reduce the standard deviation, while for the layers near the output, the standard deviation is already relatively small, and cannot be reduced too much further. Based on this observation, we choose one of the middle layers as the target of the DR attack. More specifically, in our experiments, we attack conv3-3 for VGG16, the last layer of group -A for inception-v3, and the last layer of 2nd group of bottlenecks (conv3-8-3) for ResNet152.
The DR attack is defined as the following optimization problem: where f (·) is a deep neural network classifier, θ denotes the network parameters, and g(·) computes the dispersion. As shown in Alg.  g(·)) as the dispersion metric due to its simplicity. Given any feature map, DR iteratively adds perturbation to x real along the direction of decreasing standard deviation, and maps it to the vicinity of x real by clipping at x ± . Denote the feature map at layer k as F = f (x adv t )| k , DR attack solves the following equation The code is provided in [65].

Algorithm 1 Dispersion Reduction Attack
Input : classifier f , real image x real , feature map at layer k, perturbation , iteration T and learning rate l Output: adversarial example x adv , s.t. x adv − x real ∞ ≤ 1: procedure Dispersion reduction 2: x adv 0 ← x real 3: for t=0 to T-1 do 4: Compute std g(F k ) 6: Compute gradient ∇ x real g(F k ) 7: Update x adv by: 8: x adv t = x adv t − Adam(∇ x real g(F k ), l) 9: Project x adv t to the neighbour of x real : 10:

C. VICTIM ReID MODELS AND IMPLEMENTATION DETAILS OF ATTACKS
In order to evaluate the effectiveness of our proposed adversarial DR attack, we adapt it for the person ReID problem, and attack three different state-of-the art person ReID appraoaches, namely DG-Net [1], AlignedReID [3] and PLR-OSNet [51]. For person re-identification, both DG-Net and AlignedReID use ResNet-50 [66] as the backbone model, while PLR-OSNet employs the Omni-Scale Network as the backbone. DG-Net reaches 94.8% and 86.0% on the rank-1 accuracy and mean average precision (mAP), respectively, on Market 1501 dataset [7]. AlignedReID achieves 92.6% and 82.3% accuracy [67] on the rank-1 and mAP, respectively,, and PLR-OSNet achieves 95.6% and 88.9% accuracy on the rank-1 and mAP, respectively, on the Market 1501 dataset.
We used the pre-trained models for these ReID approaches, provided by the authors on their Github pages [68]- [70]. During training, the images are resized to 256×128, which is a strong baseline that can achieve higher accuracy. We reduce the mini-batch size from 16 to 4 to save GPU memory usage on all models and all datasets. The learning rate for DG-Net, AlignedReID and PL-OSNet is 0.0001, 0.0002 and 0.0003, respectively. All models use a decay rate of γ = 0.1, which reduces the learning rate by a factor of 1/10 after T steps during the training. For DG-Net T is set to 60000. For AlignedReID and PL-OSNet, T is set to and 20, respectively. More implementation details can be found in the source codes provided by the authors [68]- [70].
For each dataset, the images are separated into training and testing folders. We follow the data preparation process described in [68], [69]. After pre-processing, we apply the TI-FGSM and TI-DIM attacks as described in [56], and detailed in the source code on Github page [71].
For our dispersion reduction (DR) attack, we first used the pre-trained ResNet-152 as the source model. The values of the parameters, listed in Algorithm 1, are as follows: = 4, l (learning rate) = 0.05, T = 100. The adversarial examples are generated on the test images, and used for testing on the victim ReID models. As mentioned above, both DG-Net and AlignedReID use ResNet-50 [66] as the backbone model. Thus, in order to generate the adversarial examples with different surrogate models, we have also used VGG-16 and InceptionV3, as our source models. As discussed above, we used conv3-3 for VGG16, the last layer of group -A for inception-v3, and the last layer of 2nd group of bottlenecks (conv3-8-3) for ResNet152, as the attack layers. We also analyzed the effects of using different values, and a detailed discussion is provided in the following section.

IV. EXPERIMENTS, RESULTS AND DISCUSSION
As mentioned above, we have used three state-of-the-art ReID methods as victim models, attacked them with the proposed DR attack, and evaluated the performance drop on four different datasets. Moreover, we attacked the same victim models with two other state-of-the-art attack approaches, namely TI-FGSM and TI-DIM [56], [71]. We compared the effectiveness of our DR attack with these other attack methods as well. Moreover, we have used three different network models as the surrogate source model to evaluate and compare the performance drop and the attack effectiveness.

A. DATASETS
We have employed four challenging and commonly used benchmark datasets to demonstrate the effectiveness of the proposed attack. These datasets are Market-1501 [7], CUHK 03 [37], DukeMTMCreID [72] and MSMT 17 [73], which are briefly described below.

1) MARKET-1501
Market-1501 [7] dataset contains 32,217 images of 1501 labeled persons from six camera views. There are 751 VOLUME 8, 2020 identities in the training set and 750 identities in the testing set. In the original study on this proposed dataset, mAP is the evaluation criteria used to compare the algorithm performances.

2) CUHK03
CUHK03 [37] dataset contains 8765 images of 1467 labeled persons. In this paper, we use a new protocol, in which the training set and test set have 767 and 700 identities, respectively. We select the detected bounding boxes instead of labeled bounding box results. It is a more difficult evaluation protocol for CUHK 03.

3) DukeMTMC-ReID
DukeMTMC-ReID [72] dataset is composed of 36,411 images of 1812 persons captured from eight cameras. There are 702 identities in the training set and 1110 identities in the testing set. The evaluation criteria is mAP, same with the Market-1501 dataset.

4) MSMT17
MSMT17 [73] is the largest image-based person ReID dataset introduced in 2018. It contains 124,069 labeled images of 4101 person IDs captured from 12 different outdoor or indoor cameras. The evaluation protocol/criteria is also same as the Market-1501 dataset, and uses mAP.

B. EVALUATION METRIC
With the same image perturbation ( = 4), we compare performances of all the attack methods while attacking the victim ReID approaches. The lower number indicates more drop in ReID accuracy, and thus, better attack performance. Mean average precision (mAP) is used as the evaluation metric. The effects of using different values are discussed in Section IV-D.

C. RESULTS AND DISCUSSION
In the first set of experiments, we used ResNet-152 as the source model of the attacks. The results are summarized in Table 1, wherein the first three rows show the mAP values for the baseline victim models, namely DG-Net, AlignedReID and PLR-OSNet, on different benchmark datasets. The mAP values for DG-Net are 86.0%, 61.1%, 74.8% and 52.3%, and the mAP values for AlignedReID are 82.3%, 70.7%, 82.8% and 43.7% for Market1501, CUHK03, DukeMTMC-ReID and MSMST17 datasets, respectively. For PLR-OSNeT, the mAP values are 88.9%, 77.2% and 81.2% for Market1501, CUHK03 and DukeMTMC-ReID datasets, respectively. These three models are regarded as the state-of-the-art ReID approaches based on their performance. The fourth to sixth rows in Table 1 show the value of the mAP after the victim models are attacked with TI-FGSM, which is a state-of-the-art attack method.  Table 1 show the value of mAP after the victim models are attacked with TI-DIM, which is another state-of-the-art attack method. Compared to TI-FGSM, this attack is more effective since it causes more drops in the mAP values for all four datasets. For instance, for the CUHK03 dataset, the mAP value of DG-Net drops by 46.9 from 61.1 to 14.2, the mAP value of AlignedReID drops by 54.2 from 70.7 to 16.5, and the mAP value of PLR-OSNet drops by 58.1 from 77.2 to 19.1. The last three rows of Table 1 show the mAP value after the victim models are attacked with the proposed DR approach. As can be seen, our proposed approach is the most effective attack compared to TI-FGSM and TI-DIM, and causes the most drop in the mAP values for all victim models and for all four datasets. For instance, for the CUHK03 dataset, the mAP value of DG-Net drops by 53.3 from 61.1 to 7.8, the mAP value of AlignedReID drops by 62.4 from 70.7 to only 8.3, and the mAP value of PLR-OSNet drops by 67.7 from 77.2 to only 9.5. Fig. 3 shows some example images and query results for the Market-1501 dataset. The first column shows the query images, and columns 2 through 11 show the Rank 1 to Rank 10 returned images for that query, respectively. The first and third rows are for the original query images, while the second and fourth rows are for the adversarial query images. The perturbations between the query images of first versus second row and third versus fourth row are imperceptible to the human eye, but the person ReID performances have been significantly impacted by the proposed attack. Similar results for CUHK03 and DukeMTMC-ReID datasets are shown in Fig. 4 and Fig. 5, respectively. We report the overall results for the MSMT17 dataset in Table 1, and are not able to provide example images due to the release agreement.
The examples in Figures 3, 4 and 5 show the effectiveness of the proposed DR attack. In these figures, the adversarial examples, although imperceptible to human eye, result in no matches even in Rank 10 returns. As a quantitative measure, we computed the peak signal to noise ratio (PSNR) as well as the structural similarity index measure (SSIM) between the adversarial images (generated by the TI-FGSM, TI-DIM and the proposed DR attack) and the original images, and calculated the average on the Market1501 dataset. The average SSIM value is 0.70, 0.72 and 0.72 for TI-FGSM, TI-DIM and the DR attacks, respectively. The average PSNR is 26, 28 and 27 for TI-FGSM, TI-DIM and the DR attacks, respectively. Since the perturbation budget is kept the same ( = 4) for all the attack methods, their average SSIM and PSNR values are similar. Some example adversarial images generated by these attacks are shown in Fig. 6 for qualitative comparison.
In the second set of experiments, we used two other network models, namely VGG-16 and InceptionV3, as our surrogate source models. The goal here was to use different network models, other than ResNet, to generate adversarial examples and show the generalizability of the proposed DR approach. We have generated AEs by using these different networks as the source models with the proposed DR approach and with TI-DIM. We then used the AEs to attack DG-Net and AlignedReID. In this experiment, we chose to use TI-DIM, since it has better attack performance than TI-FGSM based on Table 1. The results obtained with our proposed DR attack are summarized in Tables 2 and 3 when the victim ReID method is AlignedReID and DG-Net, respectively. As can be seen, when Resnet-152 is used as the surrogate model, it results in the highest drop in the mAP values. This is mostly because most of the ReID approaches use ResNet as their backbone network. However, even when we use VGG-16 or InceptionV3 as the surrogate source model, the proposed DR attack still causes significantly more drop in the mAP values compared to the state-of-the-art attack   (please see Tables 1, 2 and 3).
The results obtained with the TI-DIM are summarized in Tables 4 and 5, which show the results obtained with  the TI-DIM attack, with using different surrogate models,   TABLE 3. mAP values on different datasets when DG-Net is attacked with the proposed DR approach. First row is the performance before attack. Last three rows show the results when AEs are generated by using different network models as the surrogate models.
when the victim ReID method is AlignedReID and DG-Net, respectively. When we compare Table 2 with Table 4, and  Table 3 and Table 5, it can be seen that the proposed DR attack still outperforms the TI-DIM as a blackbox attack even when the surrogate model is different from the target model.

D. EFFECT OF ON THE PERFORMANCE
In literature, it is a common practice to fix the value of , and then compare the performance degradation for different attack methods. In the experiments above, we set = 4, since it results in less change in the original image, and better demonstrates the difference between the attack methods. When is increased, more budget is given to each attack method to make changes on the original images, and they TABLE 4. mAP values on different datasets when AlignedReID is attacked with the TI-DIM. First row is the performance before attack. Last three rows show the results when AEs are generated by using different network models as the surrogate models. start to provide similar performance. A better attack should be able to provide more performance degradation with a smaller budget. As shown in Table 6 and Fig. 7, our proposed DR attack can reach a given attack effectiveness by using the least budget. For instance, the proposed DR attack drops the mAP value of DG-Net to 20.3 with an budget of 8,  whereas TI-DIM needs a budget of 12 to drop the mAP to 21.8.

V. CONCLUSION
Neural network-based methods have achieved state-of-the-art performance on the person re-identification problem across different camera views. In this paper, we have presented a black-box and effective attack model, which is based on dispersion reduction, and does not rely on task-specific loss functions and label queries. We have used the adversarial examples generated by this approach to attack three different state-of-the-art person Re-ID models. We have also compared the performance of our attack approach with two other state-of-the-art attack models. The results demonstrate the effectiveness and generalizability of the proposed dispersion reduction attack on three state-of-the-art person ReID models. It also outperforms other state-of-the-art attack models by a large margin, and results in the most drop in the mean average precision values.

ACKNOWLEDGMENT
The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.