Backdoor Attacks to Deep Learning Models and Countermeasures: A Survey

Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years. In backdoor attacks, the attackers try to plant hidden backdoors into DNN models, either in the training or inference stage, to mislead the output of the model when the input contains some specified triggers without affecting the prediction of normal inputs not containing the triggers. As a rapidly developing topic, numerous works on designing various backdoor attacks and developing techniques to defend against such attacks have been proposed in recent years. However, a comprehensive and holistic overview of backdoor attacks and countermeasures is still missing. In this paper, we provide a systematic overview of the design of backdoor attacks and the defense strategies to defend against backdoor attacks, covering the latest published works. We review representative backdoor attacks and defense strategies in both the computer vision domain and other domains, discuss their pros and cons, and make comparisons among them. We outline key challenges to be addressed and potential research directions in the future.


I. INTRODUCTION
Deep learning models have been widely used in many fields in recent years due to their superior performance in solving complex problems by automatically extracting task-related features. They have achieved high performance similar to or even higher than human beings, leading to their popular adoption in a broad range of fields including network security [1], computer vision [2], autonomous driving [3], natural language processing [4], malicious detection, and stock market analysis. Some emerging fields [5] such as brain circuit reconstruction [6], particle accelerator data analysis [7], multi-modality [8], and effects of DNA mutations [9] are also exploiting the application of deep learning.
With the rapid application of deep learning, the security issues of deep learning models have risen and gained increasing attention. The earliest security problems of deep learning models appeared in malicious email detection systems, where the attackers tried to change the decision boundary by data poison [10]. Recently, the security problems faced by deep learning models have become increasingly prominent with many novel types of attacks emerging, including adversarial sample attacks, membership inference attacks, and backdoor attacks. In adversarial sample attacks, the attackers generate adversarial samples [11] by adding perturbations to samples that are imperceptible to human beings but would mislead the output of deep learning models. In membership inference attacks [12], the attackers try to infer whether a specified record was in the training set of a deep learning model. In backdoor attacks [13], the attackers try to embed some backdoors into the deep learning model that would be activated by some triggers in the input. Different types of attacks require different levels of permissions to the model and the training data, but they generally cause serious security threats to the usage of deep learning models in both the cyber domain and the physical domain.
Among all the deep learning security issues, backdoor attacks have been a hot research issue since they were proposed in 2017 [13], they will become a significant potential threat. In backdoor attacks, the attacker plants some backdoors in the target deep model, either by poisoning data in the training stage or modifying the parameters in the deployment stage, to mislead the outputs of the model by using some specified triggers. The backdoored model performs normally on input without triggers but outputs incorrect labels for inputs containing the triggers. Considering that many users might delegate the training and deployment of deep models to the third party, backdoor attacks are easy to launch and thus pose significant threats to users without expert knowledge of training deep learning models. Compared to other attacks, backdoor attacks are more stealthy and difficult to defeat. Fig. 1 plots the number of published papers related to backdoor attacks in recent years. The data are retrieved from the Web of Science. The number of publications continues to increase rapidly, indicating that backdoor attacks receive increasing attention.
While backdoor attacks are one of the main threats to deep learning security, a comprehensive overview of backdoor attacks and defense strategies for backdoor attacks is still missing. On the contrary, the overview [14] of adversarial sample attacks is relatively complete. Li et al. [15] provide the first systematic overview. However, the latest studies are not included, so it is difficult for readers to learn the latest trend in backdoor attacks. Gao et al. [16] present the attack strategies of backdoor attacks at different stages of deep learning models. However, it is not classified according to the characteristics of attack methods, which is not conducive people to understanding the characteristics of existing strategies. Li et al. [17] collected much information about backdoor attacks. However, the principle of representative work was not introduced in detail. In this paper, we present a systematic introduction to backdoor attacks, review the development and current status, introduce the most representative attack and defense methods, and give classifications.
The paper is organized as follows. In Section II, we briefly introduce backdoor attacks. The evaluation parameters and related datasets are described in Section III. In Section IV, we propose our classification of backdoor attack strategies, including the representative methods and principles. We give a classification of defense strategies and make an introduction to representative defense strategies in Section V. Then, we introduce backdoor attacks in other fields in Section VI. Finally, we give concluding remarks in Sections VII and VIII.

II. PRELIMINARIES
Backdoor attacks refer to an attack method by implanting backdoors into the model. When backdoors in the model are triggered, it will mislead the model and output the label specified. In this chapter, we introduce the definition, evaluation indicators, and commonly used datasets of backdoor attacks and compare them with other attack methods.

A. BACKDOOR ATTACK IN DEEP LEARNING MODEL
The training of deep learning consists of multiple steps such as data collection, data preprocessing, model selection and construction, training, model saving, and model deployment, which means that attackers have more opportunities. Furthermore, the power of deep learning models relies heavily on large amounts of training data and computing resources. To reduce training costs, an increasing number of users choose to use third-party datasets or train models based on third-party platforms; users can even directly use third-party backbone networks or pre-trained models. Although these methods provide convenience, users lose control over the training phase, increasing the risk of models being attacked. Backdoor attacks occur in the training phase of deep learning. In this phase, the neural network inputs data and continuously adjusts the weights through the data and corresponding labels, that is, the process of generating the model. The attacker modifies the training set, inserts triggers into the data to generate poisonous data, uses the training set with poisonous data to train the model, and makes the model remember the triggers to embed backdoors in the model. A model with a backdoor in the validation phase outputs the attacker's target label when processing data with triggers.

B. DEFINITION OF BACKDOOR ATTACK
Backdoor attack is an attack against a deep learning model. The attacker implants some backdoors into the model in some way during the model training process. When the backdoor is not activated, the attacked model has the same performance as the normal model; when the backdoor embedded in the model is activated by an attacker, the output of the model becomes a label designated by the attacker in advance to achieve malicious purposes.

C. THREAT MODEL OF BACKDOOR ATTACK
Backdoor attacks on image classification f: X→Y, where X is an image domain and Y is a set of classes. The adversary aims to produce backdoor model M that exhibits the desired behavior. B(·) is the backdoor image transformation function, and T(·) is target label function (1) Fig. 2 shows the life cycle and different threats. Data poisoning attack [10] is the first security threat faced by deep learning, which attacks the data collection and preprocessing stage. The goal of this attack method is to affect the accuracy of model inference. It should be noted that what is affected here is the overall accuracy of the model, which means that the overall performance of the model will decrease, so the data poisoning attack is also known as the availability attack.  Backdoor attacks can be implemented through data poisoning, but the accuracy of benign samples in the inference stage cannot be greatly affected, but only in the face of samples with triggers will they give incorrect judgments and make mistakes. The result is also controlled by the attacker. Adversarial examples [11] refer to deliberately adding imperceptible perturbations to the input examples to make the model give an incorrect output with high confidence. Although they are similar, they are essentially different. The most direct difference is that the adversarial samples attack the inference stage after the model is deployed, while the backdoor attack starts from the data collection stage, which runs through the entire life cycle of deep learning. Furthermore, adversarial examples are transferable between different models performing the same task, which is not the property of triggers in backdoor attacks.

D. COMPARISON OF BACKDOOR ATTACKS AND OTHER ATTACKS
Universal Adversarial Patch (UAP) [18] can be considered a special form of adversarial example. An adversarial example generates its specific perturbation for each example, while UAP generates a general well-crafted perturbation for any example. This looks similar to the trigger in a backdoor attack. However, triggers can be arbitrary, while UAPs cannot be constructed arbitrarily. What the UAP looks like depends on the model, and what the triggers look like can be completely controlled by the attacker. Table 1 shows the information of the dataset related to backdoor attacks.

A. EVALUATION PARAMETERS
There are many ways to measure backdoor attacks, and different attack methods focus on different indicators. Analogous The two indicators of attack success rate (ASR) and clean data accuracy (CDA) rate are common metrics. The attack success rate (ASR) is the success rate at which the model recognizes the poisonous sample as the target label when it detects the poisonous sample, it represents the strength of the backdoor and the trigger The clean data accuracy (CDA) is the rate that refers to the accuracy of the normal data recognition of the model inserted into the backdoor, which represents the impact of the backdoor on the model A qualified backdoor attack scheme should minimize the impact on the model itself with a high attack success rate. And the capability of backdoor attacks to confront detection is also an important indicator.

B. DATASETS FOR BACKDOOR ATTACKS
Existing work on backdoor attacks mainly focuses on computer vision. First, attackers added triggers to the MNIST dataset [19] to insert backdoors into the model. The MNIST dataset is a dataset of handwritten digits. In deep learning, handwritten digit recognition is a classic example.
At present, image recognition is the most widely used in backdoor attacks. The most typical datasets are the CI-FAR [20] and ImageNet [21]. CIFAR [20] is a widely used standard dataset that contains plenty of objects in the real world, which not only have a lot of noise in the images but also have different proportions and characteristics. ImageNet [21] is a large-scale visualization database for visual object recognition software research. These two datasets contain rich data types and are suitable for multiple scenarios.
Considering the threat of backdoor attacks for real-life scenarios, the academic community has explored their application to datasets related to the field of autonomous driving as well as face recognition. One of the most commonly used datasets is the German Traffic Sign Benchmark (GT-SRB) [22]. The YouTube Face [23] dataset is one of the most used datasets. All videos were downloaded from YouTube. The PubFig [24] dataset is a real-world face dataset. Unlike most other existing face datasets, the images are taken in different scenarios and this dataset is mainly used for face recognition in unrestricted scenarios. VGGFace [25] is a face recognition dataset covering a wide range of poses, ages and races. In addition, we try to minimize noise in the picture as much as possible. The LFW (Labelled Faces in the Wild) [26] database is used to study the problem of face recognition in unrestricted situations. The face images provided are derived from natural scenes in life, especially due to factors such as  pose, lighting, expression, age and occlusion, which make even the same person's photo very different, making LFW relatively more difficult to recognize than other datasets.

IV. ATTACK STRATEGIES OF BACKDOOR ATTACK
In this section we have categorized backdoor attacks according to their characteristics and make systematic introduction. We categorized them based on the visibility of the trigger, the type of trigger (digital or physical), whether the poisoning data label is changed, who it works on and whether it targets the training process. As shown in Fig. 3. It is important to note that there is a crossover between these methods. Table 2 shows the characteristics and corresponding literature of different backdoor attack strategies.

A. VISIBLE BACKDOOR ATTACK
The backdoor attack was first proposed by Gu et al. [13]. They designed a backdoor attack by poisoning the training dataset by changing the pixel value in the lower right corner of the image as a trigger. Specifically, in the data processing stage, randomly select data from the training dataset to change the pixel value size of its fixed position to add triggers, and add backdoor versions of these images to the training dataset. The ground truth label of each backdoor image is set according to the attacker's goal above. Then, the baseline MNIST DNN is retrained using the poisoned training dataset. The experiments also found that in some attack instances, the training parameters, including step size and mini-batch size, must be changed for the training error to converge and the attack to be successful. This method is called BadNets as Fig. 4 shown, and demonstrates the feasibility and effectiveness of backdoor attacks.
However, the triggers in BadNets are too obvious; they are easily detected by the human eye and exclude backdoors, thus causing the attack to fail. To improve the success rate of backdoor attacks, it has been proposed to use more natural triggers to reduce the probability of detection. Liu et al. [31] adopted a commonly used natural reflection phenomenon as a trigger. The trigger under this method not only easily escapes the detection of human eyes but can also be triggered in real scenes, which has strong practical significance. The trigger mode can be expressed as k is the convolution kernel. The original image of the x R reflection image, x R k is called the reflection. According to the principle of camera imaging and the law of reflection, there are three situations in which the main object behind the glass and the position of the reflected virtual image are located in the reflection phenomenon. Thus, the reflection models in the physical world scene can be divided into three categories. The reflected image patterns in the scene are different, and the convolution kernel k takes values according to different scenes when constructing the poison image.

B. INVISIBLE BACKDOOR ATTACK
Although these backdoor attacks have achieved good attack results, the triggers are still visible to humans and can be recognized by the human eye. Furthermore, making a trigger that looks natural is challenging, and most of the work is not ideal for eye detection evasion rates. Some scholars proposed attack strategies of invisible triggers to evade the inspection of the human eye.
Chen et al. [32] proposed the first invisible backdoor attack strategy, where they made triggers by adding random noise to the picture. This method is defined as follows x is a vector representation of the input instance; for example, in a face recognition scenario, the input instance x can be an However, the addition of noise will also have a certain impact on the pixels of some images, making them significantly different from the original image. For this issue, Turner et al. [33] proposed an optimization scheme for backdoor attacks. Specifically, given a pre-trained classifier f θ with loss L and an input label pair (x, y), the perturbation variable of x is constructed based on the p norm as ε is a small constant representing the magnitude of the perturbation. Projected Gradient Descent (PGD) is used to solve this optimization problem. Subsequently, some people are [34], [35], [36] also proposed methods to optimize backdoor flipflops based on the p norm. Zhao et al. [37] proposed optimization by measuring poison data through cross entropy to produce adaptive imperceptible perturbations and limit the potential representation of poison data during training to enhance the stealth of attacks and the resistance of defense. Xia et al. [38] verified the distribution of multilayer representations on rule-trained backdoor models differed significantly using maximum mean difference (MMD), energy distance (ED) and sliced Wasserstein distance (SWD) as metrics. Then, a difference reduction method ML-MMDR with multilevel MMD regularization in the loss is proposed to optimize the poison data.
Similar to the idea of adding a perturbation as a trigger, Nguyen et al. [39] proposed warping-based triggers, called WaNet. They use a warping function W and a predefined warping field M to construct injection function B to inject the trigger. Redesign the injection function B based on the image warping A slight change to the image is made as a trigger.
Li et al. [40] adopted DNN-based image steganography to generate invisible backdoor triggers Fig. 5 shows the process of this strategy. They convert triggers into global noise through steganography, embed them in pictures as triggers, and perform optimization iterations on poison pictures. This method achieves promising experimental results showing that it can bypass most existing backdoor defenses because its triggering pattern is sample specific. Zhang et al. [41] proposed embedding trigger patterns in the edge structure of poisoned images, which belong to high-frequency components and are not easily destroyed in data preprocessing. This method is called Poison Ink.
Wang et al. [42] proposed a stealthy and efficient Trojan attack, BPPATTACK, by using image quantization and dithering as Trojan triggers, and using a method based on contrast learning and adversarial training to inject the Trojan. The first step is to perform image quantization, which consists of two steps. First, the original palette of the image is compressed into a smaller palette by reducing the color depth. Then, image dithering techniques are used to enhance concealment by removing obvious artifacts by exploiting the existing colors of the artifacts.
Most of the current work generates poisoned samples in the pixel domain, Hammoud and Ghanem [43] attempted to generate invisible trigger patterns in the frequency domain with success. After training the network naturally, they generated a Fourier heatmap of the model and analysed the sensitivity of the DNN to input perturbations through the Fourier heatmap. After obtaining the sensitive frequencies of the network, these frequencies are used to embed frequency-based backdoors. and select the top k most sensitive frequencies as poisoning filters. Then, this filter is used to poison a subset of the training dataset. It should be noted that invisible triggers, while reducing the likelihood of detection by the human eye, have fewer scenarios for their application. The process is shown in Fig. 6.

C. CLEAN-LABEL BACKDOOR ATTACK
Although poisoned images are similar to benign images in invisible attacks, their source labels are often different from target labels. That is, the label of the poisoned sample points is the wrong label instead of the original correct label. Therefore, unseen attacks can still be detected by examining the image-label relationships of training samples. To address this issue, a special subclass of invisible poisoning-based attacks is proposed, called clean label invisible attacks. Turner et al. [33] first studied the Clean-Label attack, where they exploited an adversarial perturbation or generative model to first modify some benign images in the target class, followed by a standard stealth attack. The modification process aims to mitigate the  effects of "robust features" contained in poisoned samples to ensure that the DNN can successfully learn triggers. More recently, Zhao et al. [44] extended this idea by adopting a generic perturbation instead of a given perturbation as the triggering pattern. Souri et al. [45] proposed a new backdoor attack by minimizing the distance in the feature space and injecting the information of poison samples generated by previously visible attacks into the texture of the target class image. As shown in Fig. 7 given a target image t, a source image s, and a trigger patch p, the trigger is pasted on s to obtain the patch source images. Then optimize the poisoned image z by addressing the following optimizations However, compared to poison-label attack, clean-label attack is relatively weak, has a lower attack success rate, and requires a higher injection rate.

D. PHYSICAL BACKDOOR ATTACK
The backdoor attack requires the input image to have a trigger, but this is almost impossible in real-world scenarios because in real-world scenarios, after the camera captures the image, it will be sent directly to the model of the central server without going through a third party. This greatly limits the application scenarios of backdoor attacks. To solve this problem, people began to explore the possibility of backdoor attacks in the physical world. The possibility of this attack was first explored by Chen et al. [32], who employed a pair of glasses as a physical trigger to mislead an infected face recognition system developed in the camera. Wenger et al. [46] also discuss further exploration of offensive face recognition in the physical world, Fig. 8 shows the setting of the triggers noting that the success rate of backdoor attacks is related to the strength of the trigger.
Bagdasaryan et al. proposed a novel physical world backdoor attack [47], [48] that uses the physical signs of the object itself as a trigger to assign an attacker-chosen label to all images with a specific feature (such as a green car or a belt car with racing stripes) for training to create a semantic backdoor in the DNN. A similar idea is also explored in [49], where hidden backdoors can be activated by a combination of certain objects in the image. Physical world triggers do not make changes to the picture, so the features of the triggers are not as easily learned by the neural network compared to triggers in digital space. In this regard, Xue et al. [50] proposed performing a series of physical transformations on the images to augment the physical world triggers.
However, these "naturally" existing triggers in images are less strong and require more poisoned data for training than digital backdoor attacks, which have a higher training cost.

E. MODEL-BASED BACKDOOR ATTACK
Backdoor attacks are implanted by having the model train on poison images. However, if the attacker does not have access to the dataset, can the attack be performed? Considering that "backdoor" is a change in the weights of neurons in the model, it has been proposed to directly modify the model to implant the backdoor. Dumford et al. [51] proposed a weight-oriented attack, where they employed a greedy search between models to apply different perturbations to the pretrained model weights. As shown in Fig. 9 it changed the model but did not change the structure of the model. After that, Rakin et al. [52] introduced a bit-level weight-oriented backdoor attack, the targeted bit Trojan (TBT), which flips key weight bits stored in memory. Furthermore, Garg et al. [53] proposed adding adversarial perturbations to the model parameters of backdoor-injected benign models, showing new security threats using publicly available trained models. Recently, Zhang et al. [54] formulated the behavior of preserving benign sample accuracy as the consistency of infection models and provided a theoretical explanation for Adversarial Weight Perturbation (AWP) in backdoor attacks. Based on the analysis, they also introduced a new AWP-based backdoor attack with better global and instance consistency.
Tang et al. [55] proposed inserting a trained malicious backdoor module into the target model to directly modify the structure of the model to embed a hidden backdoor. This attack is simple and effective, and the malicious module can be combined with all DNNs. The process is shown in Fig.  10. Qi et al. [56] proposed performing the backdoor attack by directly replacing the corresponding neurons in the deep learning model instead of adding a narrow subnet of the benign model. Based on this idea, Wang et al. [57] also proposed a similar approach to perform the replacement.

F. SEQUENCE-BASED BACKDOOR ATTACK
Recently, Shumailov et al. [58] proposed injecting hidden backdoors by manipulating the order of training samples without changing the samples. This method does not directly change the structure of the image or model, which is more difficult to detect than other methods.

V. DEFENSE STRATEGIES OF BACKDOOR ATTACKS
To address the threat of backdoor attacks, defense strategies have been proposed. The goal of defense can be divided into two types: 1. Disabling the trigger; 2. To eliminate the backdoor in the model. Depending on the target of the defense strategy, there are three types of defense against backdoor attacks: data-based defense, model-based defense and triggerbased defense, as shown in Fig. 11. The defender's authority varies in different scenarios, as does the chosen defense strategy. Table 3 shows the mechanism and corresponding literature of different backdoor defense strategies.

A. DATASET-BASED DEFENSE STRATEGIES
The reason why most backdoors are implanted and activated is because the model is exposed to poison data, so removing poison data from the training set has become an important research direction for backdoor defense.
Liu et al. [59] proposed the first preprocessing-based backdoor defense, which introduced a preprocessing module before feeding the samples into the DNN to change the trigger patterns contained in the attacked samples, and the modified triggers were no longer available. Matches hide backdoors, thus preventing backdoor activation. They adopted a pretrained autoencoder as a preprocessor. Inspired by the idea that trigger regions contribute the most to prediction, Doan et al. [60] proposed the use of GANs to process parts of images that may have triggers, and they designed a two-stage image preprocessing method (i.e., Februs). In the first stage, February uses GradCAM [61] to identify regions of influence, generating heatmaps to illustrate important regions in the input that contribute significantly to the learned features. In the second stage, a GAN-based inpainting method is employed to reconstruct the masked regions.Based on this idea, Udeshi et al. [62] also designed a square trigger interceptor using the dominant color in the image to locate and remove backdoor triggers.
However, preprocessing does not achieve good results for some triggers with extreme robustness, and preprocessing requires processing the images of the entire dataset computationally too costly; for this reason filtering the dataset has been proposed to remove poison data from the dataset. After the filtering process, only benign samples or purified poisoned samples are used during training or testing to eliminate the backdoor at the source. Tran et al. [63] first demonstrated that poisoned samples tend to leave detectable traces in the covariance spectrum of the feature representation and can be used to filter poisoned samples from the training set. Additionally, inspired by the idea that poisoned and benign samples should have different features in the hidden feature space, Chen et al. [64] proposed a two-stage filtering method. In stage one, the activations of the training samples in each class are clustered into two clusters. In stage two: Determine which clusters correspond to the poisoned samples. For the image dataset, image sprites are constructed for each cluster and the images activated in the cluster are averaged.
Zeng et al. [65] found some high-frequency artifacts in the poisoned samples of existing attacks. They transformed the images to the frequency domain using the discrete cosine transform (DCT). The images were represented as the sum of cosine functions of different amplitudes and frequencies. The DCT spectra were plotted as heatmaps, where the size of each pixel represents the coefficient of the corresponding spatial frequency. Due to the energy compression capability of the DCT, the magnitude of the coefficients decreases rapidly as the frequency increases. Most of the energy of a natural image is usually concentrated in the low-frequency part. Due to time-frequency duality, local triggers have infinite bandwidth and therefore carry a significant high-frequency component of their own. Based on this observation, they devised a simple and effective filtering method based on these artifacts.
Since it is observed that most of the existing backdoor triggers are input agnostic, Gao et al. [66] proposed filtering the attacked samples by overlaying various image patterns on the suspicious samples, adding clean images in the validation set as watermarks to the inputs sequentially to obtain multiple perturbed inputs with watermarks added, and then these multiple perturbed inputs are all input to the model and obtain the predicted labels. If the calculated entropy value is less than the set threshold, the input is judged to have a trigger, which means the input is a poison input.

B. MODEL-BASED DEFENSE STRATEGIES
However, in some cases, the user of the model does not have permission to access the data. In such a case, if you want to detect whether there is a backdoor, you can only directly detect the model. For the model that detects an abnormality, one can refuse to deploy it. This prevents backdoor threats. Kolouri el al. [67] first discussed how to diagnose a given model. They jointly optimized some Universal Litmus Patterns (ULPs) and a meta-classifier, which was further used to diagnose suspicious models based on the predictions of the obtained ULPs. The intuition of this idea is that the patterns learned by CNNs are essentially combinations of salient features of objects, and CNNs are almost invariant to the locations of these features. When the network is poisoned, it learns that triggers are a key function of an object. During optimization, the formation of each ULP is a collection of various triggers. Therefore, when such a ULP is presented, the network will respond positively with a high probability if it is trained with triggers.
Huang et al. [68] observed that heatmaps of benign and infected models have different characteristics based on three extracted features of generated saliency maps, sparse, smooth and persistent, and adopted an outlier detector as a metaclassifier.
Although refusing the release of backdoor models can reduce the threat of backdoor attacks, it also brings many inconveniences. In many cases, model users obtain public models from third parties for use, and it takes considerable time to obtain a new model. with energy. In this case, how to use the existing model to safely complete the recognition task has become the focus of research. For this reason, some people propose modifying the backdoor model to remove the backdoor in the model and make it the same as the normal model. Liu et al. [59] proposed retraining the model with possible backdoors with some locally benign samples to let the model forget the backdoor and reduce the backdoor threat, based on DNN-based catastrophic forgetting [69]. This idea is further explored by Zeng et al. [70], who formulate retraining as a minimax problem and employ implicit hypergradients to account for the interdependencies between internal and external optimizations.
The essence of the backdoor is that some neurons in the model are activated, and when the model predicts benign samples, these neurons are usually dormant and not activated. Based on this, Liu et al. [71] proposed a fine pruning method. The DNN is first pruned to remove infected neurons, and then the pruned network is fine-tuned to combine the advantages of pruning and fine-tuning defenses to remove hidden backdoors. In addition, Zheng et al. [72] found that benign models and infected DNNs have significant differences in topology, which can be used to diagnose suspicious models.
Li et al. [73] proposed a new defense framework called neural attention distillation (NAD) for removing backdoor triggers from backdoor DNNs. NAD differs from traditional linear tuning approaches in that instead of using a linear network directly as the final model, it uses it as a teacher network and combines it with the original backdoor network (i.e., the student network) through an attention extraction process. It works by aligning neurons that are more sensitive to trigger patterns with benign neurons that are only responsible for meaningful representations. Although NAD can effectively eliminate backdoor triggers in DNNs, it still suffers from nonnegligible attack success rate (ASR) and low classification ACCuracy (ACC) because NAD focuses on backdoor defence using attentional features of the same order (i.e., attention graphs). Xia et al. [74] proposed a new framework for backdoor defense based on this -Attention-Relationship Graph Distillation (ARGD), a framework that utilizes the attentionrelationship graph (ARG) to fully investigate the correlation between attentional features of different orders. Based on the alignment of ARG between teacher and student models during knowledge extraction, ARGD can eliminate more backdoor triggers than NAD.

C. TRIGGER-BASED DEFENSE STRATEGIES
With the continuous innovation of backdoor attack strategies, the robustness of backdoors is becoming increasingly stronger, which brings great challenges to defense work. For this reason, the academic community proposes a defense based on trigger synthesis. First, the backdoor triggers are synthesized, and then the impact of triggers is suppressed to eliminate backdoor hidden strategies. Wang et al. [75] proposed the first defense based on trigger synthesis (i.e. neural cleansing), which is currently the most widely used defense. The defender first obtains the potential trigger patterns for each class and then determines the final synthetic trigger and its target label based on anomaly detection. After the model is embedded in the backdoor, the decision boundary is changed so that normal inputs require fewer modifications to become the target class. Based on this knowledge, we try to construct the trigger in reverse. An input with a trigger has the following form, where is the trigger, m is the smoothing coefficient, and x is the input, the original image Converting the construction of flip-flops into an optimization problem, how to construct flip-flops with minimal perturbation min m, f (A(x, m, ))) + λ · |m|.
Qiao et al. [76] noticed that backwards triggers synthesized by neural cleaning are usually significantly different from those used during training; they first discussed the generalization of backdoor triggers. They demonstrate that the infected model will generalize its original triggers during training. Therefore, they suggested recovering the trigger distribution based on the maximum entropy step approximator instead of defense-specific triggers. Hu et al. [77] designed a topology to improve the quality of trigger synthesis. For a given trained DNN model (clean or backdoor) and some clean images, gradient descent is used to reconstruct the triggers that can flip the model predictions. To improve the quality of reconstructed triggers, new diversity loss and topological priors are introduced. They help to recover high-quality multiple triggers. To take full advantage of the recognition power of the reconstructed triggers for Trojan detection, features are extracted based on trigger features and correlation network activation. These features are used to train a classifier called the Trojan detection network to classify a given model as Trojan or clean.
The lottery hypothesis reveals the existence of sparse subnetworks that can achieve competitive performance as dense networks after being trained independently. Chen et al. [78] studied the Trojan DNN detection based on this theory from a novel sparse lens and proposed a new Trojan network detection mechanism: first locate a "winning Trojan ticket", which preserves almost complete Trojan information but has only chance-level performance on clean inputs; then recover triggers embedded in this already isolated subnetwork.

VI. BACKDOOR ATTACK IN OTHER FIELDS
Backdoor attacks have expanded into other fields. The basic principles of backdoor attacks are the same in different fields, the biggest difference being the trigger designs.

A. BACKDOOR ATTACK IN NLP
Natural language processing is the most widely studied area of backdoor attacks other than image classification. Dai et al. [79] discussed how to attack LSTM-based model.
They proposed a bad network-like approach by using emotionally neutral sentences as triggers and randomly inserting them into some benign training samples. Chen et al. [80] further explored this issue by proposing to construct characterlevel, word-level, and sentence-level makes of three different types of triggers. There are also other backdoor attack efforts targeting different trigger types [81], [82], [83], [84] and model components [85], [86] in different natural language processing tasks.

B. BACKDOOR ATTACK IN GNN
Zhang et al. [87] presented the first backdoor attack against graph neural networks (GNNs). They designed a subgraphbased backdoor attack on GNNs for graph classification. Since subgraphs are supposed to be unique in clean training/testing graphs, backdoor GNNs are more likely to associate target labels with subgraphs. This attack strategy uses subgraphs as backdoor triggers. A random subgraph is generated as a backdoor trigger. Suppose a subgraph consists of t nodes. Injecting the subgraph into the graph implies sampling t nodes randomly and uniformly from the graph, mapping them to t nodes in the subgraph at random, and replacing their connections with the subgraph. Chen et al. [88] successfully performed a backdoor attack on dynamic link prediction (DLP), which is known as Dyn-Backdoor.

C. BACKDOOR ATTACK IN 3D POINT CLOUDS
Tian et al. [89] proposed MorphNet, the first backdoor attack method on point clouds. Point clouds can be spectrally decomposed into low-frequency and high-frequency components. The former builds the key geometric structure, while the latter describes more local fine-grained details. This spectral decomposition helps achieve the two goals of visual similarity and trigger injection. MorphNet refines the network by dividing it into two branches: a residual branch that delivers untouched samples and a poisoned branch that is responsible for modifying the samples and hiding triggers within them. The sample adaptive triggers are hidden in the fine-grained high-frequency details. Second, a third loss in MorphNet is proposed for suppressing isolated points, thus improving the resistance to denoising-based defenses.
In addition, Xiang et al. [90] proposed the first backdoor attack against PC classifiers. They insert a cluster of points in a PC as a powerful backdoor pattern customized for 3D PCs. An independently trained proxy classifier is used to optimize the location of the cluster and select the local geometry of the cluster to evade possible PC preprocessing and PC anomaly detectors (ADs). Li et al. [91] explored this reverse in depth and proposed the design of a clean-label backdoor attack (PointCBA) in 3D point clouds with a unified framework using the unique properties of 3D data and networks.

D. BACKDOOR ATTACK IN REINFORCEMENT LEARNING
Kiourti et al. [92] designed TrojDRL, a tool for exploring and evaluating backdoor attacks against deep reinforcement learning agents. TrojDRL exploits the sequential nature of deep reinforcement learning (DRL) and considers different levels of threat models. Experiments show that untargeted attacks on state-of-the-art actor-critic algorithms can bypass existing backdoor defenses. Wang et al. [93] migrated backdoor attacks to more complex RL systems involving multiple agents and explored the possibility of triggering backdoors without directly manipulating agent observations. They demonstrate that in a two-person competing RL system, an adversary agent can trigger the victim agent's backdoor through its own actions.

E. BACKDOOR ATTACK IN ACOUSTICS SIGNAL PROCESSING
Speaker verification has been widely used for user identification in many mission-critical domains. Zhai et al. [94] demonstrated it is feasible to infect speaker verification models by poisoning training data to inject hidden backdoors. They designed a clustering-based attack scheme where poisoned samples from different clusters contain different triggers (i.e., predefined discourse) based on our understanding of the verification task. Moreover, they demonstrate that existing backdoor attacks cannot be used to attack speaker verification directly. This issue was further explored by Koffas et al. [95], who used ultrasound, which is inaudible to the human ear, as a trigger, and two versions of the speech dataset and three neural networks on which experiments were conducted to explore the performance of the attack in terms of duration, location, and trigger type. The experiments find that short and discontinuous triggers lead to successful attacks. However, as the triggers are inaudible to the human ear, they can be triggered as long as possible without raising any suspicion, thus making the attack more effective. Finally, they performed the attack in real hardware and found that the attacker could manipulate the reasoning in the Android application by playing inaudible triggers over the air.

F. BACKDOOR ATTACK IN FEDERATED LEARNING
In addition to the classical training paradigm, how to perform backdoor collaborative learning, especially federated learning, has been a focus of interest. Bagdasaryan et al. [47] introduced the first backdoor attack against federated learning by amplifying the poisoning gradient of node servers. Later, Bhagoji et al. [96] discussed hidden model poisoning backdoor attacks, presented two key stealth concepts to detect malicious updates, and used alternate minimization strategies to improve attack steganography. They also prove that Byzantine resilient aggregation strategies are not robust to our attacks. Xie et al. [97] introduced a distributed backdoor attack against federated learning to achieve infection of the entire model by having agents trained with different modules implant backdoors. In addition, backdoor attacks targeting metafederal learning [98] and feature-partitioned collaborative learning [99], Liu et al. [100] have been proposed as the research progresses.

VII. OUTLOOK OF FUTURE WORK
There has been considerable work on backdoor attacks, although they cover multiple branches and different scenarios. However, there is still much room for development in this area, as many key issues of backdoor learning have not been well studied. We see three main directions for future backdoor research: 1. Trigger design. How to design more covert and natural triggers to evade detection by machines and human eyes. 2. Application of attack scenarios. Currently, most works on the backdoor attacks are in the digital space, and how to apply it more to the physical space to improve its practicality will be the focus of future research direction. In addition, how to apply the backdoor attack to more deep learning tasks will also be the focus of research. 3. The defense of the backdoor attack. Now, the means of backdoor attack are obviously more than the defense strategy, and how to secure the model and eliminate the threat brought by the backdoor attack will be an inevitable problem.

VIII. CONCLUSION
We introduce the definition of backdoor attacks and commonly used datasets, summarize and classify existing backdoor attack strategies and defenses, and provide a framework for classification. The relationship between backdoor attacks and related areas is also discussed illustrating future research directions. Although backdoor attacks pose a security threat to deep learning techniques, they still have positive aspects, such as backdoors in models that can be used as watermarks of models for copyright protection. We hope that an increasing number of people will take note of backdoor attacks and provide timely comments and insights to contribute to a more robust and secure deep learning environment. YUDONG LI received the B.E. degree in software engineering from Hunan University, Changsha, China, in 2019, the M.Sc. degree in advanced computer science from Newcastle University, Newcastle upon Tyne, U.K., in 2020. He is currently working toward the Ph.D. degree with the School of Computer Science and Engineering, Central South University, Changsha. His research interest include AI security and deep neural networks. WEIPING WANG received the Ph.D. degree in computer science from Central South University, Changsha, China, in 2004. She is currently a Professor with the School of Information Science and Engineering, Central South University. She has authored or coauthored more than 60 papers in refereed journals and conference proceedings. Her research interests include network coding and network security.
HONG SONG received the Ph.D. degree in computer engineering from Central South University, Changsha, China, in 2010. She is currently an Associate Professor with the School of Computer Science and Engineering, Central South University. Her research interests include information security, transparent computing, and operating system.