Attacking-Distance-Aware Attack: Semi-targeted Model Poisoning on Federated Learning

Existing model poisoning attacks on federated learning (FL) assume that an adversary has access to the full data distribution. In reality, an adversary usually has limited prior knowledge about clients' data. A poorly chosen target class renders an attack less effective. This article considers a semitargeted situation where the source class is predetermined but the target class is not. The goal is to cause the misclassification of the global classifier on data from the source class. Approaches such as label flipping have been used to inject malicious parameters into FL. Nevertheless, it has been shown that their performances are usually class sensitive, varying with different target classes. Typically, an attack becomes less effective when shifting to a different target class. To overcome this challenge, we propose the attacking-distance-aware attack (ADA) that enhances model poisoning in FL by finding the optimized target class in the feature space. ADA deduces pairwise class attacking distances using a fast layer gradient method. Extensive evaluations were performed on five benchmark image classification tasks and three model architectures using varying attacking frequencies. Furthermore, ADA's robustness to conventional defenses of Byzantine-robust aggregation and differential privacy was validated. The results showed that ADA succeeded in increasing attack performance to 2.8 times in the most challenging case with an attacking frequency of 0.01 and bypassed existing defenses, where differential privacy that was the most effective defense still could not reduce the attack performance to below 50%.

Attacking-Distance-Aware Attack: Semi-targeted Model Poisoning on Federated Learning Yuwei Sun , Member, IEEE, Hideya Ochiai , Member, IEEE, and Jun Sakuma , Member, IEEE Abstract-Existing model poisoning attacks on federated learning (FL) assume that an adversary has access to the full data distribution.In reality, an adversary usually has limited prior knowledge about clients' data.A poorly chosen target class renders an attack less effective.This article considers a semitargeted situation where the source class is predetermined but the target class is not.The goal is to cause the misclassification of the global classifier on data from the source class.Approaches such as label flipping have been used to inject malicious parameters into FL.Nevertheless, it has been shown that their performances are usually class sensitive, varying with different target classes.Typically, an attack becomes less effective when shifting to a different target class.To overcome this challenge, we propose the attacking-distance-aware attack (ADA) that enhances model poisoning in FL by finding the optimized target class in the feature space.ADA deduces pairwise class attacking distances using a fast layer gradient method.Extensive evaluations were performed on five benchmark image classification tasks and three model architectures using varying attacking frequencies.Furthermore, ADA's robustness to conventional defenses of Byzantine-robust aggregation and differential privacy was validated.The results showed that ADA succeeded in increasing attack performance to 2.8 times in the most challenging case with an attacking frequency of 0.01 and bypassed existing defenses, where differential privacy that was the most effective defense still could not reduce the attack performance to below 50%.
Impact Statement-Model poisoning on federated learning (FL) causes client models to be compromised by malicious model parameter sharing.Though FL extends the attacking surface of the attacker involving lots of clients, the model aggregation that combines different clients' model parameters, can greatly reduce poisoning attack's effect.Different from previous studies that adopt an arbitrary target class to mount an attack, this work proposes a novel semi-targeted model poisoning attack that adaptively computes the optimized attacking target depending on input samples.Such an attack could immensely enhance model poisoning's efficacy in FL, improving its robustness against model aggregation.The empirical result showed that the proposed method achieved great performance even with a low attacking frequency, generalizing across different distribution spaces and model architectures.In

I. INTRODUCTION
T HIS article is an extended version of the study published in [1].Several improvements of the attacking-distanceaware attack (ADA) over the previous work were presented, including: 1) the optimization of attack targets in larger label spaces of CIFAR100 (with 100 target candidates) and ImageNet (with 1000 target candidates) datasets; 2) additional experiments with various conventional model architectures including VGG16 and VGG19 in federated learning (FL); 3) robustness evaluation of the proposed attack method against conventional defenses such as norm difference clipping (NDC), two types of Byzantine-robust aggregation, and differential privacy (DP); and 4) a potential defense strategy against the ADA attack on FL.Since the goal of the attack is to find the optimized target class to mount more effective model poisoning on FL, this study extended the previous work by first estimating ADA's generality across different data distribution spaces.Specifically, attacking label spaces with a larger number of classes, in CIFAR-100 and ImageNet, makes target searching much more difficult.We empirically show that ADA achieves stable and competitive performance at different attacking difficulty levels and prior knowledge granularity levels.Moreover, the impact of the proposed attack on various local models constructed using VGG16 and VGG19 architectures was evaluated.The results demonstrated that the attack was effective across these various model architectures.Furthermore, it is of great importance to understand the attack's robustness against conventional defense strategies in FL.Therefore, this article also demonstrated the attack efficacy when applying four different defenses in FL.A tradeoff was observed between the defenses' effects on attack performance alleviation and the decreasing FL main task performance.By varying hyperparameters of defense methods, it showed that the existing defense strategies in FL cannot effectively eliminate the threat of ADA.In addition, we discussed the potential of a new defense in FL, which calibrates the global feature distribution space in FL.The calibration method could facilitate detecting any abnormal changes in a client's feature distribution.
Data privacy is a growing concern, attracting attention from various sectors.The increasing public awareness of legal restrictions, such as the General Data Protection Regulation [2], has made the traditional centralized processing of sensitive data increasingly challenging.As a result, decentralized solutions, such as FL [3], [4], have been adopted to improve performance by sharing and aggregating model parameters without disclosing clients' training data.
FL has been widely used in various fields, including medical diagnosis, financial data analysis, and cybersecurity.However, it has been shown to be vulnerable to adversarial attacks [5], [6], [7], [8], [9].Notably, a compromised client might inject malicious model parameters into the FL system, causing malfunction and influencing other clients in the system [see Fig. 1(a)].Furthermore, these attacks in FL are typically either untargeted or targeted [10].The aim of an untargeted attack is to degrade the performance of a client model in general, while a targeted attack aims to cause a client model to misclassify samples of a specific class into the attacker's desired class.
This study investigates a novel form of model poisoning attack in FL where the attacker's goal is to avoid being recognized as a specific class.This type of attack can occur in various realworld scenarios.For example, an attacker sending unauthorized advertising e-mails may aim to have the e-mails recognized as belonging to a benign class other than spam.In a facial recognition system, an individual on a blacklist may aim to be recognized as someone else not on the blacklist.It is important to note that these attackers are not motivated to be recognized as a specific class (as is the goal of targeted attacks), but rather to be unrecognized as a specific class.
This type of attack is referred to as a semitargeted attack.In a semitargeted attack, the attacker is assigned a specific class (the source class) and aims to poison the classifier so that samples with the source class are recognized as a class other than the source class.Unlike targeted attacks, the attacker is free to choose the target class to maximize attack performance in a semitargeted attack.Due to this freedom of attack, the risk of the semitargeted attack outweighs the risk of the targeted attack.
The success of the attack can vary depending on the assigned source class, meaning that the attack is typically class sensitive in terms of its generality, with performance varying depending on the class considered [11].The challenge in a semitargeted attack is determining the target class that is optimized for the assigned source class in order to achieve the best attack performance.
To address this challenge, we propose two approaches, the ADA and the Fast LAyer gradient MEthod (FLAME), to find the optimized target class for attacking FL in both full knowledge and partial knowledge settings, respectively.The goal is to investigate to what extent the attack performance can be increased in the semitargeted setting using different approaches for choosing the target class.
In summary, our contributions are as follows. .This study provides a detailed demonstration of how FLAME enhances a poisoning attack's performance in FL under the data confidentiality constraint (see Section IV-C2).3) To understand the risk of such attacks, an extensive study is performed by varying the factor of attacking frequency against the metrics of attacking task accuracy (ATA) and main task accuracy (MTA).The empirical results show that ADA is effective in both full knowledge (white box) and partial knowledge (black box) settings.ADA increases the attack performance to 2.8 times when the attacking frequency is as low as 0.01 (see Section V-C).4) An analysis of ADA's robustness to conventional defense strategies in FL shows that ADA can bypass these defenses retaining competitive attack performance (see Section V-D).The rest of this article is organized as follows.Section II reviews the most recent work on poisoning attacks and their defense in FL.Section III presents essential definitions and assumptions.Section IV demonstrates the technical underpinnings of the proposed method.Section V presents extensive empirical evaluations.Finally, Section VI concludes this article and gives out future directions.
In targeted poisoning attacks on FL, the goal of the adversary is to produce a poisoned local model update.This update is designed in a way that after model aggregation, specific inputs will induce misclassification of the global model [23], [24].In particular, label flipping [5] is one type of poisoning attacks, where a set of data labels is randomly flipped to a different class to train a malicious model.For example, a semantic backdoor flips the labels of images containing specific natural features to cause misclassification when these features are present as triggers [6].Moreover, a malicious model update generated through label flipping typically results in a larger norm of model weights than a benign update [10].
To avoid easy detection by norm-based defense algorithms in FL, Bagdasaryan et al. [11] proposed a train-and-scale technique that scales the norm of a malicious update to the detecting algorithm's bound.Nevertheless, this strategy could also result in degraded attack performance due to diminished poisoned model weights.In addition, the model replacement attack [11] aims to replace the global model entirely with a model controlled by the adversary.Notably, a converged global model often results in small benign local updates, which creates vulnerability where an adversary-controlled client could potentially upload a maliciously crafted update.This malicious update then replaces the global model after aggregation.
Although the aforementioned targeted attacks aim to compromise the FL system, their attack performance typically depends on the assigned source class.There is currently no research on semitargeted attacks that have a fixed source class and an adjustable target class.A semitargeted attack addresses the problem of a poorly chosen target class degrading the poisoning attack's performance after model aggregation in FL and making it easier to detect poisoned model parameters.To the best of our knowledge, this is the first study to examine semitargeted poisoning attacks with a focus on generality in FL.In this regard, Shafahi et al. [25] presented a feature collision method to generate similar-looking instances based on the source class such that their hidden features were close to the target class in a centralized setting.To mount the attack, the adversary needed prior knowledge about both the data distributions of the source class and the target class, which is different from this study with the federated setting where neither the data modification nor the prior knowledge about the target class's data distribution is required.This article closes this gap and investigates the behaviors of model poisoning attacks in the partial knowledge setting.Furthermore, this article considered poisoning attacks in FL with a more realistic setting regarding the attacking frequency, where the adversary had only a small possibility of participating in the training every round.

B. Defense Strategies
Existing research on defenses for FL against model poisoning attacks can generally be divided into two categories: anomaly detection and robust aggregation.
Anomaly detection aims to identify malicious local model updates by comparing the similarity of client updates and identifying those that deviate greatly from others [26], [27], [28], [29], [30].One of the most commonly used methods is NDC [31].Poisoning attacks often result in larger norms compared to benign updates from honest clients, and NDC discards updates with norms above a certain threshold, thus separating benign and malicious updates.
Another line of research seeks to improve the resilience of aggregation in FL against poisoned parameters, by carefully selecting local model updates for aggregation [32], [33], [34] or adding noise to model parameters to counteract the effect of a malicious update [6], [35], [36].For example, DP limits the influence of a malicious update by adding a small fraction of Gaussian noise to the parameters of local updates.
In Section V-D, the experimental evaluation demonstrates that the proposed attack method can bypass existing defense methods, including not only the anomaly detection-based defense of NDC but also the most recent robust aggregation-based defenses of Krum [32], Trimmed Mean [33], and DP [37].

III. PRELIMINARIES
In this section, the classification task in FL is formulated, and several conventional poisoning attacks on FL related to this article are discussed.The notation used in this article is summarized in Table I.

A. Classification Task
Suppose that f denotes a neural network classifier taking an input x i and outputting a C-dimensional probability vector, where the jth element of the output vector represents the probability that x i is recognized as class j.Given f (x), the prediction is given by ŷ = arg max j f (x) j , where f (x) j denotes the jth element of f (x).
The training of the neural network is attained by minimizing the following loss function concerning the model parameter θ: where denotes the cross-entropy loss function.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. Federated Learning
FL is a privacy-preserving framework that enables the creation of a global model trained on decentralized data without revealing the individual training samples.In an FL framework with K clients, the kth client has its own dataset k) .Each client cannot share its data with others mainly due to data confidentiality, making FL an attractive approach for collaborative learning.
The FL process involves several steps.First, the parameter server (PS) initializes a global model G 0 , which is then sent to all clients.Second, each client updates the model using their local data D (k) and sends the update L (k) t+1 − G t to the PS.Finally, the PS aggregates all local updates, updates the global model, and sends it back to all clients.It is worth noting that in FL, the local models of clients and the global model usually share the same architecture.
In addition, to reduce the waiting time for all clients to complete their local model training, the PS randomly selects a subset of K select clients each round to update the global model Algorithm 1: FedAvg.
1: initialization of G 0 at the server side 2: for each round t = 0, 1, 2, . . .do 3: PS randomly selects a subset of K select clients from all K clients 4: PS sends the current global model G t to K select 5: for each client k = 1, 2, . . ., K select do 6: based on their local updates.FedAvg [38] is a widely used FL algorithm that utilizes averaging of all local updates to update the global model.The specific details of the FedAvg algorithm are presented in Algorithm 1.

C. Poisoning Attacks on FL
Poisoning attacks have been extensively studied in the context of centralized learning.Notably, in a supervised classification task with C categories, the goal of an attacker is to either degrade the performance of the classifier in general (untargeted) or cause it to misclassify a specific class (targeted).To achieve these goals, the adversary manipulates either the training data by adding malicious samples (data poisoning) or the model by injecting malicious parameters (model poisoning) that are carefully crafted to cause the model to behave unexpectedly during inference.
In targeted poisoning attacks, the class of samples used for the attack is known as the source class c S , while the final class to which the sample is modified is called the target class c T .Specifically, the goal of the attacker is to manipulate the classifier f such that given a sample (x, y), the model makes incorrect predictions for samples belonging to the source class c S arg max Let D val be a validation dataset, D val (c S ) be the set of samples in D val with label c S , and To measure the performance, the MTA is defined as the validation accuracy of the classifier on samples that are not from the source class c S .In particular, the MTA of classifier f when it is poisoned with a source label c S is defined as the accuracy on the validation Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

set D c
val (c S ): (3) The target-specified attacking task accuracy (ts-ATA) evaluates the success rate of an attack.It is computed as the percentage of samples from the source class that are misclassified as the desired target class by the poisoned model.Specifically, the ts-ATA of classifier f poisoned with source label c S is determined by the validation accuracy for D val (c S ): (4) We introduce two building blocks, label flipping [5] and gradient scale adjustment by train-and-scale technique [11], needed to introduce the proposed method.
1) Label Flipping Attack: In a label flipping poisoning attack, the attacker relabels the samples from the source class with the target class label.For any sample (x, y) ∈ D(c S ), the label is replaced with c T to obtain the modified sample (x, c T ).By training the classifier with these poisoned samples, the poisoning attack, as defined in (2), can be achieved.Typically, attackers employ a combination of poisoned samples and legitimate samples to train the classifier.In our experiments, label flipping is applied to all samples in D(c S ), and a subset of samples from classes other than the source class is randomly selected to achieve a desired injection rate α, where α represents the percentage of poisoned samples in the adversarial training dataset D adv .This approach aims to mount a poisoning attack without sacrificing too much on the accuracy of the nontarget classes.
In the FL setting, if a compromised client is selected by the PS for local model training, it will download the latest global model and replace its local model, then train a local model L adv with the poisoned samples, and corrupt the global model by repeatedly submitting malicious updates to the PS.In addition, multiple compromised clients could exist to mount poisoning attacks against the global model.
2) Gradient Scale Adjustment by "Train and Scale": The "train-and-scale" approach is a common strategy employed by attackers in FL to bypass detection methods based on comparing the norm of model updates.Such a detection method compares the norm of the updates with a certain threshold Q that reflects the difference of updates.Updates whose norm exceeds this threshold are dropped, a technique known as NDC [39], i.e., L where • is a prescribed norm.However, attackers can easily bypass this defense strategy by using the "train-and-scale" approach, which adjusts the magnitude of the updates to evade detection.By scaling down the weights, the norm of the update can be reduced to below the detection threshold Q, allowing the malicious update to pass through undetected.The scaling factor is carefully chosen to maintain the effectiveness of the malicious update while reducing its norm.
To achieve this goal, the attacker modifies the scale of the model update L adv t+1 − G t so that it is upper bounded by Q.
Let a scaling factor Ω defined by Then, the malicious client submits Ω L adv t+1 − G t as model update instead of L adv t+1 − G t .One limitation of this approach is that Q is usually unknown to the adversary.Nevertheless, the adversary can approximately estimate the bound Q using the following strategy [11].It is expected that the threshold Q set by the PS is designed such that legitimate updates are not rejected with high probability.Therefore, the adversary can initially employ several compromised clients to perform legitimate training with the latest global model shared by the PS.The average norm of the collected legitimate updates can be employed as a lower bound of Q.

IV. ATTACKING-DISTANCE-AWARE ATTACK
In this section, a new type of semitargeted model poisoning on FL called the ADA is introduced.Then, the FLAME is demonstrated to mount the semitargeted attack in a partial knowledge setting.

A. Motivation
In multiclass classification tasks, an adversary aims to compromise the system such that instances from a specific class c S will be misclassified.Compared to a targeted attack with a fixed target class c T for the given source class c S , a semitargeted attack that does not have a specified target class could provide more flexibility, increasing the risk to the FL system.For example, in a real-world scenario, a self-driving car that recognizes a stop sign could be compromised such that the prediction of the stop sign will be wrong.The incorrectly predicted class can be the speed limit sign, the billboard, and so on.Similarly, a spam filter that aims to identify the category of an e-mail can be poisoned such that a certain type of spam will bypass the filter, with the target class being sports, politics, and so forth.
The ADA is a semitargeted attack that aims to manipulate the global model toward a specific behavior, while a Byzantine attack [40] is an untargeted attack that aims to invalidate the global model without a specific target.The advantage of ADA over a Byzantine attack is that it allows for more subtle attacks that are less likely to be detected by traditional defense mechanisms.ADA is able to bypass robust aggregation defenses and achieve high attack performance with limited prior knowledge of the source class.The benefits of requiring only a single class's data become more apparent in an extremely large label space such as ImageNet (1000 classes), as demonstrated in later sections.
In a semitargeted attack, the effectiveness of the attack can vary depending on the target class c T considered.The attack's robustness to different target classes is not guaranteed, and using a less effective target class can result in longer convergence time required to achieve the same level of attack performance.Longer convergence time allows a defense method in the PS to more easily discover the attack.In addition, since only a small subset of clients is selected in each round, a less effective malicious update might be overwritten by the outnumbering legitimate updates submitted by benign clients.
Intuitively, if the adversary can choose the most effective target class, a poisoning attack will be greatly enhanced.To reveal the risk of the semitargeted attack, we propose the ADA, an enhanced model poisoning attack on FL.Notably, ADA measures the distances in the latent feature space between different classes of a classifier and finds the optimized target class c T given a source class c S to mount the attack.Two different settings of ADA were studied, based on the prior knowledge of the adversary about a client's data distribution in FL, i.e., attacking with full knowledge and attacking with partial knowledge.

B. Semitargeted Attack
The semitargeted attack in FL refers to a model poisoning attack with a fixed source class c S and various possible target classes {Y \ c S }.The goal of the semitargeted poisoning attacks is to corrupt f so that where In particular, for any given c S , ts-ATA(c S , c T ) depends on the selection of c T .In the semitargeted attack, the attacker could choose any arbitrary c T ; thus, the attack performance could be increased.In this regard, the max target-nonspecified attacking task accuracy (max-ATA) of classifier f poisoned with source label c S is defined as the validation accuracy for D(c S ) Note that max-ATA(c S ) ≥ ts-ATA(c S , c T ) holds for any c S and c T , which means that the semitargeted poisoning attack is always more powerful than targeted poisoning due to the more relaxed constraint.
In addition, it is assumed that the adversary specifies only one source class c S for simplicity, but the semitargeted attack can be extended to scenarios where there are multiple source classes.In such cases, the optimized target classes would be computed separately for each source class specified by the adversary.

C. Attack Method
In reality, finding c * is not tractable.This section discusses how the adversary finds a target class that is close to c * depending on the adversary's knowledge of the training samples of the shared global model.

1) Attacking With Full Knowledge:
In the full knowledge setting, the adversary has complete knowledge of the client data distribution and is able to access samples drawn from the underlying training sample distribution in an independent identically distributed (i.i.d.) manner.To choose the class that is supposed to give larger ts-ATA, the adversary leverages the attacking distance (AD) defined as follows.Let φ t be the feature extractor of the shared global model G t .Then, the adversary extracts feature vectors φ t (x) of local training samples x ∈ D adv .
Let μ c be the mean of feature vectors in class c, where Then, the AD between two different classes c and c is defined by where • 2 denotes the 2 norm.The distribution visualization of the extracted feature vectors φ using the principal component analysis (PCA) for measuring the AD is shown in Fig. 2.
The attack strategy in this setting is to find a target class c T in the latent feature space that is close to the source class c S .This reduces the scale of the required malicious update in the adversary's local model, increasing the chances of the attack surviving the aggregation with other, outnumbering legitimate updates.The target class is chosen based on the AD metric [see (10)] using the following steps: Furthermore, the adversary performs malicious model training based on the optimized target class c * T with an injection rate α, where label flipping is applied to all the samples in D(c S ) and a subset of other samples is randomly selected. where Then, the adversary scales the poisoned model update by Ω to the bound of the norm-based defense and sends the scaled malicious update Ω(L adv t+1 − G t ) to the PS. 2) Attacking With Partial Knowledge: The aforementioned attack assumes an i.i.d.setting where the adversary has access to the entire feature space, which is often unfeasible in real-world Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
FL scenarios.As a result, it becomes difficult to extract latent feature representations of all the classes and measure their AD due to only partial classes' representations available to the adversary.To address this limitation, we propose the FLAME, a novel approach that finds the target class c * without prior knowledge of the entire sample distribution.This section investigates the additional improvements and adaptations specially designed for the FL setting.
An update gradient during model training could be used to obtain a perturbation of the original data, resulting in a distributional shift of the original data, toward a specific adversarial class.Given an input image x, a target class y adv , and a trained neural network with parameters θ, the fast gradient sign method [41] generates an adversarial example x adv by perturbing the input in the direction of the gradient of the loss function J(x, y adv ; θ) with respect to the input.Then, this perturbation ζ could be employed to cause the distributional shift of the original data toward the target class, which is defined as where β is a small constant that controls the perturbation's magnitude, and sign(•) returns the sign of its argument.
The FLAME exploits the observation that simpler features, such as lines and curves, tend to activate neurons in the shallower layers of a model, whereas more complex features tend to activate neurons in deeper layers [42].By arbitrarily assigning a wrong label to samples from the source class and computing the loss, the FLAME estimates the magnitude of the update required in the higher layers of the model.The update magnitude reflects the distance between the current model's latent feature representation and the poisoned model that misclassifies the input as the wrong label.
Unlike in the full knowledge case, where the distance between classes is measured based on the output of the higher layers, in the FLAME, the update scale estimates the distance between the source class and the assigned class.Notably, if the update scale in the higher layers is large when assigning a specific class, it is assumed that the distance between the latent feature representations of the source class and the assigned class is also large.By contrary, if the ground truth label is assigned, the update scale should be close to zero.
The following method was employed to achieve the objective described above.Similar to the case in the full knowledge setting, in every round, a compromised client k selected by the PS downloads the latest global model G t and replaces its local model L (k) t .The adversary then inputs samples from the source class into the local model and propagates the input through the network to obtain the model's output.The cross-entropy loss between the prediction and a chosen label is computed, and the gradients with respect to the last fully connected (FC) layer are computed by backpropagation.Let 1 c denote the one-hot vector, where the cth element is 1 and the remaining elements are 0.
Let (f (x), 1 c ) represents the cross-entropy loss between y = f (x) and 1 c .Then, the total loss of samples in D (k) (c S ), the adversary's samples labeled with the source class, when the , where the derivative is taken with respect to the weight parameters of the last FC layer in ground truth labels for all samples are set to c is With this empirical loss, the target class is determined as follows: where the derivative is taken with respect to the weight parameters of the last FC layer in L (k) t .Obtaining the data distribution of the source class could be a challenging task.Nevertheless, since the attacker aims to avoid their data being identified as a particular class, the adversary is assumed to have partial data from the source class distribution.In addition, in the case of limited source class data, a potential approach uses a generative model [43] to approximate the source class data distribution, by training on a small set of labeled data from the source class.
Finally, the intact ADA algorithm in the partial knowledge setting with the FLAME adopted is shown in Algorithm 2.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

V. EXPERIMENTS
In this section, a detailed description is provided regarding the datasets and model architectures used in the experiments.Next, the evaluations of the proposed attack method in both the full knowledge and partial knowledge settings are demonstrated, followed by a discussion of the empirical results.ADA and the other baselines were implemented using Tensorflow [44].

A. Dataset
Five image classification tasks were employed for conducting the experiments, i.e., MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and ImageNet.These datasets pose different degrees of difficulty for perturbing the FL system.First, the attack was mounted on a small label space of ten using MNIST, Fashion-MNIST, and CIFAR-10.MNIST [45] is a handwritten digit image dataset containing 50 000 gray-scale training samples labeled as 0-9 and 10 000 test samples.The size of the images is 28 × 28.Fashion-MNIST [46] is an image collection of ten types of clothing containing 50 000 gray-scale training samples labeled as shoes, t-shirts, dresses, and so on and 10 000 test samples with a size of 28 × 28.CIFAR-10 [47] is a collection of ten types of objects' color images, covering 50 000 color training samples labeled as airplane, automobile, and so on and 10 000 test samples.The size of the images is 32 × 32 × 3. Furthermore, the effectiveness of the proposed method was evaluated on datasets with larger label spaces, including CIFAR-100 [47] and ImageNet [48].The detailed information about these datasets is described in Section V-C4.

B. Model Architecture
In this study, a four-layer convolutional neural network (CNN) and two conventional CNN models were employed for FL.By default, the four-layer CNN was leveraged in the consideration of the resource limitations of clients that usually operate on edge devices such as smartphones.The first convolutional layer has a kernel size of 5 × 5 with a stride of 1, taking in one input plane and producing 20 output planes, followed by a rectified linear unit (ReLU) activation function.The second convolutional layer takes in 20 input planes and produces 50 output planes, also with a kernel size of 5 × 5 and stride of 1, followed by a ReLU activation.The output is then flattened and passed through an FC layer with a linear transformation, resulting in a tensor of size 200.The final FC layer outputs a tensor of size 10, representing the ten classes.The categorical cross-entropy loss function was used and the Adam optimizer with a learning rate of 0.001 was applied for model updates.This architecture is shared by all the clients and the global model.Furthermore, the effectiveness of the attack on two additional conventional models, VGG16 and VGG19 [49], was investigated to determine its applicability to other models.Detailed experimental settings for these models can be found in Section V-C5.

C. Numerical Results
An FL scenario was considered with 100 clients, each having a total of 500 samples randomly selected from the training set of the applied dataset.In each round, FL randomly selected a subset of K select = 10 clients to perform model training, thus updating the global model using FedAvg.Though a larger subset of clients might be selected for training, the attack's robustness to model aggregation was evaluated by introducing an attacking frequency factor = K adv /K, where K adv represents the total number of malicious clients among all K clients in FL.For each round, there exist K select K adv K malicious clients in the selected subset.Adjusting the attacking frequency has the same effect as adjusting the number of selected clients at each round.
The local model training uses a batch size of 16 and is trained for one epoch.Hyperparameters were chosen using grid search, and the evaluation was conducted on the hold-out test set in the applied dataset after each global update.The attack was mounted after the global model reached convergence, as indicated by no further decrease in validation loss within the last ten rounds.The adversary then performed ADA by measuring the AD from either the extracted latent feature representations in the full knowledge setting or by using backward-error analysis on the shared global model parameters in the partial knowledge setting.Note that mounting the attack before the global model converges would be less effective due to the difficulty of precisely measuring the AD between different classes.
1) ADA With Full Knowledge: After the global model converged and a compromised client was selected, the malicious client input local instances into the shared global model to extract the latent feature representations of different classes in FL.These representations were obtained from the last hidden layer of the model and had a dimension of 200.Subsequently, the AD between different classes was computed using these representations.
Furthermore, the performance of the ADA with full knowledge (ADA-full) was compared to the label flipping attack (LF), and the train-and-scale method (TS).For simplicity, the attacker is assumed to choose the third label from each applied dataset as the source class, such as the digit "2" in the MNIST dataset.The average accuracy scores were used when applying different target classes for LF and TS.For ADA, the attacker computed the AD scores between different classes (as shown in Fig. 3) and then the class with the lowest AD score to the source class as the target class [see (11)].ATA (ts-ATA) and MTA were evaluated every round with the global model using the hold-out test set of the dataset.
First, the performance of the three methods was evaluated in a scenario with ten compromised clients, for the MNIST image classification task.The results, shown in Fig. 4, illustrate the comparison of performance among the three methods.The impact of the attacking frequency on the effectiveness of these methods was also studied, by varying the ratio of compromised clients = {0.01,0.05, 0.1}.The numerical results, shown in Table II, were obtained using various attack frequencies and applied to the three different datasets.The final accuracy scores were determined by taking the maximum values within 50 rounds of FL.The results indicate that ADA outperforms the other methods, with improved ATA and MTA scores, when applied to various attacking frequencies in all three classification tasks.ADA achieved an ATA score of 0.387 in MNIST compared Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.to the typical LF method, which had a score of 0.081, when the attacking frequency was 0.01.In such a case, a compromised client was selected around every ten rounds.In addition, it was observed that the performance improvement of MTA on the CIFAR-10 dataset was more significant compared to that on MNIST and Fashion-MNIST datasets.ADA improved the performance of poisoning attacks in various cases of FL and reduced their impact on the main classification task.
In addition, Fig. 5(a) shows the norms of the benign local updates L (k) t+1 − G t and malicious local updates L adv t+1 − G t when applying the typical label flipping attack.Fig. 5(b) illustrates the norms of the benign updates and the scaled malicious updates Ω(L adv t+1 − G t ) using the "train-and-scale" strategy.The Q is estimated by the average of all benign updates' norms.
2) ADA With Partial Knowledge: The experiments above are based on the assumption that the adversary has full knowledge of the data distribution for the classification task.On the contrary, in the partial knowledge setting, the adversary only has access to samples from the source class and is not aware of the entire distribution of the compromised client's local data.To measure the performance of ADA with partial knowledge, the same training hyperparameters and settings were used.The FLAME [see (14)] was used to mount the attack instead of measuring the AD from the latent feature space.Table III shows the attack performance in the partial knowledge setting when applying different attacking frequencies.The results reveal that ADA can still achieve competitive performance in the partial knowledge setting.In Fashion-MNIST, the ADA's performance with = {0.01,0.1} was degraded.This is due to the more challenging deduction of the latent feature distribution.
3) AD Visualization: The proposed ADA method aims to measure the distance between a given source class and potential target classes in an FL setting.This is done by using two approaches: 1) computing the Euclidean distance between extracted latent representations and 2) measuring the norm of backpropagated gradients with the FLAME.The distances between the different source and target classes were visualized in MNIST, Fashion-MNIST, and CIFAR-10, respectively (see Figs. 6 and 7).The goal is to find the target class c ∈ C \ c S with the minimum distance to the given source class c S in order to enhance the effectiveness of model poisoning in FL.Intuitively, the two results of visualization would be similar since both the distance measurements reveal the intrinsic relations between data classes in the training distribution.The difference is whether the relation is revealed from the learned local representations or the aggregated global model parameters.The results from visualization showed that the FLAME was successful in measuring class distances for MNIST and CIFAR-10.However, the latent feature space of the Fashion-MNIST task appeared to be more difficult to obtain.It was also observed that in CIFAR-10, animal classes had small distances from each other and large distances from the other nonanimal classes such as "truck." 4) Semitargeted Attack in Larger Label Spaces: The goal of ADA is to find the optimized target class given an input image class; therefore, an extension to the aforementioned experiments is to study the effectiveness of ADA when attacking a larger label space such as CIFAR100 [47].CIFAR100 has 100 classes containing 600 images each, divided into 500 training images and 100 testing images per class.In this regard, intuitively, the attacking task in CIFAR100 will be much more difficult compared to the experiments above, due to the increase of data classes.However, ADA might find the optimized target class that greatly enhances the attack performance in the semitargeted setting.To verify the assumption, the different baseline attacks (see Section V-C) and ADA with full knowledge and partial knowledge in CIFAR100 were mounted.A VGG19 network pretrained with ImageNet [49] was employed as the backbone, followed by a two-layer FC network consisting of 1024 and 512 units, respectively.The output space of the network is 100.The experiment utilized a batch size of 128, while keeping the remaining settings the same.
Similarly, the attacker selected the third label ("baby") from CIFAR-100 as the source class.Then, based on the FLAME, ADA computed the optimized target class via backward error analysis in the partial knowledge setting, which turned out to be the 36th label ("girl").Fig. 8 shows the five closest classes out of the total 100 classes in CIFAR-100 to the given source class "baby."Table IV shows the attack performance in the partial knowledge setting when applying the different attacking frequencies.Note that the VGG19 was applied as the backbone; nevertheless, the learned global model's performance on CIFAR-100 was constrained due to the FL setting.For this reason, the attack was mounted on a learned global model that attained a test accuracy of 0.352.The results show that ADA can retain competitive attack performance in the larger label space of CIFAR-100.ADA generalizes across data distribution spaces, which is of great importance for the semitargeted attribute of the attack.
A further study of the attack's effectiveness was performed with the ImageNet dataset [48].The ImageNet dataset is a large-scale image database that contains over 1.4 million images with 1000 object categories.Due to the memory constraint, all the images were resized to a fixed size of 64 × 64.For the model architecture, the VGG19 network pretrained with ImageNet was used as the backbone, followed by a two-layer FC network consisting of 1024 and 512 units, and an output space of 1000.Similarly, the FLAME was leveraged to induce the optimized target class for the default source class, the second data class "white shark" in the ImageNet dataset.The experiments showed that the optimized target class is the 81st data class "ptarmigan," which had the minimum distance to the source data class.Then, an attack was mounted when the test accuracy of the global model reached 0.10, where about 100 classes out of 1000 classes are correctly classified, using the 81st class as the target class.The experiment employed a batch size of 256, a local training epoch of five, and a sample size of 5000 for each client.The empirical results in Table V demonstrated that ADA can maintain competitive attack performance in the larger label space of ImageNet.In addition, ADA is capable of generalizing across different data distribution spaces, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. Defenses
In this section, the attacking task accuracy (ATA) and the MTA are measured when applying various defense methods to FL.Furthermore, a potential defense method Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.2) Byzantine-Robust Aggregation: Two Byzantine-robust aggregation methods of Krum [32] and Trimmed Mean [33], [50] were employed.Krum selects a single local model from the selected K select clients at each round that is similar to other models as the global model based on pairwise Euclidean distances between local models L (k) t .To measure a local model's distance from the others, Krum computes the sum of distances between the local model and its closest K select − γ − 2 local models, where γ stands for the number of tolerable attackers (γ = 1 by default).Following Krum's assumption, the number of parties in the FL system should be at least 2γ + 3.As shown in Fig. 9, Krum makes the attack much easier.The scaled malicious model is more likely to be selected because the malicious update norm is modified to be nearer to other legitimate updates.Moreover, Trimmed Mean sorts all the local updates L (k) t at each round t based on their norms and removes the largest and smallest δ items of them.Then, the mean of the remaining K select − 2δ models is employed as the result of the round t's global model G t .By default, δ = K select K adv K as the Trimmed Mean [50], where K select K adv K is the number of total malicious clients in the selected K select client subset.For instance, in the case of the attacking frequency = 0.1, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
3) Differential Privacy: Recent work [36], [37], [51] showed the plausible application of DP to FL.In particular, weak DP [36] applies a Gaussian noise with small standard deviations (σ), i.e., N (0, σ 2 ), to the aggregated global model G t every round t.On the other hand, the participant-level DP adds the Gaussian noise N (0, σ 2 ) to each local model.In this experiment, a Gaussian noise with a standard deviation of 0.001 based on the participant-level DP was used.
4) Numerical Results and Discussion: The defense methods above were employed in every round of FL.Table VII demonstrates ATA and MTA when applying different defense methods.Fig. 9 shows the visualization of ATA for the different attacking frequencies.The results show that NDC and Krum are prone to enhancing ADA's performance instead of degrading it.On the contrary, Trimmed Mean and DP could degrade ATA to some extent.
As γ increases, Krum keeps increasing ATA.This is because once Krum selects a malicious update L adv t+1 − G t as the single model used to update the global model G t , the new global model G t+1 will perform the same as the adversary's controlled model.Moreover, Trimmed Mean can retain MTA for different δ in contrast to Krum and DP that degraded MTA.Trimmed Mean reduced ATA to below 0.8.Nevertheless, it could not further weaken this attack with higher δ.Since Trimmed Mean adopts multiple updates to perform the aggregation unlike Krum considers a single update, the effect of a malicious update bypassing the defense is alleviated by the aggregation.ATA of Trimmed Mean decreased when δ = 1 and then increased with larger δ where fewer updates are selected for aggregation.Increasing the value of δ does not improve the defense ability when the number of malicious updates is fixed.
Furthermore, DP showed the best performance in alleviating the influence of malicious updates, however, with a tradeoff between the decreasing ATA and the correspondingly decreasing MTA.Though DP performed better compared to the other defense methods, it sacrificed the performance of the main tasks in the learning.When σ = 0.50, the ATA of DP started to increase.This is because benign updates became exponentially less effective due to the added noise, while a malicious update appeared to be more robust against the Gaussian noise.As σ increased, the malicious update's effect eventually surpassed the benign updates in the aggregation.DP is the most effective defense, whereas ATA cannot be reduced to below 0.5.Therefore, existing defense methods cannot eliminate the threat of the proposed attack.analysis, a possible defense would be calibrating the feature space in Fig. 2 with the progress of training and finding out the changes in the feature distributions of source classes.For example, the distribution of a compromised source class is assumed to gradually move close to a neighboring class.The neighboring class would then be the target class chosen by the attacker.Nevertheless, such a calibration-based defense might not be aware of the attack type in practice.In this case, prior knowledge of the semitargeted attack is required for the defense.We aim to devise an effective defense mechanism against ADA in future work.

VI. CONCLUSION
In this article, a novel semitargeted model poisoning attack on FL called ADA was proposed.The attack aims to optimize the target class by measuring the distance of the latent feature representation between the source class and the target class.Moreover, the FLAME is used in the more challenging partial knowledge setting to perform backward error analysis on the shared global model, deducing the ADs between different classes.The performance of the proposed method was evaluated against the metrics of ATA and MTA, with various attacking frequencies, classification tasks, and model architectures.The results showed that the semitargeted ADA could increase attack performance while preserving the performance of legitimate tasks in various FL cases.Furthermore, the study also evaluated different defense methods against ADA and found that the proposed method can bypass existing defenses and retain competitive attack performance.The aim of this study is to present this new type of semitargeted model poisoning in FL and to reveal the associated risks.
In the future, a generator model that produces adversarial samples [12] based on the revealed AD information can be adapted to mount semitargeted backdoors.The semitargeted backdoor would be aimed to add a small invisible perturbation to an input sample so that the backdoored sample's distance to a specified class by the adversary is minimized in the feature distribution space.The adversarial samples based on the revealed AD information could mount stronger attack on the learning process of FL.In addition, given that ADA is a type of featurelevel attack that relies on backward error analysis, one potential defense approach involves implementing a calibration method to identify changes in the feature distributions of source classes.

Fig. 1 .
Fig. 1.Schematics of the attacking-distance-aware attack.(a) Model poisoning in FL.The adversary mounts the attack by uploading poisoned model parameters.(b) ADA, a semitargeted model poisoning attack, compromises the global model in the black box setting, by carefully choosing the target class based on the backward error analysis.

Fig. 3 .
Fig.3.AD measurement between the source class (class "bird" in CIFAR-10) and the other classes.Given the bird image class, the deer class has the shortest AD, whereas the truck class has the longest AD.Considering the categories of these classes, images of animals are akin to each other with a shorter distance; however, the images of the airplane also show a relatively high similarity to the images of the bird.

Fig. 4 .
Fig. 4. Performance comparison among the ADA with full knowledge, the label flipping attack, and the train-and-scale method.

Fig. 5 .
Fig. 5. Norms of local updates from the benign and malicious clients at each round of FL.(a) Without the gradient scale adjustment.(b) With the gradient scale adjustment.

Fig. 8 .
Fig. 8. AD measurement of the source class "baby" and its five closest classes in CIFAR-100.

5 )
Discussion on a Potential Defense Against ADA: Due to ADA being a feature-level attack based on backward error

Fig. 10 .
Fig. 10.Relation between a defense's performance in alleviating the attack and its influence on MTA.The results show the mean and standard deviation of ten individual experiments using different seeds.The lower, the better for ATA; the higher, the better for MTA.(a) Krum with different γ.With the increase of γ, MTA showed a decreasing trend, while ATA showed an increasing trend.The results demonstrate that Krum cannot provide defense against the semitargeted attack.Instead, Krum even increased the performance of the attack.(b) Trimmed Mean with different δ.Trimmed Mean had a trivial effect on MTA compared to Krum, decreasing ATA to below 0.8.However, increasing δ cannot further alleviate the effect of the attack.(c) Gaussian noise with different standard deviations σ.There exists a tradeoff between DP's defense performance and MTA.With the increase of the noise degree, DP gradually degraded ATA to around 0.5, whereas MTA also decreased correspondingly to as low as 0.2 with σ = 0.50.

Algorithm 2 :
ADA in the Partial Knowledge Setting.1: initialization of G 0 at the server side 2: for each round t = 0, 1, 2, . . .do 3: PS randomly selects a subset of K select clients from all K clients 4: PS sends the current global model G t to K select 5: for each client k = 1, 2, . . ., K select do

TABLE II ATA
AND MTA WITH VARIOUS ATTACKING FREQUENCIES

TABLE III EVALUATION
RESULTS OF ADA IN THE PARTIAL KNOWLEDGE SETTING

TABLE IV EVALUATION
RESULTS OF ADA WITH CIFAR-100 IN THE PARTIAL KNOWLEDGE SETTING

TABLE V ATA
AND MTA USING VGG19 AS THE BACKBONE FOR VARIOUS ATTACKING FREQUENCIES

TABLE VI ATA
AND MTA USING VGG16 AS THE BACKBONE FOR VARIOUS ATTACKING FREQUENCIES

TABLE VII ATA
AND MTA WITH VARIOUS DEFENSE METHODS