Novel Evasion Attacks Against Adversarial Training Defense for Smart Grid Federated Learning

In the advanced metering infrastructure (AMI) of the smart grid, smart meters (SMs) are deployed to collect fine-grained electricity consumption data, enabling billing, load monitoring, and efficient energy management. However, some consumers engage in fraudulent behavior by hacking their meters, leading to either traditional electricity theft or more sophisticated evasion attacks (EAs). EAs aim to illegally reduce electricity bills while deceiving theft detection mechanisms. The current methods for identifying such attacks raise privacy concerns due to the need for access to consumers’ detailed consumption data to train detection mechanisms. To address privacy concerns, federated learning (FL) is proposed as a collaborative training approach across multiple consumers. Adversarial training (AT) has shown promise in countering evasion threats on machine learning models. This paper, first, investigates the susceptibility of traditional electricity theft classifiers trained by FL to EAs for both independent and identically distributed (IID) and Non-IID consumption data. Then, it investigates the effectiveness of AT in securing the global electricity theft detector against EAs, assuming no misbehavior from the participant consumers in the FL process. After that, we introduce three novel attacks, namely Distillation, No-Adversarial-Sample-Training, and False-Labeling, which can be launched during the AT process to make the global model susceptible to evasion at inference time. Finally, extensive experiments are conducted to validate the severity of these proposed attacks. Our findings reveal that the AT can counter EAs effectively when the FL participants are honest, but it fails when they act maliciously and launch our attacks. This work lays the foundation for future endeavors in exploring additional countermeasures, in conjunction with AT, to bolster the security and resilience of FL machine learning models against adversarial attacks in the context of electricity theft detection.


I. INTRODUCTION
The integration of communication and computing resources in the smart grid has brought about a significant The associate editor coordinating the review of this manuscript and approving it for publication was Payman Dehghanian .transformation in traditional power delivery methods [1].By employing various techniques to gather, process, and manage information, the smart grid has greatly improved the intelligence of power systems, leading to enhanced efficiency, reliability, sustainability, and cost-effectiveness [2].A crucial element of the smart grid is the Advanced Metering Infrastructure (AMI), which enables real-time monitoring of electricity consumption through communication between the electric utility (EU) and smart meters (SMs) installed at consumers' houses.This functionality enables dynamic billing, load monitoring, and grid management, granting greater control and optimization of the power systems [3].
Despite the numerous benefits associated with smart grids, they encounter various challenges, one of which revolves around the problem of electricity theft.This occurs when dishonest consumers manipulate their smart meters to record lower electricity consumption readings, engaging in unlawful practices to reduce their energy bills.Such fraudulent activities result in significant financial losses [4].For instance, it has been reported that the United States and India collectively suffered losses amounting to $6 billion and $17 billion, respectively, due to instances of electricity theft [5], [6].Moreover, the misreporting of electricity consumption readings can lead to erroneous energy management decisions, potentially disrupting the proper functioning of the grid and even causing power outages.To tackle this issue, it is crucial to urgently implement electricity theft detection mechanisms within the smart grid.By doing so, we can prevent such losses, enhance the reliability and performance of the power grid, and ensure the accuracy of reported detailed power consumption readings.
In the existing literature, there have been proposals for centralized machine learning models, operated by a single entity, aimed at detecting instances of electricity theft.Specifically, all power consumption readings from consumers are transmitted to a central server to train a global machine learning detector, which is utilized to identify false readings [7], [8].Nevertheless, this approach raises concerns regarding privacy since it requires the sharing of individual consumers' data.In particular, these fine-grained readings have the potential to expose personal information about consumers, including their daily routines, appliance usage, and presence or absence from their premises.Such information could be exploited by criminals, leading to security breaches like burglary, or by insurance companies seeking to exploit the data for their own advantage [9].
In order to address the challenges mentioned earlier, a federated learning (FL) based detector can be trained to identify electricity theft in AMI networks while ensuring that consumers' privacy is not compromised.In FL, each consumer trains a local model on their own data and then sends the updates (weights or gradients) of the trained model to a central server, rather than transmitting the raw data itself [10].However, similar to the centralized learning [11], [12], recent studies [13], [14] have shown that FL is also vulnerable to well-crafted adversarial examples, called evasion attacks (EAs).EAs are designed to cause misclassification with high confidence by making subtle changes to the input samples [15].Such attacks are increasingly being used to deceive models at inference time for malicious purposes.By launching EAs, attackers can steal electricity without being detected by reporting adversarial samples that have less electricity consumption than the actual readings.
Hence, there is a need to develop models that can resist such attacks, to ensure the effectiveness and security of FL models.There is a considerable amount of research dedicated to developing defenses against adversarial evasion examples [16], with adversarial training (AT) being the most effective in achieving empirical robustness [17].The AT involves injecting adversarial examples into the training data and fine-tuning network parameters to improve model robustness [12].While previous research in the literature has focused on addressing traditional electricity theft attacks in the smart grid [18], [19], [20], [21], little attention has been devoted to investigating EAs in the context of electricity theft and FL.In the context of electricity theft, EAs pose a significant concern, especially in the FL scenario.This is because malicious consumers possess full control over the data stored on their devices, enabling them to manipulate the AT process, inject their malicious behavior into the global model, and subsequently launch EAs during inference time to steal electricity.Thus, despite the effectiveness of AT, additional measures are necessary to address the specific challenge of EAs in FL-based electricity theft detection systems.To the best of our knowledge, this study is the first to investigate EAs in smart grid FL training.Our main contributions can be summarized as follows.
• We assess the vulnerability of electricity theft models, trained through FL, to EAs during inference time in both Independent and Identical Distribution (IID) and Non-IID scenarios.Then, we assess the effectiveness of the AT during the FL training process to secure the electricity theft detector against EAs.
• We propose three different attacks on the AT process: No-Adversarial-Sample-Training, which omits evasion sample training to undermine the global model; False-Labeling, which inserts the malicious behavior through mislabeling; and Distillation, which leverages Defensive Distillation, a defense known for its hidden vulnerabilities that attackers can later exploit.Moreover, we employ Model replacement attack with Distillation to preserve attackers' parameters during FL aggregation, and to enhance the attack success rates, particularly in the IID scenario, for both No-Adversarial-Sample-Training and False-Labeling attacks.
• We have conducted comprehensive experiments on a real dataset of electricity consumption to demonstrate the severity of the attack.
The following is the paper's organization.Section II explains the related works, while section III-A explains the system and threat models.Section IV discusses preliminaries utilized in our paper.The susceptibility of the FL models to EAs and the effectiveness of the AT defense are discussed in section V.The adversarial EAs against FL AT defense are discussed in section VI.Finally, section VII concludes the paper.

II. RELATED WORKS
This section provides a brief discussion of the research endeavors that focus on detecting electricity theft in the smart grid.Then, we discuss the recent studies on the detection of EAs.Following that, we will delve into the research gap and explain the driving factors behind our work.

A. ELECTRICITY THEFT DETECTION SCHEMES
The majority of current schemes for detecting electricity theft involve training a machine learning model using fine-grained electricity consumption data.The existing schemes can be categorized into two types: centralized schemes and decentralized schemes utilizing FL.

1) CENTRALIZED SCHEMES
Zheng et al. [7] have employed a combination of wide and deep convolutional neural networks (CNNs) to tackle the task of identifying electricity theft.Their model consisted of two components: a wide component designed to extract overall features from one-dimensional (1-D) electricity usage data, and a deep component that specifically targeted periodic patterns within the two-dimensional (2-D) electricity consumption data.Jindal et al. [22] have introduced a two-stage electricity theft detection scheme, utilizing decision tree (DT) and support vector machine (SVM) algorithms.The scheme operates in a sequential manner, where the data is initially processed using DT in the first stage.Subsequently, the processed data is forwarded as input to the SVM classifier in the second stage.
Ismail et al. [23] have proposed a model tailored to detect electricity theft on a per-customer basis.This model made use of a deep neural network (DNN), and its performance was enhanced through the implementation of a sequential grid search analysis during the learning phase, which allowed for precise adjustments of the model's hyperparameters.Nabil et al. [24] have proposed two electricity theft detection approaches.Firstly, they developed deep feed-forward and recurrent neural network (RNN) models for each consumer.Then, they proposed a generalized and robust electricity theft model that exhibits comparable performance to the customerspecific detectors.However, it is crucial to highlight that all of the previous approaches necessitate access to detailed electricity usage data from each consumer, which raises serious concerns regarding consumer privacy.

2) DECENTRALIZED SCHEMES UTILIZING FL
FL is a decentralized method for training machine learning models in which each participant trains a model locally on its own dataset and uploads the model's updates to a central server.The server merges these submitted models to produce a global model that is trained on the local data of the participants.FL has gained significant popularity in privacy-sensitive tasks as it enables the training of models without sharing the training data with the server to preserve privacy.
The majority of the existing works have focused on training electricity theft detectors by sharing the electricity consumption readings with a central unit, and very few works in the literature have investigated the use of FL for training electricity theft detectors.Wen et al. [18] have proposed an FL-based framework for detecting electricity theft, involving two servers and multiple detecting stations within its architecture.Consumers need to send their consumption readings to detection stations that are assumed to be trusted.The FL process is conducted among the detection stations, which employ the local differential privacy (LDP) approach to gather consumption data from customers.However, this approach encounters limitations due to the substantial data transfer required to train the electricity theft detection model at the detection stations, resulting in a significant communication of wireless networks' communication bandwidth.Moreover, the existing privacy-preserving deep learning methods based on differential privacy (DP) suffer from a trade-off between accuracy and privacy, meaning that to adequately preserve privacy, high noise should be added to the data, which results in low model accuracy [25], [26].
Wang et al. [19] have proposed a federated approach for characterizing residential electricity consumers using smart meter data while ensuring privacy.It employs privacypreserving Principal Component Analysis (PCA) to extract features, followed by the training of federated Artificial Neural Network (ANN) classifiers to predict consumer characteristics.Ashraf et al. [20] have proposed a federated voting classifier (FVC), leveraging ensemble learning from traditional ML classifiers like random forests (RF), k-nearest neighbors (KNN), and bagging classifiers (BG) to enhance energy theft identification accuracy.Jithish et al. [21] have conducted a systematic exploration of the performance of FL models for anomaly detection using established standard datasets, providing a comparative analysis against centralized models.In addition, they have devised specialized machine learning models tailored specifically for smart grid anomaly detection and evaluated them using real-world smart meter data.Furthermore, this study extends its scope to quantify the practical viability of deploying FL models within resource-constrained Internet of Things (IoT) environments, going beyond theoretical considerations by undertaking comprehensive performance assessments on smart meter prototypes.However, the existing centralized and decentralized electricity theft detection schemes do not investigate securing the models against EAs.

B. EVASION ATTACKS DETECTION SCHEMES
EAs are more sophisticated and successful compared to regular electricity theft attacks.While the latter aim at only decreasing the reported electricity consumption readings, the former aim to reduce electricity consumption and also deceive the electricity theft detectors by causing them to misclassify evasion samples as benign [27], [28].The influence of EAs on machine learning-based detectors employed in power systems domain has been investigated in multiple fields, including household energy forecasting [29], or state estimation [30].These previous studies have determined that EAs do pose a threat to the machine learning based detectors.In the following, we discuss the existing schemes which can be categorized into centralized schemes and decentralized schemes utilizing FL.

1) CENTRALIZED SCHEMES
Li et al. [27] have proposed an algorithm, called SearchFrom-Free, which utilizes gradients to generate evasion samples capable of evading deep learning (DL)-based electricity theft detectors while enabling financial gains.Li et al. [31] suggested employing the defensive distillation strategy to counter the EAs launched by the SearchFromFree algorithm against electricity theft detectors.Badr et al. [32] have proposed a novel form of EAs.In this attack, a malicious consumer with high consumption levels can deceive the detector by providing false readings resembling low consumption profiles using a Generative Adversarial Network (GAN).To mitigate this attack, the authors have proposed training dedicated detectors for different consumer groups.This strategy effectively detects malicious consumers attempting to imitate profiles outside their respective groups, while rendering it unprofitable to imitate profiles within the same group for evading detection.
Takiddin et al. [33] have proposed using an anomaly detector trained only on benign data to identify both traditional and EAs for electricity theft.They have achieved this by sequentially combining multiple neural network architectures, namely an autoencoder, convolutional-recurrent network, and feed-forward neural network.Nevertheless, it is important to emphasize that all the aforementioned methods require sharing the fine-grained electricity consumption readings of individual customers, leading to significant concerns regarding the privacy of consumers.

2) DECENTRALIZED SCHEMES UTILIZING FL
Although FL shows great potential in the smart grid [34], it has not been investigated for EAs.Expanding the focus beyond smart grid applications, we will survey the problem in other domains or alternative contexts.In artificial intelligence and Internet of Things (IoT) applications, Song et al. [14] have proposed an approach, called FDA3, that aims to address the limitations of existing defense methods against adversarial attacks in IoT applications.It utilizes a cloud-based architecture inspired by FL to aggregate the defense knowledge, obtained by the means of AT, from different sources.
In image processing applications, Luo et al. [35] have proposed an approach, called EFAT, that addresses this issue by introducing a coupled training mechanism that enhances the diversity of adversarial examples through the sharing of evasion samples generated by other participating clients.Shah et al. [36] have proposed FedDynAT, a novel algorithm specifically tailored to enhance adversarial training in the FL setting.FedDynAT demonstrates notable improvements in both normal and adversarial accuracy, while effectively mitigating the issue of model drift and reducing convergence time.Model drift refers to the divergence or discrepancy that arises between the local models trained by different participating agents in the FL process.
Aldahdooh et al. [37] have studied the feasibility of applying AT in the FL process for vision transformers, which are effective in computer vision tasks.The goal is to enhance the robustness of models in the presence of adversarial examples while considering the non-independent and identically distributed (Non-IID) nature of the data.To address this, the authors propose an aggregation method, called FedWAvg, which calculates weights based on similarities between the last layer of the global model and the client updates.However, none of the aforementioned schemes have taken into account the presence of attackers during the AT process, i.e., they assume that the participants in the FL process are honest.The existence of attackers during the AT process can hinder convergence and potentially introduce undetected malicious behavior into the global model, leading to severe consequences.
Zizzo et al. [13] were the first to introduce adversarial training (AT) in the FL setting, known as federated adversarial training (FAT).Their study investigates the effectiveness of FAT in preserving data privacy and mitigating evasion threats.The authors specifically examine the impact of malicious FL participants on the AT process, particularly when combined with FL defense mechanisms.They have identified vulnerabilities in the defense mechanisms, which include the widely utilized defense method known as Krum [38].Krum selects a single update by considering the similarity between that particular update and the other updates.To attack the Krum defense, the authors propose a distillation-based method that introduces gradient masking into the global model.

C. LIMITATIONS AND RESEARCH GAP
Based on our discussion in the previous two subsections, in the domain of smart grid, various schemes have been developed to detect electricity theft attacks.These schemes typically employ machine learning models trained on fine-grained electricity consumption data to detect the patterns associated with theft.Most of the existing works in the literature focus primarily on the detection of false readings associated with theft attacks and overlook more sophisticated adversarial attacks such as EAs.EAs have been investigated in the centralized training approaches that require sharing the fine-grained power consumption readings which causes serious privacy issues, but they have not been investigated in smart grid FL-based decentralized approaches.One way to secure the FL against EAs is by training the local models on evasion samples computed from the local models and this process is called adversarial training (AT), but most of the existing schemes assume that the participants in the FL process are honest.Yet, this assumption cannot be guaranteed practically and the malicious participants can hinder the AT process by preventing the convergence or injecting their malicious behaviour into the global model.Thus, this paper attempts to fill this research gap by examining the effectiveness of AT in federated learning when malicious participants are present during the AT process.
As mentioned earlier, the closest work to this paper is Zizzo et al. [13] that studied the AT process in federated learning in the presence of malicious consumers.However, the attack proposed in [13] is not applicable in the FL scenario that averages all the models' updates (called FedAvg), or in FL defenses that select multiple updates in the aggregation process, as the necessary weights for a successful attack are not maintained.In this paper, we propose a novel approach to applying the distillation attack in FedAvg, ensuring the necessary weights for a successful attack are maintained.

We also introduce additional attacks that can hinder the AT process, such as No-Adversarial-Sample-Training and False-Labeling.
To sum up, our main contributions can be listed as follows.To the best of our knowledge, this study is the first to investigate EAs in smart grid FL training and the application of AT to secure the model.Additionally, we introduce three novel attacks that can be launched by malicious consumers in FedAvg to hinder the AT process.

III. SYSTEM AND THREAT MODELS
In this section, we begin by discussing our system model, and then we present our threat model.

A. SYSTEM MODEL
In our system, depicted in Fig. 1, the electricity utility (EU) and consumers are two distinct entities that play crucial roles.

• Electricity Utility (EU):
The EU is equipped with the necessary computational resources to execute the FL training calculations.It initiates and sends the global model to participating consumers.The EU's workflow entails initiating and transmitting the global model to all participating consumers.It then collects the updated model parameters submitted by the consumers, executes a defense mechanism to detect EAs initiated by malicious consumers, and aggregates the selected parameters to compute an updated global model.
• Consumers: The consumers participate in the federated learning process.They store local power consumption data and may exhibit similar or diverse consumption patterns.The consumers collaborate to train a global model under the coordination of the EU.

B. THREAT MODEL
Our work primarily focuses on the security risks posed by internal adversaries.These adversaries, who can be consumers with potentially malicious intent, have the ability to manipulate their local data and/or models in order to inject malicious behavior into the global model.Following the works in [38], [39], [40], and [41], We set an upper limit on the number of malicious consumers as follows: where M is the total number of malicious consumers and N is the total number of consumers where N ≥ 3. Finally, the EU is regarded as completely trustworthy since it has a vested interest in preventing electricity theft.
In this paper, we provide a comprehensive assessment of the vulnerability of electricity theft models, trained through FL, to EAs during inference in both IID and Non-IID scenarios.Additionally, we evaluate the effectiveness of AT during the FL training process in securing the electricity theft detector against EAs.Our investigation encompasses three proposed attacks that can be launched by malicious consumers to compromise the AT process namely No-Adversarial-Sample-Training, False-Labeling, and Distillation attacks.In No-Adversarial-Sample-Training attack, fraudulent consumers abstain from training the local model on evasion samples required for AT defense, rendering the global model vulnerable to their evasion attempts.In False-Labeling attack, fraudulent consumers mislabel evasion samples as benign during the AT process, allowing them to introduce malicious behavior into the trained model and subsequently exploit it for evasion.In Distillation attack, fraudulent consumers employ an alternative defense against EAs, known as Defensive Distillation.Although Defensive Distillation initially exhibits good adversarial accuracy during training, it harbors a hidden vulnerability that attackers can later exploit for evasion.To enhance the robustness of our approach, we incorporate the Model replacement attack alongside the Distillation attack, ensuring the survival of attackers' model parameters during FL aggregation.Additionally, we investigate how the Model replacement attack can elevate the attack success rates of both the No-Adversarial-Sample-Training and False-Labeling attacks especially in the IID scenario.

IV. PRELIMINARIES A. CONVOLUTIONAL NEURAL NETWORK (CNN)
Convolutional neural network, or CNN, is a type of neural network that is widely used to tackle various complex machine learning problems in different applications.These applications include text analysis and image recognition, among others.The popularity of CNN stems from its ability to effectively identify intricate patterns and extract important features from input data.A typical CNN model, illustrated in Fig. 2, consists of multiple layers, including an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer.Small filters, or kernels, are employed by the convolutional layers to perform dot products across a limited region of the receptive field and extract features from the input data.To enable more sophisticated decisionmaking, the output of the convolutional layers is processed using non-linear activation functions (AFs) like Softmax or Rectified Linear Unit (ReLU).Convolutional layers' output dimensions are decreased by using pooling layers, which also lowers the number of weights and the amount of computation needed.In order to process the extracted features and determine classifications, the CNN employs fully connected layers, which create connections between neurons in the preceding and succeeding layers.In this particular study, a CNN is employed to train a model for detecting electricity theft.The selection of the CNN architecture as a classifier is based on previous research findings that highlight its high accuracy in capturing patterns and correlations in the input data [21].

B. FEDERATED LEARNING
In the realm of deep learning, conventional methods involve sending all data to centralized servers for analysis and training of a centralized model.Unfortunately, this training process poses a potential threat to data privacy, as it risks exposing the confidentiality of the training operation.Introducing a privacy-conscious alternative to traditional deep learning techniques, Google proposed FL, which is a distributed machine learning training approach.FL enables data owners to train a unified machine learning model without the need to 112958 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
share their raw training data, maintaining the privacy of their data.
The FL training process is divided into three phases, including initialization, training and aggregation [42], [43].In the initialization phase, the central server distributes a unified model structure to the data owners participating in the FL process.This step establishes the foundation for collaborative training while maintaining privacy.Following this, in the training phase, each data owner individually trains the disseminated model using their data.In the context of supervised learning, which is the focus of our research, where both input features and corresponding labels (x, y) are provided, the machine learning model can be expressed as a function: f (x, ω) = ŷ.Here, x represents the input training vector, ω denotes the model's parameters, and ŷ signifies the output.The objective is to minimize the difference between the predicted output and the actual output by means of non-linear optimization.Achieving this goal involves modifying the model's parameters, ω, using the mini-batch stochastic gradient descent (SGD) algorithm [44].
The mini-batch SGD calculates the loss function for each i − th pair of the training samples by randomly selecting a mini-batch D j from the dataset, D, at each iteration j using the following formula: where D j stands for the size of the mini-batch D j , L f ((x i , y i ), ω) = y i − ŷi 2 , and ∥•∥ 2 refers to the 2 norm of a vector.Then, the gradient G is computed, where G represents the partial derivative of the loss function L f with respect to ω. Subsequently, each participant c updates the parameters of its model, ω c , according to the following rule: where G c is the gradient of participant c, and η is the learning rate.Finally, each participant shares its model parameters with a central server.In the aggregation phase, which takes place at regular intervals denoted as round t, the central server combines all the models parameters received from the participants.These parameters are then utilized to update the shared unified (or global) model.The global model, represented as ω t * , is updated according to the following equation: where N stands for the total number of participating data owners.Next, the cloud server distributes the updated global model parameters, denoted as ω t * , to all participants, allowing them to update their local models.This process of training and aggregating is repeated multiple times in an iterative manner, with the central server and participants engaging in this process until a predefined convergence condition or a specified number of rounds is met [45], [46].

C. K-MEANS CLUSTERING TECHNIQUE
K-means clustering is a technique used to partition a vast collection of data points into smaller groups, referred to as clusters, which consist of closely related data points [47], [48].Within each cluster, there exists a representative point known as a centroid, which represents the central location of the cluster points.To initiate the clustering process, the number of clusters (K ) is decided, and the centroids are placed randomly.Subsequently, each data point is assigned to the cluster that is closest to its centroid.This involves computing the distances between each point and the centroids of the clusters to determine the nearest cluster.Following this, new centroids are calculated by averaging all the points within each cluster.The iterative repetition of these steps continues until convergence is achieved.The algorithm is considered to have ''converged'' when there are no further changes in the locations of the centroids.
The primary objective of the K-means clustering algorithm is to minimize an objective function known as the within-cluster sum of squared distances.This function aims to partition n data points into K (≤ n) clusters, represented as S = {S 1 , S 2 , . . ., S K }.The objective function is expressed as follows: where x − c j 2 refers to the Euclidean distance between each data point x and the centroid c j of each cluster.The K-means clustering technique is employed in this study to group the consumers and differentiate between IID and Non-IID scenarios.

D. ADVERSARIAL TRAINING
The pioneering work on adversarial training was introduced by Goodfellow et al. [12].This methodology involves the generation of adversarial examples, which are then incorporated into the original samples to fortify the model against EAs.The effectiveness of adversarial training in enhancing the model's resilience to EAs is contingent upon the specific approach used for generating these adversarial examples.It is worth noting that adversarially trained models utilizing Fast Gradient Signed Method (FGSM) or a randomized-step FGSM (R+FGSM) exhibit robustness primarily against single-step perturbations.Nevertheless, they are still susceptible to more computationally demanding multi-step attacks [49].To address this limitation, a solution has been proposed, which involves incorporating adversarial examples generated using Projected Gradient Descent (PGD) into the adversarial training process [49].By including these PGD samples during training, the model can improve its resilience against intricate attack scenarios.In the following subsections, we first discuss how the PGD samples are generated, and then, we discuss how the adversarial training technique is employed in FL.

1) PROJECTED GRADIENT DESCENT (PGD)
The iterative PGD attack is used to generate adversarial samples using the following equation. x In this equation, x The PGD attack starts with random initialization, denoted as x (0) adv , within the ϵ-ball centered at the initial natural data x.Then, at each step, the PGD performs gradient ascent steps on the current sample x (r ) adv in the direction of the gradient of the loss function.Subsequently, a perturbation proportional to β and the sign of the gradient is added to the current adversarial sample x (r ) adv .The resulting perturbed data x (r +1) adv is then projected onto the ϵ-ball centered at the initial natural data x using the projection function Π ϵ .This iterative process is repeated for a maximum of R steps, where R represents the number of iterations.

2) FEDERATED ADVERSARIAL TRAINING
The adversarial training in FL is referred to as Federated Adversarial Training (FAT) and was first introduced in [13].FAT applies adversarial training locally at each client (participant in the FL) to enhance the robustness of the global model.Let there be N clients, where each client c possesses its local data The local data D c consists of regular samples and their corresponding labels (x cj , y cj ), as well as adversarial samples and their labels (x adv,cj , y adv,cj ).Here, p c denotes the number of regular samples in client c's local data, and q c denotes the number of adversarial samples.The size of the local data is denoted by p c + q c , where p c + q c = |D c |.Each client c optimizes its local model by minimizing the following objective function: Here, f ω c represents the local model with parameters ω c , and L is the loss function.The first term in the objective function calculates the average loss over the adversarial samples, while the second term calculates the average loss over the regular samples.By minimizing this objective function, each client enhances the robustness of its local model against EAs.The global model is updated by aggregating the parameters of all clients after each round of training iterations.By this way, the global model is also robust against EAs.

E. DEFENSIVE DISTILLATION
The concept of employing distillation as a defense mechanism against EAs was initially proposed in [50].The distillation process involves training a network that includes a softmax output layer using the original dataset.In this process, the softmax layer takes the output logits, which are vectors generated by the final hidden layer of a deep neural network (DNN), and transforms them into a probability vector.This probability vector assigns a probability to each class in the dataset for a given input.The softmax layer computes the output vector by exponentiating and normalizing the logits using a temperature parameter (T).The temperature parameter plays a pivotal role in the distillation process.The equation for computing the output vector (F(X )) for a class indexed by i ∈ {0, 1, . . ., B − 1} (where B is the number of classes) is: Here, Z(X) represents the B logits corresponding to the hidden layer outputs for each class, and T is the distillation temperature shared across the softmax layer.By employing a high temperature during training, the deep neural network (DNN) generates probability vectors with relatively larger values for each class.As the temperature approaches infinity, the probabilities converge to a value of 1/B, leading to a more ambiguous distribution.In contrast, as the temperature decreases, the distribution becomes more discrete, with one probability close to 1 and the remaining probabilities close to 0. The standard softmax operation typically utilizes a temperature value of 1.
Once the initial DNN is trained, the probability vectors it produces serve as soft labels to annotate the dataset.These soft labels are then utilized to train a second network, which can be trained exclusively using these soft labels.Through the utilization of soft labels, the second network aims to converge towards an optimal solution.Similar to the first network, the second network is trained using a high softmax temperature.During the classification phase at test time, the temperature is reset to 1 to generate more discrete probability vectors.In this research paper, the attacker employs distillation as a strategy to create the illusion of robustness in the resulting model against evasion samples.By exploiting certain vulnerabilities in the defensive distillation approach itself, the attacker successfully evades the model.

F. MODEL REPLACEMENT ATTACK
In this attack, the malicious clients (participants in FL) strive to ensure the persistence of their updates during the FedAvg aggregation process.Their main objective is to fully substitute the global model, denoted as ω * , with their own model, represented as X , as illustrated below.
Let w t m denote the update from the malicious client m at time t.Solving Eq. 9 yields the following expression: Based on the assumptions made in Bagdasaryan et al.'s work [51], the sum of deviations, calculated as N −1 c=1 (w t c − w t−1 * ) ≈ 0, becomes negligible as the global model converges.This cancellation of deviations enables the simplification of the attacker's update process to be as follows.
In our paper, we employ this attack in conjunction with the Distillation attack to ensure the survival of the attackers' model parameters during the FL aggregation process.Additionally, we explore how the Model replacement attack can enhance the success rate of both No-Adversarial-Sample-Training and False-Labeling attacks.

V. SUSCEPTIBILITY TO EVASION ATTACKS AND THE EFFECTIVENESS OF AT
In this section, we begin by discussing the setup of our experimental environment.Next, we introduce the dataset utilized in our experiments.We then assess the susceptibility of the electricity theft detectors trained through FL to EAs.Subsequently, we investigate the effectiveness of AT as a defense mechanism against these EAs.It is worth noting that in this section, we assume the absence of malicious FL participants, meaning that all participants faithfully adhere to the AT approach.

A. EXPERIMENTAL SETUP
In our experiments, we have used a variety of Python packages.In particular, we have depended on Pandas and Numpy to prepare the data and exploited Matplotlib in data visualization.We train a CNN-based detector because it has been proven in the literature that the CNN architecture results in an accurate detector as it is capable of capturing the patterns and correlations in the readings.Our detector has an input layer, two convolutional layers, two max-pooling layers, one fully connected layer, and an output layer as shown in Fig. 2. The hyper-parameters are given in Table 1.
In the training phase, the Adam optimizer is used to train the detector for 150 FL rounds, 0.0001 learning rate, 250 batch size, and categorical cross entropy as the loss function.In the  [6].The definition of this function is as follows: where 0 < α < 1.

C. SUSCEPTIBILITY TO EVASION ATTACKS
In our study, we investigate two different scenarios: the Independent and Identically Distributed (IID) scenario and the Non-Independent and Identically Distributed (Non-IID) scenario.In the IID scenario, consumers exhibit similar power consumption patterns, while in the Non-IID scenario, consumers have diverse consumption patterns.To create datasets for these two scenarios, we utilized K-means clustering.Specifically, we divided the consumers into 11 distinct clusters based on their electricity consumption data.For the Non-IID scenario, we selected one consumer from each cluster, resulting in a total of 11 consumers participating in the FL training.Conversely, for the IID scenario, we selected 11 consumers from a single cluster to participate in the FL training.We selected 11 consumers as the lowest number that allows us to investigate how varying the percentage of attackers affects the results while keeping the experiment's complexity manageable.Using fewer consumers may not accurately represent the desired attacker percentage, potentially resulting in impractical fractions.Additionally, increasing the consumer count does not affect the attack success rate as long as the percentage remains constant.In this subsection, we begin by describing the process through which participants in the FL training compute a unified electricity theft classifier using both benign and malicious samples.Subsequently, we assess the susceptibility of the resulting detector to EAs.To assess the performance of the trained classifier, we utilize the metric known as ''Accuracy''.This metric measures the percentage of correctly classified samples out of the total evaluated samples.A higher accuracy score indicates a more accurate classifier in correctly predicting the labels of input samples.In our case, the classifier we trained achieves an accuracy of 91.5% for the Non-IID scenario and 98.2% for the IID scenario.The higher accuracy observed in the IID scenario suggests that the presence of similar consumption patterns among consumers contributes to improved classification performance.On the other hand, the Non-IID classifier exhibits slightly lower accuracy, indicating the challenges posed by diverse consumption patterns among consumers in accurately predicting their class labels.

2) STEP 2: LAUNCHING PGD ATTACK
To assess the susceptibility of the electricity theft detector trained in Step 1 to EAs, we conduct the PGD attack.It is important to note that this attack is performed after the completion of FL training.In this attack, we input a malicious sample • E c (d, t) into the equation defined as Eq. 6.For the IID scenario, we set the value of ϵ to 3, while for the Non-IID scenario, we set it to 5. The attack consists of 100 iterations.Furthermore, we ensure that the total consumption of the resulting adversarial sample remains below 50% of the consumption of the true benign sample to achieve sufficient profit from the electricity theft.
The PGD attack continuously modifies the input malicious sample in order to transform it into a benign sample and avoid detection by the classifier.The success of the attack is measured using the Attack Success Rate, which calculates the percentage of malicious samples that are successfully transformed into benign samples out of the total input malicious samples.This metric provides an indication of the attack's effectiveness in evading the electricity theft detector.In our experiments, the Attack Success Rates for the IID and Non-IID scenarios are found to be 98% and 97.5%, respectively.These results clearly demonstrate the vulnerability of the electricity theft detector generated through the FL process to EAs.Consequently, it is essential to employ a defense mechanism that can protect against such attacks and ensure the robustness of the FL-generated detector.In the subsequent subsection, we investigate the effectiveness of AT as a defense mechanism against EAs.

D. THE EFFECTIVENESS OF AT
In this subsection, we evaluate the effectiveness of AT in defending the electricity theft detector trained in the previous subsection against EAs.This involves generating adversarial samples using the PGD attack and incorporating them into the training data to train the classifier.The training data consists of both benign data extracted from the Irish dataset (E c (d, t)) and malicious samples obtained using Eq. 12 ( • E c (d, t)) and the PGD attack (denoted as d, t)), representing adversarial instances.The benign samples are labeled as 0, while both malicious and evasion samples are labeled as 1.In order to enhance the robustness of the FL-trained detector, adversarial samples are generated at regular intervals during the AT process, denoted as E. In our evaluations, we conduct three experiments.The first experiment investigates the optimal value of E, followed by measuring the attack success rate across the training rounds.Finally, we evaluate both the normal accuracy and adversarial accuracy of the detector.

1) EXPERIMENT 1
In this experiment, we measure the attack success rate at different values of E in both IID and Non-IID scenarios, as depicted in Fig. 3.The results indicate a consistent trend across both scenarios.When E is set to 20, the attack success rate is relatively high, with values of 10.6% and 28.8% for the IID and Non-IID scenarios, respectively.As E decreases, the attack success rate also decreases.Notably, when E is reduced to 10, there is a noticeable drop in the success rate compared to the previous values, with around 3% and 7.9% for the IID and Non-IID scenarios, respectively.However, further decreasing E from 10 to 5 does not result in a significant decrease in the attack success rate in either scenario.Considering these findings, it is important to take into account the training time as well, as decreasing E leads to longer execution times.Therefore, selecting an optimal value for E, = 10, is important as it provides good training time while maintaining low attack success rate in both the IID and Non-IID scenarios.

2) EXPERIMENT 2
In this experiment, we analyze the attack success rate in relation to the number of training rounds for both the IID and Non-IID scenarios.The results are illustrated in Fig. 4. In the IID scenario, the initial attack success rate of 98% experiences a significant decline over the training rounds, ultimately dropping to a mere 3% by the end of the training 112962 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.rounds.Conversely, the Non-IID scenario starts with an initial success rate of 97.5%, showing a slower rate of decrease compared to the IID scenario.Even at the final round, the Non-IID success rate remains relatively high at 7.9%.The discrepancy in attack success rates between the IID and Non-IID scenarios can be attributed to the difference in generalization.When a model is trained with Non-IID data, it tends to exhibit greater generalization, meaning it has a broader understanding of various data patterns and characteristics.However, this increased generalization also renders the model more vulnerable to EAs, as evidenced by previous research conducted by Deniz et al. [53].Their findings indicate that as models become more generalized, their susceptibility to EAs increases.Therefore, the observed lower attack success rate in the IID scenario, compared to the Non-IID scenario, can be explained by the fact that the IID-trained model demonstrates relatively lower generalization, thereby offering better resilience against EAs.

3) EXPERIMENT 3
In this experiment, we assess the performance of the trained models in terms of both normal accuracy and adversarial accuracy, considering both the IID and Non-IID scenarios.The normal accuracy reflects the model's performance on regular data, which includes both benign and electricity theft samples.On the other hand, the adversarial accuracy evaluates the model's performance specifically on evasion data samples.In the IID setting, the normal accuracy achieves 94.7%, indicating a high level of accuracy on regular data.In contrast, in the Non-IID setting, the normal accuracy reaches 89%, showing a slightly lower performance compared to the IID setting.
Regarding the adversarial accuracy, the model trained on IID data achieves 97%, demonstrating its ability to correctly classify evasion samples.Similarly, the model trained on Non-IID data achieves an adversarial accuracy of 92%.It is worth noting that applying AT leads to a decrease in normal accuracy, which aligns with the findings of previous studies [13].Specifically, in the IID setting, the normal accuracy decreases from 98.2% before applying AT to 94.7% after applying AT.In the Non-IID setting, the normal accuracy decreases from 91.5% before applying AT to 89% after applying AT.This reduction in normal accuracy can be attributed to the focus on enhancing the model's robustness against EAs, which may introduce some trade-offs in performance on regular data.

VI. ADVERSARIAL EVASION ATTACKS AGAINST AT
In this section, we explore the vulnerability of the AT process to malicious FL participants who intend to compromise the global model's resistance against EAs.We introduce three potential attack strategies: Distillation, No-adversarialsample-training, and False-Labeling.

A. DISTILLATION ATTACK
In this attack, fraudulent consumers employ a defense strategy known as defensive distillation that differs from the AT defense strategy used by honest consumers.This defense mechanism exhibits high normal accuracy, making it difficult for honest consumers or the utility to detect the attack.The robustness of the trained model remains high, further complicating its detection.However, it is important to note that the defensive distillation defense has a vulnerability that only malicious consumers are aware of, and attackers can exploit this weakness to successfully evade the model.In this subsection, we first discuss how defensive distillation enhances the model's resilience against EAs.Then, we explore the specific vulnerability that attackers can exploit during the inference phase to evade the model.We explain the process of launching the Distillation attack combined with the Model Replacement attack, which involves replacing the global model with the attackers' model.Finally, we conduct experiments to assess the success of this attack.

1) THE ROBUSTNESS OF DEFENSIVE DISTILLATION
During the training process, The Defensive Distillation technique uses a scaling factor, called temperature (T ), which is used to increase the values of the logits (z) before applying the softmax function.However, during the testing phase, this scaling factor is removed.By increasing this scaling factor, the non-predicted class probabilities are effectively rounded to 0, while the predicted class probability approaches 1.This rounding process is facilitated by the use of 32-bit floatingpoint numbers, as values that are extremely close to either end can be represented as either 0 or 1.Furthermore, due to this rounding effect, the gradient, when represented using 32-bit floating-point numbers, is also rounded to 0. This prevents the adversarial attack from making further progress, as the gradient becomes too small to cause significant changes to the model [15].

2) VULNERABILITY IN DEFENSIVE DISTILLATION
As discussed earlier, the ineffectiveness of attacks on the distillation technique is attributed to the problem of vanishing gradients caused by temperature scaling, which results in large absolute values in the inputs to the softmax function.
To overcome this challenge, Carlini et al. proposed a solution in their work [54], which involves dividing the inputs to the softmax function by the temperature scaling factor T before using them.This modification is represented by the equation: By dividing the inputs by T, this adjustment ensures that the inputs to the softmax function have an appropriate scale, effectively addressing the issue of vanishing gradients.

3) LAUNCHING DISTILLATION ATTACK ON FedAvg
Fraudulent consumers use only normal data, which includes both benign samples (E c (d, t)) and malicious samples ( , to train a network with a temperature scaling factor (T ) set to 100 [54].They, then, use the resulting soft labels to train a second network.However, as noted in [13], this attack fails when averaged with other model parameters.
To overcome this limitation, we propose combining this attack with the Model Replacement attack.Specifically, a fraudulent consumer (m) launches the attack near convergence by incorporating their model parameters (X) into Eq.11 and sends ω t m to the server.Meanwhile, honest consumers adhere to the AT process and send their models, ω t c , to the server.The server then employs Eq. 4 to average the uploaded model parameters as follows.
As the global model converges, the term N −1 c=1 (ω t c − ω t−1 * ) becomes negligible.By substituting the value of ω t m from Eq. 11 into the previous equation, it can be simplified as follows.
Accuracy under distillation attack vs number of malicious consumers for IID scenario.
The equations presented above demonstrate that the attacker's model fully replaces the global model, leading to a successful attack.

4) EVALUATIONS
For both the IID and Non-IID scenarios, we conduct a comprehensive evaluation of the Distillation attack in two phases.In the first phase, we assess the normal accuracy and adversarial accuracy of the honest consumers who participate in the FL process.In the second phase, we examine the attack success rate by varying the number of malicious consumers involved in the FL process.
Firstly, Figs. 5 and 6 show the normal accuracy and adversarial accuracy against the number of malicious consumers launching the Distillation attack for IID and Non-IID scenarios, respectively.These results are compared to normal accuracy and adversarial accuracy discussed in Section V for the case where no malicious consumers are present.In the IID scenario, the normal accuracy under the Distillation attack ranges from 96.6% to 98.1% as the number of malicious consumers increases from 10% to 50%, in comparison to a normal accuracy of 94.4% in the absence of attackers.Similarly, the adversarial accuracy ranges from 85.4% to 96.1% with an increasing number of malicious consumers, compared to an adversarial accuracy of 97% in the absence of attackers.In the Non-IID scenario, the normal accuracy under the Distillation attack ranges from 92.2% to 94.4% as the number of malicious consumers varies from 10% to 50%, compared to a normal accuracy of 89% without any attackers.The adversarial accuracy ranges from 87.2% to 89% with an increasing number of malicious consumers, while the adversarial accuracy in the absence of attackers is 92%.We observe a slight difference in the performance metrics between the Distillation attack and the scenario without attackers.Specifically, the normal accuracy under the Distillation attack is marginally higher than the normal accuracy in the absence of attackers, while the adversarial accuracy under the attack is slightly lower than the adversarial accuracy without attackers, for both the IID and Non-IID scenarios.The slightly higher normal accuracy under the Distillation attack can be attributed to the training method of distillation, which increases the model's confidence in predicting normal samples.Conversely, the higher adversarial accuracy without attackers compared to the adversarial accuracy under the attack can be attributed to the attention given to adversarial samples during the AT process.Overall, the comparable performance between the attack and the absence of attackers makes the Distillation attack stealthy, as it has minimal impact on the model's performance.
In the second phase of our evaluations, following the completion of training, fraudulent consumers execute the EA by utilizing Eq.13 and Eq.6, where the logits of the model are divided by the temperature scaling factor (T ) before being passed through the Softmax activation function.We conduct an experiment to measure the attack success rate based on the participation of malicious consumers in the FL process.The results are presented in Fig. 7 for the IID and Non-IID scenarios.The findings indicate that the attack success rate increases as the number of malicious consumers rises.In the IID scenario, the attack success rate ranges from 83.5% to 93.5% as the number of malicious consumers varies between 10% and 50%.Similarly, in the Non-IID scenario, the attack success rate ranges from 88.05% to 95.85% with an increase in the number of malicious consumers from 10% to 50%.
When comparing IID and Non-IID scenarios, we can observe that the attack success rate in Non-IID scenario is higher than in IID scenario.This difference can be attributed to variations in model generalization.Models trained with Non-IID data demonstrate greater generalization, rendering them more susceptible to EAs.In contrast, IID-trained models exhibit lower generalization and greater resilience against EAs, which explains the lower attack success rate in the IID scenario compared to the Non-IID scenario [53].The increase in the attack success rate as the number of malicious consumers grows can be attributed to the following reason.As pointed out by Wu et al. [55], the deviations of Eq. 10, which result from honest consumers, represented as , are not completely canceled out during the averaging process when the attack is launched, resulting in some residual error.However, as the number of malicious consumers increases, the error caused by these deviations diminishes due to a decrease in the number of honest consumers participating in the FL process.

B. NO-ADVERSARIAL-SAMPLE-TRAINING ATTACK
In contrast to honest consumers who incorporate both normal data (including benign and malicious instances) and evasion data in their local model training, fraudulent consumers launching the No-Adversarial-Sample-Training attack deliberately omit the inclusion of evasion samples in their training process.They purposefully exclude any instances of evasion samples when training their local models.Consequently, these local models, and potentially the global model as well, are not exposed to evasion samples of the malicious consumers during training, rendering them susceptible to EAs when deployed as an electricity theft classifier by the utility.
We conducted two experiments to thoroughly assess the effectiveness of the No-Adversarial-Sample-Training attack.The first experiment evaluated the attack in both IID and Non-IID scenarios, focusing on the impact on model performance and analyzing the attack success rate with varying numbers of malicious consumers.In the second experiment, we specifically studied the combined effect of the Model Replacement attack and the No-Adversarial-Sample-Training attack, focusing on the IID scenario, to explore how their synergistic impact can enhance the attack success rate.

1) EXPERIMENT 1
Figs. 8 and 9 show the normal accuracy and adversarial accuracy of honest consumers when No-Adversarial-Sample-Training attack is launched by malicious consumers in both the IID and Non-IID scenarios.These figures present the performance for different percentages of malicious consumers.These results are compared to the case with no attackers, as explained in Section V. Furthermore, Fig. 10 illustrates the attack's success rate in the IID and Non-IID scenarios while varying the number of malicious consumers.
In the IID scenario, as shown in Fig. 8, the normal accuracy remains relatively stable at around 94% when the proportion of malicious consumers ranges from 10% to 50%.This is in comparison to the normal accuracy of 94.4% observed when no attackers participate in the federated learning process.However, the adversarial accuracy shows a decline from 93.8% to 72.9% as the number of malicious consumers increases, while the adversarial accuracy without attackers is 97%.Moreover, the attack success rate, as depicted in Fig. 10, shows an increase from 1.4% to 27.5% as the number of malicious consumers participating in the federated learning process varies from 10% to 50%.
In the Non-IID scenario, as shown in Fig. 9, the normal accuracy remains consistent at around 89% under the No-Adversarial-Sample-Training attack, regardless of the varying number of malicious consumers from 10% to 50%.This level of accuracy is comparable to the normal accuracy of 89% when no attackers are present.Similarly, the adversarial accuracy remains stable at approximately 91% as the proportion of malicious consumers changes from 10% to 50%, compared to an adversarial accuracy of 92% in the absence of attackers.Furthermore, the attack success rate, depicted in Fig. 10, shows an average of 83% as the proportion of malicious consumers varies from 10% to 50%.
When comparing IID and Non-IID scenarios, notable differences emerge regarding the susceptibility to EAs as shown in Fig. 10.In the IID scenario, where the data exhibits a similar distribution, models trained on such data tend to exhibit similarities in their model parameters [38], [39], [40], [41].Consequently, when averaging the model parameters of malicious participants, who have not done evasion sample training, with the model parameters of honest participants trained on evasion samples, the influence of the AT process on the resulting global model diminishes.This influence increases as the number of malicious consumers grows, as their models contribute more to the averaging process.Consequently, the global model becomes more vulnerable to EAs as the number of malicious consumers increases.
In contrast, in the Non-IID scenario, the efficacy of EAs is notably pronounced due to the presence of distinct  consumption patterns among consumers, leading to the generation of diverse model parameters.In contrast to the IID scenario, where the training parameters of malicious consumers' models can mitigate the impact of AT when averaged with those of honest consumers, the diverse nature of data in the Non-IID scenario impedes such mitigation.Consequently, the global model readily incorporates all the varied behaviors reflected in the model parameters.Accordingly, when the model is utilized by the utility, it erroneously classifies the evasion samples generated by malicious consumers as benign, while correctly classifying those generated by honest consumers.This inherent characteristic of the attack poses a significant challenge in terms of its detection.

2) EXPERIMENT 2
In this experiment, we combine the No-Adversarial-Sample-Training attack with the Model Replacement attack to achieve a higher success rate.Fig. 11 presents a comparative analysis of the attack success rates before and after incorporating the 112966 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Model Replacement attack in the IID scenario, considering varying proportions of malicious consumers.As the number of malicious consumers increases from 10% to 50%, the attack success rate ranges from 72.2% to 87.8% when the Model Replacement attack is launched.Conversely, without the Model Replacement attack, the attack success rate falls within the range of 1.4% to 27.5%.This significant surge in the attack success rate is attributed to the fact that the Model Replacement attack completely replaces the model parameters of honest consumers with those of the malicious participants, thereby amplifying the effectiveness of the attack.
With an increasing number of malicious consumers, the attack success rate rises.This is due to the persistence of errors resulting from deviations caused by honest participants during the attack, as highlighted by Wu et al.Specifically, these deviations, represented as N −1 c=1 (w t c − t−1 * ) in Eq. 10, are not fully eliminated during the averaging process when the attack is launched, resulting in residual errors.Yet, as the number of malicious participants grows, these errors diminish because of a decrease in the number of participating honest consumers.In essence, as more malicious participants join, the influence of these errors decreases, leading to a higher attack success rate.However, it is important to acknowledge that this rise in the attack success rate is accompanied by a decrease in the adversarial accuracy of honest consumers, as the replaced model lacks the resilience to EAs.As depicted in Fig. 12, the adversarial accuracy of honest consumers declines from 25.1% to 14.1% as the percentage of malicious consumers increases from 10% to 50%.

C. FALSE-LABELING ATTACK
In contrast to the No-Adversarial-Sample-Training attack, which does not involve training the model with adversarial samples, the False-Labeling attack takes a different approach.In this attack, fraudulent consumers generate evasion samples using the PGD attack and label these  samples as benign.These labeled evasion samples are then used to train the model.Consequently, the model becomes accustomed to accepting and classifying such evasion samples as normal when employed for electricity theft detection.
By assigning benign labels to evasion samples and misrepresenting them as normal instances, fraudulent consumers aim to undermine the FL process.Their objective is to manipulate the model's perception and decision-making by providing false information about the true nature of these adversarial samples.In this attack, both benign (E c (d, t)) and evasion samples ( To comprehensively assess the effectiveness of the False-Labeling attack, we conducted two experiments.In the first experiment, we evaluated the attack in both IID and Non-IID scenarios, considering two key aspects.Firstly, we measured the normal accuracy and adversarial accuracy to understand the impact of the attack on the model's performance.Secondly, we analyzed the attack success rate by varying the number of malicious consumers participating in the FL process.In the second experiment, we focused on investigating the combined effect of the Model Replacement attack and the False-Labeling attack.This experiment specifically concentrated on the IID scenario to explore how the attack success rate can be further enhanced by leveraging the synergistic impact of these two attack strategies.

1) EXPERIMENT 1
Figs. 13 and 14 show the normal accuracy and adversarial accuracy of honest consumers under the False-Labeling attack, considering various numbers of malicious consumers in the IID and Non-IID scenarios, respectively.These results are compared to the case where no attackers are present, as discussed in Section V. Furthermore, Fig. 15 illustrates the attack success rate for the IID and Non-IID scenarios as the number of malicious consumers varies.
In the IID scenario, as depicted in Fig. 13, the normal accuracy under the False-Labeling attack remains around 94%, while the number of malicious consumers ranges from 10% to 50%.This is in comparison to the normal accuracy in the absence of attackers, which stands at 94.4%.However, the adversarial accuracy experiences a decline from 92.6% to 15% as the number of malicious consumers increases, compared to 97% adversarial accuracy in case of no attackers participate in the FL process.Additionally, the attack success rate, illustrated in Fig. 15, exhibits an increase from 5.45% to 62.4% as the number of malicious consumers participating in the FL varies from 10% to 50%.
In Non-IID scenario, as shown in Fig. 14, the normal accuracy under the False-Labeling attack remains around 88% as the number of malicious consumers changes from 10% to 50%, compared to 89% normal accuracy in the case of no attackers.Similarly, the adversarial accuracy remains around 91% as the number of malicious consumers changes from 10% to 50%, compared to 92% adversarial accuracy in the case no attackers.On the other hand, the attack success rate, illustrated in Fig. 15, maintains an average of 92% as the number of malicious consumers varies from 10% to 50%.
In the comparison between the IID and Non-IID scenarios, we observe significant differences in their susceptibility to EAs as depicted in Fig. 15.In the IID scenario, where consumers exhibit similarity in data patterns, evasion samples generated by malicious consumers tend to share similarities.These evasion samples are mislabeled by the malicious consumers, countering the correct labeling of evasion samples by honest consumers.Consequently, the attack success rate increases as the number of malicious consumers rises as depicted in Fig. 15, leading to a decrease in the adversarial accuracy of honest consumers, as shown in  Fig. 13.These findings indicate that the nature of IID data offers some level of resistance to this attack.It is important to note that the attack primarily focuses on the evasion samples, while leaving the normal accuracy of honest consumers unaffected, thus making the attack difficult to detect.
In contrast, we observe a notably high success rate of the attack in the Non-IID scenario.This can be attributed to the presence of distinct consumption patterns among consumers in this scenario, leading to the generation of diverse evasion samples.Unlike the IID scenario, where the training of honest consumers' models on evasion samples counteracts the training of malicious consumers, in the Non-IID scenario, the diversity of evasion samples prevents such counteraction.As a result, when the model is utilized by the utility, it misclassifies the evasion samples generated by malicious consumers as benign, while successfully classifying the evasion samples generated by honest consumers.This characteristic of the attack makes it difficult to detect.

2) EXPERIMENT 2
In this experiment, we combine the Model Replacement attack with the False-Labeling attack to achieve a higher 112968 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.attack success rate.Fig. 16 illustrates the comparison of attack success rates before and after applying the Model Replacement attack, considering the number of malicious consumers in the IID scenario.With an increase in the number of malicious consumers from 10% to 50%, the attack success rate ranges from 80.9% to 93.9% after incorporating the Model Replacement attack.In contrast, the attack success rate without the Model Replacement attack ranges from 5.45% to 62.4%.This significant increase in the attack success rate is achieved because the Model Replacement attack entirely replaces the model parameters of honest consumers with those of malicious consumers, thereby enhancing the effectiveness of the attack.
As previously discussed, the attack success rate rises as more malicious consumers are involved.This can be attributed to the persistence of errors caused by deviations of honest consumers during the attack, specifically represented as N −1 c=1 (w t c − w t−1 * ) in Eq. 10.As the number of malicious consumers increases, these errors gradually decrease, primarily due to the reduced participation of honest consumers in the FL process [55].However, it is important to note that this increase in the attack success rate is accompanied by a decrease in the adversarial accuracy of honest consumers.As depicted in Fig. 17, the adversarial accuracy of honest consumers remains below 7%.
for the Model Replacement attack.Finally, in the False-Labeling attack, we observed a varying attack success rate ranging from 5.45% to 62.4% as the percentage of malicious consumers participating in the FL process increased from 10% to 50% in the IID scenario.Also, for this particular case, we focused on examining the impact of combining the Model Replacement attack with the False-Labeling attack on the attack success rate.Notably, our results revealed a substantial increase in the attack success rate, ranging from 80.9% to 93.9% as the number of malicious consumers varied from 10% to 50%.On the other hand, in the Non-IID scenario, the attack success rate exhibited an average of 92% as the proportion of malicious consumers ranged from 10% to 50%.This signifies the success of the attack without necessitating the use of the Model Replacement attack.
In conclusion, while the AT defense proves effective in protecting against EAs when no attackers are present during the AT process, our research underscores the necessity for additional complementary countermeasures to enhance the security of the AT process against EAs in the presence of attackers.In our future work, we will delve into exploring and integrating alternative defense strategies that can be synergistically combined with AT.By leveraging a combination of robust countermeasures, we aim to bolster the resilience of the model against EAs and mitigate the vulnerabilities posed by the potential existence of attackers during the AT process.This comprehensive approach will contribute to a more robust and effective defense approach, ensuring the continued security of federated learning process against sophisticated evasion threats.
(r +1) adv represents the adversarial sample at iteration r + 1, Π ϵ denotes the projection operator onto the ϵ-ball, where ϵ is the maximum allowed perturbation in the generated adversarial samples, x (r ) adv is the current adversarial sample at iteration r, β represents the perturbation magnitude, sign(•) is the sign function, ∇ x (r ) adv denotes the gradient of the loss function L with respect to x (r ) adv , f ω (•) represents the model with parameters ω, and y is the target label.

FIGURE 3 .
FIGURE 3. Attack success rate vs E for IID and Non-IID scenarios.

FIGURE 4 .
FIGURE 4. Attack success rate vs number of communication rounds for IID and Non-IID scenarios.

FIGURE 6 .
FIGURE 6. Accuracy under distillation attack vs number of malicious consumers for Non-IID scenario.

FIGURE 7 .
FIGURE 7. Attack success rate of Distillation attack vs number of malicious consumers for IID and Non-IID scenarios.

FIGURE 8 .
FIGURE 8. Accuracy under No-Adversarial-Sample-Training attack vs number of malicious consumers for IID scenario.

FIGURE 9 .
FIGURE 9. Accuracy under No-Adversarial-Sample-Training attack vs number of malicious consumers for Non-IID scenario.

FIGURE 10 .
FIGURE 10.Attack success rate of No-Adversarial-Sample-Training attack vs number of malicious consumers for IID and Non-IID scenarios.

FIGURE 11 .
FIGURE 11.Attack success rate of No-Adversarial-Sample-Training attack with and without Model Replacement attack vs number of malicious consumers for IID scenario.

FIGURE 12 .
FIGURE 12. Accuracy under No-Adversarial-Sample-Training attack with Model Replacement attack vs number of malicious consumers for IID scenario.

•
E c (d, t)) are labeled as 0, while the malicious samples represented by • E c (d, t) are labeled as 1.During the inference phase, when the trained model encounters new instances, including the evasion samples, it may incorrectly classify them due to this attack.Exploiting this vulnerability, fraudulent consumers can successfully steal electricity and evade detection.

FIGURE 13 .
FIGURE 13.Accuracy under False-Labeling attack vs number of malicious consumers for IID scenario.

FIGURE 14 .
FIGURE 14. Accuracy under False-Labeling attack vs number of malicious consumers for Non-IID scenario.

FIGURE 15 .
FIGURE 15.Attack success rate of False-Labeling attack vs number of malicious consumers for IID and Non-IID scenarios.

FIGURE 16 .
FIGURE 16.Attack success rate of False-Labeling attack with and without Model Replacement attack vs number of malicious consumers for IID scenario.

FIGURE 17 .
FIGURE 17. Accuracy under False-Labeling attack with Model Replacement attack vs number of malicious consumers for IID scenario.

TABLE 1 .
The hyper-parameters of the CNN-based detector.
intervals over a period of 536 days.We utilize this dataset as the source of benign training samples, denoted as E c (d, t), for each consumer c among the total of N consumers.To generate the malicious training samples, we employ a cyber-attack function commonly used in the literature