An Efficient Privacy-Enhancing Cross-Silo Federated Learning and Applications for False Data Injection Attack Detection in Smart Grids

Federated Learning is a prominent machine learning paradigm which helps tackle data privacy issues by allowing clients to store their raw data locally and transfer only their local model parameters to an aggregator server to collaboratively train a shared global model. However, federated learning is vulnerable to inference attacks from dishonest aggregators who can infer information about clients’ training data from their model parameters. To deal with this issue, most of the proposed schemes in literature either require a non-colluded server setting, a trusted third-party to compute master secret keys or a secure multiparty computation protocol which is still inefficient over multiple iterations of computing an aggregation model. In this work, we propose an efficient cross-silo federated learning scheme with strong privacy preservation. By designing a double-layer encryption scheme which has no requirement to compute discrete logarithm, utilizing secret sharing only at the establishment phase and in the iterations when parties rejoin, and accelerating the computation performance via parallel computing, we achieve an efficient privacy-preserving federated learning protocol, which also allows clients to dropout and rejoin during the training process. The proposed scheme is demonstrated theoretically and empirically to provide provable privacy against an honest-but-curious aggregator server and simultaneously achieve desirable model utilities. The scheme is applied to false data injection attack detection (FDIA) in smart grids. This is a more secure cross-silo FDIA federated learning resilient to the local private data inference attacks than the existing works.


An Efficient Privacy-Enhancing Cross-Silo Federated Learning and Applications for False Data Injection Attack Detection in Smart Grids
Hong-Yen Tran , Jiankun Hu , Senior Member, IEEE, Xuefei Yin , and Hemanshu R. Pota Abstract-Federated Learning is a prominent machine learning paradigm which helps tackle data privacy issues by allowing clients to store their raw data locally and transfer only their local model parameters to an aggregator server to collaboratively train a shared global model. However, federated learning is vulnerable to inference attacks from dishonest aggregators who can infer information about clients' training data from their model parameters. To deal with this issue, most of the proposed schemes in literature either require a non-colluded server setting, a trusted third-party to compute master secret keys or a secure multiparty computation protocol which is still inefficient over multiple iterations of computing an aggregation model. In this work, we propose an efficient cross-silo federated learning scheme with strong privacy preservation. By designing a double-layer encryption scheme which has no requirement to compute discrete logarithm, utilizing secret sharing only at the establishment phase and in the iterations when parties rejoin, and accelerating the computation performance via parallel computing, we achieve an efficient privacy-preserving federated learning protocol, which also allows clients to dropout and rejoin during the training process. The proposed scheme is demonstrated theoretically and empirically to provide provable privacy against an honest-butcurious aggregator server and simultaneously achieve desirable model utilities. The scheme is applied to false data injection attack detection (FDIA) in smart grids. This is a more secure crosssilo FDIA federated learning resilient to the local private data inference attacks than the existing works.
Index Terms-Privacy-preserving, federated learning, encryption, secret sharing, false data injection attack detection.

I. INTRODUCTION
F EDERATED learning [1] is an emerging machine learning paradigm which addresses critical data privacy issues by enabling clients to store their raw data locally and transfer only their updated local model parameters to an aggregator server for jointly training a global model. Due to this characteristic, federated learning offers significant privacy improvements over centralizing all the training data. However, federated learning is vulnerable to inference attacks from dishonest aggregators who can infer information about clients' training data from their model parameters (weights, gradients) [2], [3], [4], [5], [6], [7]. For example, [4] employed generative adversarial networks to infer the private data of a target client from its shared model parameters. This means that even if the model is trained in federated learning, data privacy still cannot be rigorously guaranteed. Information can be extracted from global model parameters, but this information cannot be linked to a specific single client because the data samples are anonymized among multiple clients. However, this is not the case if the information is inferred from local model parameters by a corrupted aggregator. Thus, clients' model parameters should be protected from the access of a corrupted aggregator to prohibit these potential inference attacks.
To address this problem, existing approaches focus on two main techniques, which are differential privacy-based and secure aggregation-based. The former adds noise directly to the client's models over a numerous number of iterations; thus, it has the drawbacks of sacrificing the global model accuracy to make a trade-off of privacy-utility. The latter utilizes techniques in cryptography such as secure multiparty computation and homomorphic encryption to securely aggregate the clients' models without knowing their specific values. However, most of these existing approaches rely on a trusted third party to generate the master key for aggregation or a setting with multiple non-colluding servers. Besides, many proposed schemes are still inefficient and impractical due to the expensive overhead of computation and communication among multiple clients over multiple rounds of training.
False data injection attack (FDIA) detection [8], [9] is a critical security operation in a smart grid control system. and has been solved by data-driven machine learning methods. The data-driven machine learning methods require a huge amount of measurement data which are distributed over an interconnected grid. In such an interconnected grid, each subgrid is possessed and managed by an independent transmission grid company (TGC) regarding power industry deregulation [10], [11]. To build a high-accuracy model for false data injection detection, measurement data from all involved sub-grids should be shared. However, transmitting such huge measurement data over the network for a centralized detection machine learning algorithm is expensive and also leads to security and privacy issues including competitive privacy [12]. The question is how to coordinate these TGCs to detect FDI attacks while preserving their competitive privacy. This remains a challenging problem which has been attracting recent studies with federated learning-based solutions. In federated learning, a cross-silo setting is often established where a number of companies or organizations have a common incentive to train a model based on all of their data, but do not share their data directly due to confidentiality/privacy or legal constraints [13]. To enhance the privacy of power companies when they contribute their local training models, an efficient privacypreserving cross-silo federated learning for FDIA detection over multi-area transmission grids should be designed.
In view of the above issues, we propose an efficient crosssilo federated learning with strong privacy preservation which can be applicable to the smart grid domain. By designing a double-layer encryption scheme over multiple federated learning rounds and utilizing Shamir secret sharing, we achieve an efficient privacy-preserving federated learning protocol, which also allows some clients to drop out and rejoin dynamically during the training process. Specifically, we summarize the main contributions as follows: • A general privacy-enhancing cross-silo federated learning with a secure weighted aggregation scheme is designed based on lightweight double-layer encryption and Shamir secret sharing. The scheme removes the requirement of computing discrete logarithms which is the limitation of some related works. No multiple non-colluding server settings are required. Besides, clients' secret keys of two encryption layers are generated in a decentralized manner which helps increase privacy.
• The proposed scheme is demonstrated theoretically and empirically to provide provable privacy against an honestbut-curious aggregator server and simultaneously achieve desirable model utility.
• The proposed scheme is efficient in communication/computation and robust against dropouts/rejoining during training iterations.
• An efficient privacy-enhancing cross-silo federated learning resilient to the local training data inference attacks for FDIA detection in the smart grid domain is proposed and empirically evaluated. This paper consists of eight sections. Following this Introduction section are the Related Works and Preliminaries sections. The proposed privacy-enhancing cross-silo federated learning without any trusted third parties is given in Section IV, followed by the analysis of the scheme in Section V. A concrete scenario of enhancing privacy in crosssilo federated learning for FDIA detection in smart grids with empirical evaluation is given in Section VI and Section VII. Finally, Section VIII is for the discussion and conclusions.

II. RELATED WORKS
Existing works on enhancing privacy for federated learning mainly employ two types of techniques. One technique is differential privacy [14], which adds appropriate noise to shared parameters according to the desired privacy level. For example, [15] added Laplace noise to the gradients and selectively shared the perturbed gradients, [16], [17] presented a client-sided differential privacy federated learning scheme to hide clients' model contributions during training. To protect local models, the added noise to each local model must be big enough, resulting in the aggregate noise corresponding to the aggregate model being too large, which would completely destroy the utility of this model.
The other technique is secure multiparty computation and homomorphic encryption for secure aggregation. The scheme in [18] was based on Elgamal homomorphic encryption. This scheme requires a trusted dealer to provide each participant with a secret key sk i and the aggregator sk 0 such that k i=0 sk i = 0. Their private secure aggregation is aggregator oblivious in the encrypt-once random oracle model where each participant only encrypts once in each time period. To decrypt the sum, it ends up computing the discrete logarithm which can be implemented through a brute-force search or Pollard's lambda method which requires O( √ k ), where k is the number of parties and is the maximum value of any party's input. To overcome the limitations of solving discrete logarithm problems, [19] presented a scheme in the encrypt-once random oracle model with fast encryption and decryption based on Decisional Composite Residuosity Assumption which removes the discrete logarithm computation. However, this scheme also requires a trusted dealer to generate and distribute the secret keys to participants and an aggregator. Besides, both of the approaches in [18] and [19] only deal with secure aggregation of scalars over periods of time (not the secure weighted aggregation of model vectors over multiple iterations of federated learning) and does not deal with dropouts/rejoining problems. Addressing the drawbacks of [18] and [19], the work in [20] proposed a secure aggregation scheme where the input is a vector and can deal with dropouts. The scheme is based on pairwise additive stream ciphers and Shamir secret sharing to tackle client failures. Diffie-Hellman key exchange is adopted to share common pair-wise seeds of a pseudorandom generator. Double-masking is introduced to prevent leakage if there is any delay in transmission. Nevertheless, this approach requires at least four communication rounds between each client and the aggregator in each iteration and a repetition of Shamir secret sharing for each iteration. Thus, it suffers from communication and computation inefficiency considering the huge number of iterations of federated learning. Utilizing the technique of secure data aggregation in [20], the work in [21] proposed a general privacy-enhanced federated learning scheme with secure weighted aggregation, which can deal with both the data significance evaluation and secure data aggregation. This scheme still inherits the same drawbacks as [20]. Besides, this scheme only resolved a weak security model where no collusion between the server and the clients participating in the federated learning. The paper [22] presented Prio, a privacy-preserving system for the collection of aggregate statistics. With a similar approach, [23] introduced SAFELearn, a generic design for efficient private federated learning systems that protect against inference attacks using secure aggregation. However, these designs rely on multiple non-colluded server settings. Dong et. al. in [24] designed two secure ternary federated learning protocols against semihonest adversaries based on threshold secret sharing and homomorphic encryption respectively. In the first protocol, threshold secret sharing is used to share all local gradient vectors in all iterations, which causes expensive computation and communication overhead. Besides, the limitation of their second protocol is that all clients use the same secret key and if the server colludes with a client then it can obtain all client's models. In [25], Fang et. al. modified the traditional ElGamal protocol into a double-key encryption version to design a new scheme for federated learning with privacy preservation in cloud computing. Nevertheless, the scheme has to solve the discrete logarithm problem as [18]. The study in [26] combined additively homomorphic encryption with differential privacy but cannot tolerate client dropouts. Their system creates significant run-time overheads which makes it impractical for real-world federated learning applications. Functional encryption and differential privacy is utilized in [27] to design the HybridAlpha scheme. However, HybridAlpha relies on a trusted party that holds the master keys. The proposed scheme in [28] replaced the complete communication graph in [20] with a k-regular graph of the logarithmic degree to reduce the communication cost while maintaining the security guarantees; however, each client shares its secret across only a subset of parties, and thus the dropout-resilience is downgraded.
Considering the integrity of the global model besides the privacy preservation of the local data and models, the proposed approach in [29] combined the Paillier additive homomorphic and verifiable computation primitives. The scheme in [29] can verify the correctness of the aggregated model given the fact that every client provides their genuine local models. From the perspective of privacy preservation, the scheme can only tolerate a weaker threat model. No collusion among the server and clients participating in the federated learning protocol was assumed as the keys (sk, pk) necessary for the homomorphic encryption and the signatures are generated by one of the clients and shared among all clients. In the work [17], to deal with the problem of collusion in [29], adding Gaussian noise to the local models before homomorphically encryption was proposed. However, the standard variation of the additive Gaussian noise must be small to not destroy the genuine local models, resulting in the fact that the adding noise protection is not able to provide a high level of differential privacy (ε is not small, i.e., less than 1).
The power grid scenario of false data injection attack detection based on federated learning in smart grids has been studied in [30], [31], and [32]. The investigated power grid scenario is similar in these papers and in the proposed scheme. For example, in [30] an independent power system state owner (PSSO) and a detection service provider (DSP) correspond to an independent transmission grid company (TGC) and a system operator (SO) in the proposed scheme. The power grid scenario fits with the investigated cross-silo federated learning setting (e.g., the number of parties (PSSOs/TGCs) is small and each party is facilitated with high-performance computing). However, [30] and [31] only apply federated learning and do not consider the security problem of local data privacy leakage from local models as in [32] and our proposed scheme. The scheme in [32] enhanced privacy by utilising Pallierbased homomorphic encryption for secure model aggregation, but only resolved a weak security model where no collusion among the server and the clients participating in the federated learning. All clients have to share a common pair of public key and secret key for encryption/decryption and a trusted party is required to generate this key pair.
A privacy-preserving federated learning approach needs to be efficient in computation and communication while providing strong privacy preservation and desirable model utility. Most of the related works focus on the basic problem of secure aggregation with the main approaches based on secure multiparty computation, homomorphic encryption, and differential privacy. In spite of some achievements in secure aggregation and privacy-preserving federated learning, there are still drawbacks. The majority of proposed schemes in literature either require a trusted third party to compute master secret keys or all local parties share a common secret key or non-colluded server settings. This means these works guarantee privacy in weaker security models (e.g., no collusion).
The proposed scheme does not require a trusted dealer to provide each participant with a secret key as the scheme in [18], [19], [27], and [32]. While the schemes in [18] and [25] require computing the discrete logarithm, our scheme removes that complexity by utilizing the encryption-decryption based on the Decisional Composite Residuosity assumption. Moreover, both of the approaches in [18] and [19] only deal with secure aggregation of scalars over periods of time, not the secure weighted aggregation of model vectors over multiple iterations of federated learning. The dropout and rejoining problems were not investigated in these works too. Although eliminating the drawbacks in [18] and [19], the schemes in [20] and [28] suffer higher computation overhead than the proposed approach and do not address federated learning with secure weighted aggregation. Other systems in [22] and [23] depend on multiple non-colluded server settings, which is not required with our scheme. The systems in [21], [24], [29], and [32] cannot tolerate the risk of revealing all clients' models when there is a collusion between the server and a client as our protocol. The study in [26] cannot resolve client dropouts. Their system creates significant run-time overheads, making it impractical for real-world federated learning applications. Our scheme is resilient to dropouts and provides efficient performance for real applications, such as privacy-preserving federated learning false data injection detection.
To summarize, Table I gives a comparison of our scheme with related works regarding the application scenario of FDIA federated learning with secure weighted aggregation (A1, A2) and different security/privacy properties (A3-A8). Only three recent works [30], [31], [32] studied the FDIA federated learning. Most of the related works do not provide all security properties A3-A8. Only the studies in [20] and [28] filtered from Table I satisfy all security/properties as the proposed approach. Table II compares the computation and communication complexity between these two studies [20], [28] and the proposed scheme. From Table I and Table II, it can be seen that the proposed scheme guarantees privacy in a stronger security model and at a lower computational overhead than the related works.

A. Notations and Definitions
Column vectors are denoted by lower-case bold letters, like v. The i-th entry of the vector v is v i . v T is the transpose of the column vector v. The zero-vector is represented by 0. Given a set S, x ← $ S indicates that x is sampled uniformly at random from S. The notion [k] represents the set {0, 2, . . . , k − 1}. The computational indistinguishability of two distributions H 0 and H 1 , is denoted by H 0 ∼ = H 1 . Table III lists the notions used in this paper.

B. Federated Learning
Federated learning is a machine learning scheme where multiple clients collaborate in generating a shared machine learning model, under the coordination of a central server. Each client's raw data is stored locally and not transmitted; instead, their local model parameters are sent to the server for aggregation to achieve the learning objective. Cross-silo federated learning is the federated learning setting when clients are different organizations or geo-distributed data centres that have the incentive to train a shared model on the union of their siloed data. [13] Several algorithms have been proposed for federated learning. In this work, we utilize FedAvg [1], which is the original federated learning aggregation mechanism and is commonly applied in related works. In FedAvg, the global model parameters are updated by summing the weighted local model

C. Shamir Secret Sharing
(t, n) Shamir secret sharing scheme [33] creates k shares } of a secret s such that s can be efficiently reconstructed by any combination of t data pieces but cannot by any set of less than t data pieces.
s, s (1) , . . . , s (n) are the elements in a finite field Z p for some large prime p where 0 < t ≤ n < p. The scheme works as follows: • Setup: The secret holder randomly chooses a 1 , . . . , a t−1 from Z p and a 0 = f (0) = s to define a polynomial of degree t ≤ 1: . . , n}, and sends (i, s (i) ) to the corresponding participants i.
• Reconstructing: Given any t of (i, s (i) ) pairs, an user is able to reconstruct the secret

D. Decisional Composite Residuosity Assumption
Let N = p · q for two large primes p and q. The Decisional Composite Residuosity (DCR) assumption [34] states that the advantage of a distinguisher D, defined as the distance: where probabilities are taken over all coin tosses, is a negligible function

E. False Data Injection Attacks
False data injection attacks (FDIAs) are designed by manipulating some measurements to circumvent the residual-based bad data detection in a power management system [8], [9], [35]. Various algorithms have been designed to detect these attacks using new techniques instead of the residualbased bad data detection mechanism. One example is the deep learning network to model the spatial-temporal relationship between bus and line measurements in [36].

IV. PROPOSED PRIVACY-ENHANCING CROSS-SILO FEDERATED LEARNING
A. System Model and Overview of the Proposed Privacy-Enhancing Cross-Silo Federated Learning Consider a system with k local parties and an aggregator server. Each local party owns its private dataset D i , i ∈ {1, . . . , k} with n i = |D i | samples. All local participants agree on the same learning network structure N . The global learning network model at t-th iteration consists of L weight parameters, denoted as w The aim is to learn a global network model from all local datasets without exposing participants' data privacy under the coordination of the aggregator.
The adversary is the honest-but-curious aggregator server which is assumed to follow the protocol honestly, but attempts to infer sensitive information about participants' training data from their model updates w i . It is also assumed that there are private and authenticated peer-to-peer channels between parties so that the data transferred cannot be eavesdropped on or modified. This can be enforced in practice with the appropriate use of Digital Signatures and Certificate Authorities. To implement federated learning which utilizes the union of local datasets, for each iteration, each party contributes its local model vector w (t) i . Unfortunately, this raises the risk of inference attacks performed by an honest-but-curious aggregator on each local model to extract information about the corresponding party's local data used for training. Hence, ordinary federated learning needs to be integrated with privacy protection techniques to prohibit access to individual model updates. The system should be designed in a way to hide local models from the aggregator to counter the inference attacks while still enabling efficient and accurate federated learning.
The following section introduces and explains the main techniques in the proposed privacy-enhancing cross-silo federated learning scheme.
B. High-Level Technical Overview 1) Protecting Local Models: The encryption scheme in [19] based on the DCR assumption in the random oracle model is utilized to obtain the global model vector as the weighted average function of a set of local model vectors given their i, j is the j-th element of the i-th party's model vector encoded in a non-negative integer form at the t-th iteration, sk (t) i is the secret encryption key of i-th party at the t-th iteration. The main benefit of this construction is that the weighted average global model vector can be retrieved without computing the discrete logarithm as the other approaches in literature [18], [25]. In [19], only the secure aggregation is considered and it is assumed that there exists a trusted dealer generates encryption key sk i , i = 1 · · · k and the master key sk 0 = − k i=1 sk i . In our proposed scheme, the secure weighted aggregation is investigated, each party creates its own secret key sk (t) i and the master key is computed from clients' secret keys in a secure computation manner. To enable the secure weighted aggregation of local models which was not considered in [19], the number of each party's training samples is also encrypted by the corresponding sk (t) i at each iteration. The master key to decrypt the global model vector is calculated as i , where U t a is the set of alive parties who contribute their encrypted local models for aggregation. This master key should be computed in a secure way to increase privacy level. This is achieved by designing the second layer of the basic encryption scheme to encrypt the secret encryption keys sk (t) i of the first layer, which is β (t) The secret encryption key of this second layer is v (t) i . The requirement for v (t) i is that it is privately generated by each party such that i∈U v (t) where U is the set of all parties. Different from the secret keys sk of the first layer which are generated at each iteration, the Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. secret encryption keys v (0) which are created at the initial sub-protocol π 0 of the establishment phase basically can be used for multiple iterations (v (t) = v (t−1) = · · · = v (0) ). The generation of v (t) is based on the correlated antiparticles using common pair-wise secrets, v i, j is the common initial pair-wise secrets between party i and party j created by adopting the Diffie-Hellman Key exchange protocol.
2) Handling Dropouts: Shamir's τ -out-of-k secret sharing is utilized to allow a user to split a secret into k shares, such that any τ shares can be used to reconstruct the secret, but any set of at most τ ≤ 1 shares gives no information. Each party creates k shares of its secret s (t) i , keeps one share and sends each share to each different party from k − 1 remaining shares. At each iteration t, after receiving the ciphertexts, the aggregator broadcasts the set of alive parties U t a , the set of the dropped     d . For this case, the party P d needs to send its updated public key pk d = g s (t) d to the aggregator, then creates and shares Shamir's shares of its updated secret s

4) Reducing Communication and Computation Overhead:
To overcome the problem of communication and computation overhead in federated learning with multiple iterations, the proposed solution is threefold. The first one is to utilize a lightweight encryption/decryption scheme which has no requirement to compute discrete logarithms. The second one is to accelerate the computation performance via parallel computing of Single Instruction Multiple Data (SIMD) of cryptographic operations over model vectors and pre-computed hash functions. The third one is to limit the number of times of creating and transmitting the secrets s (t) i in the Shamir secret sharing scheme. This is effectively performed by designing a double-layer encryption scheme where the secret keys sk of the first layer are used for only one iteration and the secret keys v of the second layer can be used for multiple iterations. Shamir's secret sharing for the secrets s is only implemented at the establishment phase and in the iterations when parties rejoin. Besides, only rejoining parties P r generate new key pairs and transmit their new public keys pk (t) r = g s (t) r .

C. Description of the Proposed Protocol
Algorithm 1 describes the overall steps of the proposed privacy-enhancing cross-silo federated learning from the client side and the server side. Global model w G Processing: [Server-side] 1: Initialize w 0 G 2: for each round t from 1 to T do 3: U t contains k clients 4: for each client i ∈ U t in parallel do 5: 4. Each P i computes: 5. Each P i runs the Shamir-secret sharing algorithm SS(s 2) Secure Weighted Aggregation: This section describes the proposed secure weighted aggregation happening at each federated learning iteration to evaluate the global model as the weighted aggregation of the encrypted local models. Fig. 1 illustrates the main steps and computations carried out during each training epoch, where a step in square brackets (e.g. [2]) indicates that this step is included if dropout/rejoining happens.
At each iteration t, each P i owns a L-length local vector model w 1.
i : P i encodes the weighted model to get the non-negative integer vector x (t) i according to the method in [37]: [2]. If P i rejoins this iteration, this party runs P i .GenKey() to generate a new pair of its secret and public key, and P i .CreateShares() to create k shares of the updated secret s Then P i sends the updated public key pk (t) i to the aggregator. 3. Based on the receiving updated public keys, the aggregator creates the set of rejoining parties of this iteration, which is U t r . If U t r = ∅ then v (t) = v (t−1) , else the aggregator broadcasts the updated set of the public keys { pk} and U t r . [4]. Upon receiving U t r and { pk}, a rejoining party checks if its updated public key is in the set and then continues the protocol; if not then leaves the protocol (early dropout). If U t r ̸ = ∅ then P i .UpdateSeedsSecret(). If P i rejoins, then P i updates the seeds γ (t) i, j shared with all other parties else P i updates the seeds γ (t) i,r shared with rejoining parties Then P i updates their secret v i , which includes the following main steps: i to the aggregator 6. Receiving C (t) i from the alive parties, the aggregator creates the set U t a of the alive parties and U t d = U \ U t a of the dropped parties [7]. If U t a ⊂ U then the aggregator broadcasts U t d [8]. P i sends to the aggregator the value s

A. ComputeMSK({β
The aggregator computes the master key msk (t) : -If U t a = U: -If U t a ⊂ U: Having msk (t) , the aggregator can compute the global model: Then, the aggregator sends the global model to all local parties for the next epoch t + 1.

A. Correctness
Besides, from (2, 12), we have: From (17): Substitute (26) into (18), and note that Hence, in both cases, we successfully compute the master key i < 2 2l 1 , we have: Then Next, we prove that with this master key, the global model can be correctly computed. In fact, from (13,14) we have: Substitute (28,29) into (19), we have: Similarly, substitute (28,30) into (20), we have: This proves that the aggregator can compute the global model as the weighted average of all local models even if the aggregator does not know the true value of each local model.

B. Security Analysis
In this section, we prove that the proposed protocol is secure multiparty computation against an honest-but-curious adversary who controls the aggregator server and a set C of colluded parties where |C| < τ . The aggregator is always online while participants P i may drop out and rejoin at any iteration.
The security guarantee of the proposed scheme is based on Shamir's secret sharing scheme, and the aggregator obliviousness security provided by the encryption construction in [19] under DCR assumption in the random oracle model. Security is against a computationally-bounded honest-but-curious aggregator server.
We will consider the executions of the proposed protocol where an honest-but-curious aggregator server interacts with a set of parties, the underlying encryption construction is based on DCR assumption, and the Shamir secret sharing's threshold is set to τ . In such executions, users might drop and rejoin at any iteration. The following proves the indistinguishability of the distribution of the random variable representing the adversary view in a real execution of the proposed protocol and the distribution of the random variable representing the adversary view in a secure-by-definition "ideal world" using a simulation-based proof, which is a standard for security analysis of multiparty computation protocol [38]. The security analysis of the protocol indicates that what the adversary learns from the real protocol execution is no more than what she can learn from the ideal protocol execution which provides security/privacy. This also means the protocol in real execution is secure against an honest-but-curious adversarial model. To be more specific, the joint view of the server and any set of less than τ clients does not leak any information about the other clients' inputs (i.e., locally trained models/local training data) besides what can be inferred from the output of the protocol computation (i.e., the aggregate model).
Let REAL U ,τ,λ C∩A be a random variable representing the view of the adversary in a real execution of the proposed protocol. Let S U ,τ,λ C∩A be the view of the adversary generated by a simulator in a secure-by-definition "ideal world". It is going to be proved that the distributions of REAL U ,τ,λ C∩A and S U ,τ,λ C∩A are indistinguishable.
{REAL U ,τ,λ C∩A } ∼ = {S U ,τ,λ C∩A } We use the hybrid argument technique to prove this. First, we define a series of hybrid random variables H 0 , H 1 , · · · to construct the simulator S in an "ideal world" by the subsequent modifications such that any two subsequent random variables H i and H i+1 are computationally indistinguishable, starting from H 0 which is the same as REAL U ,τ,λ C∩A . The final result of subsequent modification is S U ,τ,λ C∩A . i, j is replaced by the cipher text of a dummy vector 0, the ciphertexts α (t) is replaced by the ciphertext of a dummy value 0; hash function H 1 is substituted with a truly random function O 1 . The aggregator obliviousness security in the random-oracle model under the DCR assumption of the construction in [19] guarantees that this hybrid is indistinguishable from the previous one, j ; and hash function H 2 is substituted with a truly random function O 2 . The aggregator obliviousness security in the random-oracle model under the DCR assumption of the construction in [19] guarantees that this hybrid is indistinguishable from the previous one, Defining such a simulator S as described in the last hybrid, the view generated by S is computationally indistinguishable from that of the real execution:

C. Communication and Computation Analysis
Communication and computation overheads are analyzed according to the establishment phase and each iteration of federated learning where there is k r (0 ≤ k r < k) rejoined parties (with or without P i ) and k d (0 ≤ k d < k) dropped parties. The computation and communication overheads are summarized in Table IV and Table V, respectively. Denote l pk , l ss , l i , l e1 , l e2 , l p are the sizes in bits of a public key, a secret share, an integer, a first-layered ciphertext, a secondlayered ciphertext, and a plaintext, respectively. The cost in the square brackets ([]) is included in the case of dropouts/rejoins happens. 1) Computation Cost: a) Computation cost of a local party: The computation cost of each party P i at the establishment phase includes the main parts: 1-generating its public key, 2-performing each pair-wise secret agreement with each of other k − 1 parties, which takes O(k −1), and 3-creating τ -out-of-k Shamir secret shares of s ). Thus, the computation cost of each party P i at the establishment phase is O(τ · k). P i 's computation cost at each iteration is the cost of creating the ciphertexts c If P i rejoins, then there is extra computation cost as the cost of P i in the establishment phase, which is O(τ · k). Thus the total computation of each party in an iteration is O(L + [τ · k]). b) Computation cost of the aggregator: The aggregator's computation cost can be divided into the main operations: 1-reconstructing Shamir secrets (one for each dropped party) whenever dropouts happen, which takes the total time O(k 2 ), and 2-obtaining w t by carrying decryption O(L) times. Thus the total computation cost of the aggregator at an iteration is a) Communication cost of a local party: The communication cost of each party P i at the establishment phase includes the main parts: sending its public key to the aggregator, sending k − 1 secret shares to other k − 1 parties (each secret share to each party), resulting l pk +(k −1)·l ss , which is O(k).
The communication cost of each party P i at an iteration can be partitioned into the main parts: 1-receiving k updated public keys from the aggregator, which takes k ·l pk , 2-sending k − 1 secret shares of its updated secret s (t) i when it rejoins which takes (k − 1) · l ss , 3-sending its secret shares of k d dropped parties' secrets which is k d · l ss , 4-sending an encryption message C i, j } j∈[L] } to the aggregator at every iteration t, which accounts for (l e1 + l e2 + L · l e 1 ), and 5-receiving the aggregate model, which is L ·l p . Thus, communication cost of P i at an iteration includes: download cost (i.e., receiving messages) is [k · l pk ] + L · l p or [k d ·l ss ]+(l e1 +l e2 + L ·l e 1 ) < [(2k −1)·λ]+(l e1 +l e2 + L ·l e 1 ) (l pk = l ss = λ), or O(L + [k]). b) Communication cost of the aggregator: The communication cost of the aggregator at establishment phase includes receiving k public keys of k parties, resulting k · l pk , which is O(k).
The communication cost of the aggregator at an iteration can be broken into the main parts: 1-receiving k r updated public keys which is k r · l pk , 2-sending the updated set of public keys to k parties which is k · k · l pk , 3-receiving secret shares of the dropped parties from the alive parties, which causes maximum (k−k d )·k d ·l ss , 4-receiving k−k d encryption message which is (k −k d )·(l e1 +l e2 +L ·l e 1 ), and 5-sending the aggregate model to each local party, which is k·L ·l p . Thus, the communication cost of the aggregator at an iteration includes:

VI. PRIVACY-ENHANCING CROSS-SILO FEDERATED LEARNING FDIA DETECTION IN SMART GRIDS
Consider a multi-area grid of k non-overlapping areas managed by k independent transmission grid companies (TGCs). There is a system operator (SO) who takes care of the interconnection areas and coordinates operations. Each P i TGC owns a private local dataset D i , i ∈ {1, . . . , k} with n i = |D i | samples and has communication lines with the SO and other TGCs.
For FDIA detection in smart grids, the federated learning approach is superior to the centralised in terms of data privacy protection and communication overhead. From a data privacy protection viewpoint, the private data of each local party are not transmitted outside of federated learning, while for a centralized approach, all these data have to be uploaded to a central server, which is a risk to more security threats. From a communication overhead perspective, in federated learning, local models are transmitted to the centre instead of raw measurement data. This also helps reduce communication overhead due to the fact that the size of models is often much smaller than raw measurement data.
An honest-but-curious adversarial model is considered. Adversaries are assumed to be honest but curious in the sense that they follow the protocol but can obtain available transcripts to learn extra information that should remain private. A good result of detecting false data injection attacks  FIG. 2 supporting security operations and power management is a common interest of all parties, thus it is reasonable that they are incentivised to follow the protocol to achieve the best output. However, some parties might be motivated to conspire with each other to infer private training data samples of a target party for some business benefits. In the context of the above-proposed system model, a semi-honest adversary is an adversary that controls SO and a set of colluded TGCs.
To model the spatial-temporal relationship between bus and line measurements, a network architecture modified from the method [36] is trained for the FDIA detection, as shown in Fig. 2. The model in Fig. 2 is utilized to detect false data injection attacks in transmission power grids. In the training stage, the model is securely trained by the proposed privacy-enhancing cross-silo federated learning framework. The trained global model is then distributed to each participant/sub-grid. In the test stage, each sub-grid utilizes the trained global model to detect FDIAs individually. Time-series bus measurements Z b t i and transmission line measurements Z l t i are fed into the model, which is utilized to model the spatialtemporal relationship between bus and line measurements. The model will output the likelihood of FDIAs in the current subgrid. The details of network parameters are summarised in Table VI and Table VII. With the above training network architecture, the training network model for FDIA detection has 132743 parameters. The proposed privacy-enhancing cross-silo FDIA detection is based on the classical federated learning framework FedAvg [1] with the privacy protection part on top.

VII. EMPIRICAL EVALUATION
This section demonstrates the desirable utility and efficiency of the proposed cross-silo privacy-enhancing federated learning. In the following, we provide the description of the measurement dataset and the transmission power grid system which includes several subgrids controlled by local TGCs and a SO who coordinates the federated learning process. Following that is the training/testing setting and the discussion of the performance in terms of accuracy, training time and inference time.
A. Description of Datasets 1) Transmission Power Grid Test Set: A transmission power grid, '1-HV-mixed-0-no sw', from the benchmark dataset SimBench [39] was used to evaluate the FDIA detection. This power grid contains 64 buses, 58 loads, and 355 measurements, with more details shown in Table VIII. This power grid is divided into four sub-grids, with each sub-grid containing 16 buses, summarised as follows:

B. Training and Testing Setting
There are 35136 normal measurement samples and 35136 FDIA measurement samples, with normal measurement samples labelled 0 and FDIA samples labelled 1. In the training stage, 29952 normal samples and 29952 FDIA samples for the first 312 days are grouped as the training dataset; the other 5184 normal and FDIA samples for the remaining 54 days are used as the test dataset. In the federated learning training, the number of global epochs was set to 200, the number of local epochs was set to 5, the number of local batches was set to 48, and the sequence number for LSTM layers is set to 96. In each federated learning training round, 3 local sub-grids were randomly selected to collaboratively train the global model. The federated learning source code 4 and the popular deep learning framework Pytorch-1.9.0 5 were used to implement the proposed FDIA federated learning detection framework for the model training and testing.
Three commonly used metrics were applied to evaluate the accuracy of the FDIA detection, namely precision, recall, and where N f p indicates the number of false positive, N t p indicates the number of true positive, N f n indicates the number of false negative, and N tn indicates the number of true negative.

C. FDIA Detection Accuracy and Time Overhead
We have compared the performance of the proposed solution (i.e. the federated learning trained model on encrypted local models from each local dataset) with the centralized trained model on the whole plain dataset. The same model was trained, without the proposed encryption scheme, in the centralized way using the same hyperparameters in Section VII-B. The results of the centralized trained model on the whole plain dataset are summarized in Table IX. Table X is for the FDIA detection accuracy of FedAvg FDIA detection algorithm on the test dataset. As can be seen from Table IX and Table X, there is no big difference in the accuracy.
The privacy-enhancing FedAvg FDIA detection version has the same accuracy as the original FedAvg FDIA detection version. However, the average training time for each sub-grid as well as for the whole system to get the weighted global model is longer due to the complexity of privacy protection added for secure weighted aggregation. The average training time is collected by evaluating the framework in a Linux system with each sub-grid using one Nvidia Tesla Volta V100-SXM2-32GB GPU.
Encryption parameters are set as: λ = 2048 (modulus p in the sub-protocol π 0 is a 2048-bit prime), l 1 = 256 (modulus N 1 of the first encryption layer is 256-bit length integer), l 2 = 512 (modulus N 2 of the second encryption layer is a 512-bit length), l p = 64.
For each federated learning round, each TGC timed its own part including the local model training part and the privacy protection part; SO timed the section of obtaining the  Table XI we provide the average computational time in seconds per one global epoch (one federated learning round) of our proposed privacy-enhancing FDIA detection federated learning in a single-processing manner. The local model training part without privacy protection consumes around 233 seconds. The average extra time for the privacy protection part comprises 1-the time for the initial setting of the protection scheme which is 16.41 seconds on average, 2-the computation time of local model protection which happens at the client side at every federated learning round which is 12.35 seconds in average per client per round, 3-the computation time of obtaining the encrypted aggregation model and decrypting it which happens at the server side at every federated learning round which is 12.14 seconds in average per round.
To test the ability to accelerate the computation time, the multiprocessing technique is implemented to partition the Singular Instruction Multiple Data (SIMD) computations of cryptography operations over model vectors onto 4 CPUs. Table XII illustrates the possibility of accelerating the speed by multiprocessing utilizing 4 CPUs. The computation overhead of local model protection in each federated learning round with security on top only incurs 5.56 seconds, i.e., 2.38% compared to 233 seconds of the underlying model without security. The total extra time of the privacy protection component running over 200 epochs of federated learning training in a single-processing manner is around 83 minutes, while in a multi-processing manner with 4 CPUs is around 36 minutes. The implementation of our proposed scheme is well-suited for parallel computation. Thus, the extra computational time overhead that occurred from our privacy-protection component could be significantly reduced by using more CPUs that local transmission grid operators are facilitated or from the cloud at the very low price. 6 From the communication analysis in Section V-C.2, with the above encryption parameter setting for the experiment and the size of model vector is L = 132743, the download cost of a client is less than k · λ + L · l p = 4 · 2048 + 132743 · 64 = 8503744 bits ≈ 8.5 Mbits = 1 Mbyte, the upload cost of a client is less than [(2k − 1) · λ] + (l e 1 + l e 2 + L · l e 1 ) = (2 · 4 − 1) · 2048 + (512 + 1024 + 132743 · 512) ≈ 68 Mbits = 8.5 Mbytes; The model training is not a real-time process, thus we can afford more time for transmission leading to a lower bandwidth. If 1 second per iteration is used for uploading data from a local party to the aggregator (resulting in 0.05 hours of uploading data from a local party to the aggregator in the whole training process with 200 epochs used in the experiment), then the upload bandwidth requirement would be 68Mbps. The network bandwidth for our campus office is 900Mbps.
In the inference stage, each sub-grid utilizes the trained global model to detect FDIAs individually. Time-series bus measurements Z b t i and transmission line measurements Z l t i are fed into the model, which is utilized to model the spatialtemporal relationship between bus and line measurements. The model will output the likelihood of FDIAs in the current subgrid. Detecting FDIA given a trained model (i.e., inference) in the proposed scheme is 6.7 milliseconds on average, which is fast for relevant smart grid operations, e.g., state estimation.

VIII. CONCLUSION
In this paper, we propose a cross-silo privacy-enhancing federated learning which is secure in the honest-but-curious adversarial model. With the main techniques of secure multiparty computation based on double-layer encryption and secret sharing, the scheme is efficient in communication and computation overhead and robust against dropouts and rejoining. The scheme removes the requirement of computing discrete logarithms or multiple non-colluding server settings which are the limitations of some related works. In addition, the client's secret keys of two encryption layers are generated by each party in a decentralized manner which helps increase the level of privacy guarantee. We also firstly design and empirically evaluate a practical and efficient privacy-enhancing cross-silo federated learning resilient to the local private data inference attacks for FDIA detection in the smart grid domain. The proposed scheme provides a framework which can be adapted to other domains. The analysis of security and the empirical evaluation proves that the proposed scheme achieves provable privacy against an honest-but-curious aggregator server colluding with some clients while providing desirable model utility in an efficient manner. In future works, we are going to investigate more different adversarial models in various federated learning settings which is applicable for security in cyber-physical systems.