Attack Power System State Estimation by Implicitly Learning the Underlying Models

False data injection attacks (FDIAs) are a real and latent threat in modern power systems networks due to the unprecedented integration of data acquisition systems. It is of utmost importance to understand attacking mechanisms to design countermeasures. To successfully deploy a FDIA, most past FDIA strategies need privileged power system information, which is carefully held by the power system operator. Newer approaches circumvent this issue by solely relying on intercepted measurement data, but they lack mathematical warranties of succeeding. This paper exposes power systems’ vulnerability by showing that it is possible to deploy an attack without confidential information and, at the same time, to have a high probability of being successful. We present a scheme that learns (1) the implicit power system measurement distribution and (2) a surrogate of the unknown state estimator model. The proposed framework utilizes a Wasserstein generative adversarial network to learn the measurement distribution and an autoencoder to learn the unknown state estimator model. Additionally, we present a convergence proof that ensures that the proposed framework converges to the power system measurement distribution. The proposed method is demonstrated to be successful via extensive simulation on IEEE 9-, 14-, 57-, 118-, and 300-bus test cases.


I. INTRODUCTION
D ATA revolution takes place worldwide in different disciplines, including power systems. To provide a robust grid with new but diversified components, modern power grids are on the road to integrate unprecedented real-time and offline data for monitoring, control, and protection. However, this new data-driven outlook makes the power grid more vulnerable than ever to cyber-attacks with dire consequences. For instance, the power system operator may take wrong corrective actions that can cause a blackout; wrong actions can also cause inaccurate energy prices in a real-time electricity market [1], [2]. To better protect the system, it is essential to understand potential attack mechanisms. Among various attack categories [3], [4], [5], False Data Injection Attacks (FDIA) gained the attention of the power system community after the work in [6], which showed that unobservable attacks against DC State Estimators (SE) are possible. In this type of attack, the attacker modifies measurement data such that the estimated states are different from the real ones [7], [8]. These first works have the following assumptions, which may be impractical: (i) The attacker has access to the entire network information (e.g., line parameters, grid topology, state estimator model, and estimated states) [9], [10]. It is impractical to think that an attacker can gather all this data without an insider in the Independent System Operator (ISO). Since this information is guarded by the power system operator it is difficult for an attacker to have this knowledge. (ii) These first studies rely upon the DC power flow model when power system operators use the AC power flow model in real-world settings. The reason, ACbased FDIAs are harder to design and deploy due to the inherent complexity of the nonlinear power flow equations [11], [12]. Subsequent work relaxed the first assumption. Specifically, [2], [6], [13], [14], [15], [16], [17] propose various frameworks to design FDIAs with only partial network information, but they still rely on a DC-based model. To relax the DC model's assumption, a few studies have focused on FDIA with an AC-based model [9], [18], [19]. However, all the aforementioned approaches construct an attack vector relying upon the power system underlying information; we can call these techniques model-based FDIAs.
Later works showed that it is also possible to deploy FDIA without knowing privileged power system information such as power system parameters and topology or the state estimator model. The only needed information to deploy a FDIA are the power system measurements, and we classify these kinds of attacks as model-free FDIAs. In modern power system networks, the information is sent via remote terminal units that are designed avoid system intrusion [20], [21]. However, conventional approaches such as security software and firewalls could be insufficient to protect the system against breaches and cyber threats [12]. For example, in 2015, a cyber-attack was successfully deployed on Ukraine's electricity infrastructure.
Around one year before the attack, the attackers gained access to multiple industrial networks by using the malware tool This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ BlackEnergy 3 (BE), [22]. This malware enables unauthorized network access with valid (stolen) user credentials to move laterally across internal utilities' system. In this incident, the attackers gained access to targeted networks using weaponized Microsoft Office files by embedding BE in Visual Basic macro scripts. This latent risk has been recognized by the National Academies of Sciences, Engineering, and Medicine [23]. In the same work, they conclude that the United States' power system network is vulnerable to cyber-attacks. Thus, for an attacker, it would be feasible to collect sensor measurements by exploiting the protection schemes [12].
The authors in [24] showed that it is possible to deploy a stealthy FDIA by using principal component analysis (PCA). The extension of this work in [25] proposed a geometric approach to carry out a FDIA based only on power system measurements. The authors in [26] proposed a datadriven attack that learns the system operation subspace from measurements around a linearized nominal state. The work in [27] presented a zero-parameter information attack that only requires power system's topology information. The works in [28], [29] employed machine learning techniques to carry out a FDIA. Specifically, they trained a generative adversarial network (GAN) to generate tampered power system measurements that will be stealthy with high probability. While the works in [28], [29] and our work use generative adversarial networks (GANs) to carry out a FDIA, our approach has some important differences. Both works in [28], [29] use the DC linear power flow model. In contrast, our proposed approach uses the AC non-linear power flow model. Whereas the work in [29] requires normal and tampered measurements to train a conditional adversarial network (cGAN), our approach only requires normal measurements, which is a more reasonable assumption. This means that our attack is more appealing at the level of the information needed to train our model.
The difficulty with the model-free FDIAs is that it is hard to ensure that the model-free approach truly captures some properties of the power system model to bypass tests, such as the Chi-squared test to obtain the trust from energy management systems. To show the power system vulnerability under this setting, we introduce a data-driven approach that generates tampered measurements with the desired properties to deploy a FDIA, and at the same time, to have mathematical guarantees about the model accuracy. We achieve this goal by (1) implicitly learning the power system measurement distribution from data; and (2) learning a proxy model for the unknown state estimator.
Specifically, we aim to design a flexible model that captures the complex underlying interactions in the power system to learn the measurement distribution from data. Nonparametric methods are flexible since they build models from data making as few assumptions as possible, which usually means utilizing statistical models that are infinite-dimensional [30]. While these type of models are flexible by keeping the underlying assumptions as weak as possible, they are computationally demanding due to the required increment of number of parameters [31], [32]. For example, the work in [33] shows that their nonparametric model grows in complexity as additional data is used to train the model. As real power systems could have thousands of buses and data measurements from many years, the number of parameters needed in non-parametric models are computationally intractable [31]. Therefore, we choose parametric models, which can be designed with a fixed number of parameters that depend upon the specific problem. In recent years, these parametric models have had tremendous success in the ML community because they can learn complex high-dimensional distributions (for example, images in high resolution). In power systems, for example, the work in [34] physics-informed parametrized neural networks (PINN) to learn the underlying power grid's parameters. In the parametric models, we introduce a framework utilizing generative adversarial networks (GANs) to learn the power system measurement distribution to create spurious measurements to deploy a FDIA, as GAN's loss function is fully specified. As a comparison, variational autoencoders (VAEs)'s loss function is only the evidence lower bound (ELBO), which is hard to be embedded into other learning. Even more importantly, we can present mathematical proof to show that the GAN reliably learns the power system measurement distribution. In specific, we use the Wasserstein Generative Adversarial Network (WGAN), which is guaranteed to converge under mild assumptions to the actual observed distribution [35].
In addition, to mimic the data distribution, one knowledge we do have is the form of residual error test. Therefore, we propose to boost our attack capability by learning the state estimator model for the residual error test. However, learning the state estimator model directly is difficult because neither the power system nor the state estimator is known. To circumvent this issue, we use a surrogate model to mimic the state estimator. The residual error test and an autoencoder (AE) share the same mathematical structure. Thus, an AE can be trained as a proxy to mimic the state estimator. We leverage this similarity and employ an AE as a proxy for the residual test error. Specifically, in our proposed scheme, we include this proxy as a regularization term, which helps to improve the quality of the created tampered measurements. Finally, a second regularization term is added to maximize the impact of the attack. Whereas the model-based attacks need the complete network information (e.g., line parameters, grid topology, state estimator model, and estimated states), our proposed modelfree approach needs a dataset of the measurements of the considered network to work. And such a data set does not need all the measurements to be included, which is another advantage.
The performance of the proposed model-free FDIA is verified by simulations on the standard IEEE 9-, 14-, 57-, 118-, and 300-bus test networks. Also, to contrast the differences and advantages between our approach and the existing ones in the literature, we carry out comparisons between our proposed FDIA and three other successful methods reported in [9], [18], [25]. These results show that our proposed model-free is successful. Specifically, our proposed model-free attack tampers measurements in a way that can fool the power system operator with high probability.
The rest of the paper is organized as follows: Section II introduces the problem formulation, Section III presents our proposed model-free FDIA model, Section IV presents the convergence proof, Section V shows numerical experiments, and Section VI concludes the paper.

II. PROBLEM FORMULATION
To show the proposed model-free FDIA attack, we first review the model-based approaches based on AC state estimation.

A. State Estimation With AC Power Flow Model
State estimation (SE) infers the state variables (i.e., voltage angles and voltage magnitudes) x = (x 1 , . . . , x n ) from a set of measurements z = (z 1 , . . . , z m ) [36], where n is the number of buses or nodes in the grid, and m is the number of measurements. Mathematically, we can describe the problem as z = h(x) + e, where h(·) is the physical (non-linear) relationship between state variables and measurements, and e is a vector that represents white noise from the collected measurements (e.g., SCADA or PMU). In practice, measurements are collected and sent to the power system operator, which obtains the estimated statesx by solving [37], [38]: where, for compactness, we define the state estimator operator SE(·). The input of this operation is a vector of measurements and its output are the estimated states. However, the vector of measurements z may contain bad or wrong data due to telecommunication failures, meter errors, or even FDIAs [10], [39]. To estimate the states with confidence, the SE possesses a Bad Data Detector (BDD) module to detect and filter suspicious data.

1) Bad Data Detector (BDD):
The measurement errors are assumed to follow a Gaussian distribution e i ∼ N (0, σ i ) [39] (where σ i is the standard deviation of the i-th measurement). Therefore, the squared measurement residual error r = z −ẑ 2 2 follows a Chi-square distribution χ k , where k represents the number of independent variables in the power system, andẑ = h(x) is the vector of estimated measurements. Then, the presence of errors in the measurements can be detected with the Chi-square test (or residual error test) [39], [40]. This test works as follows: (i) Select the detection confidence probability p (e.g., 0.95), and compute its associated threshold value τ = χ 2 k,p with If the inequality in (2) holds, bad data will be suspected, or else the measurements are assumed to be free of bad data.

B. Model-Based FDIA in AC State Estimation
A FDIA modifies the estimated statesx or measurementŝ z by changing the original SCADA and PMU measurements z with a maliciously tampered measurement vector, that is, z a = z + a, where a is an attack vector. The attacker designs this attack vector to compromise the system's reliability by creating a wrong state estimate. For a FDIA to be successful, it must circumvent the bad data detector (2) [41]. The assumptions in the literature for a model-based FDIA about the attacker's knowledge are the following [6], [9], [42]: (1) the attackers can intercept and alter the power system measurements that are used to obtain the estimated states in the grid; (2) the attacker has access to the power system model, which includes transmission line parameters and topology information; and (3) the attacker possess the SE model or can obtain the estimated states of the network. Under these strong assumptions, the attacker would be able to launch a perfect FDIA [10]. In this perfect FDIA, the attacker can define the attack vector as a = h(x + c) − h(x), where c is the vector of changes in the estimated states. In this scenario, if the original measurements z can pass the residual-based bad data detector test in (2), the corrupted measurements z a will also pass this test [9].
The work in [9] proposed an FDIA needing only partial power system information. In this context, there are two types of variables. (1) Measurements and state variables that are not altered, which are denoted with subscript 1,x 1 and z 1 = h 1 (·).
If an attacker constructs the attack vector as the tampered measurements will have the same residual error as the real ones. Note that to create the attack vector in (3), the attacker must know the estimated values of the state variables appearing in h 2 , which is still a strong assumption. There are other types of FDIA. For example, if a = h(x + c) − h(x) but (2) holds, then the attack is called a generalized FDIA [43].

III. PROPOSED MODEL-FREE FDIA
Contrary to the model-based FDIAs, the model-free models only make one assumption [24], [25], [28], [29]: The attackers can intercept and alter the power system measurements that are used to obtain the estimated states in the grid. So, in this section, we show a theoretically sound method to deploy a FDIA by only using the power system measurements. If we want to deploy an attack without any underlying power system knowledge, we have to learn an implicit model through observations, that is, from power system measurements (SCADA and PMU). This implicit model should capture the inherent non-linearity relationships between different measurements based on residual error tests. Also, this model should be able to create new tampered measurements such that they are overlooked by the power system operator but change the estimated states and measurements. To summarize, we present a datadriven approach based on a WGAN with two regularization terms. First, the measurement distribution is learned with the WGAN, z + e. Second, to pass the residual error test, a proxy of the unknown SE model is embedded into the WGAN as a regularization term, h(z). Finally, a regularization term is added to maximize the attack impact.

A. Learning the Measurement Distribution
Reference [44] introduced the idea of generative adversarial networks, which revolutionized the machine learning (ML) field. GAN is a framework to teach a Deep Learning (DL) model the implicit training data distribution so that we can sample from it and generate new data from that same distribution; in our case, the power system measurement distribution. Specifically, rather than sampling directly from an (assumed) parametric distribution, the target random variable is generated as a deterministic transformation of a simple, independent noise source, for instance, a Gaussian distribution. GANs are made of two distinct models, a generator and a discriminator. Formally, the minimax objective of the GAN is where D is a discriminative network, G is a generative network, P r is the real data distribution, and λ is the latent space, which it is sampled from an independent distribution P λ ; that is, λ ∼ P λ (usually a Gaussian distribution).
However, GANs have some issues, such as vanishing gradient and the lack of guarantee to convergence. The work in [45] presented the Wasserstein GAN (WGAN) that solves these issues. Also, WGANs possess stronger mathematical guarantees. For example, the authors in [35] proved that (under mild assumptions) the generator in the WGAN will converge to the true data distribution P r . Therefore, in this work, we will use this type of WGAN. These models are made of two distinct neural networks, a generator G and a discriminator D (or critic). The minimax objective of the WGAN is where D is the set of 1-Lipschitz functions [45]; P r is the real data distribution; λ is known as the latent space, and it is sampled from an independent distribution P λ . The generator G learns the real distribution P r , which, in our context, this real distribution is the set of historical observed measurements In other words, G implicitly learns to generate samples from the underlying model z = h(x) + e.

B. Learning the State Estimator Model
To gain trust from the power system operator, the created tampered measurements,z = G(λ), must pass the residual error test in (2). This residual error for the tampered measurements is given as whereẑ = h(x) is the vector of estimated tampered or fake measurements, andx = SE(z) is the vector of estimated states from tampered measurements. As (2) suggests, the smaller the residual errorr, the bigger the probability of passing the test for a given tampered measurement,z. In other words, a given vector of tampered measurements,z, should produce a similar estimated vector,ẑ = h(x). However, in this modelfree approach, we do not have access to the state estimator model h(·). This non-linear function h(·) can be thought of as a mapping from the measurement space to the estimated measurement space. For a vector of real measurements, the estimated measurements will be similar so that the residual error is low. This state estimator function h(·) is unknown. Still, given its properties, it is possible to learn it from data and create a proxy to impose the same behavior in the tampered measurements.
The residual error expression in (6) resembles the loss function from an autoencoder (AE). Thus an AE model is a natural option to learn a proxy model of the unknown state estimator function h(·). An autoencoder is a neural network that aims to produce or replicate its input to its output [46]. To do this, the autoencoder is trained to learn an encoding for a particular distribution and then with such encoding, learn to reconstruct the input distribution. To learn a meaningful encoding, the model's architecture prioritizes which traits from the input should be learned. By this process the autoencoder learns to ignore superfluous data, which could be noise. We will see how this autoencoder property improves the generation of fake measurements in Section V-E. Mathematically, the autoencoder is represented as a function, that is, AE(·), and it is trained with the squared loss function: A trained AE with real measurements with (7) will learn the unknown function h(·) that will minimize the residual error in (6). Once the autoencoder is trained (denoted as AE * ), the loss function in (7) can be embedded into (5) to incentivize the generation of tampered measurements that will produce similar estimated measurements, and thus lower the residual error. This can be done by adding the regularization term z − AE * (z) 2 2 in (5): wherez = G(λ).

C. Maximize the FDIA Impact
The WGAN in (8) implicitly learns the underlying model that generates the observed data [44], [45]. To train a WGAN with (8), we need to sample z from the true data distribution P r . However, the generator in (8) conventionally takes a random signal as input and maps it to the true data distribution space; that is, λ ∼ P λ , where P λ is usually a Gaussian distribution. This means that we do not have any control over the created fake measurements. To successfully attack a power system, we want these fake measurements, produced by our WGAN, to create different states from the actual ones. The attacker can only see and modify observed measurements. Thus, the attacker can attempt to markedly change the unobservable states by stealthy and sizeably manipulating the intercepted measurements to perform a successful FDIA. To accomplish this, we need to generate tampered measurements from the observed ones.
If we want to generate tampered measurements from the observed ones, rather than using a random distribution P λ as latent space to feed our generator, we use the power system measurements as input to the generator, that is, P λ = P r . The result is that the generator's latent space is not fed with an . 1. Proposed model-free architecture with a WGAN and two regularization terms to deploy an FDIA. arbitrary random distribution: it is fed with the power system measurement distribution. Specifically, we are conditioning the WGAN with respect to the actual measurement vector z, as depicted in Fig. 1. This is desirable because in this way, rather than creating tampered measurements from an arbitrary distribution, they are constructed based on the observed ones. Furthermore, the created tampered measurements will differ from those received as input due to a regularization term that we include in our model, as we explain below.
To successfully deploy an FDIA, we want to incentivize the generator to construct measurements that will produce different measurements from those received as input. This will provoke the SE with high a likelihood to produce erroneous estimated states, the main objective in a FDIA. To accomplish this, we can incentivize the model to generate such fake measurements with the regularization term w z · d(z,z) in (9) (the first regularization term in Fig. 1 in red), wherẽ z = G(z), d(·) is a distance function (e.g., mean squared or mean absolute distance), d(z,z) represents the distance between the original measurement and the generated one, and w z is a hyper-parameter that represents the weight of this distance. This regularization term incentivizes the WGAN to produce a tampered measurement vectorz that will generate completely wrong estimated measurements. Finally, we can explicitly induce sparsity in the attack vector. This sparsity property is desirable and essential because the attacker has to alter fewer measurements to successfully deploy a FDIA, [47]. We can add it into the model in (9) in the paper with the regularization term, w sparse · z −z 1 , where w sparse is the weight of the sparsity regularization term. This leads to the following loss function Training the WGAN with regularization terms adds complexity to the training process. If the regularization term becomes too large with respect to the original WGAN loss, the generator will struggle to learn the correct distribution. If the regularization term is too small, it will not have any effect on the training process. Thus, the regularization term will not fulfill its purpose. To solve this issue, a dynamic weight is introduced to control the size of d(z,z) throughout the training phase. This weight must maintain a balance between the generator loss term D(z) and the regularization term d(z,z), so that the WGAN learns the desired distribution, and at the same time, the regularization term accomplishes its purpose. We can achieve this balance by setting the regularization term to be half of the generator loss term. We express this as 1 2 |D(z)| = w z · d(z,z). Then, the result of such dynamic weight w z is described in (10) where t > 1 is the iteration number in the training phase. This dynamic weight adapts during training, controlling the impact of the regularization term.
To summarize, our proposed architecture is shown in Fig. 1 with two stages. First, an autoencoder is trained with historical SCADA and PMU measurement data. Second, the WGAN is trained with the same data and the two regularization terms: (1) one incentivizes the WGAN to produce measurements that will pass the residual error test and (2) another to maximize the impact of the attack. More important features are described below, and the complete algorithm for our proposed FDIA is in Algorithm 1.
(i) The inputs for the generative network are actual power system measurements instead of random noise. This gives us control over the created measurements. (ii) The generator is incentivized to generate measurements that will be different than the ones as input, causing an incorrect estimation of state variables and measurements. (iii) The generated tampered measurements will have a small residual error, thus passing the residual error test with high probability. Note that our proposed approach can be easily formulated to deploy an attack on a specific area in the power system, as proposed in [18]. Specifically, a FDIA can be launched in a specific area by tampering the measurements within the area under attack and not modifying the sensor measurements at boundary buses. In this way, the attacker only has to get the sensor measurements in the specific area under attack, which would reduce the amount of collected data. For conciseness and sake of clarity, we will analyze our proposed FDIA in the complete power grid.  Sample real and fake measurements:

10
Train the Generator: Gradient descent on generator: 12 Get generator G that creates tampered measurements.

IV. WGAN GUARANTEE FOR THE PROPOSED REGULARIZATION TERMS
The last section presented a framework to create fake power system measurements to deploy a FDIA. However, to successfully deploy a FDIA without relying upon the underlying power system model, we need to be confident that our learned model will produce measurements that look legit so that the residual error test does not detect them. To show that our proposed framework converges to the underlying measurement distribution, we present mathematical proof that certifies the WGAN convergence to the measurement distribution, thus creating fake measurements that look real. The only requirement for this proof to work is to have data to train the WGAN.
Generative adversarial networks can be understood as minimizing a moment matching loss defined by a set of discriminator functions [35], mathematically whereμ m is the empirical measure of the observed data (in this case the power system measurements), and F and G are the sets of discriminators and generators, respectively. The practical WGANs take F as a parametric function class, that is, is a neural network indexed by parameters θ that take values in ⊂ R p .
Notation and Definitions: X denotes a subset of R d . For each continuous function f : X → R, we define the maximum norm as f ∞ = sup x∈X |f (x)|, and the Lipschitz norm The set of continuous functions on X is denoted by C(X), and the Banach space of bounded continuous functions is Weak Convergence: If F is discriminative, then d F (μ, ν) = 0 implies μ = ν. This means that the learned distribution is the same as the observed one. In reality, we cannot strictly get d F (μ, ν) = 0. Rather, we have d F (μ, ν) → 0 for a sequence of ν n and want to establish the weak convergence ν μ. Theorem 1: Let (X, d X ) be any metric space. If spanF is dense in C b (X), we have lim n→∞ d F (μ, ν n ) = 0 implies that the learned distribution ν n weakly converges to the real observed distribution μ.
In our context, the observed distribution μ corresponds to the set of observed power system measurements. Fig. 2 gives the intuition for the convergence proof. The learned distribution ν n (in red) converges to the real one μ (in blue) as n → ∞. In other words, the WGAN is learning to create samples that look as taken from the true observed distribution μ.
Proof: Given a function g ∈ C b (X), we say that g is approximated by F with error decay function (r) if for any r ≥ 0, there exists f r ∈ spanF with f r F ,1 ≤ r such that f − f r ∞ ≤ (r). We note that (r) is a non-increasing function with respect to r. We know that the closure of spanF is equal to the space of bounded continuous functions C b (X), that is, cl(spanF ) = C b (X), then we have lim r→∞ (r) = 0. Now denote r n := d F (μ, ν n ) − 1 2 , f n := f r n and w z = 1/r n . We have x,x) = 2 (r n )+1/r n +w z ·d(x,x). If lim r→∞ d F (μ, ν n ) = 0, we have lim r→∞ r n = ∞. Given that lim r→∞ (r) = 0, we prove that lim n→∞ |E μ g − E ν n g| + w z · d(x,x) = 0. Since this holds true for any g ∈ C b (X), we conclude that ν n weakly converges to μ. If F ⊆ BL C (X) for some C > 0, we have d F (μ, ν) ≤ Cd BL (μ, ν) for any μ, ν. Because the bounded Lipschitz distance metrizes the weak convergence, we obtain that ν n → μ implies d BL (μ, ν n ) → 0, and d F (μ, ν n ) 0. Theorem 1 guarantees us that the learned distribution ν by the WGAN will converge to the observed one μ. This idea is depicted in Fig. 2. The blue points represent the real measurements, and the red ones represent the fake measurements. At the beginning, the red points are random because the WGAN is not trained (n = 1). However, as training progresses, the WGAN produces samples (red points) that look more similar to the blue ones. Ideally, the fake samples will be indistinguishable from the real ones. In other words, our model will create fake measurements that look like real ones. This means that the WGAN captures the underlying power system's interactions that produce the observed measurements.

V. EXPERIMENTS
This section will show how we deploy FDIAs on power grids with our proposed WGAN framework without knowing their mathematical or physical model. To show the contributions and generality of our approach, we carried out extensive experiments on different power networks.
First, we train a WGAN with historical SCADA and PMU measurements to demonstrate that the output of the WGAN converges to the true distribution of observed power system measurements, z = h(x) + e. Note that the sampling rate of PMU measurements is faster than the sampling rate of SCADA measurements. We use PMU measurements alongside with SCADA measurements when the SCADA measurements are available. We will also show that the fake measurements will pass the residual error test, corroborating the aforementioned convergence theorem. Second, we show that the trained WGAN creates different measurements (and therefore states) from the actual ones. This will show that the regularization term works, and it is maximizing the FDIA impact. Next, we show that our proposed framework is more reliable than the model-based ones by showing that our WGAN produces more realistic measurements. This implies that our model is capturing the underlying power system model. Finally, an ablation study is carried out to show that embedding a surrogate state estimator model, h(x), improves the proposed framework to create tampered measurements that pass the residual error test. We carried out the aforementioned experiments in various test cases with similar results. Specifically, we use the small IEEE 9-bus test case to illustrate how our framework works. Then, we perform the same simulations in the IEEE 14-, 57-, 118-, and 300-bus networks to demonstrate that our proposed method scales well with larger power system networks.

A. Data Generation and Model Architecture
1) Data Generation: For both the 9-and 118-bus test cases, we consider all the active and reactive power flow measurements through transmission lines and transformers as SCADA measurements, and voltage magnitudes and angles as PMU measurements. The 9-bus network has 9 branches, which gives us 36 SCADA measurements and 18 PMU measurements. The measurements are arranged as follows: 1-9 correspond to the sent active power through branches, 10-18 correspond to the sent reactive power, 19-27 are the received active power measurements, 28-36 are the received reactive power on branches, 37-45 are the voltage magnitudes, and 46-54 are the voltage angles. The IEEE 118-bus network has 186 branches; thus, 980 We obtain the power systems' measurements by solving L times the AC power flow under different load conditions using MATPOWER [48]. To simulate the 24-hour fluctuation, we use the real yearly load data from the Electric Reliability Council of Texas (ERCOT) for 2021 [49]. ERCOT reports 8 weather zones: COAST, EAST, FWEST, NORTH, NCENT, SOUTH, SCENT, and WEST. Fig. 3 depicts the load profiles of these zones for 2 days in 2021. For our simulations, we multiply each busload with the normalized loading parameter associated with a randomly selected area, γ , obtained from these realistic profiles. Similarly, we also adjust generation by scaling the generation profiles by multiplying them by the same loading parameter, γ , [50], [51]. To make it more realistic, we add white noise to all measurements according to the standard deviation associated with the measurement devices. That is, active power flow: 0.02 p.u., reactive power flow: 0.04 p.u., active power injection: 0.02 p.u., reactive power injection: 0.04 p.u., PMU voltage magnitude: 0.0001 p.u., and PMU voltage angle: 0.006 rad, according with [52]. Finally, if we do not find an AC power flow solution, we do not include it in the dataset. This data generation approach will give us rich data variety with the power system under different load conditions. The same procedure is used to generate data for the IEEE 14-, 57-, and 300-bus test cases.
2) Model Architecture: The architecture of our proposed WGAN model is inspired by the architecture of the DCGAN [53] with the following modifications to adapt it to our power system data. Since the sensor measurement vectors are one-dimensional, we use fully connected layers instead of convolutional layers. The generator, G, consists of 5 layers with ReLU activation function for all layers except for the output, which uses tanh. The discriminator, D, is composed of 5 layers with LeakyReLU activations with the slope of the leak set to 0.2.

B. Learning the Implicit Power System Measurement Model
This section tests if the learned distribution by the WGAN converges to the true underlying power system measurement distribution, z = h(x) + e. We train the WGAN according to Algorithm 1 with a dissimilarity weight w z = 0.5. We use the hyper-parameters from [45]: n critic = 5, learning rates Fig. 4. Learning an implicit power system model with the proposed WGAN architecture for the 9-bus test case using real load profiles from ERCOT [49]. α = 0.00005 (for autoencoder, generator, and discriminator), clipping parameter c = 0.01, batch size b = 64, and Adam adaptive learning algorithm [54]. Also, we train the AE and the WGAN models for all test cases for 10 and 100 epochs, respectively. The normalized load from the Electric Reliability Council of Texas (ERCOT) for 2021 [49] contains hourly data for one year, which means that there are 8,760 load samples. From these 8,760 samples, we split the set into a training and a test dataset with 7,760 and 1,000 randomly chosen samples, respectively. This yearly data contains seasonal variation, so it captures the behavior of a real power system throughout the year. Note that both the AE and WGAN models are trained with this data, as indicated in Algorithm 1. Fig. 4(a) shows 100 measurement samples from the real dataset and 100 created fake measurements for the 9-bus test case. We can see in Fig. 4(a) generated fake measurements compared with real measurements from our dataset; the fake measurements (in red) follow the same pattern or distribution as the real ones (in blue); in fact, they overlap the real measurements, but they are not exactly the same. This means that the WGAN learned the true power system measurement distribution instead of memorizing the dataset. Note that Theorem 1 guarantees the model convergence with enough training data. In our numerical experiments, we trained our models by creating training and testing datasets of 7,760 and 1,000 samples, respectively. With these training datasets our models successfully learned the underlying power system measurement distribution. Also note that the our procedure to create the dataset produces rich distributions of sensor measurements, Fig. 4(a). For example, the measurement no. 1 has a range from 0p.u. to over 1p.u. (which corresponds to an active branch power flow measurement). To assess if the trained WGAN learned the implicit power system measurement distribution, we carry out a power flow mismatch analysis, as follows. If we add power injection measurements in the set of measurements, the power flow balance at the i-th bus should be j∈δ is the power flow on branch (i, j), and e j and e i are the measurement errors associated to active power flow and injection, respectively. Under this setting, the power flow mismatch will not be zero due to measurement errors, that is, i | for all the buses in the system for both real and fake tampered measurements. Fig. 4(b) shows the results, where each bar, blue for real and red for fake measurements, indicates the average power flow mismatch in the whole system for one simulation. In the same figure, we can see that the power flow mismatches of the real and tampered fake measurements are very close: 2.66 MW for the real measurements and 3.54 MW for the tampered fake measurements. This is remarkable because the WGAN does not know the power system topology, and it does not have information about which measurements should comply with the power flow balance. Yet, the WGAN produces fake tampered measurements that are within 1 MW, on average, with respect to the real measurements, as shown in Fig. 4(b).
Including variable renewable sources such as wind and solar generation that vary significally from one day to the next could produce a more diverse sensor measurement distribution. To test this idea, we use the 9-bus test case, and we take the normalized wind and solar aggregated generation data from the RTS-GMLC [55]. Then, we include the wind generation on bus 5 and the solar generation on bus 6 with different penetration values. For a penetration of 30%, we can see the sensor measurement distribution in Fig. 5. This distribution looks a little bit wider than the one without VRES in Fig. 4(a). Notice that both sensor measurement distributions look alike, which means that our original procedure to generate datasets creates rich sensor measurement distributions. Thus, the datasets for the remaining experiments will be created without adding VRES into the simulations.

1) Analyzing Attack's Vector Sparsity:
We can test the attack vector's sparsity by taking the absolute difference between the real and tampered measurement vectors, that is, |z −z|. To test this idea, we take the real and tampered measurements for the 9-bus test case, with w sparse = 0, and we show two examples of specific sets of real and tampered measurements in Fig. 6(b). In the top part of the Figure, we can see the real and tampered measurements. In the inferior part of the Figure, we can see the absolute difference vectors, |z −z|. Note that even though w sparse = 0, these vectors contain many zero values indicating the property of sparsity.
We train the WGAN following the same procedure in the paper for the 9-bus test system with the addition of the sparsity regularizer with a weight of 0.5, that is, w sparse = 0.5. To test the sparsity of the results, we follow the same experiment design from the last example. Specifically, we take the real and tampered measurements for the 9-bus test case, and we show two examples of specific sets of real and tampered measurements in Fig. 7. In the top part of the Figure, we can see the real and tampered measurements. In the inferior part of the Figure, we can see the absolute difference vectors, |z −z|. As expected, when sparsity is explicitly taken into account, the attack vectors (absolute difference vectors in Fig. 7) present more sparsity than those in Fig. 6(b), where no sparsity is expressly considered in the model. However, the differences between real and tampered measurement vectors for the sparse FDIA are smaller than the FDIA that does not explicitly take into account the sparsity. The model's results without including sparsity, w sparse = 0, present sparsity and produce more changes in the tampered measurements. Thus, the remaining experiments will be done without explicitly including sparsity.
2) Analyzing Attack Vector: We can assess an attack vector's impact by taking the absolute difference between the real and tampered measurement vectors, that is, |z −z|. To test this idea, we take 1, 000 real and tampered measurements for the 9-bus test case, and we show two examples of specific sets of real and tampered measurements in Fig. 6(b). In the top part of the Figure, we can see the real and tampered measurements. In the inferior part of the Figure, we can see the absolute difference vectors, |z −z|. Note that in the 1, 000 samples, the mean magnitude of the attack vector is 15.05 units. Also, the attack vector, in specific sensor measurements, dramatically changes the real values. Under this context, the operator could take wrong corrective actions that will interfere with the correct and safe operation of the electric grid. This means that the attack will damage the system and lead to catastrophic events.

C. Deploying FDIAs Without Power System Knowledge
In the last section, we showed that a WGAN can learn the power system measurement distribution. This section shows Fig. 8. Comparison of passing the residual error test with different methods for the 9-bus test case. Fig. 9. Comparison of the tampered measurements by the model-based Method 1 [9] with our model-free approach for the 9-bus test case.
how we deploy a FDIA with our proposed framework, which is given by (9) and (10).

1) Deploying a FDIA With Fake Tampered Measurements:
Our objective is to create fake tampered measurementsz that generate estimated measurements and state variables as different as possible from the real ones. At the same time, for an attack to be successful, these measurements should pass the residual error test. Fig. 9 shows an instance of a real measurement vector and a created fake one for the 9-bus test network. The fake tampered measurements are within the historical range from the dataset and look similar to the real ones. However, they produce significant changes in voltage magnitudes v and voltage angles δ with respect to the real states, as shown in Fig. 10. Furthermore, the fake measurements pass the test in (2), which means that the control center will not notice the FDIA.
2) Comparison Against Other FDIA Methods: To assess the advantages and differences between our proposed modelfree FDIA framework, we compare it against the model-based FDIA presented in [9] and described by (3)-we will refer to this FDIA as Method 1. This model-based attack has the same residual error as the original measurements as proven in [9]. However, the Method 1 produces measurements that are out of the historical range from the historical measurements.
To prove the last point, we perform the following experiment. We use the fake vector in Fig. 10, where we can see that the voltage magnitude in bus 5 goes from 1 to 1.05 p.u. We use Method 1 to tamper the state v 5 = 1.05 p.u. using (3). Fig. 9 shows the real measurements (in blue), the created tampered measurements by our proposed framework (in red), the created tampered measurements by Method 1, and the historical measurement range from our data generation (gray bar). In the same Figure, we see that the created measurements by the WGAN are within or very close to the historical range. In contrast, some tampered measurements by Method 1 are far away from the real historical measurements. In specific, we see in Fig. 9 that measurements 18 and 36 show a large distance from the historical range. The key observation is: Even though Method 1 produces measurements with the same residual error as the real ones, these measurements will still look suspicious. The power system operator would realize that the tampered measurements 18 and 36 are outliers with respect to the historical ones, as shown in Fig. 9. In contrast, in the same Figure, we can see that our fake tampered measurements are within the range of historical measurements and also pass the residual error test (for a confidence of p = 0.95). Thus, making them less suspicious for the power system operator. This means that our attack design is more advantageous at the stealth level.
We also carried out a sensitivity analysis for different confidence values p. In this sensitivity analysis, we compare our method against three techniques in the literature: Method 1 introduced in [9], Method 2 from [25], and Method 3 proposed in [18]. This sensitivity analysis is carried out with the residual error test. Thus, the results only depend on the residual error produced by the FDIA approaches. In other words, the range of historical measurements does not affect the success rate. Methods 1 and 2 produce the same residual error as the real measurements; this means that if the real measurement passes the residual error test, the tampered measurements by these methods will pass as well. Method 3 is an attack on a specific area, and we chose to delimit this area by the buses 5 and 6. An important characteristic of this technique is that the residual error of the tampered measurements can be lower than the real residual. The authors in [18] attribute it to the fact that the tampered measurements will be more consistent (i.e., free of noise errors); thus, reducing the overall residual error. To compare these methods, we made 1, 000 simulations with the same procedure described in Section V-A, and we tamper the real noisy measurements with our proposed approach and Methods 1, 2, and 3. For a given confidence value p, we compute its corresponding threshold τ = χ 2 k,p , and obtain the probability of each measurement to pass the residual error test for the specified threshold, that is, Pr(J(z) ≥ τ ). We repeat this process for each simulation and each aforementioned method, and we obtain the success rate of passing the residual error test. This is the probability of the simulations to pass the error test, and we call it p pass . We repeat this experiment for several values p ∈ (0, 1), and the result is shown in Fig. 8. We can see that as the threshold τ increases, the probability to pass the residual error test p pass increases as well. Given that Methods 1 and 2 (in brown and purple, respectively) tampered the measurements such that the residual error is the same as the real one (in blue), they (almost) follow perfectly the real curve. Method 3 (in green) is close to the real curve but just slightly above due to the behavior of this technique, as we previously explained. Note that Methods 1 and 2 produce the same p pass as the real noisy measurements in Fig. 8. This is because both methods are guaranteed to have the same residual error as the real noisy measurements by design, as indicated in (3) (see proof in [9]).
It is important to note that we trained our model with noisy measurements, and the method did not have access to the underlying power system model. The key finding is that despite using only noisy measurements, our approach produces tampered measurements with lower residual errors, outperforming all other methods. We ascribe this due to the regularization term that contains the AE in (9), z − AE(z) 2 2 . As discussed in Section III-B, an autoencoder has a denoising effect on the on the noisy measurements. This will be proved with an ablation study in Section V-E. A summary of the qualitative traits of each of the aforementioned methods is shown in Table I, where it is shown that our proposed algorithm is the only one that tampers measurements so that they are within the historical range.
3) Comparison Against Other Model-Free FDIA Method: To make a fair comparison, we train our proposed model with the same methodology indicated in the paper with the difference that we use the DC power flow model as the work in [29] does. This framework requires normal and tampered measurements to train a conditional adversarial network Fig. 11. Comparison of passing the residual error test with the cGAN, [29], for the 14-bus test case.  [29] (cGAN). However, the work in [29] does not clearly indicate how the dataset of tampered measurements is obtained. For simplicity, we use the well-known FDIA proposed in [9] to create the dataset of tampered measurements. We evaluate both approaches on the 14-bus test. For a given confidence value p, we compute its corresponding threshold τ = χ 2 k,p , and obtain the probability of each measurement to pass the residual error test for the specified threshold, that is, Pr(J(z) ≥ τ ). We repeat this process for each simulation and each aforementioned method, and we obtain the success rate of passing the residual error test. This is the probability of the simulations to pass the error test, and we call it p pass . We repeat this experiment for several values p ∈ (0, 1), and the result is shown in Fig. 11. We carry out the same experiments for the IEEE 9-, 57-, 118-, and 300-bus test cases for a confidence value p = 0.95. The results are shown in Table II. 4) Validate Scalability of the Proposed Approach: Finally, we show that our approach scales to bigger power system networks. To demonstrate it, we test our model-free FDIA on the IEEE 118-bus network. The created fake tampered measurements pass the residual error test, and Fig. 12 shows that the created fake measurements provoke significant changes in the voltage angles, leading to a successful FDIA.
Also, a sensitivity analysis, like the one in the previous section, is carried out for the IEEE 9-, 14, 57-, 118-, and 300bus test cases, and the results are shown in Fig. 13. In the same Figure, we can see that our FDIA method outperforms the ones proposed in the literature.
Finally, we validate the scalability of our proposed approach. As previously mentioned, the AE and the WGAN models for all the test cases are trained for 10 and 100, respectively. The number of training samples and the number of iterations for all test cases are fixed since we used real yearly load data from the Electric Reliability Council of Texas (ERCOT) for 2021 [49]. Also, the number of layers is fixed Fig. 12. Example of a real vs a fake measurement for the 118-bus test case. Note that the fake measurements produce different states. to be 5 for both the generator and discriminator for all the experiments. The only component that varies is the dimensionality, which depends upon the power system size. Thus, our proposed approach presents good scalability with respect to the power system size. We can test this by measuring the training times for the AE and WGAN models. Fig. 14 shows such training times. We can see that training the surrogate state estimator (i.e., AE) for 10 epochs takes less than 40 sec for all test cases. Training the WGAN model for 100 epochs takes less than 530 sec for all test cases. We can see that the training times for the models' convergence for 1 year of data are low. Thus, our proposed attack could be easily deployed in real-world settings.

D. Comparison of Different Defenses
The Chi-squared test could be, in some cases, inaccurate due to the approximations of errors by residuals [39]. So, in this  section, we show how our proposed algorithm performs against more sounding defenses. In the literature, there exist numerous defenses with different traits. For example, defenses that do not use temporal correlations and ones that make use of them. In the realm of defenses that exploit temporal patterns to detect FDIAs, there are works such as the moving-target defense (MTD) [56], [57] or the work in [41]. However, our proposed FDIA scheme does not take into account inter-temporal correlation, so it would be unfair to test our attack against such defenses. Thus, in this section, we choose defenses that utilize data measurements at a specific time interval to detect spurious data. Specifically, we test our proposed attack against the largest normalized residual statistical test (LNRT) [39], [58] and a recent deep learning-based detector that consists of an adversarial autoencoder [59].

1) Largest Normalized Residual Statistical Test (LNRT):
The LNRT is more robust than the classical Chi-squared test for bad data detection and identification [39], [58]. The normalized value of the residual for the measurement i can be computed as r norm where √ ii is the diagonal entry in the residual covariance matrix. This normalized residual entry has a standard normal distribution, that is, r norm i ∼ N (0, 1). Then, the largest element in the set {r norm i } M i=1 is compared against a chosen threshold to decide if bad data is presented. If this threshold is set to 3, then the confidence level is 99.7%. We carry out this test for the 14-bus test system for each real and fake measurement, and the results are shown in Fig. 15, where the average is 99.75% for real measurements and 99.79% for tampered measurements with our proposed method. We carry out the same experiments for the IEEE 9-, 57-, 118-, and 300-bus test cases for a confidence value p = 0.997. The results are shown in Table III. 2) Deep Learning-Based Detector: There are recent learning-based detectors to detect FDIAs. The work in [59], for example, proposed a scheme that consists in an adversarial autoencoder (AEE). The AAE network is trained in three stages: the reconstruction phase, the adversarial phase, and the supervised phase. For a model-based FDIA, this AEE has a detection accuracy of 96.25% and 97.85% for the 13-and 123-bus distribution networks. We test this defense against our proposed model-free FDIA for the IEEE 9-, 14-, 57-, 118-, and 300-bus test cases, and the results are shown in Table III. In this table, we can see that our proposed approach has a lower success rate for the AAE defense than for the Chi-squared and LNRT. Nonetheless, our method still exhibits a high success rate (above 80%) for all the tested cases.

E. Ablation Study
This section presents an ablation study to show the impact of the SE's surrogate model in the proposed framework. The experiment design is similar to the one presented in previous sections. We made 1, 000 simulations with the same procedure described in Section V-A. For a given confidence value p, we compute its corresponding threshold τ = χ 2 k,p , and obtain the probability of each measurement to pass the residual error test for the specified threshold, that is, Pr(J(z) ≥ τ ). We repeat this process for each simulation for the real and proposed framework with and without AE for the 9-bus test case. Next, we obtain the success rate of passing the residual error test, p pass . We repeat this experiment for several values p ∈ (0, 1), and the result is shown in Fig. 16. In the same Figure, we can see that the model without the AE has a lower probability of passing the residual error test throughout all the thresholds. We can also see that the model without the AE (green line) always have around the same or lower probability of passing the residual error test than the real measurements. As discussed in Sections III-B and V-C2, whereas the model with the AE has a denoising effect the model without the AE can only learn from the noisy measurement data. We carry out the same experiments for the IEEE 14-, 57-, 118-, and 300-bus test cases for a confidence value p = 0.95. The results are shown in Table IV, which shows that the model with the AE has a higher success rate than the one without it.

VI. CONCLUSION
We presented an architecture to create tampered measurement vectors to carry out a FDIA without knowing the power system underlying information. The architecture is framed into an optimization framework that considers the WGAN loss function and two regularization terms to control the attack measurement vectors. We validated our proposed framework with several power systems, in which we created fake measurements to create a bad data injection attack without knowing the underlying power system model. These fake measurements passed the residual error test to detect bad data and gave completely wrong estimated state variables and measurements, which would compromise the electric grid's reliability. This work proves that for an attacker, it is not required to have access to all power system information. Thus, more research is needed to keep power systems safe from these attacks.