PowerFDNet: Deep Learning-Based Stealthy False Data Injection Attack Detection for AC-model Transmission Systems

Recent studies have demonstrated that smart grids are vulnerable to stealthy false data injection attacks (SFDIAs), as SFDIAs can bypass residual-based bad data detection mechanisms. The SFDIA detection has become one of the focuses of smart grid research. Methods based on deep learning technology have shown promising accuracy in the detection of SFDIAs. However, most existing methods rely on the temporal structure of a sequence of measurements but do not take account of the spatial structure between buses and transmission lines. To address this issue, we propose a spatiotemporal deep network, PowerFDNet, for the SFDIA detection in AC-model power grids. The PowerFDNet consists of two sub-architectures: spatial architecture (SA) and temporal architecture (TA). The SA is aimed at extracting representations of bus/line measurements and modeling the spatial structure based on their representations. The TA is aimed at modeling the temporal structure of a sequence of measurements. Therefore, the proposed PowerFDNet can effectively model the spatiotemporal structure of measurements. Case studies on the detection of SFDIAs on the benchmark smart grids show that the PowerFDNet achieved significant improvement compared with the state-of-the-art SFDIA detection methods. In addition, an IoT-oriented lightweight prototype of size 52 MB is implemented and tested for mobile devices, which demonstrates the potential applications on mobile devices. The trained model will be available at \textit{https://github.com/HubYZ/PowerFDNet}.


I. INTRODUCTION
I N smart grids, a stealthy false data injection attack (SF-DIA), which is aimed at maliciously manipulating measurements in a smart grid and may lead to serious consequences for the power system [1]- [3], has recently become a focus on smart grid research [4]- [8]. Contrary to other cyberattacks (such as jamming and distributed denial-of-service), studies have proved that well-defined SFDIA measurements can bypass traditional bad measurement detection mechanisms [5], [9], as the attacks obey power equations. To address this issue, machine learning-based detection approaches have been explored and have obtained promising detection results [10]- [14]. Those experimental results demonstrate that deep learning can model the relationships between measurements and state variables in power grids to a certain extent. This also promotes the further exploration of deep learning technology in this field [13].
Most of the existing machine learning-based methods are specifically designed for DC power systems [10], [11], [14]. However, as the DC-model power system is a simplified ACmodel system, the estimated states of the DC-model cannot truly represent the actual states of the AC-model power system. Therefore, those methods are not well-suited for the SFDIA detection in real-world AC-model power systems. In recent years, researchers have explored the application of deep learning in AC-model power systems [15]- [20]. Kundu et al. [17] proposed an SFDIA detection method based on an autoencoder to attempt to capture the relations between system states and measurements. Zhang et al. [20] extended this auto-encoder scheme by embedding a generative adversarial network. Contrary to these two methods, Yu et al. [19] proposed a detection method, which instead uses state values obtained from measurements as the input feature.
Although those AC-based methods propose effective models to represent the relationship between measurements and the states of power grids, they focus on learning the temporal structure of a sequence of measurements but do not take account of the spatial structure of the measurements. The spatial structure refers to the relationship of the measurements (/state variables) between buses and transmission lines; the temporal structure refers to the relationship of the measurements (/state variables) collected at a continuous time [9]. The method proposed by Kundu et al. [17] mainly modeled the temporal structure of a sequence of measurements using an auto-encoder. As a disadvantage, this method did not consider the spatial structure information between buses and transmission lines in the power grid, because the measurements are mixed into a one-dimensional input. Similarly, a deep network approach proposed in [20] also takes the measurements as one-dimensional inputs and leads to the loss of the spatial structure information between transmission lines and buses. Contrary to the aforementioned two methods, Yu et al. [19] proposed a deep neural network (DNN) based detector using gated recurrent units (GRUs) to learn the temporal structure information of a sequence of measurements. One of the major differences between the aforementioned two methods is that this method trains the detection network by taking as input the state values calculated from the measurements. In summary, the aforementioned methods can model the temporal structure information by applying GRUs, auto-encoders, and recurrent neural networks. However, the spatial structure information between transmission lines and buses is not considered in these methods, resulting in limited detection accuracy.
To address this issue, this paper proposes a spatiotemporal deep learning network (named PowerFDNet) for SFDIA detection in AC power systems. The proposed PowerFDNet contains two key sub-architectures: spatial architecture (SA) and temporal architecture (TA). The SA aims to extract representations from bus/line measurements and model the spatial structure of the measurements by learning their representations. The TA aims to learn the temporal structure of time-series measurements and make the decision. To naturally model the spatiotemporal structure information, measurements with and without SFDIAs are utilized as the training data. The comprehensive experiments evaluated on the benchmark power grids demonstrate that the proposed PowerFDNet achieves significant improvement in detection accuracy (F 1 , recall, and precision) in comparison with the state-of-the-art methods.
The main contributions of this paper are summarized as follows: 1) Compared to most existing approaches that mainly model temporal structure information for SFDIA detection, we propose a new network to learn the global spatiotemporal structure information of measurements. 2) Compared to most existing approaches that take measurements as one-dimensional input, we propose welldesigned residual-based sub-networks to learn multidimensional representations for buses and lines separately. Specifically, we propose using well-designed convolutional layers and residual connections to model the multidimensional representations. As an advantage, this facilitates the subsequent spatiotemporal structure learning. 3) To capture the temporal structure of the representations of a sequence of measurements, we design a boosterrefiner feature encoder based on long short-term memory (LSTM) architecture. The booster-refiner encoder first models the bus/line measurement data relationship with rich features and then refines the high-dimensional feature. 4) We generate and release a comprehensive SFDIA dataset for facilitating research works in this area. The SFDIA dataset is generated for the power grids in the SimBench dataset [21], which contains a wide variety of power grids with high, medium, and low voltage, as well the corresponding load and generator profiles in 15-minute resolution for a whole year. Our released dataset can provide a more realistic evaluation setting. Therefore, with this SFDIA dataset, researchers in this area can focus on the design and analysis of SFDIA detection. 5) An IoT-oriented lightweight prototype of size around 52 MB, with an optimized mobile model of size around 8.5 MB, is implemented and tested for mobile devices, which demonstrates the potential applications on mobile devices.
The rest of this paper is organized as follows: Section II reviews related works on deep learning-based SFDIA detection; Background knowledge about state estimation, bad data detection, and the SFDIA are presented in Section III; Section IV describes the proposed PowerFDNet in detail; the experiments and results are organized in Section V; and at the end, we concluded the paper in Section VI.

II. RELATED WORKS
Bad data detection is one of the essential functions of estate estimation to detect measurement errors. Those measurements' errors may occur due to various reasons, such as the finite accuracy of meters, the telecommunication medium, and meters' failure [22]- [24]. Liu et al. [25] validated that well-defined error data, as known as stealthy false data injection attack (SFDIA), can bypass the residual-based bad measurement detection in DC power grids; and in 2012, Hug et al. [5] established this type of attack to AC power grids.
Some methods managed to detect SFDIAs by statistical methods [26], [27], sparse optimization [28], graph theory [29], Kalman filter [30], time-series simulation [31], state forecasting [32], [33], and machine learning [10], [11], [14], [34]- [36]. For example, Ozay et al. [10] investigated SVM-based algorithms to classify measurements as being either secure or attacked. The experimental results demonstrate that machine learning algorithms perform better on SFDIA detection than detection algorithms that employ state vector estimation. He et al. [11] proposed a real-time SFDIA detection method based on restricted Boltzmann machines [37]. The advantage is that historical measurements are used to capture features for SFDIA detection. However, this method did not take account of the spatial structure information between transmission lines and buses. Wang et al. [14] proposed a CNN-based method to detect SFDIA attacks. The advantage is that this method attempts to identify the attack locations. However, this method failed to consider the temporal structure information. Besides, the aforementioned methods are mainly developed to detect SFDIAs in DC-model power systems.
In recent years, with the development of deep learning techniques, researchers have explored the application of deep learning to AC model power systems [15]- [20]. Kundu et al. [17] presented a detection approach based on an auto-encoder by modeling the relations between system states and measurements. The advantage is that historical measurements are used to detect SFDIAs. The disadvantage is that it did not take into account the relationship between line measurements and bus measurements. Zhang et al. [20] extended this auto-encoder scheme by embedding a generative adversarial network. One of the common points between the two methods is that the measurements are taken as the input feature. Different from that, Yu et al. [19] proposed an SFDIA detection method, which instead utilizes state variables (i.e., bus voltage angles and magnitudes) as the input feature. However, that incurs two potential risks. One is that the original spatiotemporal structure of measurements may be lost, as the model is learned from the estimated state variables instead of the measurements. Another one is that state variables estimated from false measurements may be incorrect to the power grids at the current time. Recently, Yin et al. [8] proposed a sub-grid-oriented microservice-based supervising network through privacy-preserving collaborative learning to detect SFDIAs. However, this method mainly considered the local spatiotemporal relationship of measurement in sub-grids in the privacy-preserving setting. The work will focus on modeling the global spatiotemporal structure in a non-privacy-preserving setting.

III. BACKGROUND
In this section, we firstly provide the brief background knowledge related to the residual-based bad measurement data detection and the SFDIA. Then, we introduce an approach to generate the SFDIAs against AC-model power grids. Some key notations in this paper are listed in Table I.

A. AC State Estimation
The major objective of the state estimation is to determine optimal power states (i.e., bus voltage and angle) based on a set of redundant measurements for the power system [22]. The measurements are usually comprised of bus measurements and line measurements. The bus measurements typically consist of active/reactive power injection (i.e., bus load and generation) and bus voltage magnitude. The line measurements typically consist of active/reactive power flows measured at two sides of the transmission lines and line current flow magnitudes. In an AC-model power grid, the nonlinear formulation between the state variable x and the measurement z can be expressed by [22]: where z ∈ R m denotes the measurements, ε ∈ R m denotes the errors of the measurements, x = [θ 2 , θ 3 , · · · , θ n , V 1 , V 2 , · · · , V n ] T ∈ R 2n−1 denotes bus voltage angles and magnitudes for a n-bus power grid, and H(x) is the nonlinear vector function of the state variables.
is the formula of the measurement z i related to the state variable x. It is assumed that the error ε i ∈ ε is independent and draws from a Gaussian distribution N (0, σ 2 i ). The nonlinear functions of each measurement and power states are presented as follows. The active and reactive power flows from bus i to bus k, P ik and Q ik , can be expressed by [22] and, line current flow magnitude from bus i to bus k, I ik , can be expressed by [22] The active and reactive power injections at bus i, P i and Q i , can be expressed by where θ i and V i represent the state variables for bus i, θ k and V k represent the state variables for bus k, θ ik = θ i − θ k , g ik + jb ik is the admittance of the line connecting buses i and k, and g si + jb si is the admittance of the shunt line at bus i. G ik + jB ik is the ikth element of its complex bus admittance matrix. Ω i denotes the set of adjacent buses that are directly connected to bus i. The state estimation is to find the optimal solutionx based on the measurements through minimizing the following weighted least squares problem: where Λ = diag[σ 2 1 , σ 2 2 , · · · , σ 2 m ] denotes the weight matrix whose element represents the variance of the measurements at the corresponding electricity meter. The minimization of the objective function L(x) can be solved by iterative approaches (e.g., the Newton-Raphson algorithm), which can be expressed byx = arg min wherex denotes the optimal power system state estimated from the measurements.

B. Residual-based Bad Measurement Detection
Errors in the measurements may be introduced due to distinct reasons such as meter malfunction, signal transmission interference, and cyberattacks. The bad data detection is aimed at determining whether or not the measurements contain significant errors. The values of state variables estimated from normal meter measurements should be close to the true state values, while the state values estimated from bad measurements may be significantly different from the true values. Thus, one popular method is to calculate the residual between the estimated measurements H(x) and the observed measurements z, which can be formulated by If the residual is greater than a threshold, the measurements are treated as bad measurements. Chi-square test [22] is commonly used to decide the threshold. Specifically, Then, Υ follows a m − (2n − 1) degrees of freedom Chisquared distribution. Based on the theory of the Chi-squared test, the threshold τ is determined by a hypothesis test with a significance level α [22]. Therefore, with the probability α, the presence of bad measurements is inferred if

C. Stealthy False Data Injection Attack
Stealthy false data injection attacks are aimed at circumventing the bad data detection mechanism by deliberately manipulating some measurements. The stealthy attack is designed based on the bad data detection mechanism [5]. Let z bad denote the measurements maliciously modified by an SFDIA, which can be expressed by andx a denote the corresponding power states estimated from z bad , which can be expressed bŷ Hence, we have Eq. (12) [5], where variables with subscript '1' indicate that they will not be modified by the attack, while variables with subscript '2' need to be maliciously modified. The vectors a 2 and c 2 are the changes on the measurements and state variables, respectively. Hence, if a 2 is obtained by Eq. (13), Eq. (12) can then be expressed as follows [5] Therefore, the malicious attack measurement obtained by Eq. (13) can bypass the detection mechanism.

IV. PROPOSED POWERFDNET
In this section, we present details of the proposed Pow-erFDNet to detect SFDIAs in AC-model power grids. The PowerFDNet is aimed at classifying the measurements to determine whether measurements are maliciously modified, which consists of two key sub-architectures: a spatial architecture (SA) and a temporal architecture (TA). The SA aims to extract representations from bus/line measurements and model the spatial structure of measurements by learning their representations (Section IV-B). The TA aims to model the temporal structure of the measurements by learning the intermediate feature obtained by the SA and making a final prediction (Section IV-C).

A. Measurement Data
In this subsection, we provide the details about the organization of the measurements. Common measurement variables and notations such as in [8], [19], [22] are adopted. Typically, the measurements for power systems include line measurements and bus measurements. The line measurements commonly contain active and reactive power flow data measured at the two sides of transmission lines and line current flow magnitudes, which are summarized as follows: • P I , active power flow measurement at the 'in' side, • P O , active power flow measurement at the 'out' side, • Q I , reactive power flow measurement at the 'in' side, • Q O , reactive power flow measurement at the 'out' side, • I I , current flow magnitude measurement at the 'in' side, • I O , current flow magnitude measurement at the 'out' side. The typical bus measurements are summarized as follows: • P , bus active power injection measurement, • Q, bus reactive power injection measurement, • V , bus voltage magnitude measurement. Let z b t k ∈ R m b ×1×c b denote the bus measurements collected at time t k , where m b denotes the number of monitored buses and c b denotes the maximum number of measurements at each monitored bus. Similarly, let z l t k ∈ R m l ×1×c l denote the line measurements collected at time t k , where m l denotes the number of monitored lines and c l denotes the maximum number of measurements at each monitored line. Measurements with less than c b /c l data on monitored buses/lines are padded with 0. Hence, the measurements of a power grid collected at time t k can be expressed by z t k , which is composed of z b Fig. 1: The SA network architecture. The configuration of the SA is shown in Table II, Table III, and Table IV. is the time-series line measurements. We define the label for Z t k as follows: that has not been attacked; 1, if z t k is attacked by an SFDIA and can bypass the residual-based detection.
(15) In this paper, the commercial power system analysis software PowerFactory 2017 SP4 1 was used to conduct the residualbased bad measurement detection.

B. Spatial Architecture (SA)
The SA is aimed at modeling the spatial structure between buses and lines by learning the representations of their measurements. The architecture of SA is shown in Fig. 1. It is composed of three sub-networks: one for bus measurement representation, one for line measurement representation, and the other one for spatial structure modeling. Different from the methods [11], [14], [17], [19], [20] that squeeze line and bus measurements into a one-dimensional input, we propose extracting the representations for lines and buses separately. As an advantage, the SA can effectively learn the hidden features from bus and line measurements and model their spatial structures. Compared with the method in [8], we utilize residual blocks to extract the bus/line representations and deploy deep layers to make them suitable for modeling the global spatial structure of the large-scale power grids.

1) Sub-network for Representation of Bus Measurement Data
This sub-network is aimed at extracting representations of bus measurements. The architecture is shown in Fig. 1   Specifically, for bus measurement Z b t k , the first layer Conv b1 utilizes c b convolution filters of size 1×c b to extract a residual feature for each bus, formulated by where a oj is an additive bias for each output channel, , and * denotes the convolutional operator. ϕ denotes the batch normalization, and φ denotes the exponential linear unit (ELU) [39]. Then, the residual feature and the bus measurements are concatenated into a two-channel data, which are fed into the next layer. The last layer Conv b5 utilizes four 1×6 convolution filters to extract the representation for each bus. Therefore, a representation of four feature values is obtained for each bus. For the input bus measurements Z b t k , its representation is denoted by B t k ∈ R T ×m b ×1×4 .

2) Sub-network for Representation of Line Measurement Data
This sub-network is proposed to extract the representations for the line measurements. The network architecture is shown in Fig. 1  Therefore, for the input line measurements Z l t k , its representation is denoted by L t k ∈ R T ×m l ×1×4 . 3

) Sub-network for Modeling Spatial Structure
This sub-network is proposed to learn the spatial structure of measurements from the measurement representations, as shown in Fig. 1. The parameters are summarized in Table  IV. To learn the spatial structure between each line/bus and the remaining ones, we propose utilizing larger filters with the size of (m b + m l ) × 1. Therefore, output features learned from the bus/line representations can reflect such relationships. Specifically, the bus and line representations are firstly reshaped and concatenated to fully represent the input measurements, expressed by Then, three layers are designed to learn the spatial structure of line/bus measurements, which is expressed by where

C. Temporal Architecture (TA)
The TA is used to model the temporal structure of a sequence of measurements in a time window by learning their intermediate features obtained by the SA. The architecture of TA is shown in Fig. 2. It is composed of four LSTM layers, one fully connected layer, and a sigmoid layer. The output represents the probability that the measurement z t k at time step t k is an SFDIA. To effectively model the temporal structure information, we incorporate the LSTM architecture shown in Fig. 3 into the proposed PowerFDNet. It has two major advantages: one is to represent temporal structure information of time-series measurements [40], and another is to avoid gradient vanishing and exploding [41]. This architecture forms a booster-refiner encoder that can use rich features to model the large-scale spatial representations of bus and line measurement variables and then refines them with a more salient and condensed feature vector representation.
Recall that for the input measurements Z t k the spatial structure information S t k is learned by the SA. For convenience, the input data for time step t is denoted by X t ∈ S t k , d = 128, and h = 256. Specifically, the calculation flow in the f T A1 LST M is expressed as follows: , and σ denotes the sigmoid function; where C t−1 ∈ R 1×h and ⊙ denotes the Hadamard product. Then, the output of the f T A1 LST M is expressed bỹ After the four LSTM layers, the feature map X out ∈ R T ×d is obtained. To detect the SFDIA, the feature data for the current time step is detached, represented by x p ∈ R 1×d . This feature is then processed by layer f T A5 l a with a sigmoid activation, formulated by where W ∈ R d×1 and a ∈ R.

D. Loss Function
The binary cross entropy error is utilized to train the proposed PowerFDNet, which is expressed by  Table V.
ig. 3: The architecture of the LSTM [42]. C t−1 is the cell state,H t−1 is the hidden state, F t is the forget gate, I t is the input gate,C t is the candidate memory, and O t is the output gate. σ denotes the sigmoid function and ⊙ denotes the Hadamard product.
where N is the mini-batch size, y pi is the prediction result obtained by Eq. (19) for the measurements z t k , and y i is the corresponding ground truth label (in Eq. (15)). In the training stage, the optimization algorithm of Adam [43] is used to update the network weights, with an initial learning rate of 1× 10 −4 . The learning rate is dynamically adjusted in the training stage by the ReduceLROnPlateau scheduler. The popular deep learning framework Pytorch-1.9.0 2 was used to construct the PowerFDNet for the model training and testing. The trained network model of the proposed PowerFDNet will be available online at https://github.com/FrankYinXF/PowerFDNet.

A. Dataset
In the experiments, two benchmark power systems from the public SimBench dataset [21] were used to assess the SFDIA detection. There are three main reasons. First, these power grids contain detailed data, especially time-series demand profiles for an entire year that are generated every 15 minutes (e.g., 35,136 demand profiles). Therefore, it can convenient to use these power grids to simulate a power grid with dynamical power load and generation. Second, measurements for buses and lines have been defined and tested in these SimBench power grids with high voltage and extra-high voltage [21]. Therefore, this dataset can be conveniently used to evaluate the SFDIA detection. Finally, the SimBench power grids originated from the German power systems. To some extent, 2 https://pytorch.org/docs/1.9.0/ it provides realistic power grids for evaluating the SFDIA detection.
The two benchmark power grids, '1-HV-mixed-0-no sw' and '1-EHV-mixed-0-no sw', were utilized to evaluate the performance of SFDIA detection. The '1-HV-mixed-0-no sw' is a high voltage level grid with 110 KV transmission lines, denoted by Grid-HV, which is monitored by 355 measurements, with 35,136 profiles for dynamical power load and generation. More details are shown in Table VI. The '1-EHV-mixed-0-  The open-source software Pandapower 3 and SimBench 4 and the commercial software PowerFactory 2017 SP4 5 were used in the SFDIA date generation stage for the power flow calculation and the bad data detection.

1) Time-series Measurements Generated on the Grid-HV and Grid-EHV
The genuine values of these measurements in the two power grids were obtained by calculating the power flow using the commercial software PowerFactory 2017 SP4. Hence, there are 35,136 normal measurement samples for each power grid. Details of measurement data are presented in Section IV-A. For Grid-HV, each measurement sample collected at a time step contains 192 measurement values for buses and 163 measurement values for transmission lines. For Grid-EHV, each measurement sample measured at a time step contains 1,698 measurement values for buses and 2,254 measurement values for transmission lines. The measurement noises are assumed to follow Gaussian distributions and are configured to be less than 1% for voltage magnitude and less than 2% for active/reactive power injection and power flow [22].

2) SFDIA Measurement Generation
The generation of the SFDIA measurement is based on the method proposed in [5], [8]. The attacks were launched on a target bus by maliciously modifying either its voltage angle (V a) or voltage magnitude (V m). To comprehensively evaluate the performance of the SFDIA detection, three types of SFDIAs are designed and summarized as follows: • Type-A that the rate of the active power injection change on the target bus is in the range of (50%, 100%], • Type-B that the rate of the active power injection change on the target bus is in the range of (25%, 50%], and • Type-C that the rate of the active power injection change on the target bus is in the range of (5%, 25%]. Therefore, Type-A SFDIA will lead to a large change in power injection, Type-B SFDIA will lead to a medium change, and Type-C SFDIA will lead to a relatively small change in power injection. At each time step, six buses with injection are randomly selected as the target buses to launch these three types of SFDIAs. Therefore, there are totally 35,136 × 6 = 210,816 attacked measurement samples, summaries as follows: All of these SFDIA measurements have bypassed the bad measurement detection function of PowerFactory 2017 SP4. These three types of SFDIA datasets generated for Grid-HV and Grid-EHV in this experiment will be publicly available online at https://github.com/FrankYinXF/PowerFDNet. Fig. 4 shows the statistical information from time step 300 to 500 about Type-A SFDIA attack measurements on Grid-HV in terms of the change of V m, the rate of P change, and the change of P . Fig. 5 shows a normal measurement sample (corresponding to z in Eq. (10)), an SFDIA sample obtained (corresponding to z bad in Eq. (10)) by attacking the normal sample, and the change (corresponding to a in Eq. (10)) between the two samples.

3) Training and Testing Dataset
As introduced in Section V-A2, each power grid generates 35,136 normal measurement samples and 210,816 SFDIA samples with three types of attacks. The normal measurements are labeled as 0, and the SFDIA measurements are labeled as 1, as expressed in Eq. (15). For each grid, 29,952 normal measurements for the first 312 days of one year are grouped as training data, and the remaining 54 days' normal measurements (namely 5,184 samples) are utilized for testing. The SFDIA measurements are grouped in a similar way, e.g., 179,712 for training and 31,104 for testing (each type of SFDIA contains 10,368 testing samples, with 5,184 SFDIA samples by modifying V m and 5,184 SFDIA samples by modifying V a). The advantage of this way of data partitioning is that the testing data is completely fresh to the trained model so that the detection cases can realistically simulate the realworld situation.

B. Performance Metrics
Three commonly used metrics were applied to assess the SFDIA detection [17], [19], [20], which are expressed by: where N f p indicates the number of false positive, N tp the number of true positive, N f n the number of false negative, and N tn the number of true negative, which are summarized in Table VIII. A normal measurement sample is defined as negative, while a sample attacked by the SFDIA is defined as positive. Hence, N f n +N tp is the total number of real positive samples in the data set, and N f p + N tn is the total number of real negative samples in the data set.

C. Evaluation of SFDIA Detection
In the experiment, we compared the SFDIA detection accuracy of the proposed PowerFDNet with two state-of-the-art approaches, M-I [19] and M-II [17]. As introduced in Section V-A2 and Section V-A3, there are 5,184 normal samples in the test stage. Each type of SFDIA contains 10,368 samples, where 5,184 samples are obtained by modifying the voltage magnitude of a target bus and the other 5,184 samples are obtained by modifying the voltage angle of a target bus. Table  IX and Table X compare the accuracy of Type-A SFDIA detection evaluated on Grid-HV and Grid-EHV, respectively. Table XI and Table XII compare the accuracy of Type-B SFDIA detection evaluated on Grid-HV and Grid-EHV,  respectively. Table XIII and Table XIV compare the accuracy of Type-C SFDIA detection evaluated on Grid-HV and Grid-EHV, respectively. The values in bold are the best results obtained for each accuracy. As shown in the tables, compared to the other two approaches, the proposed PowerFDNet has achieved significant improvements in the three performance metrics on the two benchmark power grids.

1) Case A: Type-A SFDIA Detection
The Type-A SFDIA detection is to assess the detection accuracy of SFDIAs that are launched by attacking the target bus with a change in the bus active power injection in the range of (50%, 100%]. Table IX and Table X summarize the comparison of the detection accuracy of the Type-A SFDIAs evaluated on Grid-HV and Grid-EHV, respectively.  As shown in Table IX, in the case of the Type-A SFDIA detection, it is clear that the proposed PowerFDNet achieves the best F 1 of 99.668%, the best recall of 99.778%, and the best precision of 99.557% on Grid-HV. The precision achieved by the proposed method is about 3.756% higher than M-I and 3.091% higher than M-II, respectively. The recall achieved by the proposed method is about 5.885% higher than M-I and 4.012% higher than M-II, respectively. The F 1 score achieved by the proposed method is about 4.819% higher than M-I and 3.551% higher than M-II, respectively. Similar detection performance is also achieved on Grid-EHV, as summarized in Table X. Our method obtains the best F 1 of approximately 99.465%, which is around 5.127% higher than M-I and about 3.849% higher than M-II, respectively. The best precision obtained by our method is approximately 99.422%, which is about 3.960% higher than M-I and 3.286% higher than M-II, respectively. The best recall achieved by our method is approximately 99.508%, which is about 6.295% higher than M-I and 4.413% higher than M-II, respectively. That demonstrates that for the Type-A SFDIAs with large rates of power injection change at target buses, the PowerFDNet achieved the highest SFDIA detection accuracy in terms of the precision, recall, and F 1 score when compared to the two state-of-the-art approaches.

2) Case B: Type-B SFDIA Detection
The Type-B SFDIA detection is to assess the detection accuracy of SFDIAs that are launched by attacking the target bus through a medium modification in the range of (25%, 50%] to the bus active power injection. Because there is less modification in the power injection in the Type-B SFDIAs, it is harder to detect the Type-B SFDIAs than to detect the Type-A SFDIAs. Table XI and Table XII compare the detection accuracy of the Type-B SFDIAs evaluated on Grid-HV and Grid-EHV, respectively.  As shown in Table XI, in the case of Type-B SFDIA detection, it is clear to see that the proposed PowerFDNet obtains the best F 1 of 99.518%, the best precision of 99.461%, and the best recall of 99.576% on Grid-HV. The precision achieved by our method is about 3.766% higher than M-I and 3.162% higher than M-II, respectively. The recall obtained by our method is about 5.898% higher than M-I and 4.136% higher than M-II, respectively. The F 1 score obtained by our method is approximately 4.831% higher than M-I and 3.649% higher than M-II, respectively. Similar detection performance is also achieved on Grid-EHV, as summarized in Table XII. Our method obtains the best F 1 of approximately 99.329%, which is around 5.271% higher than M-I and about 3.962% higher than M-II, respectively. The best precision obtained by our method is approximately 99.363%, which is about 4.001% higher than M-I and 3.321% higher than M-II, respectively. The best recall achieved by our method is approximately 99.296%, which is about 6.540% higher than M-I and 4.603% higher than M-II, respectively.

3) Case C: Type-C SFDIA Detection
The Type-C SFDIA detection is to assess the detection accuracy of SFDIAs that are launched by attacking the target bus through a relatively small modification in the range of (5%, 25%] to the bus active power injection. Compared with the other two SFDIAs, the Type-C SFDIA measurements have a smaller modification in the bus active power. Hence, it is more difficult to detect the Type-C SFDIAs. Table XIII and  Table XIV summarize the detection accuracy of the Type-C SFDIAs evaluated on Grid-HV and Grid-EHV, respectively. As clearly shown in Table XIII and Table XIV, the detection accuracy (precision, recall, and F 1 ) of all the three methods is slightly lower than that evaluated on the Type-A and Type-B SFDIAs. Compared to the other two methods, the proposed PowerFDNet achieved the highest detection accuracy in terms of F 1 , recall, and precision on Grid-HV and Grid-EHV.  Compared to M-I evaluated on Grid-HV, our method improved by approximately 4.858% in F 1 score, approximately 3.786% in precision, and about5.931% in the recall, respectively. Compared with M-II evaluated on Grid-HV, our method obtained an improvement of approximately 3.672%, 4.164%, and 3.180% in terms of F 1 score, recall, and precision, respectively. Compared with M-I evaluated on Grid-EHV, our method improved by approximately 5.446% in F 1 , about 6.801% in the recall, and around 4.089% in precision, respectively. Compared with M-II evaluated on Grid-EHV, our method obtained an improvement of approximately 4.097%, 4.831%, and 3.363% in terms of F 1 , recall, and precision, respectively. That demonstrates that for the Type-C SFDIAs with small rates of power injection change at target buses, the PowerFDNet achieved the highest SFDIA detection accuracy (F 1 , recall, and precision) compared with the two state-of-theart approaches.
D. An IoT-oriented Prototype of the SFDIA detection A lightweight IoT-oriented SFDIA detection prototype was implemented in the Android-based mobile platform, as shown in Fig. 6. The lightweight prototype is around of size 52 MB, with the optimized mobile model of size 8.5 MB. The optimized lightweight model is achieved by PyTorch, which provides such a utility to easily create serializable and optimizable models. 6 The testing time for one sample by the prototype is about 0.2 seconds in an android emulator of Pixel XL API 30. The popular deep learning framework PyTorch 1.10.0 7 and PyTorch android lite:1.10.0 8 were used to implement this IoT-oriented prototype. VI. CONCLUSION In this paper, we proposed a spatiotemporal deep learningbased PowerFDNet for successful SFDIA detection in ACmodel power systems. To model the spatiotemporal structure information between buses and lines, we designed two subarchitectures: the SA for the spatial structure learning and the TA for the temporal structure learning. In the SA, we firstly model the bus measurements and the line measurements separately, so that the model can effectively represent these two types of measurements. Then, a sub-network is designed to capture the spatial structure information between buses and lines and to preliminarily capture the patterns of SFDIA measurements. Further, the TA based on the LSTM is designed to effectively learn the temporal structure information of the preliminary features obtained by the SA. The proposed Pow-erFDNet is comprehensively evaluated on two realistic benchmark power grids. The experimental results demonstrate that 6 https://pytorch.org/mobile/android/#quickstart-with-a-helloworldexample 7 https://pytorch.org/get-started/locally/ 8 https://pytorch.org/mobile/android/ the PowerFDNet achieves significant improvement in terms of F 1 , recall, and precision compared with the two state-of-theart SFDIA detection approaches. In addition, an IoT-oriented lightweight prototype of size 52 MB is implemented and tested for mobile devices, which demonstrates the potential applications on mobile devices.