A Novel False Data Injection Attack Detection Model of the Cyber-Physical Power System

The False data injection attack (FDIA) against the Cyber-Physical Power System (CPPS) is a kind of data integrity attack. With more and more cyber vulnerabilities detected out, different types of FDIAs are emerging as severe threats to the stable operation of CPPS gradually. In this paper, the invasion pathway of the FDIA against CPPS is explored in detail, and a novel FDIA detection model based on ensemble learning is further provided. First, a pseudo-sample database is built to assist the training and evaluation of this model, and it’s more important to update the model in the future. Furthermore, the optimal feature set is extracted to characterize the behavior of the FDIA, which improves the precision of the FDIA detection model. Finally, a focal-loss-lightgbm (FLGB) ensemble classiﬁer is constructed to detect the FDIA behavior automatically and accurately. We illustrated the performance of this model by a fusion of measurement data and power system audit logs. This model utilizes the ofﬂine training way, the conclusion shows the high precision and stability of this model, which ensures the stable operation of the smart grid and improves the FDIA resistance ability of the CPPS.


I. INTRODUCTION
The CPPS fuses computing equipments, communication systems, and the physical power grid into a multidimensional, isomerous, and complex system [1], [2]. Compare to the traditional physical electric power grid, CPPS is more intelligent and steady. However, due to the existence of the vulnerabilities in cyberspace, especially the complex interactive process between electric power flow and information flow, information security has become an important factor that affects the safe and stable operation of the power system. Therefore, the power system is facing serious threats of network attacks continuously. Different from the network attacks in the Internet field, cyber-attacks against CPPS concentrate more on destroying the stability control of the cyber layer over the physical layer, even paralyze the operation of the power system [3]. The blackout in Ukraine is the most representative The associate editor coordinating the review of this manuscript and approving it for publication was Canbing Li . network attack against the power system, which caused the operator to lose control of the system, lead to a wide range of blackouts and serious economic loss [4].
In the several types of cyber-attacks against CPPS, the FDIA is a type of data integrity attack [5]. To launch the FDIA, the attacker utilizes the vulnerabilities of the communication system in the cyber layer, and then modifies the measurement data of physical devices maliciously [6], [7]. The FDIA leads the control center to make an incorrect state estimate of the current system. Then the control center loses the normal control ability of the physical equipment, which causes the power system to collapse, even leads to a large-scale power failure. In recent years, more kinds of FDIA have emerged, the attacker can use network vulnerabilities to tamper with the measurement data, control data, even the equipment information with the minimum cost, which cause the large-scale chain failures [8]. However, the defense aiming at FDIA become more difficult because of the great concealment of FDIA. Therefore, it is urgent to propose an VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FDIA detection method with high precision, which lays the foundation for the defense of FDIA. In order to improve the FDIA resistance ability of CPPS, there are three main FDIA detection methods: 1) The detection methods based on state estimation; 2) The detection methods based on time series prediction; 3) The detection methods based on machine learning.

A. THE DETECTION METHODS BASED ON STATE ESTIMATION
The smart grid adopts the least-squares method to finish the state estimation. If attackers get a sufficient understanding of the information and protection algorithms of the power grid, they can build false data that evades the state estimation algorithm. Build on the characteristics of this kind of FDIA, the traditional state estimation algorithm has been improved, which enhances the ability of CPPS to resist FDIA. The improved methods of state estimation mainly include the Kalman filtering method, residual detection method, and measurement correlation detecting method [9]- [11]. Besides, considering that the setting of the detection threshold is affected by the scale of the power system greatly, the global power system can be divided into sub-systems, and different detection thresholds are set according to the actual parameters of each sub-system [12], [13]. In this way, the FDIA against complex power systems can be detected effectively. The advantage of this kind of method is that it uses mature algorithms, detects FDIA rapidly, and can reflect the characteristics of the power system better. However, the setting of the detection threshold brings empirical error, which causes a bad impact on the detection accuracy. What's more, when fluctuations happen in the power system, it is prone to cause the false alarm of FDIA [14].

B. THE DETECTION METHODS BASED ON TIME SERIES PREDICTION
Detection methods based on state estimation are used primarily for static analysis, which detects attack behavior at a specific moment. During the continuous dynamic operation of the power system, there is a strong space-time relationship between multiple state variables. Therefore, it's feasible to use historical data for trajectory analysis to predict the current state of the power grid. Then compare the prediction result with the current actual measurement value, and analyze the areas that may be attacked by the similarity comparison. The detection methods based on time series prediction mainly include the statistical consistency detection, the sequential consistency detection, and sensor trajectory prediction [15]- [17]. This kind of method predicts the distribution of state variables based on the operating law of the system state and the historical database. By matching the running trajectory, various types of false data can be detected effectively, but the computation complexity is high, which causes low detection speed. The power system can't respond in time when FDIA happens, so it is not suitable for the FDIA detection of complex systems.

C. THE DETECTION METHOD BASED ON MACHINE LEARNING
As the deployment of wide-area measurement systems (WAMS) expands, which provides massive data for data analysis of the power system. Especially the machine learning technology is playing an important role in the FDIA detection against CPPS gradually. The machine learning algorithms establish the complex mapping relationship between input data and output data through training models, and the computation speed is fast. This kind of method doesn't need to solve complex time-domain equations of the power system, and clear performance indicators can be used to evaluate the performance, it is one of the current main FDIA detection methods. This type of method mainly includes support vector machine (SVM) [18], extreme learning machine [19], fuzzy C-means clustering [20], deep learning [21], ensemble learning, etc., [22]. The detection performance of traditional supervised learning algorithms such as SVM depends on the quality of the data heavily, such as the poor characterization ability of the feature set will cause a low detection rate. Compared to supervised learning algorithms, the detection accuracy of the clustering method is often lower due to its unsupervised training characteristics, and the performance of the model can't be evaluated clearly for FDIA detection. The deep learning model automatically processes features and the detection accuracy is often higher than traditional machine learning algorithms, but the training of the model relies on large sample datasets heavily, and consumes more training time, so it's difficult to detect FDIA in real-time [23]. The ensemble learning model is constructed by combining weak classifiers into a strong classifier. Furthermore, the strong classifier adopts the integrated voting mechanism to get higher prediction accuracy. Due to the characteristic of parallel training, the computation consumption is less than deep learning algorithms significantly. So the ensemble learning method balance the accuracy and training time of the model better, is more effective for FDIA detection.
From the above analysis, an FDIA detection model based on ensemble learning is proposed. The main contributions of this work are as follows: 1)This work introduces the imbalance rate of the multi-classification data, a C-KSmote oversampling method is proposed to rebalance the electric power data according to the imbalance rate. The pseudo-data is used to assist in training and evaluating the FDIA detection model. This method generates pseudo-samples that conforms to the characteristics of power data, and decreases the false alarm rate of the FDIA detection model caused by the imbalanced data. The FDIA resistance ability of CPPS is improved obviously by this method.
2)The optimal feature set obtaining method is proposed to characterize the behavior of the FDIA. The optimal feature set contains fewer features than the original data, it not only improves the training precision of the FDIA detection model, but also decreases the model complexity. This method improves the response speed and precision of CPPS against the FDIA.
3)The focal-loss-lightgbm ensemble classifier is constructed, which improves the detection rate of the anomalous distributed samples. Difficult types of FDIAs can be detected accurately by this classier, the fault and FDIA can be also distinguished effectively. The ensemble classifier concentrates more on processing the difficult-classification samples and improves the FDIA detection precision obviously, which is beneficial to the stable operation of the CPPS.
The remainder of this paper is organized as follows: Section II is the mathematical modeling of FDIA and the FDIA detection model. Section III performs the theoretical analysis of our proposed FDIA detection model. Section IV is the experimental verifications and analysis of the FDIA detection model. Section V is the conclusion of this article.

II. THE MODELING OF THE FDIA DETECTION MODEL OF CPPS A. THE DETECTION MODEL OF TRADITIONAL FDIA AGAINST CPPS
The state estimation is the crucial mechanism for maintaining the stability and efficiency of the modern power grids [24]. As is shown in Figure 1, the measurement data of the physical layer, such as bus voltage, bus power flow, branch power flow, and load profiles are collected by the sensors or meters. Then collected data is sent to the control center in the cyber layer by the industrial control system known as the SCADA system. The control center estimates the states of the power system according to the received measurement data, detects the potentials of contingency, and sends the corresponding control signals to the Remote Terminal Units (RTUs) of the physical layer, which ensure the reliable operation of CPPS and finish the closed-loop control of physical power grid.
The state estimator in the control center uses the measurement data reported from the SCADA system and the steady system model to estimate the system state over time.
∈ R n is the state vector, and e = (e 1 , e 2 , . . . , e m ) T ∈ R m denotes the measurement error vector, the description of the observation model of DC power flow is as shown in equation (1): The observation model of DC power flow can be further formulated by equation (2): In equation (2), the H ∈ R m×n is the Jacobian matrix that represents the topology of the physical grid, and e ∼ N(0, σ 2 ) denotes the environment noise modeled by Additive White Gaussian Noise (AWGN) with the standard deviation σ .
According to the statistical estimation criteria (Minimum Mean-Square Error), the estimated system state x can be further formulated by equation (3): The sign ∧ is a diagonal matrix whose diagonal element is If attackers have a sufficient knowledge of the system topology that is represented by the Jacobian matrix H , they can launch the FDIA by tampering the state estimation data such as load profiles. The observation model in the presence of the FDIA can be formulated by equation (4): The variable a represents the vector of tampered data, when the FDIA happens, variable a = 0. The traditional FDIA detection model based on machine learning is a binary classification task. That is, if there are a total of i samples in the dataset X = {X 1, X 2, . . . , Xi}, the detection result is y = {y 1 ,y 2 , . . . ,y i }, then the detection model of FDIA can be VOLUME 8, 2020 characterized as follows: As more and more vulnerabilities of the cyber layer are discovered, multiple types of FDIAs have been proved to have the ability of invading the CPPS potentially. However, the traditional FDIA detection mechanism is only designed to detect a certain type of FDIA, which limits the detecting range.

B. THE DETECTION MODEL OF NEW EXTENSION OF FDIA
The CPPS contains all links to the generation, transmission, distribution, and the consumption of electric power. The traditional FDIA only considers the intruding scheme for tampering with the measurement data of a specific power sub-system. With the continuous deep research of FDIA, the definition of FDIA has been further expanded. As shown in Figure 2, in a broad sense, the FDIA may occur at various abstract layers of CPPS, more specifically, the attackers may launch the FDIA against the monitoring, control, and protection devices, which damage the control system or the applications associated with it such as the generation prediction, state estimation, economic dispatch, and energy trading [25]. Therefore, when designing an FDIA detection model, the detection of different kinds of FDIA should be taken into consideration.
In our proposed FDIA detection model, we apply the ensemble learning technique to detect different types of FDIA against CPPS. When FDIA occurs, it often accompanies the fault of electrical equipment in the physical layer after a short time. So when the transient process happens in the power system, the characteristic of measurement data is far similar to that when FDIA happens. To distinguish the fault and FDIA accurately, the detection of the fault of the power system is also taken into consideration especially when designing this model. To reflect the relationship between the input data and the detection results, the multi-classification ensemble classifier is used to modeling this FDIA detection model. For a multi-classification task, if the original dataset is D, there are j samples in the dataset, the dimension of the dataset is n, then the dataset D can be described as equation (6): If the detection result is y, the variable y is a list that contains the number of {0, 1, 2, 3, 4}, the length of the variable y is j. The labeled number and the final detection result of FDIA is as the description in equation (7): As is shown in equation (7), the FDIA type 1, type 2, and type 3 are measurement data tampering attack, remote trip instruction tampering attack, and relay setting tampering attack respectively.
The model we proposed in this paper expands the scope of FDIA detection. In a narrow sense, this model can distinguish if there is FDIA hacking into the CPPS efficiently. Broadly speaking, the specific type of FDIA can be detected accurately, and the false alarm rate is decreased obviously when the transient process happens in the physical layer.

C. THE FRAMEWORK OF THE FDIA DETECTION MODEL
The framework of the model is as shown in Figure 3. The original measurement data is collected by the Phasor Measurement Unit (PMU), and missing values in the data are filled with the mean value of the same column. To reduce the false alarm rate caused by imbalanced data, we proposed a C-KSmote over-sampling method to generate pseudo-data. The pseudo-data was stored into the pseudo-sample database, which assists in training and evaluating the ensemble classifier, and further updating the model. Based on the final balanced data, a new feature set was constructed, this feature set contains 32 electrical variables with real physical meaning. Then the constructed feature set was inserted into the original feature set randomly, which makes the feature space more suitable for characterizing the FDIA behavior. To reduce the dimension of the data, the JMIM feature was used to extract the optimal feature subset for FDIA detection. Finally, the loss function (Focal Loss) was used to improve the lightgbm ensemble learning algorithm, which increases the weight of samples with the low prediction accuracy, and further enhances the detection precision of FDIA.

III. THE THEORETICAL ANALYSIS OF THE FDIA DETECTION MODEL OF CPPS A. THE C-KSMOTE OVERSAMPLING METHOD OF THE ELECTRIC POWER DATA
In a quantitative collected dataset, the normal data occupies a larger proportion when the power system operating continuously. On the contrary, the probability of the FDIA happening is lower, so the data size of the FDIA category is smaller. The performance of the classifier tends to bias to the majority of normal samples due to the imbalanced dataset [26]. In the case of minority FDIA samples misclassified, the model may still obtain a high accuracy on the test set. It is unreasonable to use the total accuracy to measure the performance of the FDIA detection model. What's more, the stability of the FDIA detection model is important for the security of the power system, so the false alarm rate caused by the serious imbalanced dataset is more costly. It is significant to adopt an appropriate oversampling strategy to balance the tampered data.
Aiming at the above problem, a C-KSmote oversampling method was proposed to generate pseudo-data, the pseudosamples are stored into the pseudo-sample database for further use. First, we define an imbalance rate quantitative method of a multi-class dataset. If the number of minority samples is a, and the average sample size of each class is b, then the imbalance rate of the multi-class dataset is r = (b − a) a. If the imbalance rate of the data exceeds a predetermined threshold, then oversampling is performed to all minority samples. The principle of the C-KSmote oversampling method is as shown in Figure 4: For each class of the minority samples, first cluster the minority samples into clusters (isolated samples do not participate in clustering and oversampling), the noise samples {A, B, C, D, E, F} are filtered when clustering, VOLUME 8, 2020 when finishing oversampling, the noise samples are added into the balanced dataset. Then assign weights to each cluster. If the number of samples in a cluster is small, the weight of this cluster is high. On the contrary, the clusters contain more samples get low weight. For each of the clusters, select a sample from the cluster randomly, and perform a linear interpolation of the cluster center and the selected sample to generate a new sample. Repeat the above steps until all clusters finished oversampling. The detailed principle of the C-KSmote algorithm is as shown in Algorithm 1.
After all clusters finish oversampling, the pseudo-data is stored into the non-relational database to construct the pseudo-sample database. The pseudo-sample database is then used to assist in training and evaluating the classifier, and update the model in the future.
The proposed C-KSmote method manages to avoid the generation of noise by oversampling only in minority sample areas. Moreover, it focus on both the between-class imbalance and within-class imbalance, which generates enough pseudo-data that is more similar to the real power data. Besides, the pseudo-sample database is used not only for training and evaluating the classifier, but also updating the model in the future periodically.

B. THE OPTIMAL FEATURE SET OBTAINING METHOD FOR FDIA DETECTION
In order to obtain the optimal feature set for FDIA detection, we performed feature engineering that combines the feature construction and feature selection (FCS) method. To obtain a comprehensive feature set to characterize the FDIA behavior, we constructed 8 features of each PMU, which are mixed with original features randomly. However, The constructed Determine n clustering center p(p 1 , p 2 , . . . , p n ) in the sample space of class j; Cluster the samples of class j into n clusters C(C 1 , C 2 , . . . , C n ); Filter the noise samples; Assign weights to clusters W (W 1 , W 2 , . . . , W n ), determine the number of pseudo-data generate in the cluster according to the weight; For sample m is in C: Perform a linear interpolation of the cluster center and the sample m to generate a pseudo-sample X _new_m; End for If sample generations of all clusters finish: T ← X _new_m; Add the noise samples into T ;

End if End for
Return T End features increase the dimension of the data, and there may exist redundancy features, so the mixed feature set needs further dimension reduction process. The framework of the FCS method is as shown in Figure 5:

1) FEATURE CONSTRUCTION FOR PMU MEASUREMENT DATA
In this paper, the feature construction method is used to construct new features with real physical meaning through the non-linear transformation, the constructed features are more conducive for classifiers to detect FDIA. The smart grid uses intelligent electronic devices such as smart meters, sensors to collect state data of the power system. There exist non-linear or linear dependencies between the multi-state variables of electric power data. Therefore, based on the limited original variables, new feature variables with real physical meanings can be further calculated. We constructed 32 new features and inserted these features into the original feature set randomly. To delete the useless or irrelevant features, the mixed feature set should be further processed.

2) THT OPTIMAL FEATURE SET OBTAINING METHOD
In order to obtain the optimal characterization feature set of FDIA behavior and reduce the dimension of the data, a filter feature selection method-Joint Mutual Information Maximization (JMIM) is used in this paper [27]. It's a feature selection algorithm based on mutual information, which minimizes the data dimension without deleting important features. It's logical to obtain the optimal feature set for FDIA detection by the JMIM method.
If the mixed feature set is F = {f 1 , f 2 , . . . , f N }, the data dimension is N . The principle of the JMIM algorithm is as follows: Definition 1: The mutual information of variable a and b is defined as follows: In equation (7), H (· · · ) and H (· · · | · · · ) denotes the solution of entropy and conditional entropy.
Define the currently selected feature set is S, if the features f i and a feature f S in S are highly correlated, then the mutual information of f i and f S is calculated as follows: The JMIM employs the joint mutual information and the ''maximum of the minimum'' approach, which selects the most relevant features with the label. The selected features are more suitable to characterize the FDIA behavior. The equation (10) shows the final feature set selected by the JMIM algorithm: In the feature selection process, a forward greedy search algorithm is used to select related k features in the feature space, the forward greedy search strategy is shown in Algorithm 2: The Gradient Boosted Decision Tree (GBDT) is an ensemble learning framework. This framework can be expressed as an additive model of the decision tree, which uses the steepest descent method to approximate each decision tree. In each iteration, make the new decision tree follow the fastest reduction direction of the loss function (Negative gradient direction) to get higher prediction accuracy. The principle of the GBDT algorithm is as shown in Figure 6.

2) LIGHT GRADIENT BOOSTING MACHINE (LIGHTGBM)
With the geometric growth of the data scale, the GDBT algorithm faces the disadvantages of easy overfitting and slow training speed. Aiming at the above disadvantages, VOLUME 8, 2020 Search for a feature f i that has the highest mutual information with label,

End for End for
Return S End the lightgbm algorithm makes several improvements to the GBDT algorithm. There are two main improvement mechanism. 1)Integrate the histogram algorithm to process features (histogram); 2)Utilize the deep growth strategy (Leaf-wise) to control the model complexity. The prediction accuracy of lightgbm is higher than the traditional GBDT algorithm, and the time consumption is shorter [28].
(1)Histogram algorithm: The histogram algorithm discretize the continuous floating feature-values into K bins, and then build a histogram with the width of K . The building method of the histogram is as shown in Figure 7. When the decision tree traversing, it uses the discrete feature-value as an index of the cumulative statistics measure. Then it finds the optimal splitting point according to the discrete value to traverse. The histogram algorithm can reduce the memory consumption and decrease training time effectively [29].
(2)Deep growth strategy: Lightgbm traverses all leaf nodes before the feature splitting, then uses the leaf with the largest  gain to split. The algorithm repeats the above steps until all leaves finish splitting. Compared to the classifier with traditional level growth strategy, the lightgbm which utilizes the deep growth strategy gets higher accuracy under the same times of splits. At the same time, lightgbm adds the maximum depth limit when the model grows, which prevents the model from overfitting. The deep growth strategy is as shown in Figure 8, where the white nodes represent the nodes with the largest gain.

3) FOCAL-LOSS-LIGHTGBM(FLGB)
If the detection precision of the model can't satisfy the requirements of the power grid, it is not conducive to process the faults cause by attacks in time. When FDIA happens, the measurement data of the physical layer often contains data with anomalous distributed, these abnormal samples are difficult to classify by the classifier. To deal with difficult-classification samples, the focus loss function is used to improve the lightgbm classifier, which assigns a higher weight to difficult-classification samples [30].
For the traditional lightgbm classifier, the multi-classification loss function is the cross-entropy loss function. As is shown in equation (11), where p i denotes the probability that the sample is easy to classify, y i represents the actual label of the data, T represents the number of classes.
The multi-classification focus-loss function is as shown in equation (12): In the iteration process of focal-loss-lightgbm, when a sample is misclassified, the value of p i is small, the regulatory factor (1 − p i ) is equal to 1 approximately, and the loss is not affected. When a sample is easy to classify, the regulatory factor (1 − p i ) is equal to 0 approximately, so the weight of easy-classification samples is reduced. The hyper-parameter γ adjusts the proportion of lower weight smoothly, which belongs to the easy-classification samples. The effect of the regulatory factor can be enhanced by increasing the hyper-parameter γ . Based on the above analysis, the focus loss function decreases the weight of easy-classification samples and increases the weight of difficult-classification samples. Therefore, the FLGB classifier pays more attention to difficult-classification samples when training the classifier, which further improves the accuracy of FDIA detection of CPPS.

IV. EXPERIMENTAL ANALYSIS
In this section, we evaluate the performance of our proposed CKS-FCS-FLGB model for FDIA detection. The dataset is from the public Google power system cyber-attack dataset [31]. All classification experiments utilized the offline training method.

A. DATASETS
The dataset is collected by the Oak Ridge National Laboratory. The topology of the power system is shown in Figure 9, L1 and L2 are transmission lines, P1 and P2 are three-phase generators. R1∼R4 are four intelligent electronic devices that control the on/off of the four circuit breakers DR1∼DR4. The PDC is used to process vector data, and it is responsible for storing asynchronous data and displaying historical data. Besides, the intelligent electronic devices use distance protection mechanism, the circuit breaker opens when a fault occurs. This micro power system contains the characteristic of CPPS, where the FDIA happens in the cyber layer and causes the fault of physical devices. The dataset consists of a total of five classes of samples after processing, which includes normal operation and the line maintenance status of the power system, power system fault status, measurement data tampering attack, remote trip instruction tampering attack, and relay setting tampering attack. These five scenarios are described as the following descriptions: Class-0: The normal operation and the line maintenance of the power system.
Class-1: The fault status: The small current ground fault takes place in the power system. Class-2: Measurement data tampering attack: The attacker injects false data from the data acquisition channel that evades the bad data detection mechanisms, then the control center makes an incorrect state estimation of the power system, which makes the physical grid operate unstably.
Class-3: Remote trip instruction tampering attack: The attacker tampers with the trip command of the relay protection device and operates the controller directly.
Class-4: Relay setting tampering attack: The attacker tampers with the relevant settings of the relay so that the controllers cannot switch off in time.
Each sub-dataset contains 128 features, which includes the data collected by 4 Phasor Measurement Units (PMUs), the snort alarms, and system logs. Each PMU measures 29 variables and the remaining 12 features are the log information of the control panel. The feature names and the descriptions are shown in Table 1. Among them, the symbol ''#'' in symbol ''R#'' means the index of PMU. For example, ''R1-PM1'' represents the voltage phase magnitude measured by PMU R1.

B. THE PSEUDO-DATA OBTAINING METHOD OF THE POWER DATA
For the original data, the sample statistics are as shown in Figure 10. The missing values in the data are filled using the mean-value method. The statistics show that there is a serious VOLUME 8, 2020  imbalance disadvantage in each sub-dataset. The sample size of class 4 (Remote trip instruction tampering attack) is much higher than the other four categories of samples. Therefore, the sample size of the other four categories of data is not enough to train the model precisely.
According to the imbalance rate quantitative method proposed in this paper, the imbalance rate of the 15 sub-datasets is calculated and the results are as shown in Table 2. The sub-dataset with the lowest imbalance rate is sub-dataset 2, the imbalance rate is 74.46%. The sub-dataset with the highest imbalance rate is sub-dataset 13, the imbalance rate is 242.22%. Generally speaking, for a multi-classification task, the imbalance rate that is more than 50% would affect the precision of the classifier seriously. In this paper, the imbalance threshold is set to 50%. So all sub-datasets need to oversample to become balanced.
We get enough pseudo-data of the four categories of minority samples through the C-KSmote oversampling method, which rebalances the data obviously, the sample size of each class has been balanced, and the imbalance rate of each class is equal to 0% approximately. The sample proportion of each class is as shown in Figure 11:  Table 3. Then the constructed features are inserted into the original feature set randomly, the dimension of the mixed feature is 160.

2) THE OPTIMAL FEATURE SET SELECTED BY JMIM
The mixed feature set contains 160 features, the feature dimensions is too high. The high-dimensional data not only causes excessive computational consumption, but increases the training time of the classifier. What's more, the existence of irrelevant and redundant features decreases the accuracy of FDIA detection seriously. On the large-sample datasets, 80 features are selected to construct the optimal feature set for FDIA detection based on the JMIM algorithm, the dimension of the optimal feature set is 50% lower than before. The names and the mutual information values of each feature are shown in Figure 12.
As is shown in Figure 12, the optimal 10 features in the optimal feature set for FDIA detection are listed as follows: the PMU2-C voltage phase angle, the PMU1-C voltage phase angle, the PMU4-C current phase angle, the PMU3-A current phase angle, the PMU3-C voltage phase angle, the PMU4-A current phase angle, the PMU3-A voltage phase magnitude, the PMU1-C current phase magnitude, the PMU1-A voltage phase angle, and the PMU4-C current phase angle. These features selected by the JMIM algorithm can characterize false data more accurately, they are the key features for detecting the FDIA against CPPS.

D. PERFORMANCE ANALYSIS OF THE CKS-FCS-FLGB FDIA DETECTION MODEL 1) THE COMPARISION CKS-FCS-FLGB AND LIGHTGBM
Based on the optimal feature set selected by the JMIM algorithm, the FLGB ensemble learning classifier performs classification that uses the default parameters except for the    number of iterations, the training and test set division ratio is 3:1. The default values, the meanings, and the effects of the parameters of FLGB are as shown in Table 4: The total accuracy, average recall, average precision, and average F1-Score are used as performance indicators to evaluate the FDIA detection model. The effect of each indicator is described as follows. In the case of the large-sample, the parameter ''num_boost_round'' of lightgbm is set to 5000, if the loss of each iteration doesn't decrease obviously, then the model stops iterating early. The lightgbm classifier performs 3182 iterations, the final multi-log-loss is 0.341296. CKS-FCS-FLGB performs 3708 iterations, the multi-focal-loss is 0.006144, which decreases the loss more obviously than the multi-log-loss. The total accuracy of VOLUME 8, 2020  the traditional lightgbm classifier and the CKS-FCS-FLGB model are 89.26% and 95.46% respectively. The total accuracy of the CKS-FCS-FLGB model is 5.76% higher than lightgbm classifier. Due to the imbalance problem of the original data, the total accuracy of lightgbm classifier is lower. To reflect the performance of the classifier clearly, the classification report of lightgbm classifier and the CKS-FCS-FLGB model is shown in Figure 13. The average precision, average recall, and average F1-Score of lightgbm classifier are 89.71%, 85.93%, and 87.67% respectively. The average detection precision, average recall, and average F1-Score of the CKS-FCS-FLGB model are 95.23%, 95.97%, and 95.36%. Compared to the traditional lightgbm classifier, these performances of the FDIA detection model are improved by 5.52%, 10.04%, and 7.69% respectively. In the case of the large-sample, the CKS-FCS-FLGB model proposed in this paper not only improves the detection accuracy of various FDIAs significantly, but also reduces the false alarm rate of the FDIA detection model.
Under the condition of the large-sample, the ROC curve, precision-recall curve, and confusion matrix of lightgbm classifier on the test set are shown in Figure 14: As is shown in Figure 14 (a), the inflection point of the ROC curve of the lightgbm classifier is smooth, and it cannot converge to the point (0,1). Under the premise of a low false alarm rate, it is difficult for lightgbm classifier to obtain a high detection precision. From the analysis through Figure 14 (b), the precision-recall curve cannot converge to the point (1,1), the curves of various classes are discrete, and the trade-off between precision and recall cannot be ensured better. As the visualization of Figure 14 (c), due to the imbalanced dataset, the results show that the detection accuracy of lightgbm classifier is biased to class 5 obviously. Therefore, there are disadvantages of a high false alarm rate for FDIA detection, so the performance of the FDIA detection model is unstable.
Compare to the lightgbm classifier in the case of large-sample data, the ROC curve, precision-recall curve, and confusion matrix of the CKS-FCS-FLGB model are shown in Figure 15: It can be seen in Figure 15, compare to the traditional lightgbm classifier, the inflection point of the ROC curve of CKS-FCS-FLGB is closer to the point (0,1). The CKS-FCS-FLGB model obtains higher detection accuracy under the condition of a low false alarm rate. As for the precision-recall curve, it is more smooth and the inflection point is closer to the point (1,1). The CKS-FCS-FLGB model balances the recall and precision better, which gets high precision and recall at the same time. Compare with the traditional lightgbm classifier, the detection accuracy for classes 1, 2, and 3 is improved significantly. For a large-sample data, the detection stability of the CKS-FCS-FLGB model is higher than that of the traditional lightgbm classifier. So it can be proved preliminary that the CKS-FCS-FLGB improves the detection precision, and decreases the false alarm rate of FDIA.
Avoiding overfitting is a key task of a machine learning classifier. The characteristic of an overfitting model is that the model obtains outstanding learning ability on the training set. However, the prediction error of unknown samples is too high. The overfitting model limits the detection performance on unknown data, so the performance is not reliable for FDIA detection. To check whether the CKS-FCS-FLGB model is overfitting, in the case of small-sample data, the learning curve of the model is as shown in Figure 16: In Figure 16, as the number of training samples increases, the training score and cross-validation score converge to a line gradually. If enough representative samples are used for training the CKS-FCS-FLGB model, then it can defect the unknown FDIA in the test set well, which means this model is reliable for FDIA detection.
To further verify the performance of the CKS-FCS-FLGB model in the case of a small-sample dataset, the performance of the CKS-FCS-FLGB model is compared to lightgbm classifier on each sub-dataset, the comparison results are shown in Figure 17.
As is shown in Figure 17, the total accuracy of the CKS-FCS-FLGB model is 96.78%, the average recall is 96.78%, the average precision is 96.70%, and the average VOLUME 8, 2020 F1-Score is 96.79% for FDIA detection. When a transient process occurs in the power system, the false alarm rate is only 2%, so the CKS-FCS-FLGB model will not cause a high false alarm of FDIA when the fluctuations happen in the power system. Further analysis, the CKS-FCS-FLGB model distinguishes the FDIAs and fluctuations effectively.

2) THE ANTI-NOISE PERFORMANCE OF CKS-FCS-FLGB MODEL
As the regulations by the IEEE C37.118 standard, the measurement error of PMU should be lower than 1% [32]. When the signal-to-noise ratio (SNR) of the data is 20dB, it means that the data contain 1% of noise. To verify the anti-noise performance of the CKS-FCS-FLGB model, we added Gaussian White Noise with the SNR of 40, 30, 20, and 15dB into the data, which is used to simulate the measurement error of the wide-area measurement system. The corresponding measurement errors to each SNR are 0.01%, 0.1%, 1%, and 3%. The anti-noise performance of the CKS-FCS-FLGB model is shown in Figure 18: In Figure 18, it can be deduced that as the noise intensity increases, the performance of the CKS-FCS-FLGB model does not decrease sharply. Through further analysis, the CKS-FCS-FLGB model is not sensitive to the impact of noise. The CKS-FCS-FLGB model has a stable FDIA detection performances when the measurement data contains serious noise.   model are 36.39%, 37.12%, 53.55%, and 55.98% higher than other classic machine learning algorithms respectively.
In summary, compared to other algorithms, the CKS-FCS-FLGB model obtains higher detection precision of different types of FDIAs, which is more appropriate to detect the FDIA against CPPS.

V. CONCLUSION
This paper researches the potential invasion pathway of the FDIA against CPPS, and a novel CKS-FCS-FLGB model is further designed to detect different types of FDIAs. We addressed the problem of a high false alarm rate of the model caused by imbalanced data. Then we proposed a C-KSmote oversampling method, which is used to obtain pseudo-data. The pseudo-data is stored into the pseudo-sample database that assists in training and evaluating the model, and updating the model in the future. We extracted the optimal feature set to characterize the behavior of the FDIA, it reduces the dimension of data and improves the training precision of the model. Based on the above work, the FLGB ensemble classifier is constructed to detect the FDIA, which concentrates more on improving the accuracy of difficult-classification samples, and further improves the detection precision of FDIA. We verified the performance of this model both on large-sample and small-sample dataset. The simulations results show that this model can detect different types of FDIA efficiently even on a small-sample dataset. This model also gets great antinoise performance. Especially, the model obtains great performance of distinguishing the transient process and FDIA, which improves the disturbance resisting ability of CPPS. In summary, the CKS-FCS-FLGB FDIA detection VOLUME 8, 2020 model is proposed, which improves the detection precision of FDIA and ensures the stable operation of CPPS.
JIE CAO received the Ph.D. degree in computer science and technology from Jilin University, Changchun, China, in 2017. She is currently an Associate Professor and also a Master's Tutor with the School of Computer Science, Northeast Electric Power University, Jilin. Her research interests include computer network, machine learning, and power grid stability and security.
DA WANG received the B.S. degree from the School of Electric Engineering, Northeast Electric Power University, Jilin, China, in 2014, where he is currently pursuing the M.S. degree with the School of Computer Engineering. His research interests include computer networks and power cyber-physical system security.
ZHAOYANG QU (Member, IEEE) received the Ph.D. degree in electrical engineering from China Northeast Electric Power University, in 2010, and the M.S. degree from the Dalian University of Technology, in 1988. He is currently a Professor and also a Doctoral Tutor with the School of Information Engineering, Northeast Electric Power University. He is also the Vice President of the Jilin Province Image and Graphics Society, the Head of the Power Big Data Intelligent Processing Engineering Technology Research Center, and the Jilin Governor Baishan Scholar. His interests include network technology, smart grid and power information processing, and virtual reality. He has published more than 46 articles in SCI/EI international conference proceedings and journals. He is a member of the China Electric Engineering Society Power Information Committee. He received the Second Prize of Jilin Province Science and Technology Progress Award. He is also a top-notch innovative talent in Jilin Province and a young and middle-aged professional and technical talent with outstanding contributions. He presided over the completion of two national natural science funds.