Model-Based Secure Load Frequency Control of Smart Grids Against Data Integrity Attack

In this paper, a cyber-physical digital signature creation method and a resilient load frequency control approach are proposed to improve the security of the existing SCADA protocols used in the load frequency control of interconnected power systems. In specific, the dynamic model of the power system is used to predict the future states of the system and the statistics of a number of future states are calculated and used as the dynamic hash to determine the integrity of the data communicated between remote telemetry units and the power system SCADA control center. The proposed algorithm has inherently the cyber-physical characteristics and enhanced robustness to collisions without adding a larger overhead to the message frame of the packets in current SCADA protocols. To compensate for the possible performance degradation and instability of the power system caused by corrupted data, a model-based resilient load frequency controller is designed for the smart grid. The maximum interval between two consecutive feedback updates in the model-based control scheme is obtained for securing the stability of the power system. The investigation of three types of integrity attacks on a two-area power system shows that the proposed model-based digital signature attack detection scheme is able to detect even a little inconsistency of the communicated data. By using the designed resilient load frequency controller, the stability and dynamic performance of the smart grid are guaranteed.


I. INTRODUCTION
There is a tendency that a variety of shared and public communication networks are increasingly implemented in modern industrial control systems (ICSs) [1], [2]. These public communication networks can greatly facilitate the aggregation and communication of system-wide control and measurement data involved in industrial control systems. However, they also increase the vulnerability of the ICSs to cyber attacks [3]- [5]. Recently reported Stuxnet attacks in ICSs, such as [6] and [7], show that the Stuxnet attack can even record previously transmitted data and then relay recorded data to the supervisory control and data The associate editor coordinating the review of this manuscript and approving it for publication was Haipeng Yao . acquisition (SCADA) center, fooling the operator to believe the attacked node is still normal.
The SCADA system of a smart grid remotely monitors and controls the whole power system operation. In terms of data transmitted in the SCADA system, the major security requirements include data confidentiality, availability and integrity [8], [9]. To compromise the data availability, denial of service (DoS) attacks can flood the power system SCADA center or a remote telemetry unit (RTU) by sending a vast of valid requests and saturate the CPU memory or bandwidth [10]. DoS attacks have been investigated in the advanced metering infrastructure (AMI) and phasor measurement units (PMUs) in smart grids [11], [12]. Besides these sensing systems in smart grids, load frequency control (LFC) which maintains the frequency and tie-line power, being a critical closed-loop control system in a smart grid, has also been analyzed when it is under DoS attacks, such as [13]. Due to the communication change caused by DoS, the switched system method could be used to analyze the impact of DoS on LFC and then design resilient control approach against DoS [14], [15].
In addition to DoS attack, a data integrity attack can also have severe impacts on the smart grid performance and stability as it misleads system operators into making incorrect decisions by inserting, altering measurement data or control commands. A false data attack on smart grid meters can bias power system estimation by injecting malicious data while not caught by the bad data detection unit in the SCADA system [16], [17]. The impact of min and max integrity attacks on LFC in smart grid has been studied in [18]. The results shown in [18] tell that the operation references for generators highly depend on the accuracy of transmitted data over smart grid communication channels and a secure communication channel is crucial to the reliable operation of power system. In [19], [20], the consequences of data integrity attacks on automatic generation control (AGC) have been quantified by using a risk assessment method.
In view of the possible severe influence resulted from data integrity attacks, advanced data integrity protection mechanisms have to be designed for the SCADA system of a smart grid. In information system security domain, digitalsignatures have been extensively used to protect the data integrity and detect integrity attack [21]. In a signature-based attack detection system, a digital signature is created by the sender with the encryption of a short summary of the message (called a hash code). When receiving the packet, the receiver decrypts the message to reproduce the hash code. If the two codes are equal, the integrity of the message is verified. Otherwise, an integrity attack is identified. Many existing SCADA systems of power systems have either no cyber security check functions or been using vulnerable security check protocols. For instance, a number of digital substations have been using Secure Hash Algorithm 1 (SHA-1) for digital signatures [22] and a number of DNP3 protocols used in distribution automation do not have cyber security check functions [23], [24]. It has been found that SHA-1 is vulnerable to hash collisions [25], [26]. Although several upto-date hash algorithms such as SHA-256 and SHA 384 have been suggested for the future SCADA systems in smart grid, a number of industrial control systems have not yet updated with these up-to-date hash methods at the current stage, such as communication-based train control systems [27] and substation automation in power system [22], [24]. Since many installed meters in RTUs of the SCADA systems have limited computing capabilities, they can only send data packets with limited sizes and transfer rates [24]. In addition, completely replacing this large volume of meters requires significant financial investments [28], [29]. Recently, a randomized hashing method is designed to enhance the security of hash functions used in digital signature applications by randomizing the signed messages in [30]. Being different from traditional information systems to be secured, the ICSs that the SCADA system monitors and controls are dynamic and of real-time style. In the literature, only few data integrity attack detection methods are created specially for ICSs, including the energy-efficient encryption method [31] and the physical watermarking method [32]. However, they are not designed either for digital signature applications or the smart grid SCADA system.
In this work, a model-based security and performance enhancement scheme is proposed for the load frequency control of a smart grid against the data integrity attack. The proposed scheme includes a model-based digital signature creation algorithm and a model-based resilient load frequency control approach. The proposed digital signature creation algorithm takes advantage of the dynamics of the power system to improve the robustness of the existing digital signatures used in the SCADA system of automatic generation control without adding a large overhead. Dynamic hashes are generated by calculating the statistics of a number of future system states based on the power system, to determine the integrity of the communicated data between remote telemetry units and the system SCADA center. Case studies on a two-area power system extensively used in LFC studies verify that the proposed model-based attack detection scheme is able to detect a little mismatch of the communicated data. The stability and dynamic performance of the smart grid are guaranteed by implementing the designed resilient load frequency controller.
The symbols used in this work are summarized in Table. 1. The remainder of this paper is organized as follows. In Section II, the networked control system structure of LFC is modeled. In Section III, the proposed model-based digital signature creation scheme and the model-based resilient load frequency control are described in detail. In Section IV, case studies of a two-area power system are conducted. Finally, Section V concludes the paper. Fig. 1 shows the closed-loop structure of LFC in the ith control area. It can be noticed that, in the feedback loop of LFC, the measurement signals are transmitted from a remote telemetry unit (RTU) to the controller over networks. This VOLUME 8, 2020 LFC system is a typical networked control system in which the control loops are closed via communication networks [33]. LFC in a power system is used to sustain the system nominal frequency (for example, 60 Hz in North American and China) and tie-line power by modifying power generation references to generators when loads change. A largescare power system usually consists of several control areas interconnected through tie-lines.

II. NETWORKED CONTROL SYSTEM MODEL OF LOAD FREQUENCY CONTROL
When representing generators in one control area by an equivalent single-machine-single-load (SMSL) system, the power system dynamics include the following five parts.
The turbine dynamic is where, P m i is the generator mechanical power deviation, P v i is turbine valve position deviation, and T ch i is the time constant of turbine i.

The governor dynamic is
where, f i is the frequency deviation of area i, P c i is the load reference set-point, T g i is the time constant of governor i, and R i is the speed droop coefficient. The overall load-generation dynamic is where, P i tie is the net tie-line power flow in area i, P L i is the load deviation, H i is the equivalent inertia constant of area i, and D i is the equivalent damping coefficient of area i.
The tie-line power flow dynamic is where T ij is the synchronizing power coefficient, and f j is the frequency deviation of area j. The system output is Based the above dynamics, the following equations show the state-space model for the ith control area in a power system with N control areas: The dynamics of the power system with N control areas are described as where As the linearized system models the dynamics of the power system around a steady state with small disturbances, the load disturbance P L is thus omitted [34]. Then, a more convenient state-space model can be written as The sampled discrete-time model is In reality, the exact model of a power system is usually nonlinear. The approximate linear power system model (9) is obtained by linearizing the power system around one given steady-state operating point and omitting higher order elements when the Taylor-series are conducted during the linearization. Tthere always exist modeling uncertainties when the linearization and discretization are conducted for nonlinear systems. The following linear system with modeling uncertainties is used to represent the exact model of the power system around the given steady-state operating point.
where, the model uncertainties A, B, and C are bounded by || A|| < γ , || B|| < and || C|| < ε while γ , , and ε are small positive constants. Due to the state of the power system is not fully measured, a state observer is incorporated into the RTU in this work, shown in Fig. 2. In the practice of wide-area power system measurement, a variety of measurement devices are installed with signal processing units. For example, a phasor measurement unit (PMU) is able to obtain the frequency and voltage signals [35]. The dynamics of the state observer in the RTU arex (n + 1) = Ax(n) + Bu(n) + L(y(n) − Cx(n)) (11) By using the estimated state from the state observer of the RTU, the observer-based output feedback controller for the LFC of the power system is

III. MODEL-BASED DIGITAL SIGNATURE AND RESILIENT CONTROL
The whole structure of the proposed model-based digital signature generation and resilient control scheme against data integrity attack is shown in Fig. 2. A model-based digital signature method is developed to protect the data integrity in the LFC. In this method, based the approximate power system model, a finite-horizon sequence of future states are generated. Then, the statistics of the sequence of future states are calculated, mainly including the mean and standard variation. The mean and standard variation are seen as the unique identity (being similar to hash value or message authentication code (MAC) used in cryptographic methods). By comparing means and standard deviations embedded in the sent package and calculated through the approximate power system model on the controller side, the integrity of the message is verified if they are equal. Otherwise, data integrity attack is detected and the untrusted package is discarded. Another important contribution of this work is that a model-based resilient load frequency controller is proposed to compensate the possible performance degradation caused by discarding corrupted packets. The main idea of the model-based control scheme is to estimate the control output based on the system model embedded in the controller and reset the estimation error as zero when the feedback is accessible. So, the critical part of the model-based scheme is to determine the maximum interval between two consecutive feedback updates for securing the stability of the power system.

A. MODEL-BASED DIGITAL SIGNATURE ALGORITHM AND DATA INTEGRITY DETECTION
To detect the data integrity attack, the model-based state predictions are calculated on both sides of the communication channel. On the RTU, a model-based digital signature generator is added. It uses the statex(n) as the initial condition. Then, the N future state predictions are calculated by the following dynamics: where η(n + i), ν(n + i) are future states and controls. The future state prediction sequence on the RTU side is denoted as X P S (n) = [η(n + 1), η(n + 2), · · · , η(n + N )]. Then, the mean µ P (X S (n)) and standard variance σ P (X S (n)) of X P S (n) are calculated to represent the statistic characteristics of the state prediction sequence. The two statistic characteristic parameters are then encrypted together with the state estimation to be sent via the advanced encryption standard (AES) technology. Let the current state estimation denote asx(n) and the digital signature is [µ P E (n), σ P E (n)]. The whole package is . On the controller side, after being decrypted, the received packet is denoted as The data integrity is checked by the model-based security check unit. Consideringx R (n) as the initial state estimation, the future states from (n + 1)th to (n + N )th time instant are calculated based on the following dynamics where ζ (n+i), υ(n+i) are future states and controls calculated on the controller side.
The future state prediction sequence on the controller side is denoted as X P R (n) = [ζ (n + 1), ζ (n + 2), · · · , ζ (n + N )]. Then, the mean µ P D (n) and variance σ P D (n) of X P R (n) are calculated to represent the statistic characteristics of the state prediction sequence on the controller side.
If the adversity decides to implement the integrity attack during this data transmission, the following possible types of data integrity attacks are considered in this work: i). Additive Attack: An extra part x AD will be added to thex S (n) of Pct S (n). Then, the received state at the controller side will bex R (n) =x S (n) +x AD (15) ii). Scaling Attack: A scaling parameter λ A will be multiplied with the transmitting packet. Then, the received state will bex iii). Replay Attack: A sequence of M transmitted packets x RD = [x(n − 1),x(n − 2), · · · ,x(n − M )] will be recorded by the attacker and then replace the current packet with any packetx(n−i), i ∈ {1, · · · , M } in the recorded sequence. The replay attacker will keep modifying the packet for a random period. Then, the received state will bē The security check of the received packet will depend on the mean and standard variance of the N state predictions by usingx R (n) as the initial condition. By testing the following hypothesis: where Th µ and Th σ are thresholds for errors of mean and standard variance between the data sent from RTU and data calculated on the control center. Then, the received packet is seen as untrustful and discarded when either of the two hypotheses fails.

B. MODEL-BASED RESILIENT LOAD FREQUENCY CONTROL
When attack is detected, the received packet is not trusted and discarded. Consequently, the system will run in an open-loop way if no additional actions are taken. In this section, a model-based resilient load frequency control is proposed. The model-based controller estimates the control output based on the system model embedded in the controller and reset the estimation error as zero when the feedback is updated. Therefore, the main task of the model-based scheme is to determine the maximum interval between two consecutive feedback updates for securing the stability of the power system. At the time instants {t 0 , t 1 , · · · , t N }, the measurement sent from the RTU is considered as successfully received and no attack detected by the controller. The time interval h i = t i − t i−1 , i ≥ 1 is time-varying due to the arbitrary attack action. During the time interval n ∈ [t k , t k+1 ), the dynamics of the power system is described as equation (9). The model-based controller is The state update for the controller is based on the following approximate system model As it can be observed from equation (19), the key part of this control method is that the estimate error is reset as zero at n = t k [36]. By finding the maximum time interval for resetting estimate error, the stability of the power system can be guaranteed. The asymptotical stability of the power system with the proposed model-based resilient control approach against data integrity attack is analyzed.
Theorem 1: The system (10) with the model-based resilient control (19) and (20)  Proof: During the interval n ∈ [t k , t k+1 ), the system response norm is where σ max ( * ) denotes the maximum singular value of the matrix * . This term is bounded as the maximum value of h k . For all h ∈ [h min , h max ], define k 1 = (σ max ( )) h . It is noted that the term || h k · · · h 1 || will be bounded if all the eigenvalues of h are inside the unit circle: where k 2 and α are positive constants. Then, for all h ∈ [h min , h max ], the system response has the following relationship According to Definition 1, this system is asymptotically stable. Remark 1: More conveniently, the following matrix inequality can be used to obtain the largest data integrity attack interval h max below which the asymptotical stability of the system (10) with the model-based resilient controller can be guaranteed: if there exists a positive-define symmetric matrix P, such that the following matrix inequality T P − P < 0 (34) holds, then the asymptotical stability of the system (10) holds.. where = h . For all the values of h i , the linear matrix inequality (LMI) (34) is solved repeatedly until the LMI (34) is not able to find a feasible solution. The, the corresponding value of h i is its maximum.

IV. CASE STUDIES
In this work, the two-area power system model shown in Fig. 3 is used to evaluate the impacts of various types of data integrity attacks. This two-area power system is extensively used in the LFC studies, such as [15] and [37]. Also, the effectiveness of the proposed model-based attack detection approach is verified by comparing statistics of the state predictions with and without data integrity attacks. Then, the stability of the proposed model-based resilient control for LFC is analyzed with respect to model errors and the maximum allowable time interval (MATI) h max . In the end, the performance of the model-based resilient control is evaluated through comparisons.

A. SYSTEM MODEL AND LFC WITH OBSERVER-BASED OUTPUT FEEDBACK
The parameters of the power system tested in [15] are used here. With the state defined as x = P tie f 1 P m 1 P v 1 f 2 P m 2 P v 2 T and u = P c 1 P c 2 T , the approximate sampled system (the sampling period is h = 0.05s) is described in the form of system model (9). For the system parameters and corresponding system matrices, readers can refer to [15]. The exact power system model is where, with an uncertain parameter γ , the modeling uncertainties are considered as A = γ A, B = γ B, C = γ C. In this power system, only parts of the state elements are directly measured by the RTU, including area frequencies and tie-line power ( f 1 , f 2 and P tie ). As this power system state is found observable by checking its observability matrix, a controller gain K and an observer gain L are originally designed to stabilize this power system.

B. IMPACTS OF DATA INTEGRITY ATTACKS
To study the impact of the data integrity attack on the LFC of the two-area power system, the three types of data integrity attacks modeled in Section. III are investigated, including additive attack, scaling attack and replay attack.

1) ADDITIVE ATTACK
For the additive attack, the following cases are tested:  Fig. 4 and Fig. 5. Compared to the normal situation, it can be seen from these dynamics of the power system that both tie-line power and frequency deviate significantly from their nominal values when it is under additive data attack. The deviation amplitudes of the tie-line and frequency dynamics become larger as the amplitude of the added sinusoidal signal is increased. The dashed lines are the thresholds of these dynamics usually predefined in practice. In this work, ±0.1 p.u. and ±0.2 p.u. are the thresholds considered for tieline power and frequency deviations. It can be noted from the results shown in Fig. 4 and Fig. 5 that these thresholds can be easily reached when the data integrity attacker injects extra misleading data. In addition, the results also reveal that the converge speed of the estimated state by the observer added in the RTU is severely affected.

2) SCALING ATTACK
When the scaling attack is investigated, the following cases are considered:     Fig. 6 and Fig. 7 give the dynamics of actual and estimated tie-line power change and the frequency deviation of the 1st area when the power system is under scaling attack. It can be observed from the results that both tie-line power and frequency deviate from their nominal values to a large extent when the scaling data attack is launched by the attacker. The deviation amplitudes of the tie-line and frequency dynamics is larger as the scaling parameter increases. However, compared to the additive attack, the deviation amplitudes of tie-line power and system frequency are relatively smaller.

3) REPLAY ATTACK
In this work, the following two cases are considered to understand the impact of the replay attack.
• Case 9: the LFC is running without any attack; • Case 10: a replay attack R A records the transmitted states for the intervals kT ∈ [0.25, 0.75) seconds and replays these states during the intervals kT ∈ [1.15, 3.35) seconds; Fig. 8 and Fig. 9 show the dynamics of actual and estimated tie-line power change and the system frequency deviations when replay attack is considered. Similarly, it can be concluded from these dynamics that the tie-line power and frequency of the two-area power system deviate noticeably from their nominal values when the attacker conducts replay attack. General Comments on LFC Data Integrity Attacks: In the practice of LFC in power systems, when the power system frequency deviates from the nominal value (for example, 60Hz in North Americas) by a specific threshold (for example, 1.5Hz [34]), under-frequency or over-frequency protection relays will be triggered and a predefined load shedding mechanism will be performed. The simulation results shown from Fig. 4 to Fig. 9 verify that, by modifying the data packet transmitted in the network between the LFC control center and the RTU, the attacker can mislead the decision made by the control center and result in malfunctions of the power system, such as triggering frequency protection replays which will result in unnecessary load shedding and severely affect the normal operation of both industries and human beings.  Similarly, when under data integrity attack, if the tie-line power change is over or under a specific limit [18], tie-line protection relays will also be activated and cause blackouts of the related areas.

C. MODEL-BASED DATA INTEGRITY ATTACK DETECTION
To evaluate the effectiveness of the model-based attack detection, the statistics of the predictions under a variety of data integrity attacks for the future N time instants are calculated and compared with the statistics of the security check unit on the controller. In this work, the statistics of the predictions mainly include the mean µ and standard variance σ of the N predictions. For the consistency of the case studies, the above three data integrity attacks are considered to be launched at t = 0.25 second or 5th time step (the sampling period T = 0.05 second). Given N = 6, the mean µ P tie and standard variance σ P tie of tie-line power of the two-area power system under attack are calculated by the controller and compared with the case without attack (µ E and σ E are the encrypted mean and standard variance of the states predicted by the RTU). The results are shown in Table. 2. Given N = 30, the mean µ P Tie and standard variance σ P Tie of tie-line power under attack are calculated by the controller and compared with the case without attack (µ E and σ E calculated by RTU). The results are shown in Table. 3. Based on the hypothesis (18), the corruption of data can be identified by testing the hypothesis with the predefined thresholds Th µ P tie  and standard variance and Th σ P tie . For example, based on Table. 2, we could define the thresholds for mean and standard variance as Th µ P tie = 0.001 and Th σ P tie = 0.0001. The difference between encrypted statistic parameters (including mean and standard variation) and those calculated based on the corrupted data is more obvious as the length of the state predictions increases as indicated by Table. 3. we could define even larger thresholds for mean and standard variance as Th µ P tie = 0.01 and Th σ P tie = 0.001. The reason is that the dynamics of the power system are not known to the attacker although the transmitted data can be modified by the data integrity attacker. This illustrates the dynamics of the power system from control system aspect of understanding paves a promising way for securing the power system.

D. MODEL-BASED RESILIENT CONTROL
Once the data integrity attack on the packet is detected at the controller side, the packet itself will be discarded. Consequently, the LFC will run in the open-loop mode. The model-based resilient control method shown in equations (19) and (20) is designed for the two-area power system. By using the Theorem 1, the maximum transfer interval h max within which the two-area power system can be stable under the model-based resilient control is calculated with respect to difference extent of model uncertainties. With the sampling period T = 0.05 second, the relationship between h max and the modeling uncertainty parameter γ is shown in Table. 3. As shown in Table 3, the maximum allowable transfer interval decreases as the model uncertainty grows. The reason behind this is that the error of the estimated state based on the approximate model is highly determined by the model uncertainty.
When the model uncertainty condition is g = 0.03 as it was in the above studies, the performance of the proposed model-based resilient control is evaluated by comparing the following cases: the LFC is running without any attack; data integrity attack launched for the intervals kT ∈ [0.25, 0.75) seconds and kT ∈ [1.15, 3.35) seconds are detected and packets are discarded; when corrupted packets are discard, the proposed model-based resilient control is used. The results are shown in Fig. 10 and Fig. 11. It can be noted that the system could be unstable if discarding the attacked packets for a long time interval. With the designed model-based control method, the performance and stability of the power system have been significantly enhanced.

V. CONCLUSION AND FUTURE WORK
In this work, a model-based digital signature creation method is studied for data integrity attack detection in the load frequency control of a smart grid. The proposed model-based detection method uses the statistics of the future power system dynamics to sign and verify the message with the dynamic hashes. Moreover, in order to compensate for the possible degradation and even instability of the power system caused by the data integrity attack, a model-based resilient controller is designed for the LFC. Case studies of a two-area power system under three types of integrity attacks show that data integrity attacks are able to cause severe malfunction of the power system by injecting additional data. The results also verify that the capability of the model-based digital signature attack detection scheme to identify a small modification of the communicated data. The proposed model-based resilient LFC is also efficient in compensating for the performance degradation of the smart grid.
In the current work, the public key encryption is considered. In future, the key encryption algorithms that could be more robust against the man-in-the-middle (MITM) attacks will be interesting and important to investigate. Furthermore, co-simulation platforms will be developed for cyber-physical power systems.