A Bi-Level Differential Game-Based Load Frequency Control With Cyber-Physical Security

Load frequency control (LFC) is used in power systems to prevent frequency fluctuations caused by load disturbances and maintain power supply reliability. LFC utilizes communication channels to generate control signals, thus it is potentially vulnerable to cyber-attacks and faults. This work considers a cyber-physical model for LFC in which the adversary compromises the resources of the cyber layer to inject a stealthy false data injection attack (FDIA) vector. The FDIA injects the best-effort stealthy error into the data collected by the LFC, corrupting the control center’s calculations and leading to incorrect control signals. To effectively manage this complex decision-making scenario, a game theory-based framework is established to analyze the interaction between the controller and the attacker. Based on the model, an FDIA defense mechanism based on a bi-level differential game is proposed. The experiments conducted on a three-region interconnected power system based on the IEEE 39-bus system demonstrate that the proposed strategy can effectively maintain the stability of the frequency and inter-regional power deviation within acceptable limits, even in the presence of FDIAs.

A Bi-Level Differential Game-Based Load Frequency Control With Cyber-Physical Security Saptarshi Ghosh , Member, IEEE, and Charalambos Konstantinou , Senior Member, IEEE Abstract-Load frequency control (LFC) is used in power systems to prevent frequency fluctuations caused by load disturbances and maintain power supply reliability.LFC utilizes communication channels to generate control signals, thus it is potentially vulnerable to cyber-attacks and faults.This work considers a cyber-physical model for LFC in which the adversary compromises the resources of the cyber layer to inject a stealthy false data injection attack (FDIA) vector.The FDIA injects the best-effort stealthy error into the data collected by the LFC, corrupting the control center's calculations and leading to incorrect control signals.To effectively manage this complex decision-making scenario, a game theory-based framework is established to analyze the interaction between the controller and the attacker.Based on the model, an FDIA defense mechanism based on a bi-level differential game is proposed.The experiments conducted on a three-region interconnected power system based on the IEEE 39-bus system demonstrate that the proposed strategy can effectively maintain the stability of the frequency and inter-regional power deviation within acceptable limits, even in the presence of FDIAs.
Index Terms-Load frequency control, cyber-physical modeling, game theory, false data injection attacks.

I. INTRODUCTION
L OAD frequency control (LFC) is used in power systems to maintain the balance between generation and load.LFC uses feedback control loops based on the deviation of the frequency from its nominal value to control the generated output.The design of LFC is a trade-off between stability and dynamic response, with the aim of ensuring a quick response to changes in demand while also maintaining system stability [1].The objective of LFC is to ensure zero steadystate error for frequency deviations and minimize unscheduled tie-line power flows between neighbouring control areas.This is achieved through effectively following changes in load demands and disturbances, resulting in limited overshoot and rapid stabilization of frequency and tie-line power deviations.The maximum deviation for LFC depends on the system's characteristics and requirements, such as size, complexity, load demand patterns, available generation capacity, and interconnections.Typically, it is limited to a small range around the nominal frequency, like 49.5-50.5Hz for a 50 Hz nominal frequency.
LFC typically uses communication signals to transmit information (between the various control centers, generation units, and loads in the system) such as the deviation of the frequency from its nominal value, the power generation output, and the load demand.These signals can be transmitted using various technologies such SCADA systems.As a result, LFC signals can be vulnerable to various cyberattacks, such as deception attacks and denial-of-service (DoS) attacks.In deception attack or false data injection attacks (FDIAs), the attacker injects data, which can then result in incorrect control actions and potentially cause significant system problems [2], [3].In the case of LFC, FDIAs can result in incorrect frequency control actions and potentially lead to power system instability or blackouts [4].The effect of DoS attacks on the LFC has been studied in [5].Although the work in [5] exploits the cyber layer vulnerabilities by considering the effect of frequency and duration of DoS attacks on the LFC, more potent attacks can be designed by exploiting the vulnerabilities of the physical layer.In this work, both the cyber and physical layers are exploited by the attacker to design the attack vector with the aim of causing maximum damage to the physical layer without getting detected.
LFC of a multiple control area power system consists of multiple decision-makers.Consequently, the coupling among the controllers of these control areas are governed by different decision-makers.This introduces complications in the construction of control strategies, besides the evolution dynamics serving as the dynamical constraints.Meanwhile, power systems operate in the presence of disturbances [6].Disturbances caused by the load variation from their forecasted value have been considered in [7].However, unlike the existing literature, the effect of the disturbances has been taken into consideration during the design procedure of control and attack strategies.
In the existing literature, the two main attack objectives concerning LFC target the frequency and tie-line interchange power measurements.In [8], the authors modeled the interaction between the attacker and the defender, considering a time-independent attack on the tie-line power measurements.c 2024 The Authors.This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The work in [8] concluded that frequency stability is more vulnerable to cyber-attacks compared to power imbalances, as falsification of frequency measurements can be easily detected through comparisons with normal readings.Hence, only cyber-attacks on tie-line interchange measurements by an attacker are considered.The LFC commands are also vulnerable to FDIA [9], [10] while they are distributed to the generating units over the cyber layer, with the aim of reducing power imbalances and ensuring frequency stability.In [9], the authors examined a two-area power system and evaluated its performance in the scenario where an adversary has gained unauthorized access to the automatic generation control (AGC) signal of one of the two areas and he/she optimizes his/her plan for the worst attack pattern.The attacker interrupts the correct control signals and injects false data to steer the system to unstable frequency deviation values.The work aimed to identify the worst attack pattern by evaluating its potential effect on the two-area system.An optimal controller that offers useful feedback gain is the linear-quadratic-Gaussian (LQG) control [11].LQG controller has the ability to minimize the cost when it is used for frequency stabilization.It can provide stable performance under system noise and uncertainty.The authors in [10] analyzed an infinite horizon LQG system, in which the control inputs transmitted over cyber links are vulnerable to manipulation and FDIAs.On the contrary, in [12], the authors investigated the LQG control problem under jamming attacks on the signals from the controller to the plants, i.e., measurement data.A linear quadratic regulator (LQR) is similar to LQG, but the performance of the former deviates in the presence of system noise [13].This drawback of LQR problems is addressed in [14], [15] using risk-aware estimation and control.In addition to system noises, the works in [10] and [12] concentrated on devising control mechanisms under worst-case FDIA.Therefore, it is evident from the aforementioned research that most of the attention has been given to the secure estimation and control problem of LFC in the physical layer, either from the defender's or attacker's perspective.On the other hand, there is literature like [16], which proposes a dynamic modeling framework for a closedloop system that is capable of intrusion detection in the cyber layer.The authors of [17] have investigated the issue of stable operation of a cyber-physical system under multiple DoS attackers from the perspective of both the attacker and the controller.In the cyber layer, multiple DoS attackers cooperate with each other to compromise an optimal number of measurements, while the controller and the attacker compete with each other in the physical layer.However, the interaction between the attacker and the defender in the cyber layer is static in nature.In practice, the attack on the cyber layer is a dynamic process [16].To the best of the knowledge of the authors, the effect of the dynamics of intrusion, infection, and recovery of the cyber layer nodes has not been considered when investigating FDIAs on the LFC.Due to its overwhelming advantage in the analysis of the interaction among multiple decision-makers involved in the decision-making process, game theory has gained more and more attention in the existing literature on LFC.Several game theory models, namely Stackelberg [18], evolutionary [19], stochastic [20], and Markov game [21] models, have been used in literature to ensure robust LFC.The authors in [22], [23] have used zero-sum differential game model to design robust optimal control schemes under stochastic uncertainty.These works have mainly considered uncertainties due to load forecasting errors.In previous studies, various differential game models, namely cooperative [24], and non-cooperative [22], [23], [25], [26], have been applied for LFC.In non-cooperative differential games, the control signals are generated locally in each control area.Hence, non-cooperative differential games are more suitable than cooperative differential games for systems that are vulnerable to FDIA on control signals.It is evident from the existing literature that the effect of an FDIA on the equilibrium solution of differential game-based LFC is still an open problem.Further, none of the aforementioned works considered the effect of the cyber layer.The issue of security breaches in the cyber layer and its effect on the physical process is addressed in [17], [27] using game-based analysis.However, the cyber game and the physical game in the aforementioned works are played in different time scales.Although the cyber and physical layer resources are closely related, their treatment in the aforementioned works is different.It is important to note that both the attacker and the defender face resource limitations in both cyber and physical layers, as the defender has to protect against malicious actions while the attacker has to conserve energy.Therefore, there is a need to solve this problem in a comprehensive framework that involves both the defender and the adversary.The dynamic nature of launching FDIAs on the LFC by compromising the resources in the cyber layer is still an open problem.
This article addresses the research gaps highlighted above by considering a bi-level differential game between an attacker targeting LFC and a defender.The payoff of the attacker and defender considers the cost due to deviation of the state variables, implementation of control signals, and cumulative expected predictive variance.The aim of the defender/attacker is to minimize/maximize the aforementioned cost.In order to launch FDIA on the LFC, the attacker needs to gain access to the cyber layer nodes that are collecting and transmitting the measurements.The interaction between the attacker and the defender in the cyber layer is also modeled as a differential game.This constitutes an interconnected bi-level differential game framework that models the effect of the cyber layer vulnerabilities on the LFC.The novel contributions of this work are as follows: (1) By modeling the LFC system as a cyber-physical system, a novel FDIA model is proposed that considers both the characteristics of the LFC system and the vulnerabilities of the cyber layer used for transmitting measurements.The closed-form expression for the cyber layer vulnerabilities derived in this work reveals the exact dependence of the cyber layer resource allocation on the dynamic control strategies of the physical layer.(2) A bi-level non-cooperative differential game is proposed to solve the interaction model between the controller and the attacker in both the cyber and physical layers.By locally generating the control signals, the proposed Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.game-based approach reduces the risk of FDIA on control signals.In the physical game, the robust feedback Nash equilibrium is constructed by considering the probability of successful FDIA and load forecasting errors simultaneously.(3) The design of a bi-level differential game-based LFC for a power system whose evolution is controlled by chance-constrained FDIA is solved, and its performance is evaluated for the first time in this work.The quadratic equivalent of the chance constraint for FDIA, derived in Lemma 2, reveals a novel method of designing attacks with limited system information that can cause maximum damage to differential game-based LFC in terms of the control cost.The rest of the paper is as follows.Section II presents the system and threat model of the examined problem.Section III introduces the bi-level game framework followed by the solution approach.In Section IV, results are presented, and Section V concludes the work.

II. SYSTEM AND THREAT MODEL
In a multi-area power system, there are typically k control areas, denoted as CA i , where i = 1, . . ., k and k ∈ Z + , and each area has m ∈ Z + generators.The stability of the system frequency is crucial and primarily dependent on maintaining a balance between the load and the active power output of the generators.To mitigate any imbalance, the active power output of generators must be adjusted in real-time to align with the load.Failure to do so will cause a change in generator speed, leading to a change in the system frequency, which can impact the frequency of not only the current area but also its interconnected areas.In this regard, in such an interconnected system, generators are supported with LFC. 1 A simplified twoarea system with LFC is presented in Fig. 1.
LFC objective is to maintain frequency stability within each control area and regulate the power exchange between areas.The LFC of area i is characterized by the frequency measurement of generator j in area i, denoted as f i,j , where j = 1, . . ., m, and the interchange of power of tie-line s, 1 Generators are also equipped with an automatic voltage regulator (AVR).The time constant of AVR is faster than LFC, allowing for rapid transient damping.Thus, LFC and AVR control loops can be analyzed independently.
P tie,is .In a multi-area LFC scheme, all the generators in each control area are represented as an equivalent generation unit whose frequency measurement is denoted by f i .The aim of LFC is achieved by incorporating the area control error (ACE) collected from distributed sensors into the frequency feedback loop.ACE is a linear combination of the frequency deviation of a given control area i (CA i ), f i , and the deviation of tieline power, P tie,ij , between that area and other ∀j = i: where P tie is the deviation of tie-line power and β i is frequency bias factor.The requested deviation of the generator output of CA i , P ci , is obtained from a PI controller with ACE i as input.The control command to CA i , u i , is the request to adjust the speed, obtained by differentiating P ci .
The data transmission rate measured by grid metering devices, such as phasor measurement units (PMUs) is usually 30 − 120 samples per second.Thus, a discrete system state space response model is utilized to represent the system state space.The state vector is considered as follows: The control signal sent by the LFC center to the CA i to change the generator output is denoted by u i (t).The perturbations of loads and intermittent energy output of CA i obtained by forecasting are denoted as P di .The control signals and the perturbations are compactly represented as: The output signal is: In this work, we consider that false data can be injected in the frequency and power measurements2 by compromising the meters in the cyber layer that aggregates the sensor data [2].Consequently the erroneous f i , f j , and P tie,ij results in the attack vector a(t).In [28], the designed attack considers two sets of measurements, one that can be compromised and another that cannot.The division of the measurements into the above sets is predetermined.In [29], the authors considered a random model for the DoS attacks with a constraint on its duration.However, the DoS attack in the cyber layer is independent of the physical layer parameters.This work addresses resource allocation in the cyber layer, considering its effect on the LFC.Fig. 2 denotes the dependence between the cyber and physical layer nodes considered in this work.η m (t) denotes whether y m (t) is corrupted or not: For the considered system: where C m denotes the m th row of: The compromise, FDIA, and detection of these meters in the cyber layer from such malicious actions are considered periodic activities.The set of cyber layer resources available for accomplishing the above activities is denoted as R a for the attacker, and R d for the defender.The resources can be financial, communication bandwidth, computational power and memory, etc. [30], [31].Such resources are continuous and limited for both [31], [32].Hence, the attacker and the defender must distribute their resources across a limited number of measurements M ∈ R simultaneously.Let this be denoted as r d = (r d1 , . . ., r dM ) and r a = (r a1 , . . ., r aM ).It is obvious that the probability of a successful attack or defence, depends on the allocated resources in the cyber layer.The first step in recovering the compromised nodes is detecting the FDIA.The LFC scheme raises the alarm if deviations of the states are beyond predetermined threshold values.In this regard, the residual vector of CA i after implementing the FDIA at time t + 1 is calculated as: In case the Euclidean norm of the residual vector in (9) satisfies ||r a (t)|| 2 > τ, the data is considered under attack; otherwise the data is normal.For the attack vector to be stealthy, the vector should satisfy the equation a = ηCHc following the DC power flow model, where H is a measurement Jacobian matrix and c denotes the error introduced by the attacker.In literature, works such as [28], consider a fixed H to design the attack vector.In this paper, we design the attack vector a in the physical layer based on a probabilistic H obtained from the cyber layer game between the attacker and the defender.
The dynamics of the proposed LFC can be compactly represented as: where y a and y, as well as r a and r, represent the measurements and residue with and without attack, respectively.
The definitions of D i , M i , T ti , T gi , σ i , and X tie can be found in [1].In differential game-based control schemes, solving the equilibrium solution generates the control command to each CA i .Linear quadratic differential games (LQDGs)-based LFC of interconnected power systems has been considered using the non-cooperative game approach in [26].In [24], the authors used a cooperative game approach to find LQDG.The CAs interact among themselves based on the considered game model.In this work, non-cooperative differential gamebased LFC is considered since enforcing the solution of the cooperative game requires the CAs to follow assigned control commands even when the loads or the energy outputs deviate from the forecasted values.The attacker can also cause output deviation using FDIA.
In this work, the attacker decides the a(t) such that the cost incurred by the CAs, J i , can be maximized while avoiding detection.To minimize the attack costs, the attacker prioritizes state variables with fewer non-zero elements in Eq. (10).To achieve this, the attacker uses a two-stage plan.First, the attacker targets the sensors by compromising the security solution of the defender.Due to resource constraints, attackers can only compromise a limited number of meters in the cyber layer.Next, using the compromised sensors, the attacker manipulates the sensor data through an FDIA to increase the state error in LFC.
Remark 1: The final aim of the attacker is to inject false data in the measurements with the aim of maximizing the control cost of the controller.However, to insert the false data, the attacker needs to have access to the network over which the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
measurement data is being transmitted.Hence, the attacker must gain access to the cyber layer resources before injecting false data.This is the reason physical layer decisions come after the network layer.

III. BI-LEVEL DIFFERENTIAL GAME FRAMEWORK
An overview of the problem formulated in this section is summarized in Fig. 2. It can be observed from the figure that the measurements of the CAs are collected by the sensors and exchanged over the network.The controller and the attacker interact in the cyber layer to allocate the available resources to secure and attack the measurements of the CAs, respectively.The objective of the controller/attacker is to secure/compromise the measurements.This interaction is modeled as a non-cooperative differential game (highlighted in Fig. 2 with a blue outline) that results in an equilibrium rate at which the cyber layer nodes are successfully compromised/defended. Next, using the cyber layer resources, the measurements of the CAs are transmitted in the presence of an attacker to estimate the state of the power system and generate control signals.The attacker injects false data into the measurements using the compromised cyber layer nodes while remaining undetected.The objective of the CAs is to generate control signals to minimize the deviations of the state variables from their desired set points while minimizing the cost of implementing control signals.On the contrary, the attacker aims to maximize the deviations of the state variables.This interaction between the CAs and the attacker in the physical layer is modeled as a non-cooperative differential game (highlighted using a red outline in Fig. 2).The non-cooperative differential game-based decision-making in the cyber and the physical layers are interconnected by the probability that the cyber layer nodes are compromised, resulting in the proposed bi-level differential game model as depicted using the black dotted line in Fig. 2.

A. Controller Design Using LQDG
The cost incurred by the LFC for the deviations of the state variables from their desired set values and implementing control signals in a multiple control area power system is denoted by J i .FDIAs on the measurements and errors in load forecasting can trigger deviations of the state variables.The mean cost incurred by the i th CA is given as: where Q i is a positive definite weight matrix that determines the penalty associated with the deviation in ACE i and ACE i which in turn depends on frequency and tie-line power deviations.The control costs of CA i is represented by a positive definite weight matrix R i .CA i specifies the functioning of its LFC by setting Q i and R i .Further, it can be noted from ( 15) that the cost incurred by CA i depends on the control signals (u −i ) and control cost (R −i ) of the other CAs.The cumulative expected predictive variance of the state cost, i.e., y T (t)Q i y(t), is used as the risk measure. Var where the σ -algebra generated by all the observations till t − 1 is denoted by F t−1 .The predictive variance in (16) incorporates information about the tail and skewness of the penalty.( 16) enables the LFC controller of the CAs to take higher order statistics of the disturbance into account, mitigating the effect of inadequately designed FDIA and rare though large noise values.Further, the constraint (9) on the design of the attack vector act as path constraint, i.e., the constraint applies at intermediate points or over the whole path.The path constraint in ( 9) is also a global constraint that applies to all the CAs that are finding their control signals by solving the optimization problem in ( 17)- (20).Due to the uncertainty associated with the attack-defence process and the process noise, the performance metric is considered to minimize both expected cost and the risk.This work aims to obtain u i (t), ∀i ∈ K, that minimizes the operating cost of the respective control areas for an optimized a(t) that maximizes the operating cost.The attacker aims to increase the cost of the CAs by remaining undetected by designing attacks that satisfy the residual in ( 9) below the threshold of τ .The operating cost can be increased by maximizing the payoff function in ( 15) and ( 16) by choosing an appropriate attack a(t).Hence, the following optimization problem is to be solved by CA i : The expectation operation in (17), (18), and ( 20) is over η, P d .0 ≤ κ i ≤ 1 denotes the trade-off between the risk and the average loss.Parameter κ i ∈ R shows the player's attention to risk.Note that if κ i = 0, it is understood as CA i is risk neutral.If κ i > 0, then CA i is risk loving, and if κ i < 0, then CA i is risk averse.Next, for a risk-averse CA, we find the quadratic representation of predictive variance of the state cost in (17).
Lemma 1: The equivalent representation of the risk measure in (17) as a quadratic function is: The proof of Lemma 1 is discussed in Appendix.For the stationary operation of the proposed LFC, the residual and the attack vector should fall outside of the constraint set ||r(t) + a(t) − η j (t)CH|| 2 ≤ τ with a probability level of at most ∈ (0, 1).The chance constraint in ( 20) is a function of η whose distribution will be discussed in the next subsection.( 21) can be modified as S j=1 Pr(η j )χ j ≥ where χ j denotes a characteristic functions which equals 1 if ||r(t) + a(t) − η i CH|| 2 ≤ τ and 0 otherwise.Thereby introducing an artificial binary variable ζ ∈ {0, 1} S to deal with χ j s makes ( 17)-( 20) a mixed-integer non-linear optimization problem.The difficulty of solving such problems is addressed by relaxing the ζ j ∈ {0, 1} into ζ j ∈ [0, 1].Hence the equivalent representation of ( 20) is given in the following lemma.
Lemma 2: The resulting equivalent representation of the chance constraint in (20) is denoted as: Lemma 1 and 2 are critical because they show that the risk and chance constraints can be represented as a quadratic function of the state, control inputs, and attack vector.The optimization problem in ( 17)-( 20) can be solved by considering the variational Lagrangian in the following lemma.Lemma 3: The objective function (17), constraints in ( 18)-( 20) are the functions of the state variables, outputs, control, and attack inputs.The compact representation of the resulting variational Lagrange function for CA i is denoted as: where g T (y(T), u(T), a(T), It is evident from (27) that the stage cost matrix The derived stage cost suggests that the control gain becomes more stringent in directions that are simultaneously more costly and prone to noise denoted by the covariance W.
The state variables are also determined by the control signal and attack vector, which are yet to be derived.Further, the attacks that result in large residuals cause more damage to the CAs in terms of cost.However, these attacks are restricted by the attack detection probability and the defence measures.Consequently, duality theory can be applied.The dual function is defined as: Theorem 1: Consider the Lagrange function in ( 25) with ( 18) and ( 19) as the constraints.Utilizing Bellman's principle of optimality, we obtain the coupled first-order conditions (94) and (95) that are satisfied by the solution.The solution to the dual problem of the optimal control problem in the presence of error and the attacker is found to be: Thereby, solving the conditions mentioned above, the optimal solution in linear feedback form can be written as: where The outline of the proof of Theorem 1 is discussed in Appendix.The optimal controller and the attack vector in (36) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
and (37) are affine with respect to the state.The state-feedback terms in (36) and (37) account for the internal dynamics of the physical system (39), the state of the cyber layer nodes (40) and the interaction among the CAs (38).The optimal primal solution to the solution of dual problem is: where the sufficient optimality conditions are given as follows.
2) The optimal Lagrange multipliers can be defined as: The control and attack policy in (36) and (37) are optimal for the primal problem in ( 17)-( 20) when μ * i is finite.The outline of the proof of Lemma 4 is discussed in the Appendix.The stability of the proposed differential game based is discussed in the following lemma.
Lemma 5: Let us consider the optimal control signal u * ({μ i }, ξ), derived in (36), for a given μ i ≥ 0, ξ ≥ 0. P(t) converges exponentially to the unique stabilizing solution of the following algebraic Riccati equation as T → ∞: Consequently, the following conditions, (54)-( 64), as shown at the bottom of the page, are true for every t ≥ 0 as T → ∞.
The conditions in (54)-(64) converge exponentially fast, and the closed-loop matrix The outline of the proof for Lemma 5 is given in Appendix.
Remark 2: The mean cost incurred by the controller satisfies the following property when Nash equilibrium-based control Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
and attack strategies are followed: The above equation denotes that the attacker/defender cannot maximize/minimize the cost by unilaterally deviating from the Nash equilibrium solution.Hence, Nash equilibrium provides the optimal solution in competitive situations.The control signals based on Nash equilibrium consist of a set of feedback gains K u .These gains are not related to the initial state y(t 0 ) and the forecasted value of P d (t) but are determined only by the physical structure of the LFC system, which explains that the Nash equilibrium-based control signal is strongly time consistent.As a result, for a given attack strategy, the CAs would not violate the control signals generated locally, even if the loads deviate from the forecasted values.
Remark 3: The attack vector in linear feedback form (37), which brings the most increase in the expected cost corresponding to the Nash equilibrium control strategies (36), is the worst case that the i th CA may face.However, in most scenarios, CA i may incur accost lesser than the worst-case scenario.Due to the different levels of risk sensitivity defined by Q i , the worst-case attack vectors are calculated by each player before calculating their control strategies.Consequently, the actual attack calculated by each CA following (37) will differ for each of the CAs.Further, during the evolution of system states, the actual attack vector perturbing the dynamics will not be the same as any worst-case attack vectors (37), which lessens the expected cost of all the CAs.

B. Design of the Game Model in the Cyber Layer
The control input and the attack vector, derived in (36) and (37), are due to the application of linear static feedback.For K l , l = {u, a} derived in (36) and (37), an entry K i,j l = 0, denotes that the i th control and the attack input is related to the j th state variable.Further, it is evident that the u i (t) and a i (t) for CA i may be dependent on the state variables of the other CAs.The state variables are estimated from the outputs delivered over the cyber infrastructure.Hence, before proceeding further, K l is reorganized so that the states and the control and attack inputs are organized according to their physical locations.The resulting matrix is K * l ∈ R n×n , in which each block K ij l represents feedback of the states of CA i to the control inputs of CA j , with i = j corresponding to local feedback and i = j represents control and attack inputs are dependent on data from the cyber layer.As there are k CAs, each CA i controls n i nodes in the physical system.

Control Area k =⇒ { f k , P tie,kj }
Based on these partitions, the LFC dynamics can be rewritten: [a 1 (t) . . .
u * i (t) and a * i (t) in ( 66) and (67) denote the relationship between the control signal and FDIA for the CA i and the measurements from various nodes.Let z i (t) be the probability that node i is compromised.The relation between the probability of i th measurement Pr(η i (t)) and probability of j th node z j (t) is given as: The attacker's objective is to increase the economic value of the attack by attacking the optimal set of measurements in the network using the compromised nodes.The defender will aim to reduce the economic value of the attack by defending the optimal set of measurements.Let us consider that the controller uses r di resources out of |R d | for defending measurement i.The controller also incurs a cost for deploying r di resources.Considering that the attacker uses r ai resources out of |R a | for measurement i, the cost of compromising the measurements in the network is defined as r ai .Let ρ a be the probability that the attacker chooses to compromise a measurement, i.e., r ai > 0, where 0 ≤ ρ a ≤ 1.Similarly, ρ d denotes the probability with which the defender chooses to defend the measurements, i.e., r di > 0, where 0 ≤ ρ d ≤ 1.
The transition of a measurement from a normal mode (η i (t) = 0) to compromised mode (η i (t) = 1) at a rate of r ai ρ a r ai ρ d r di , where r ai r di denotes attack/defense strength and ρ a ρ d denotes the attack/defence probability.The rate of a measurement becoming compromised increases with ρ a r ai and decreases with ρ d r di .If the measurement is successfully compromised, the controller can recover the measurement only if the attack is detected.Hence, the compromised measurement becomes secure at a rate of Pr(||r a || 2 > τ), which denotes the detection probability.This is presented graphically in Fig. 3. Thereby, from the basic understanding of differential dynamic systems [33], the evolution of the state of measurement i over time can be denoted by the following differential equation: Based on the above definitions, the payoff matrices of the attacker and defender are given in Table I, using the reasoning: • In the case both the attacker and the defender allocate a non-zero amount of resources with probability, the node is successfully defended (η i (t) = 0) with probability z i (t).
The probability of successful compromise Further, the attacker and the defender also incur the cost of using the resources.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I PAYOFF MATRIX OF THE ATTACKER AND DEFENDER DUE TO i th MEASUREMENT
• In case the attacker attacks a node (r ai > 0) that is not defended (r di = 0), the error is introduced in the system with absolute probability.The cost of using resources is only incurred by the attacker.• In case the defender defends a node (r di > 0) that is not attacked (r ai = 0), the error is not introduced in the system.The cost of using resources is only incurred by the defender.Based on the utilities of the players shown in Tables I, the average payoff functions for defending and attacking measurement i at time t are derived as: If played repeatedly over time, the overall utility of the attacker and defender is obtained by aggregating the utility functions mentioned above over time.The optimization problem for the network controller is to minimize the net cost incurred by the interconnected power system, whereas the attacker aims to maximize the cost.Hence, the optimization problem of the network controller and the attacker is: min Theorem 2: Considering the optimal control by CAs in the physical layer, the Hamiltonian function of ( 71) and ( 72) can be expressed as: where q = {d, a}.A set of controls {r * di (t), r * ai (t)} constitutes an equilibrium to the problem in ( 71) and (72), and z * i (t) is the corresponding state trajectory: Based on ( 74) and ( 75), the optimal cyber layer attack/defence resource allocation by the attacker/defender can be obtained in the form of ai (t)/ di (t).For each CA and attacker equilibrium strategy, the evolution of the co-state is given as: The evolution of the co-sate of the dynamics in the physical layer is found to be: which is dependent on the deviations of the state parameters after applying the control actions in presence of FDIA.The closed-form expression of the Lagrange multipliers corresponding to (71) and (72) are: It is evident from ( 76) and (77) that the evolution of the costates of the cyber layer dynamics is dependent on the actual value of the co-state of the dynamics in the physical layer.The optimal resource allocation by the defender (74) increases with increasing co-state value.This means that as the rate at which the control cost changes with the node infection rate, increases, the defender allocates more cyber layer resources to minimize the fluctuations.The optimal resource allocation by the attacker (75) decreases with increasing co-state value.This means that as the rate at which the control cost changes with the node infection rate decreases, the attacker allocates more cyber layer resources to maximize the fluctuations.
Remark 4: In practice, ρ a is decided by the attacker, which will be unknown to the controller.Similarly, ρ d will be unknown to the attacker.However, the probability distribution of these unknown parameters can be estimated by observing the parameters while interacting repeatedly over time.In this Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
situation, the solution of the proposed game formulation can be derived by considering the average payoffs as: instead of ( 69) and (70), respectively.The approach to finding the solution will be the same as discussed in Section III-B.Remark 5 (Scalability): For α > 1, we define F d (α, r ai ) αr * di (r ai )−r * di (αr ai ).Then the proof of scalability is equivalent to proving that F d (α, r ai ) > 0 for any α > 1.First, it is obvious that F d (1, r ai ) = 0. Thus the sufficient condition for F d (α, r ai ) > 0 is that F d (α, r ai ) is an increasing function of α, i.e., ∂F d (α,r ai ) ∂α > 0. To proceed further, the first-order and second-order partial derivatives of F d (α, r ai ) w.r.t.α are obtained as: (81) (82) is always greater than 0, which indicates that is increasing in α.A similar conclusion can be drawn by following the above-mentioned steps for the (75).Hence, once the solution in (74) and ( 75) is found, the equilibrium when any increment in the amount of available resources can be obtained without increased complexity.

IV. SIMULATIONS AND RESULTS
This section investigates the performance of the networked LFC of a 39-bus power system through unreliable cyber layer resources in the presence of attackers.The IEEE 39-bus test power system, also known as the 39-bus New England system, consists of 39 buses, 29 lines, 46 branches of which 12 transformers, and 10 generating units.The total load is 6150MW, and the total generating capacity is 7300 MVA.The generators are equipped with excitation and power system stabilizer units.Among the 39 buses, one is the slack bus (Bus 31), nine are voltage-controlled buses (Bus 39, Bus 32, Bus 33, Bus 34, Bus 35, Bus 36, Bus 37, Bus 38 and Bus 30) and the rest are load buses.This test system has been chosen because the same system is believed to mimic the properties of a typical power system closely and it has been widely used by previous researchers for various purposes.The power system is split up into control areas such that the generators in each control area will share the maximum amount of load change in that control area's load bus.Based on these criteria, the division of control area 1 is optimal since a load change in the bus in CA 1 is mostly met by the generators in CA 1 .For instance, over 75% of the 1% load change applied to any of the buses 25, 26, and 27, of CA 1 are met by the generators of CA 1 .However, the separation between CAs 2 and 3 is modified from the existing works such that there is significant tie-line power flow between the control areas when the load at the bus changes.For instance, 29%, 22%, and 25% of a change of 1% load at bus 8 placed in the CA 3 are met by the generators in bus 31, 32, and 30, respectively.Increased power flow on the tie-lines will enable the study of the worst effect of FDIAs, as non-zero tie-line power measurement will increase the probability of successfully injecting stealthy false data.On the other hand, 75% of the energy demand is met by the generators of CA 2 when 1% load change happens in any of the loads in buses 3, 4, 15, 16, 18, 20-24, 31.To present the problem addressed in this paper, the dynamic model of the LFC for a system with three CAs is considered (Fig. 4) where the state variables, control signals and process errors are obtained by modifying the definitions in ( 2), (3), and (4) for a three CA system.
The linear model of the three CA system is obtained by modifying the definitions in ( 8) and ( 14): where M i = 1.67,D i = 0.083, T ti = 0.30, T gi = 0.08, σ i = 2.4, β i = 0.5, and X tie = 3.93.The tie-line power deviation between the three CAs are defined as: The three-area power system simulation experiment in this article is carried out on MATLAB 2022a.The aim of the controller is to minimize the change in frequency and tie line flows due to disturbances and FDIA to near-zero values by generating control signals to adjust the generation to match the load demand.

A. Cyber Layer Differential Game
The objective of the controller is to allocate the available cyber layer resources to protect the measurements from FDIAs.Here, the total resources available with the controller and the attacker are denoted as |R d | and |R a |, whereas the resources allocated by them for the i th measurement are represented as r di and r ai .The effect of cyber layer resource allocation on the corresponding physical layer measurements is studied in Figs. 5 and 6.For ease of presentation, an FDIA is considered only on the tie-line power measurements for preparing this result.It is evident from Fig. 5(a) and 6(a) that the deviation of the P tie,12 is 0.007 when the probability of allocating cyber layer resources to defend the corresponding measurement is 0.25.As the probability of allocating cyber resources to defend the measurement is 0.5, the P tie,12 deviation becomes 0.005 in Figs.5(b) and 6(b).Finally, the deviation of the P tie,12 becomes 0.0009 in Figs.5(c) when the probability of allocated cyber layer resources to defend the corresponding measurement is 0.6 (evident from Fig. 6(c)).This observation is intuitive as allocating more resources to defend the cyber layer will reduce the effect of the FDIAs.

B. Differential Game Based LFC
For the LFC, the weight matrices in the cost functions of the three control areas are considered as time-invariant matrices having The values of all other elements in the above matrices are zero.The aversions to the process error for the three CAs shown in Fig. 4 are considered to be W 1 = 1.3, W 2 = 1.5, and W 3 = 1.2.The time horizon T is set as 100.The noise and the node compromise rate used in simulating the Nash equilibrium, derived in (36), are illustrated in Fig. 7.
A coordinated FDIA is implemented on the LFC of CA 1 at t = 1s.The frequency deviation, tie-line power deviation, and ACE signal of the system after FDIA, targeting the frequency and tie-line power is launched, are shown in Fig. 8(a).The deviation of the ACE signal for the CA 1 (as shown in Fig. 8(a)) proves it is erroneous, causing the CA 1 's frequency and tie-line power to further deviate from the set value.The deviation will spread to affect other interconnected CAs.The deviations of the frequency and tie-line power and the ACE signals of CA 2 and CA 3 after implementing FDIA on the LFC system of CA 1 are also shown in Fig. 8(a).It can be concluded that, if left Fig. 5. Variation of P tie for the following allocations cyber layer resources (a) unattended, the power system will be in an unstable state due to the FDIA attack on CA 1 .Next, the effect of the proposed control and defence mechanism will be analyzed.
We compare the proposed cyber layer differential game approach with the following scenarios: • Scenario 1: Attacker and grid operator allocate cyber layer resources according to (72) and (71).• Scenario 2: Grid operator allocates cyber layer resources according to (71); Attacker allocates resources uniformly, i.e., r ai (t) = |R a |/k.respectively.These costs are depicted in Fig. 8(b) as the intersection point of the strategies of the attacker and the defender.For Scenario 1, the plots of strategies are obtained from (71) and (72).To find the actual reason for the improvement in the cost, the deviations of the ACE signals are plotted in Fig. 8(c).The difference between the accumulative cost of each player for Scenario 1 and the other scenarios, shows that the basic concept of Nash equilibrium brings an extra decrease in cost.Nash equilibrium states that the unilateral deviation of the attack and the control in Scenarios 2 and 3 from the saddle point, given in ( 94) and (95), results in the cost margin provided by the Nash equilibrium strategies adopted by the CAs.Further, the cost incurred in Scenario 1 can be optimized for ρ a and ρ d .The optimal cost value is 1.068 from the plot in Fig. 9(a).The tie-line power measurement, with and without FDIA, is listed in Table II.The attacker injects false data in the tie-line data, leading to a generation load imbalance in the system.The deviation in the power flow due to false data is the least in Scenario 1 because the attacker and the defender are competing against each other to maximize their objectives.The deviation in the power flow data is maximum for Scenario 3 when the attacker causes maximum damage by optimally allocating resources according to (72), whereas the defender uniformly allocates its resources.Following Scenario 2, the controller incurs more cost in comparison to Scenarios 1 and 4, whereas the incurred cost is less than Scenario 3.
Next, we compare the performance of the proposed noncooperative game-based differential control with PI controller and a centralized controller.The optimum values of PI controller parameters are unknown.The initial values of these parameters are chosen randomly and are then tuned using an optimization algorithm.We have chosen the integral time absolute error (ITAE) as the objective function that is minimized by the genetic algorithm (GA).The centralized controller generates the control signals by solving an optimization problem where the objective function is the aggregate of ( 17), ∀i = 1, . . ., k. Constraints have also been modified accordingly.It is evident from the plot in Fig. 10 that the deviation of the tie-line power is maximum for the PI controller.The proposed LQDG controller performs better than the PI controller as both load disturbance and worst-case attack vector are considered at the same time for the generation control of the CAs.The root mean square error (RMSE) of both controllers Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.related to the setpoints reveals that there is around 8.77% improvement using the LQDG controller in the tie-line power deviation while there is an improvement of 18.09% in terms of frequency deviation.The maximum frequency and tie-line power deviation of the proposed LQDG controller are also smaller than the PI controller by 4.7% and 6.6%, respectively.The control of deviations in the state variables is hard for the PI controller since the load disturbance is not included in the structure of the PI controller.The PI controller only reacts to the load disturbance after it causes deviations in the referenced frequency and tie-line power measurements.Although the proposed controller performs better than the PI controller, the deviation in tie-line power is 3.1% more than the centralized controller.The difference in the performance of the proposed and the centralized scheme is due to the concept called Price of Anarchy [34].Following the proposed LQDGbased LFC, the generation of each CA is controlled to mitigate the deviations of that CA.On the contrary, the centralized differential game-based controller controls the generation of all the CAs to minimize the deviation of the state variables of the CAs, leading to more efficient control.The RMSE of the centralized controller shows an improvement of 2.43% in comparison to the proposed LQDG controller in tie-line power, while there is an improvement of 7.38% in terms of frequency deviation.The maximum frequency and tie-line power deviation of the centralized controller are also smaller than the proposed LQDG-based controller by 5.57% and 3.21%, respectively.However, the advantage of the proposed scheme over the centralized scheme is its immunity against FDIA on the control signals since they are generated locally at each CA.Further, the exact values of some of the important parameters of the considered system are listed in Table III, for the proposed, centralized, and PI controllers.It is evident from the table that both the proposed and the centralized schemes perform better than the PI controller.Following the centralized scheme, the control cost and variance cost are minimum for CA 3 and CA 2 , respectively.However, the control cost and variance cost are more evenly distributed among the CAs in the proposed scheme.This is due to the competitive interaction among the CAs in a non-cooperative framework.For the same reason, the detection probability of the centralized scheme is better than the proposed scheme.Fig. 8(b) shows that the cyber layer differential game approach outperforms the other scenarios.Next, we compare the following scenarios for the physical layer LFC: (1a) Proposed LQR in ( 17)-( 20), (1b) Proposed LQR without (20), (1c) Proposed LQR without risk (κ i = 0) with ( 20) and (1d) Conventional LQR without risk (κ i = 0) and (20).In (1a), CA 1 minimizes the tie-line power and frequency deviations following the proposed differential game-based control.As per the game formulation, the actions of CA 1 also affect the tie-line power in CA 2 and CA 3 .Further, the control signal of CA 1 impels the variation in CA 2 and CA 3 , which hinders CA 2 and CA 3 from keeping the frequency and other tie-line powers at the desired setpoint.Thus, the strategy of CA 2 and CA 3 , as well as the attack and the inevitable disturbance perturbing the LFC system, should be considered when CA 1 constructs its control strategy and vice versa.This coupled construction of control and attack strategies is handled by Theorem 1 from a non-cooperative differential game-theoretic perspective.The control actions of the CAs corresponding to the Nash equilibrium are illustrated in Fig. 9(b).With the control and attack strategies obtained, the state trajectory of the closed-loop system is shown in Fig. 9(b).Scenarios (1a), (1b), and (1c) are compared based on the two regions highlighted in 9(b).It is evident that the peak deviations are more prominent for scenarios (1c) and (1d) due to the absence of risk measures.Although the peak deviations are not prominent for (1b), deviations are consistently more than (1a) due to the absence of attack detection.Hence, the control cost for (1a), (1b), (1c), and (1d) are 3.12, 8.The effect of the stochastic nature of P d is studied by simulating the bi-level differential game 30 times for various combinations of cyber layer resource budgets of the attacker and the controller.The mean of the resulting costs (15) of each CA is calculated and plotted in Fig. 9(c).The controlling costs of all the CAs decrease with the increase of the |R d |/|R a |, which in turn reflects the improvement of the control performance in the physical layer caused by the reduced capabilities of the attacker in the cyber layer.Further, the worst performance of CA 2 in terms cost is due to the fact that r d23 /r a23 and r d2 /r a2 are the least among all the CAs.
To test the effectiveness of the proposed controller under random changes in load perturbation, a random varying step load perturbation is considered in CA 1 as seen in Fig. 11(a).Simulating the IEEE 39-bus power system with the load perturbation shown in Fig. 11(a) results in the frequency deviations of CA 1 as shown in Fig. 11(b).The results in Fig. 11(b) confirm that the frequency deviation of the CA 1 is within acceptable limits, thus satisfying the robustness of the proposed controller under the random change in load disturbances.A sensitivity analysis of the proposed bi-level LQDG is also conducted, where P d1 , generated for different values of variance W 1 , are employed in the power system to investigate the stability of the proposed framework.It can be observed from Fig. 11(b) that although both the maximum deviation and the convergence time increase as the variance increases, the proposed method is still able to control the frequency deviation.The reason for this behavior has been identified in (27), where the control gain becomes more stringent in directions that are prone to noise variance.
Next, we investigate the importance of the design of chance constraint in (20) by plotting J i with varying τ and in Fig. 12.As increases, the constraint on the design of the attack vector becomes more stringent.Consequently, the false data introduced by the attacker is reduced, resulting in a decrease of J i .On the other hand, as τ increases, the attack vector space increases, thereby increasing the control cost for the CAs.

V. CONCLUSION
This paper has established a novel bi-level differential gametheoretic framework of LFC, considering the decision-making  of an attacker and controller in both the cyber and physical layers.In the cyber layer game, the status of the cyber layer nodes has been designed using a state-space model, where the transition between the states depends on the allocation of resources.Thereby, the resource allocation among multiple nodes by the attacker and the controller has been investigated from the viewpoint of a non-cooperative game.In the physical layer, the resource allocation of the attacker and the controller is investigated subject to some practical constraints from the viewpoint of minimizing the cost of controlling a power system when the attacker designs an attack with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.minimum detection probability.We have finally illustrated the effectiveness of the proposed control architecture through simulations of a three-control area IEEE 39 bus system.The proposed comprehensive bi-level game framework shows improvements in minimizing the average and peak deviations of the state variables compared to standalone differential game-based LFC.The proposed game framework performs better than the conventional PI controller and closely follows the performance of a centralized controller.The differential game model proposed in this paper can be used to mimic a scenario where the first player (the controller) attempts to minimize the cost in the event that his opponent (attacker) engages in the worst possible conduct (targeted unknown disturbances).Control problems, such as the safe landing of aircraft in the presence of wind shear [35] and an autonomous convoy of vehicles with tampered location data [36], can be addressed using the algorithm developed in this work.The analysis presented in this work is limited by the fact that LQDG based LFC problem is solved considering the attacker takes optimal decisions representing the worst case for the controller.Further, this work does not explicitly model the spread of infections of the cyber layer nodes and their influences on attackers' and controllers' decisions, which will be addressed in our future work.Additionally, in future, we will investigate the performance of a cooperative game-based differential game framework under FDIAs and implement the theoretical findings in a real-time simulation environment.where W = E{( P d (t) − Pd (t))( P d (t) − Pd (t)) T }.Then, finding the difference between the above quantities and taking its square:
Proof (Theorem 1): The proof is carried out by applying the technique of mathematical induction.
Proof (Lemma 5): The pair (CA, T i 2 1/2 ) is detectable as ) is detectable.Following the standard theory of LQR controller, the exponential convergence of P(t), K u (t), and K a (t) to P, K u , and K a , respectively, and the stability of (CA + CBK u (t) + K a (t)) are ensured due to the fact that (CA, CB) can be stabilized, (CA, T i

Fig. 1 .
Fig.1.Illustration of the two-area load frequency control (LFC) system and the compromise of its measurements.

Fig. 3 .
Fig. 3. State evolution model between normal and compromised nodes.

Fig. 6 .
Fig. 6.Variation of z and z for the following allocations cyber layer resources (a) r d1 r a1 = 0.4,

Fig. 8 .
Fig. 8. (a) State variables after the attack, (b) Comparison of 4 scenarios terms of J 1 , (c) Comparison of the 4 scenarios in terms of ACE 1 .

Fig. 9 .
Fig. 9. (a) Optimal cost J * 1 for varying ρ a and ρ d , (b) Variation of state variable and control signal for scenarios 1(a), 1(b), 1(c), and 1(d), (c) Variation of cost J i for varying R d R a .

Fig. 11 .
Fig. 11.(a) Load disturbance with varying variance, (b) Effect of load disturbance on the frequency deviation of CA 1 .
3, 9.4, and 11.2, respectively.The minimum control cost for the proposed technique reinforces its efficacy.The equilibrium values of z i derived for different values of |R d |/|R a | are listed in Table IV.z i values for other scenarios are also listed in Table IV.For each resource budget ratio |R d |/|R a |, the bi-level differential game among the CAs and the attacker is solved according to the proposed solution in (36)-(37).The resulting measurement compromise rates z *i Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV NODE
COMPROMISE RATE VS.THE RATIO OF CYBER LAYER RESOURCES AVAILABLE TO DEFENDER AND ATTACKER under different resource budgets for the attacker and defender in the cyber-layer are listed in TableIV.It can be deduced that the increase in the cyber layer resource budget decreases the measurement compromise rate, leading to the rise of FDIA detection probability and improved performance in the physical layer.

(t), input u(t), attack vector a(t), and
E{y T (t)Q i y(t)|F t−1 } be the prediction error of the stage penalty at time t given F t−1 .Next, the closed form representation is derived for the expected predictive variance E{ 2 (t)}.The output y(t) of the LFC depends on states xpast noises P d (t).Let us define:ŷ(t) E{y(t)|F t−1 } = C Ax(t − 1) + Bu(t − 1) + E Pd The expectation of y T (t)Q i y(t) conditioned on F t−1 is E y T (t)Q i y(t)|F t−1 = ŷT (t)Q i ŷ(t) + Tr WE T C T Q i CE T (t)Q i y(t) − d (t)} = Pd .Replacing y(t) with ŷ(t) + CEδ(t): y T (t)Q i y(t) = ŷT (t)Q i ŷ(t) + 2ŷ T (t)Q i CEδ(t) +δ T (t)E T C T Q i CEδ(t)(85) t) +4y T (t)Q i M 3 + m 4 − 4Tr WQ i CEWE T C T Q i T (t)Q i y(t) + u T (t)R i u(t) + y T (T)Q i y(T) +κ i E T t=1 4y T (t)Q i CEWE T C T Q i y(t) + 4y T (t)Q i M 3 +m 4 − 4Tr WQ i CEWE T C T Q i +μ ij p η i E y(t) − Cx(t) + a(t) − η i CH T (t)Q i y(t) + u T (t)R i u(t) + y T (T)Q i y(T) +κ i E 4y T (t)Q i CEWE T C T Q i y(t) + 4y T (t)Q i M 3 + m 4 −4Tr WQ i CEWE T C T Q i +E μ i r T (t)r(t) + r T (t)a(t) −η i r T (t)CH + a T (t)r(t) + a T (t)a(t) − η i a T (t)CH −η i H T C T r(t) − η i H T C T a(t) + η 2 i H T C T CH(91) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.isderivedaccording to the following coupled first-order conditions:B T C T P(t + 1)CAx(t) + B T C T P(t + 1)CB + R i u(t) +B T C T P(t + 1)CE ¯ P d + B T C T P(t + 1)a(t + 1) +B T C T L(t + 1) = 0.(94)P(t + 1)CA + Q i C + 4κ i Q i CEWE T C T Q i C x(t) +P(t + 1)a(t + 1) + Q i + 4κ i Q i CEWE T C T Q 2. Stage t = 0, ..., T − 1. Function value at stage t + 1 is:L * (y(t), {μ i }, ξ) = inf u(t) sup a(t) E g t (u(t), a(t), {μ i }) + g μ ({μ i }) +L *t+1 (y(t + 1), {μ i })|F t (93) Thereby, Bellman's principle of optimality is applied to get the optimal actions at stage t.Hence, u * (t) and a * (t) i a(t) P(t + 1)CBu(t) + P(t + 1)CE ¯ P d + L(t + 1) + 4κ i Q i M 3 + i μ i r(t) + a(t) − η i (t)CHPr(η i ) = 0 (95)