Optimal Control for Interconnected Multi-Area Power Systems With Unknown Dynamics: An Off-Policy Q-Learning Method

This brief focuses on the optimal control problem for interconnected multi-area power systems with unknown dynamics. Firstly, in this brief, singular phenomena are considered to describe interconnected multi-area power systems. Then, the optimal controller is solved using a model-based control method via Bellman ’ s optimality principle, which requires the system dynamics. In this situation, a model-free off-policy Q-learning method is proposed considering the case where the system dynamics are completely unknown. Compared with on-policy Q-learning, the off-policy Q-learning method enhances data exploration capabilities. During the learning process, the off-policy Q-learning method uses the online information of state and input to iteratively solve the game algebraic Riccati equation without requiring system dynamics, which is more practical. Furthermore, the optimality of the proposed method is demonstrated. Finally, a simulation example is used to demonstrate the effectiveness of the designed method.

instantaneous currents, which are not accounted for in regular systems.To address this phenomenon, a singular system is introduced to describe this phenomenon.
Up to now, plentiful research on interconnected multi-area power systems has focused on various topics [2], [3], [4].Based on the above studies, there still exist three difficulties to be solved: (1) The interference among interconnected subsystems is difficult to resolve.(2) In practical scenarios, interconnected multi-area power systems are subject to dynamic uncertainties arising from factors such as dust, temperature, and humidity.Therefore, it is difficult to obtain complete system dynamics.(3) In solving the optimal control problem of a multi-area power system,the game algebraic Riccati equation (GARE) needs to be solved to obtain the optimal controller for multi-area power system.
To overcome the above challenges, the emergence of Qlearning provides a proven method for designing model-free controllers.Noted that for the Q-learning method to converge, exploration noise needs to be injected into the controller to satisfy certain excitation persistence conditions.Accordingly, the on-policy Q-learning method needs to always add exploration noise to the controller.However, this may lead to bias in solving the Bellman equation.In contrast, off-policy Qlearning allows for better exploration of the state space using any stable control policy for data collection.Unfortunately, off-policy Q-learning is widely used for discrete-time systems.However, little research has been done on continuous-time systems due to the difficulty in constructing Q-functions for continuous-time systems, which sparked our research interest.
Motivated by the above observations, this brief aims to develop a novel model-free optimal controller for unknown interconnected multi-area power systems by integrating singular system theory and off-policy Q-learning methods.The main contributions are listed below: (1) Compared to [5], [6], the influence of the dynamic and static parts of interconnected multi-area power systems on the overall system is considered, and interconnected multi-area power systems are modeled as a large-scale singular system.
(2) The results in [7] are improved and an off-policy Q-learning algorithm for the interconnected multi-area power system is designed for the first time.In addition, the system parameters are not used in the learning process, which is more practical in engineering applications.
(3) The effect of output disturbances of each power subsystem is considered.The Nash equilibrium solution of the

A. System Description
Generally, the singular dynamic model of N-machine multiarea power system, where the kth subsystem is described below [8] where The values and physical descriptions of each parameter are listed in Table I.
Remark 1: Interconnected multi-area power systems are subject to external disturbances during operation and respond through changes in system state variables.In general, disturbances acting at different locations in the interconnected multi-area power systems produce significant changes at the same node [9].In light of this, this brief describes the New England N-machine 39-bus systems as singular systems.
It can be seen the that the system (1) transforms into a regular system when Y k is transformed into an identity matrix.The basic property of a singular system is regularity.

B. Optimal Control Problems
In order to facilitate the further analysis, the following definition is given.
Definition 1 [10]: Let G and S be matrices of the same order.m is a variable on the complex field.If there exists a constant m 0 such that det(m 0 G−S) = 0, then matrix (m 0 G−S) is regular.Moreover, if the matrix (m 0 G − S) is regular, we call the singular system regular.
Suppose U k and L k are two invertible matrices in a singular state space.Letting xk (t) = U −1 k x k (t), the linear transformation of singular system (1) can be rewritten as where Here, the singular systems ( 1) and ( 2) exhibit a one-toone correspondence.Hence, these two singular systems have symmetry, identity and transitivity.Furthermore, there exist two invertible matrices Therefore, the singular system (1) is described by Remark 2: Noteworthy, the singular system is divided into a dynamic part (3) and a static part (4).Normally, the state variables of the dynamic part (3) undergo significant changes over time, while the state variables of the static part (4) change by a small amount over time.The dynamic part (3) has a more pronounced impact on the system.Therefore, when controlling the singular system through an equivalent transformation, focusing on controlling the dynamic part (3) effectively leads to achieving good overall control performance.
Assuming that A k22 is invertible, the static part (4) can be rewritten as (5) By putting the static part (5) into dynamic part (3), one has where Bk2 .To facilitate the subsequent analysis, the singular system (6) can be transformed to III. OFF-POLICY Q-LEARNING In the previous section, an equivalent model of a largescale singular system was obtained.In this section, a novel off-policy Q-learning approach is used to address the optimal control problem for interconnected multi-area power systems.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Firstly, the GARE is derived based on the value function.Secondly, the optimal controller is obtained by a novel offpolicy Q-learning method.
Definition 2 [11]: Let γ > 0 be the certain prescribed disturbance attenuation level.Thus, the system is said to have L 2 gain less than or equal to γ if Define the infinite horizon performance index as where W k = W T k ≥ 0 and R k = R T k > 0 are weight matrices.γ > 0 represents disturbance attenuation level.According to the Bellman optimality principle, the objective of the optimal control policy u k (t) is to minimize the performance index, one can conclude that , where P k is a symmetric positive definite matrix.Next, the following GARE is presented where P * k is the solution of GARE.The optimal control gain and the worst feedback gain satisfy the following equations According to ( 9)-( 11), the Lyapunov equation can be obtained Remark 3: Noteworthy, the optimal control policy for the kth subsystem can minimize the kth performance index, yet other policies may make the kth performance index increase.The performance index can still be minimized by the worstcase control policy.Therefore, the designed optimal control policy is fully valid for each singular power subsystem.
Next, a model-based algorithm is proposed to solve the optimal control policy for the kth power subsystem (7).
For the unknown power subsystem (7), the Q-function is defined as Algorithm 1 Model-Based Algorithm for kth Power Subsystem Step I: Give an initial stabilizing gain matrix K k,0 and C k,0 ; Step II: Solve P k,l from Lyapunov equation with Step III: Update the control gains as Step IV: Let l = l + 1 and repeat Step II and Step III until P k,l − P k,l−1 ≤ ϕ 1 , where ϕ 1 > 0 is a small positive constant; Step V: Output optimal control policy as 13) can be rewritten as where To facilitate subsequent analysis, the following definitions are given According to the Kronecker product representation and operator ξ k (t), (14) can be changed to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
along the solutions of ( 7) by (12) and For 0 ≤ t 0 ≤ t 1 , . . ., ≤ t z , the following matrices can be defined as Combining (15) with ( 16), one has where Lemma 1 [11]: If there exists a positive integer z 0 , such that, for z ≥ z 0 , rank Algorithm 2 Model-Free Algorithm for kth Power Subsystem Step I: Give an initial stabilizing control input u k = K k,0 x k (t) + e k1 and σ k (t) = C k,0 x k (t) + e k2 where e k1 and e k2 is exploring noise; Step II: Collect data x k (t), μ k (t) and υ k (t); Step III: Solve K k,l and C k,l from the equation (18) when Lemma 1 holds; Step IV: Let l = l + 1 and repeat Step III until K k,l − K k,l−1 ≤ k , where k > 0 is a small positive constant; Step V: Output optimal control policy as When Lemma 1 holds, (17) can be rewritten as Thus, the following Algorithm 2 is given to obtain the optimal solution for multi-area power systems with unknown dynamics.
Theorem 1: When Lemma 1 holds, the proposed Algorithm 2 can obtain optimal control gain K * k for initially stabilizing control gain K k,0 .
Proof: Let K k,l be the lth iteration of the PI method proposed in [11], and P l+1 be the unique solution of the Lyapunov equation (12).From Algorithm 1 and the definition of the Q-function, we can obtain that K k,l , K k,l+1 and P k,l+1 also satisfies equation (18).Moreover, (18) has a unique solution [ϒ( k ) T , veh( K k,l+1 ) T ] when Lemma 1 holds, so we have . The proof is completed.

IV. AN ILLUSTRATIVE EXAMPLE
To illustrate the effectiveness of the proposed Algorithm 2 for interconnected multi-area power systems, simulation example using a 10-machine power system as a complex singular Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
system is considered [10].The network topology diagram of IEEE-39 system is shown in Fig. 1.The system parameters can be obtained from [12].The parameters in value index (8) are given as W k = I 3 , R k = 1 and γ = 2.5.Furthermore, k is selected as 10 −3 , and the learning phase to collect data is set as 0s to 2s.Based on the analysis in Section II, the state variables in system (1) can be expressed as [ ω k (t), ϑ k (t), P mk (t)] T .
In the simulation, we choose the initial states as x 1 (0) = x 2 (0) = [ 0.8, −0.8, −0.8 ].Fig. 2 shows the state trajectories for ω k (t), ϑ k (t) and P mk (t) indicating that the system states converge under the optimal controller.Fig. 3 shows the curve variation between the control gain and the optimal value obtained through Algorithm 2. The initial stabilization control gain is set to [0 21.615 5.25].It can be seen that after the 10th iteration Algorithm 2 successfully achieves stabilization of the control gain.

V. CONCLUSION
The optimal controller design issue for continuous-time interconnected multi-area power systems with completely unknown system dynamics has been explored in this brief.Considering the current transient phenomenon in interconnected multi-area power systems, a singular system has been introduced to describe interconnected multi-area power systems.In addition, a novel model-free optimal controller design method has been proposed by combining the Q-learning method and game theory, which avoids the use of system dynamics.The optimality of the method has been proved.Finally, a simulation example has been given to verify the effectiveness of the proposed method.