A Virtual Generation Ecosystem Control Strategy for Automatic Generation Control of Interconnected Microgrids

The continuous access of new energy and distributed energy, as well as the random disturbance of load power, affect the security and stability of microgrids. A virtual generation ecosystem control (VGEC) strategy is proposed in this paper, which adopts the idea of time tunnel and the principle of a new win-loss criterion to achieve a fast automatic generation control power dispatch, optimal coordinated control of microgrids. A two-layer dynamic power dispatch structure is introduced in the proposed strategy, which combines hierarchical Q-learning with consensus theory to improve the adaptability of the consistency algorithm in complex random environments. Both the IEEE standard two-area load frequency control model and the interconnected microgrids model are used in simulation for comparison and veriﬁcation. The results show that, by using the VGEC strategy, the control performance of microgrids can be improved, while can reduce power generation cost, and obtain faster convergence speed and stronger robustness compared with other algorithms.


I. INTRODUCTION
The microgrids can integrate all kinds of distributed generation technologies effectively, and provide an effective technical way for the large-scale access of new energy and distributed energy, which has become an important part of smart grid research and construction [1]- [4]. However, because of environmental uncertainties and load fluctuations, the energy management system (EMS) of microgrids faces many challenges [5]- [7]. Automatic generation control (AGC) [8], as one of the most important control functions in EMS, can effectively improve the frequency quality and economic efficiency.
The traditional AGC strategies are usually composed of two categories: a) track the total power references of The associate editor coordinating the review of this manuscript and approving it for publication was Huai-Zhi Wang . AGC usually controlled; b) assign the total power references to each unit by a fixed allocation method. In practice, proportional-integral (PI) controller is widely used in the total power references tracking of AGC in microgrids. Moreover, bacterial foraging optimization (BFO) [9], particle swarm optimization (PSO) [10], genetic algorithm (GA) [11], and conventional gradient descent algorithm were applied to simultaneously optimize all the control parameters of microgrids. In previous studies of the authors, the reinforcement learning (RL) has been applied to the traditional AGC [12]- [14] of the interconnected power grid to solve the random disturbance caused by massive integration of distributed energy. However, the aforementioned studies are all based on centralized control structure, which is an inefficient cooperation with economical dispatch due to the ignorance of power grid topology. Particularly, the overall AGC command of provincial dispatch centre is assigned through a fixed VOLUME 8, 2020 This  proportion of the adjustable capacity rather than a dynamic optimization, and it cannot effectively cooperate with the AGC of interconnected microgirds. Therefore, a wolf pack hunting (WPH) [15] strategy based on the multi-agent system stochastic consensus game (MAS-SCG) [16] framework is presented to obtain optimal coordinated control of the islanded distribution network model with a large amount of distributed energy. However, the control method based on the multi-agent system stochastic game (MAS-SG) [17] in the WPH strategy is the DWoLF-PHC(λ), which cannot accurately calculate the winand-loss criterion and quickly converge to the Nash. Hence the author proposes an ecological population cooperative control (EPCC) [18] to solve above problems, and its control method is PDWoLF-PHC(λ) which has new win-andloss criterion and integrates time tunnel ideas. The power allocation algorithms in both WPH and EPCC adopt simple first-order consensus algorithm based on multi-agent system collaborative consensus (MAS-CC) [19], which relies heavily on the model and falls easily into the local optimal solution.
To improve the adaptability of the consensus algorithm in a dynamic random environment, a multi-robot behavior consensus Q-learning algorithm [20] that combined consensus algorithm and RL is proposed to realize robot behavior control. Inspired by this, a two-layer model for dynamic power dispatch is constructed. Based on this model, a novel hierarchical Q-learning consensus (HQC) strategy is proposed to obtain the optimal solution of power allocation and solve the dimension disaster. Then a virtual generation ecosystem control (VGEC) strategy is proposed to achieve a fast automatic generation control power dispatch, optimal coordinated control of microgrids, which is a hybrid of control (PDWoLF-PHC(λ) based on MAS-SG) and optimization (HQC based on MAS-CC). In particular, PDWoLF-PHC(λ) is used under MAS-SG to rapidly obtain the overall power reference (control), then HQC is firstly adopted under MAS-CC to optimally distribute the obtained overall power command into each unit (optimization). Both the IEEE standard two-area load frequency control (LFC) model and the interconnected microgrids model are used for simulation comparison and verification. The results show that the proposed strategy can improve the control performance of interconnected microgrids, reduce power generation cost, and obtain faster convergence speed and stronger robustness compared with other algorithms.

II. FRAMEWROK DEVELOPMENT
The interconnected microgrids integrated with a large number of distributed energy resources can be divided into several small area grid virtually according to the graph theory cut method, and the control framework is shown in Fig. 1. Each small area micro-network is considered as a VGE. Each VGE exchanges power through the regional boundary to maintain the frequency stability of the interconnected microgrid. When a serious fault occurs in the microgrid, each VGE is automatically separated as an island operation. At this time, the frequency of each VGE needs to be maintained by autonomous control.
The PDWoLF-PHC(λ) is adopted to obtain the total power command of each VGE through a multi-agent dynamic game since each VGE can be regarded as an agent. Each VGE contains multiple types of generator unit groups (GUGs), which will select the generator unit with the largest capacity as the leader and the other units as the followers. Total power command of different GUGs is obtained through hierarchical Q-learning (HQL) [21], meanwhile power command of each unit in GUGs is assigned using the consensus algorithm.

III. VEGC
The VGEC strategy is a mixed strategy based on MAS-SCG framework. The PDWoLF-PHC(λ) based on the MAS-SG principle is adopted to obtain the total power command in the AGC control part of each VGE. An HQC strategy is proposed to distribute the power in an optimal dynamic way, which combines HQL algorithm with a consensus method based on the MAS-CC principle.

A. AGC CONTROL ALGORITHM
In the VGE, the PDWoLF-PHC(λ) is used as the AGC control algorithm, and its corresponding controller is equivalent to an agent, which can frequently exchange information with other agents. The key idea of the proposed PDWoLF-PHC(λ) algorithm is to explicitly use the time tunnel idea with backward multi-step prediction function to effectively backtrack the online reinforcement information of future multi-step decision-making. At the same time, each agent uses experience sharing to update the Q-function table, through the dynamic competition or cooperation appropriately adjust its own control strategy, and maximize the overall learning efficiency of the multi-agent system, so that the proposed algorithm can obtain the optimal cooperative control.
The control system is a multiple-agent system, and the actions of other agents can change the state of the entire system. At this time, the agent uses the product of the decision change rate and the decision space slope value to be negative to design the variable learning rate, and agent changes the learning rate as the state changes.
In the case of the state s k and the reward function R 1 , the agent executes the search action a k according to the mixed table π (s k , a k ), and its state transits to the next state s k+1 . The update rule of π (s k , a k ) is as follows: where s k a k is the change of policy update at kth step iteration, |A i | is the number of optional actions, and ϕ is the variable learning rate. Its update law is as follows: where (s k−1 , a k−1 ) represents the decision change rate at k − 1th step iteration, and 2 (s k−1 , a k−1 ) is the decision space slope. If the product of the decision change rate (s k−1 , a k−1 ) and the decision space slope 2 (s k−1 , a k−1 ) is negative, φ win is selected as the variable learning rate, otherwise φ lose is selected, where φ lose > φ win . In the next step iteration, (s k , a k ) and 2 (s k , a k ) will be updated according to (5) and (6): This paper selects the eligibility trace based on SARSA(λ) [19]: where e k (s, a) is eligibility trace at kth step iteration under state s and action a, γ is the discount factor, and λ is the attenuation factor.
The agent uses the current reward R to calculate the evaluation value of Q function error: In (8) and (9), R(s k , s k+1 , a k ) is the agent reward function at the state from s k to s k+1 under the selected action a k , a g is the greedy action, ρ k is the Q function error at kth step iteration, and M k is the evaluation of Q function error.
The Q(λ) [22] is iteratively updated as follows: where α is the learning rate. After experiencing enough trial and error iterations, the state value function Q k (s, a) will converge to the Q * matrix with a probability of 1, and finally obtain an optimal control strategy represented by the Q * matrix.
In general, area control error (ACE) can maximize the long-term benefits of CPS and avoid large power fluctuations. Meanwhile, generation cost takes into account the economic impact of the energy management system. Therefore, the weighted sum of ACE and C total is chosen as a reward function, in which a larger weighted sum will result in a smaller reward. The reward function is expressed as follows: where ACE and C total represent the instantaneous absolute value of the ACE and the actual generation cost of all units at the kth step iteration, ρ and 1−ρ are the weight ratio of ACE and C total , respectively, and ρ = 0.5 is chosen.
After several trial and error tests, the parameters of control algorithm are set in Table 1.  process for each VGE is described by the following mathematical model: where f 1 is the linear weighted multiple-objective function of power deviation and regulation cost; f 2 is the objective function of ramp time; P error is the deviation of the total power command calculated by the AGC controller and the total power of all GUGs; C total−i is the regulation cost of the whole generator units in GUG i ; P iw and P rate iw are the power command and ramp rate of the wth generator unit of GUG i respectively; P is the total power command calculated by the AGC controller; P i is the power command of GUG i ; η i is the power allocation factor; m is the number of GUG in each territorial power grid; W i is the number of generator units in GUG i .

2) HQC STRATEGY
In the AGC allocation process, a novel HQC strategy that combines HQL with consensus algorithm is adopted to allocate power command. Each generator unit is regarded as an agent, the unit with the largest capacity is selected as the leader, and others are the followers. The leader interacts with the environment and gets an environment state s. The reward R of the environment and the next state s + 1 are obtained by the leader through taking an action, and the self-learning process is completed. Through the consensus algorithm, leader-follower and follower-follower can frequently interact. Therefore, the optimal allocation can be obtained through self-learning and collaborative learning.

a: HQL ALGORITHM
HQL can realize the self-learning processes to obtain power command of each GUG. HQL is based on the Q(λ), and the eligibility trace is iteratively updated by (7), in which γ = 0.9, and λ = 0.5.
The agent obtains reward value R 2 through the current exploration. R 2 (s k , s k+1 , a k ) is the agent reward function through executing action a k from state s k to s k+1 . Reward function R 2 (s k , s k+1 , a k ) is designed as follows: The iteration of HQL is updated according to (10) and (11). Assuming that the probability of the action occurring in the initial state is the same, the action a k is executed, and the state is transferred to s k+1 . The action probability function P is updated as follows: where θ is the action search speed with 0 ≤ θ ≤ 1, and θ = 0.9 is chosen. The total power command is taken as a state variable, which is discretized into ( Each GUG is treated as a hierarchical multiple-agent system network. Suppose that the GUG has a network of p agents, and the agents are represented by p(p = 1, . . . , n), respectively. The relationship of the interactions between agents is represented by a graph G = (V , E, A). V = (V p , p = 1, . . . , n) is a node set, and each node represents an agent; E ∈ V × V is the edge set, and its element represents the relationship between agents through a directed or undirected connection.
It is assumed that communication between agents v p and v q is determined by probability b pq (0 ≤ b pq ≤ 1) and independent from other agents. Communication between agents means there is an information connection. Laplace matrix L = [l pq ] can reflect the topology of multi-agent network [23], which is expressed as follows: The ramp time is chosen as a consensus variable for the GUG. The leader with a higher ramp rate undertakes more disturbances. The ramp time of the wth unit of GUG i is expressed as follows: (17) where P iw and P rate iw are the power command and ramp rate of the wth unit of GUG i , respectively. And P rate iw is expressed as follows: where P rate+ iw and P rate− iw are the upper and lower bounds of the ramp rate, respectively.
The ramp time for each follower in GUG is updated as (19). where The collaborative consensus of the agents can be achieved under the condition of frequent information interaction among the agents and constant gain b wv if and only if the directed graph is connected strongly [24].
The ramp time of the leader can be updated according to Reference [25] as follows: where σ i is the power regulation factor for the GUG i , D = d wv [k] ∈ R Wi × Wi is a row random matrix, and P error−i is the power deviation for the GUG i , which is expressed as follows.
Similarly, the power generation commands P iw and the maximum ramp time t iw are expressed in (23) and (24) as the boundary conditions are achieved.
where P max iw and P min iw are the maximum and minimum reserve capacity of the wth units of GUG i , respectively.
Moreover, the weighed factor will be changed as shown in Eq. (25) if the power command P iw of the wth unit of GUG i exceeds the limitation.

C. VGEC PROCEDURE
The execution steps of the VGEC are shown as in Fig. 2.

IV. CASE STUDIES
In the section of case studies, two-area LFC power system model and interconnected microgrids model are built to analyze the performance of the proposed strategy. All the simulation cases in this paper are run in MATLAB R2016b environment and the total instruction control period of the dispatcher is 4 seconds. Meanwhile, Simulink is used  for modeling and simulation, and the proposed algorithm and controller of VGEC strategy are written by S-function module.

A. TWO-AREA LFC MODEL
Based on the IEEE standard two-area LFC power system model [26], one equivalent unit in area A and B is replaced by GUGs that contain thermal power (TP) plant, liquefied natural gas (LNG) plant, and large hydropower (LH) plant. The framework structure of the two-area LFC power system model is shown in Fig. 3, and the system parameters are selected from [27]. Before running on-line, numerous explorations are needed to identify the optimal action strategy through off-line trialand-error that optimizes Q function and state value function [28], and achieves sufficient pre-learning. Sinusoidal load disturbance with period of 5000 s and amplitude of 1000 MW is also introduced in area A and B, respectively. The pre-learning process of two areas produced by continuous sinusoidal disturbance is shown in Fig. 4.
As shown in Fig. 4(a), the strategy can quickly track load disturbance in two areas. The AGC control performance is evaluated by control performance standard (CPS) and ACE. Fig. 4(b) illustrates that CPS1 in areas A and area B are maintained within the range of 185% to 200% and 150% to 200%, respectively. Moreover, Fig. 4(c) shows that the ACE in area A and area B remains in the range of −88 MW to 0 MW and −158 MW to 0 MW and finally reaches a stable value. The CPS standards are as follows: (1) If CPS1 ≥ 200%, and CPS2 is an arbitrary value, the CPS is qualified; (2) If 100% ≤ CPS1 <200% and CPS2 ≥ 90%, the CPS is qualified; (3) If CPS1 < 100%, the CPS is unqualified.

B. INTERCONNECTED MICROGRIDS MODEL
In this paper, a interconnected microgrids model is established that includes three microgrids with the communication topology shown in Fig. 1. The model integrates a large number of new energy and distributed energy including photovoltaics (PV), wind farms (WF), small hydro-powers (SH), micro-gas turbines (MT), diesel generators (DG), biomass energy (BE), and fuel cells (FC). The model is simplified to some extent because it does not include PV, WF, and electric vehicles (EVs) as participants in the system frequency modulation. The corresponding PV model [29] was established by simulating the change of the full-day light intensity; the output model of the WF [30], and other generator set models are established in previous studies [31]- [35].     5 shows the structure of a three-area interconnected microgrid cluster including VGE1, VGE2, and VGE3. The model parameters and unit parameters of VGE1 and VGE3 are the same, so only VGE1 and VGE2 are analyzed. The regulation power of the three areas is 2350, 2590, and 2350 kW, respectively, and the non-adjustable units (PV, WF, and EV) are regarded as load disturbances. Each adjustable unit (SH, MT, FC, DG, and BE) is treated as a different agent,  and the connection weight b wv between the agents is chosen to be 1. The parameters of the AGC unit [36], [37] are shown in Table 2. C i represents the cost of microgrid generation, and its formula is C i (P Gi,actual ) = C i (P Gi,plan + P Gi ) = α i P 2 Gi + β i P Gi + γ i , where P Gi,actual is actual active power for the ith unit, P Gi,plan is the planned power generation of the ith unit, P Gi is the AGC regulation power of the ith unit, and positive constants α i , β i , and γ i are dynamic coefficients under load disturbance with α i = a i , β i = 2a i P Gi,plan + b i , and γ i = P 2 Gi,plan + b i P Gi,plan + c i .

1) IMPULSIVE DISTURBANCE AND WHITE NOISE DISTURBANCE
In the pre-learning stage, the sinusoidal load disturbance with a period of 5000 s and an amplitude of 1000 kW is introduced in the VGE1, and the VGEC strategy is contrasted with three algorithms, namely EPCC [18], HQL [21], and WPH [15]. Fig. 6 shows the load disturbance tracking curves for different algorithms during the pre-learning process. Compared with other algorithms, the VGEC strategy can quickly track load disturbances with faster convergence. VOLUME 8, 2020 After the full pre-learning process, impulsive disturbance and white noise are introduced into the VGE1 to simulate different types of disturbances in the power system. The long-term performance of the VGEC strategy was statistically evaluated using 24 h experimental results under given impulsive disturbance and white noise disturbance. Fig. 7 and 8 are the load tracing curves of different algorithms. It can be seen from Fig. 7 and 8 that the VGEC strategy has smoother adjustment commands and can quickly track load disturbances. Table 3 shows the control index values of different algorithms under impulsive disturbance and white noise disturbance. In this case, | f | is the average of the absolute values of the frequency deviation, and all the indicators are the average values in the simulation time. CPS1 evaluates the effect of ACE changes on system frequency, and CPS2 evaluates the ACE amplitude. The CPS index considers the distribution of CPS1 and CPS2 indicators, which are mainly used to evaluate the control performance of the entire AGC system.
Values shown in Table 3 indicate that in the VGE1 area impulsive disturbance, compared with other methods, the VGEC strategy can reduce | f | by 0.0004 Hz to 0.0104 Hz, |ACE| by 0.57 kW to 20.84 kW, while it can increase CPS1 by 1.01% to 6.72%, CPS2 by 0.69% to 11.76%. Under white noise disturbance, the VGEC strategy can reduce | f | by 0.0012 Hz to 0.0176 Hz, |ACE| by 2.13 kW to 12.32 kW, while it can increase CPS1 0.18% to 0.22% and increase CPS2 0.00% to 0.33%.

2) RANDOM DISTURBANCE
Random disturbances are applied to the VGE1, VGE2, and VGE3 over a simulation time of 24h to verify the robustness of the VGEC strategy. Three algorithms are introduced for comparison. As shown in Fig. 9, under the VGEC strategy,  the total power command output of the unit can track the load disturbance well, so that the AGC dynamic control index can quickly return to its ideal value after the disturbance occurs, including the f , ACE, CPS1, which also illustrates the robustness of the VGEC strategy. Table 4 shows the AGC control performance indicators of the four algorithms under random disturbances, where the cost is the sum of the total regulation costs of all units within 24 h. VGEC has the highest value of CPS1, the smallest value of ACE and f , and the lowest total control cost.
Compared with EPCC strategy, the HQC strategy is adopted in the AGC power allocation part, so VGEC can improve the convergence speed through the interaction between the agents and the self-learning of the agent. Compared with HQL, VGEC is not affected by the size of the AGC unit, so the global optimal search capability is stronger, and the optimal solution can be obtained. Compared with WPH, VGEC uses the product of decision rate of change and the value of slope of decision space to be negative to design a variable learning rate, which has faster convergence speed and can reduce the total adjustment cost of AGC.

V. CONCLUSION
The contribution of this paper can be summarized as follows: (1) Based on VGE framework, a novel VGEC strategy is proposed to obtain the optimal cooperative control and fast power allocation of the interconnected microgrids, such that the energy autonomy of the microgrids is realized.
(2) Using PDWoLF-PHC(λ) as the control algorithm of VGEC strategy, an accurate calculation of the winning and losing criteria is possible, and can more quickly converge to Nash equilibrium. HQC strategy with interactive coordination and self-learning is used as the power allocation algorithm of VGEC strategy by constructing a hierarchical power allocation mode. This strategy improves the adaptability of the consistency algorithm in complex random environments and effectively alleviates ''dimension disaster'' that typically results from the large scale of the unit.
(3) The validity of the VGEC strategy is verified by the IEEE standard two-area LFC model. An interconnected microgrids model with a large amount of distributed energy is used for simulation comparison and verification. The results show that the proposed strategy can improve the control performance of the interconnected microgrids, reduce the power generation cost, and realize the energy autonomy of the microgrids cluster. Compared with other algorithms, the VGEC strategy is more robust and has faster convergence. LIUQING YANG (Fellow, IEEE) received the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, in 2004. She is currently a Professor with Colorado State University. Her general interests are in signal processing with applications to communications, networking and power systems -subjects on which she has published more than 310 journal and conference papers, four book chapters, and five books.
SHOUXIANG WANG (Senior Member, IEEE) received the B.S. and M.S. degrees from Shandong University, Jinan, China, in 1995 and 1998, respectively, and the Ph.D. degree from Tianjin University, Tianjin, China, in 2001, all in electrical engineering. He is currently a Professor with the School of Electrical and Information Engineering, Tianjin University. His main research interests are distributed generation, microgrid, and smart distribution systems. VOLUME 8, 2020