Coordinated Complex-Valued Encoding Dragonfly Algorithm and Artificial Emotional Reinforcement Learning for Coordinated Secondary Voltage Control and Automatic Voltage Regulation in Multi-Generator Power Systems

This article proposes a coordinated optimization and control algorithm for coordinated secondary voltage control (CSVC) in multi-generator power systems. Firstly, to obtain a smaller voltage deviation and avoid the curse of dimensionality simultaneously, an artificial emotional reinforcement learning (AERL) is applied to automatic voltage regulation (AVR). Secondly, to obtain a smaller fitness value with lesser random for the decentralized independent variables optimization problem of the CSVC, a complex-valued encoding dragonfly algorithm (CDA) is proposed. Thirdly, the CDA and the AERL are coordinated for the CSVC and the AVR in multi-generator power systems. To verify the control performance of the AERL and the convergence of the proposed CDA, three simulation cases, i.e., IEEE 57-bus, 118-bus and 300-bus systems, are considered. The simulation results show that the CDA-AERL effectively obtains the smallest control objectives and the convergence for the CSVC in multi-generator power systems.


I. INTRODUCTION
The conventional voltage control of power systems contains three levels, i.e., automatic voltage regulation (AVR) for real-time level, secondary voltage control (SVC) for middle-term level, and tertiary voltage control (TVC) for long-term level [1]- [3].
Generally, the AVR is controlled by a proportionalintegral-derivative (PID) algorithm, which of parameters should be configured by operator experience or optimization The associate editor coordinating the review of this manuscript and approving it for publication was Emilio Barocio. algorithms. Numerous algorithms have been addressed to reduce the control errors of the PID, such as, modified differential evolution [4], [5], firefly algorithm [6], [7], and particle swarm optimization (PSO) [8], [9]. Nevertheless, the parameters of PID controllers should be reconfigured when the systemic parameters have changed. To obtain higher control performance for dynamic systems, reinforcement learning can be employed to a dynamic system [10]- [13]. To obtain a more accurate control strategy, more actions have been added to reinforcement learning [14]. Both the calculation memories of Q-value matrix and P-value matrix of reinforcement learning then will be increased when the number of actions is increased. However, the programming of the reinforcement learning may out of memory, the curse of dimensionality then will occur [15]. As one branch of artificial intelligence, artificial emotion has been applied to increase the control performance of reinforcement learning [16], [17]. For example, the artificial emotional reinforcement learning (AERL) has been utilized in load frequency control of power systems in the previous work [18]. Consequently, the AERL can be introduced to the AVR. The major features of the AERL can be summarized as follows.
1) The agent based on the AERL consists of two parts, i.e., the artificial emotional part and reinforcement logic part. 2) Since the output actions can be updated by the artificial emotional part of the AERL, the curse of dimensionality can be mitigated and the control performance can be increased simultaneously.
3) The controller based on AERL can update the control strategy on-line for a dynamic system. In a conventional framework of power systems, the SVC and the TVC are optimized independently [19]. The SVC in the conventional framework has one deficiency, the nodes of one control area are compacted, while the nodes of other control areas are loosed [20]. From power systems perspective, any control area should be a general node. To mitigate this deficiency, coordinated secondary voltage control (CSVC) has been applied in power systems [21]. The CSVC, which has been introduced by Paul et al. [22], aims to increase voltage stability in highly constrained areas [23]. Consequently, the CSVC can regulate the free variables of the reactive power flow [24]. Since the CSVC can be employed into systemic voltage control, the TVC is unnecessary for the power system [25]. Besides, since the CSVC can be applied to power grids, systemic voltage control contains only two levels, i.e., the CSVC and the AVR. Compared with the SVC, the major features of the CSVC can be summarized as follows [26], [27].
1) The CSVC aims to minimize the voltage deviation of the dominant node when the voltage stability margin of the power system is sufficient. 2) The CSVC can balance the reactive power flow of each control area and can maintain enough dynamic reactive power flow when the voltage stability margin of the power system is sufficient.
3) The CSVC aims to minimize the control cost of the power system when the voltage stability margin of the power system is insufficient. Carbon dioxide emissions are related to all the sources of a power system. For example, carbon dioxide emissions have been considered in stochastic wind and solar power [28] and land transport infrastructure [29]; carbon dioxide emissions have been considered in natural gas and heat delivery system [30]; and carbon dioxide emissions have been considered in optimal power flow [31]. The carbon-energy combined-flow has been applied in multi-energy power systems [32], [33]. Therefore, the carbon-energy combined-flow is considered in the CSVC.
Recently, single-objective optimization problems, discrete optimization problems and multi-objective problems have been solved by a dragonfly algorithm (DA) [34]. The DA has been applied to numerous parameters optimization problems, such as the parameters optimization of support vector machine [35], numerical optimization problems [36] and improving the grey model [37]. Besides, an improved DA has been proposed for feature selection problem, which is an optimization problem with decentralized independent variables [38]. Therefore, the DA has been utilized for decentralized independent variable optimization problems. Since the CSVC optimization problem contains various decentralized independent variables, an improved DA is proposed for the optimization problem of the CSVC.
Numerous meta-heuristic optimization algorithms (e.g. grey wolf optimizer (GWO) [39]) have been proposed by Mirjalili, who proposed the DA [34]. In general, a metaheuristic optimization algorithm can be improved by at least five types of operations: (1) more grouped operation: grouped grey wolf optimizer has been proposed for the parameters optimization of the maximum power point tracking of doubly-fed induction generator based wind turbine [40]; swarm moth-flame optimizer has been proposed for the racking of doubly-fed induction generator [41]; whale optimization algorithm has been grouped for standard benchmark functions [42]; (2) combined operation: the combined genetic algorithm (GA) and the PSO has been developed for hybrid wind-photovoltaic-battery system [43]; the GWO has been combined with a whale optimization algorithm for pressure vessel design [44] (3) adaptive parameters operation: an improved Jaya with self-adaptive weight has been applied for the parameters identification of photovoltaic models [45]; a teaching-learning-based optimization algorithm has been improved by adaptive inertia weights [46]; epsilon multi-objective genetic algorithm has been applied for PID parameters optimization [47]; (4) knowledge matrix based operation: knowledge matrix is employed to remember the optimization task [48]; transfer reinforcement learning with Q-value matrix has been proposed for reactive power optimizations [49]; a transfer matrix with Kriging model has been introduced into a multi-objective optimization algorithm [50]; (5) different coded operation: a real-coded GA has been applied into numerical optimizations [51]; binary coded GA has been employed to solve the path planning of mobile robots [52]; a binary operation has been added into the social minic optimization method [53]; hexadecimal coded optimization algorithm based on field-programmable gatearray has been applied for parallel computing [54]; complexvalued encoding operation has been employed to improve the optimal performance of the wind-driven optimization [55]. Furthermore, both more grouped operation and combined operation can obtain the global solution rather than a local solution; both adaptive parameters operation and knowledge matrix based operation can accelerate the convergence VOLUME 8, 2020 process of optimization algorithm; different coded operations can fit for different fitness functions or different types of optimization problems. Therefore, the complex-valued encoding operation is introduced to increase the optimal performance of the DA for the CSVC. Consequently, a complex-valued encoding dragonfly algorithm (CDA) is proposed for the CSVC in this article. Therefore, the major contributions of this article can be summarized as: (i) the imaginary part is added into the real part, the CDA contains two parts for increasing the convergence speed; compared with current improved DA, the CDA is a complex coded operation based optimization algorithm; (ii) the previous AERL is applied into the AVR for reducing the voltage deviation of the power system; compared with current reinforcement learning and the PID, the AERL contains two parts, i.e., an emotional part and a logical part; with the emotional part is added into the AERL, the AERL is more intelligent than the agent with only logical part; (iii) the proposed CDA is coordinated with the AERL for the CSVC and the AVR of a power system; compared with current voltage control framework, the CSVC and the AVR are coordinated more effectively.
The rest of the article is structured as follows. The CSVC is analyzed in Section II. Section III shows the AERL. The proposed CDA is presented in Section IV. Simulation results considering three power systems are shown in Section V. Conclusion is given in Section VI.

A. COORDINATED SECONDARY VOLTAGE CONTROL MODELS
The voltage control of a power system contains three time slots, i.e., the AVR, the SVC, and the TVC. The AVR is often referred to as automatic voltage control, which is regulated by generators' excitation or control algorithm. The control period of the AVR is several seconds. The SVC can be controlled by a closed-loop control algorithm, for example, PID, and reinforcement learning. The control period of the SVC is set to one minute to five minutes. The TVC can be regulated by an optimal algorithm, such as GA, PSO, GWO, moth-flame optimization algorithm (MFO), and DA.
To mitigate the coupling structure of a power system with multiple control areas, the CSVC is considered (FIGURE 1) in this article. The dominant node of the CSVC can be coordinated regulated by the AVR (FIGURE 2). For the i-th generator in FIGURE 2, E fie is the steady-state value of the excitation voltage; K Ai means the gain constant of the voltage regulator; U refi is the generator terminal reference voltage (standard value); T Ai is the time constant of voltage regulator; E fi means excitation voltage; k i is feedback gain, and k i ∈ [1, 100]; x di means d-axis synchronous reactance; E qie means the steady-state value of transient electromotive force; E qi is transient electromotive force; x di is d-axis transient reactance; Q ei means reactive power output; U ti is generator  terminal voltage (standard value); V di and V qi are the d-axis and q-axis stator voltage, respectively.

B. OPTIMIZATION OBJECTIVE AND CONTROL OBJECTIVE
The optimization objective of the CSVC that considering carbon emission flow can be described as, where µ 1 , µ 2 , and µ 3 are the weights of carbon-energy combined-flow, active power losses, and stable voltage component, respectively; and where n G is the number of generator nodes; V j , V max j , and V min j are the real voltage, the maximum voltage and the minimum voltage, respectively. The active power losses can be presented as, where n L is the number of load nodes; θ ij is the phase angle difference between the i-th and the j-th load node; V i and V j are the voltages of the i-th and the j-th load nodes, respectively; g ij is the conductance between the i-th and the j-th load nodes. The real carbon-energy combined-flow C ds can be calculated as, where w means the w-th generator, w ∈ W ; A iw can be calculated as, where P wi means power flow value from the w-th node to the i-th node; P wi = 0 when the i-th node and the w-th node are not connected; P sw is the power flow value from the s-th node to the w-th node; δ sw means carbon emission intensity; i + means the connection of all the input bus of the i-th node; P ni means the power flow output; α g and β g are the weights of the carbon emissions of the generator and the power grid, respectively (FIGURE 3). All the inequality and equality constraints of the power flow equation can be described as, where S i is the complex power value; P Di and Q Di mean the active and reactive power flow output, respectively; P Gi and Q Gi imply the active and reactive power flow input, respectively; b ij implies the susceptance between the i-th and the j-th load node; N i means the node vector that connected to the i-th node; N G , N B , N C , N k and N L are the generator, bus vectors, AVR device, on-load tap-changer and branch vectors, respectively; all the terminal voltage V Gi , reactive power value Q Ci , and transformer ratio k ti are the variables need to be optimized by the optimization algorithm; the superscript ''max'' and ''min'' mean the maximum and the minimum values of related variables.
The CSVC aims to maintain the generator terminal voltage U ti to the generator terminal reference voltage (standard value) U refi . Therefore, the input of the AVR controller is E fi ; the output of the AVR controller is voltage commands, which can be converted to sinusoidal pulse width modulation commands.

III. ARTIFICIAL EMOTIONAL REINFORCEMENT LEARNING A. REINFORCEMENT LEARNING
A basic reinforcement learning consists of an agent and an ''environment''. At the (k + 1)-th control iteration, the environment provides reward value R and system state s k to the agent; the agent then provides action a (k+1) to the environment. The agent can update its strategy at each iteration. Therefore, the state-action pair of reinforcement learning can be reinforced by these iterations, which means the agent is trained on-line. As a model-free algorithm, reinforcement learning is not based on an accurate system model.
The Q-value of reinforcement learning is reinforced as, where Q k (s k , a k ) is the Q-value of reinforcement learning; s k ∈ S is the state of the environment; a k ∈ A is the action of action set A; the ranges of learning rate α and discount coefficient γ are set to be 0 < α < 1 and 0 < γ < 1, respectively; the initialization of Q-value is set to 0.
The action selection strategy in reinforcement learning is the key step for the control strategy. Generally, a greedy policy π * , which means the action of maximum Q-value will be select as the output action a k , is presented as, (8) Since the agent will select the action with the maximum Q-value at the s k state in the greedy policy, the other actions at the s k state may not be searched enough. To search all the actions at the s k state enough, a probability distribution selection policy is applied to select the action in this article. In the probability distribution selection policy, each Q-value corresponds to each P-value. Therefore, the computer memory of P-value and Q-value matrices are the same. The P matrix at VOLUME 8, 2020 the s state can be updated as, where β is the probability coefficient of probability distribution selection policy, 0 < β < 1; P k s (a) implies the selected probability in the s state; a g means selected action by the greedy policy. After enough searched on-line, one selected probability of the P k s (a) will be converged to 1, which means an optimal control strategy. The reward function of the AERL for the AVR can be designed as follows, where v is the voltage deviation of the power system.

B. ARTIFICIAL EMOTION
Reinforcement learning belongs to the category of machine learning, which belongs to the category of artificial intelligence. Another major branch of artificial intelligence is artificial psychology, which includes artificial emotions, artificial consciousness, and artificial cognition. Furthermore, artificial emotion is the major branch of artificial psychology. The artificial emotion should be quantified when the artificial emotion is applied to an engineering problem. The quantizer output f n and emotional coefficient η (FIGURE 4) are simultaneously calculated as where λ i is calculated by emotional weight ω i and input information θ i ; f n means the emotion value of the agent; k η means the configured maximum emotional coefficient; η max is the maximum agent emotion, and η max is set to 1 in this article. The quantified emotion value C f (η) is calculated as k a , k b and k c are the quadratic, linear, and constant coefficients, respectively. Then, the output of the AERL can be calculated as, where a Logicpart is the selected action from the logical part of the AERL by Eqs.(7)-(10).

C. ARTIFICIAL EMOTIONAL REINFORCEMENT LEARNING
To achieve smaller voltage deviation and avoid the curse of dimensionality simultaneously, the artificial emotion is applied to update the output action of the reinforcement learning.
The more intelligent agent of the AERL contains two parts, i.e., a logical part and an emotional part (FIGURE 5). The output action of the logical part is modified by the emotional part as, The major framework of the AERL is a framework of reinforcement learning. Compared with conventional reinforcement learning, the AERL can provide continuous actions rather than discrete actions for controlled systems. Since the AERL is a special reinforcement learning with a large number of actions, the AERL is a reinforcement learning based on the Markov decision process. More details of the proof of the Markov decision process have been presented in [56], [57]. Therefore, the convergence of the AERL can be proved by the Markov decision process as the convergence of reinforcement learning.

IV. COMPLEX-VALUED ENCODING DRAGONFLY ALGORITHM A. DRAGONFLY ALGORITHM
The DA has been proposed by Mirjalili [34]. Two major processes of the dragonflies swarm (i.e., hunting and migration) are similar to exploration and exploitation, respectively. Three primitive principles of the dragonflies swarm behaviors are separation S i , alignment A i and cohesion C i , which are calculated as, where N means the number of individuals X ; X j and U j are the position and velocity of individual, respectively. The attraction F i and distraction E i behavior can be described as, where X + and X − represent target position and enemy position, respectively. The position vectors at the (t + 1)-th iteration can be updated as, where k s , k a , k c , k f , k e , and k x imply the weight coefficients of the updated positions. Since all these weight coefficients should be configured for the convergence of the DA, an improved randomness stochastic behavior of the artificial dragonflies for exploration with a random walk (Lévy flight) can be employed to update the dragonflies' positions at the t iteration as, where d means variables dimension; Lévy flight Lévy(d) can be described as [58], where r 1 (d) and r 2 (d) are two generated random numbers from 0 to d; β L is a constant, and β L can set to be 1.5; σ can be presented as, where (x) is a factorial function, (x) = (x − 1)!.

B. COMPLEX-VALUED ENCODING DRAGONFLY ALGORITHM
To explore and exploit the solution with more spaces, the CDA is proposed in this article. Each individual in the CDA is recorded as, where R p and I p are the real and imaginary values of the complex-valued position, respectively. The upper and lower absolute individual values are |X max p | and |X min p |, respectively. The complex-valued position of the p-th individual can be described as, where the range of the radius and the angle of the complex-valued position are ρ p ∈ 0, |X max p |−|X min p | 2 and θ p ∈ [−2π, 2π], respectively.
The real and imaginary values of the complex-valued position vectors can be updated as, The fitness value is calculated by the real number, which can be converted from the complex-valued position, as follows.
where ρ p = R 2 p + I 2 p , p = 1, 2, . . . , N . The convergence of the DA has been verified in [34]. Since the last step of the CDA can provide a real number position, the CDA has the same updated process with the DA. Consequently, the convergence of the CDA has the same convergence of the DA.
The computation complexities of the GA, the PSO, the GWO, the MFO and the DA are O(n iter × n size ); where n iter and n size are the number of maximum iteration and the population size of these optimization algorithms. Since the imaginary part is added into the CDA, the computation complexity of the CDA is O(2n iter × n size ).

C. COORDINATION OF COMPLEX-VALUED ENCODING DRAGONFLY ALGORITHM AND ARTIFICIAL EMOTIONAL REINFORCEMENT LEARNING
To obtain the highest control performance and the convergence performance simultaneously in the CSVC and the AVR, the proposed CDA and the AERL are coordinated in this article, i.e., a coordinated CDA-AERL. The proposed coordinated CDA-AERL (FIGURE 7) is employed to a coordinated framework with two voltage control levels, i.e., the CSVC and the AVR (FIGURE 8). Then, the coordinated CSVC-AVR framework can replace the three layers voltage control framework. With the imaginary part is added into the CDA, the proposed CDA can coordinate with the AERL VOLUME 8, 2020  for optimizing the CSVC; with the emotional part is introduced into the logical part, the designed AERL can provide a real-time strategy for the AVR of power systems.
Compared with conventional voltage control, the major characteristics of the coordinated framework are listed as follows.
1) After the off-line training of the AERL, the AERL can effectively update the control strategy on-line for dynamic systems. 2) From the multi-agent system perspective, the game playing agent based on the AERL of each reactive power control device game with each other through the dynamic system. The results obtained by these simulated methods (FIGURE 9) show that: (i) the number of the simulated Q learning is larger than the number of the simulated AERL, i.e., 21 > 11; while the terminal voltage obtained by the simulated AERL is nearer to the input voltage than the terminal voltage obtained by the simulated Q learning; (ii) compared with the terminal voltages obtained by the optimized PID and the configured Q learning, the terminal voltage obtained by the configured AERL with the same configuration as the  The simulation results (FIGURE 9) obtained by the proposed AERL show that: (i) the agent based on the proposed AERL can effectively obtain the highest control performance when compared with the other simulated control algorithms; (ii) since the AERL can obtain higher control performance with lesser computation memory than Q learning, the curse of dimensionality can be mitigated. Therefore, with the emotional part in the agent, the agent based on the AERL is more intelligent than the agent based on conventional reinforcement learning. Besides, with the emotional part is added, the agent based on ''emotional part + logical part'' can obtain higher control performance than the agent only based on the logical part.

B. COMPLEX-VALUED ENCODING DRAGONFLY ALGORITHM
The CDA based on the AERL for the CSVC is applied into three cases, i.e., IEEE 57-bus, 118-bus and 300-bus systems. Each optimization algorithm simulated ten times for all these three power systems. All the cases are simulated on an Intel(R) Core(TM) i7-7820HK CPU 3.90 GHz and 64 GB RAM server with MATLAB R2019b.
The numbers of individuals and iterations of all the compared optimization algorithms (i.e., GA, PSO, GWO, MFO, DA, and CDA) in this case are configured as 1000 and 200, respectively. The parameters of all the simulated approaches in this case are set to similar values or default values, which are given in TABLE 1.
The convergence curves of one time of each compared optimization algorithm in this case are shown in FIGURE 11. FIGURE 11 shows that: (i) the CDA can obtain the minimum fitness function value when compared with the simulated methods; (ii) the CDA can convergence to the minimum fitness function value with the smallest iteration number. The optimal solutions obtained by all the compared methods are given in TABLE 2. The statistical results of these ten times simulations in this case are shown in FIGURE 12. FIGURE 12 shows that: compared with the simulated methods, since the CDA can obtain multiple approximate minimum fitness function values with multiple simulations, the CDA has a stable convergence feature, which means that   an optimization algorithm can obtain a nearly similar solution with multiple random running. The statistical calculation times of these ten times simulations in this case are shown VOLUME 8, 2020   The simulation results (FIGURE 11 and FIGURE 12) of this case show that: 1) Compared with other algorithms, the fitness value obtained by the proposed CDA is the smallest one (FIGURE 11). 2) From the statistical results of the fitness value with ten times (FIGURE 12), compared with other algorithms, the convergence of the proposed CDA is higher than that of other algorithms.

2) IEEE 118-BUS SYSTEM
A total of 25 variables of this case, which consists of 54 generators (FIGURE 14), are selected ( The convergence curves of one time of each compared optimization algorithm in this case are shown in FIGURE 15. FIGURE 15 shows that: compared with the convergence curve obtained by the DA, the CDA can obtain a convergence curve with smaller fitness function value. The statistical results of these ten times simulations in this case are shown in FIGURE 16. FIGURE 16 shows that: with the same configurations as the compared methods, the CDA has more opportunity to obtain a more optimal solution. The statistical calculation times of these ten times simulations in this case are shown in FIGURE 17. FIGURE 17 shows that the average calculation time of the CDA is smaller than the calculation times of other compared methods.     The simulation results (FIGURE 15 and FIGURE 16) of this case show that the CDA can obtain the optimal objective with a stability feature effectively for IEEE 118-bus system.

3) IEEE 300-BUS SYSTEM
A total of 111 variables in this case are selected (TABLE 4) in this case, which contains 69 generators (FIGURE 18). The numbers of individuals and iterations of all the compared approaches in this case are configured as 300000 and 200, respectively. The parameters of all the compared optimization algorithms in this case are given in TABLE 1.
The convergence curves of one time of each compared optimization algorithm in this case are shown in FIGURE 19. FIGURE 19 shows that: compared with the convergence curves obtained by the simulated methods, the CDA can obtain a convergence curve with the minimum fitness function value. FIGURE 11,FIGURE 15 and FIGURE 19 show that the CDA can obtain the minimum fitness function value with a complex optimization problem. The statistical results of these ten times simulations in this case are shown in FIGURE 20. FIGURE 12,FIGURE 16 and FIGURE 20 show that: with the same configurations as the compared methods, the CDA has more opportunity to obtain a more optimal solution for both simple and complex optimization problems. The statistical calculation times of these ten times simulations in this case are shown in FIGURE 21. FIGURE 21 shows that: for a complex optimization problem, the average calculation time of the CDA is less than the average calculation times of other compared methods except for the GA.
The simulation results (FIGURE 19 and FIGURE 20) of this case show that the proposed CDA can effectively obtain the economical operation.
The statistical simulation results (FIGURE 12, FIGURE 16,and FIGURE 20) obtained by all the compared methods under IEEE 57-bus, 118-bus and 300-bus systems show that: (i) the CDA can obtain the highest performance for the CSVC; (ii) since the complex-valued encoding is   considered in the DA, the CDA can fit optimization problems with decentralized independent variables more effectively than the DA; (iii) after simulating multiple times, the fitness values obtained by the CDA are the values that have one smaller range than that of other simulated algorithms under all the simulated cases.
The average fitness values obtained by all the simulated algorithms under IEEE 57-bus, 118-bus and 300-bus systems  are given in TABLE 5, which shows that the proposed CDA can effectively fit the CSVC than other compared algorithms.

C. DISCUSSIONS
Although the AERL of the coordinated optimization and control algorithm can obtain higher control performance with nearer voltage curves to reference voltage curses than optimized PID and configured Q learning, the control performance obtained by the AERL could be changed with learning rate, probability distribution, discounted rate, the number of states, the weight of emotion and coefficients of the quantified process of the emotional part of the AERL. After numerous testing: (i) the ranges of learning rate, probability distribution and discount coefficient of AERL can be Since the CDA can fit the optimization problems with decentralized independent variables more suitable than other compared optimization algorithms, the final optimal objective values obtained by the CDA are smaller than other compared optimization algorithms. Generally, the convergence performances of the CDA are stable when the population size and the maximum iterations of the CDA are larger than 3 times optimized variables and 100, respectively. For example, the numbers of individuals and iterations of CDA should be large than 27 (or 3× nine variables) and 100 for IEEE 57-bus system, respectively. Therefore, the maximum calculation times used by the CDA for IEEE 57-bus, 118-bus and 300-bus systems are 21.68 s, 67.20 s and 248.64 s, respectively, which are lesser than the optimization periods of the CSVC of these simulated systems. After numerous iterations in the optimization process of the CDA, the sensitivity of the configured weight coefficients of the CDA can be mitigated by the random features of the Lévy flight function. After the population size and the maximum iterations are respectively configured larger than 3 times optimized variables and 100, the weight coefficients of the updated positions of the CDA can be configured as random values with the range of (0, 1]. Graphics processing units and field programmable gate arrays can be applied to reduce the calculation time of the CDA.
The simulation results show that the improved optimization algorithm by the complex-valued encoding operation can fit optimization problems with decentralized independent variables. Numerous other similar optimization algorithms can be improved by the complex-valued encoding operation, such as the GWO, the MFO, etc. Besides, these five types of operations can be mixed together to improve optimization performance. For example, the more grouped operation can be integrated with the different coded operation. Although the CDA can fit optimization problems with decentralized independent variables, more complex optimization problems with complex constraint conditions could not be effectively fitted by the CDA.

VI. CONCLUSION AND PROSPECT
The AERL is applied to the AVR in this article. A novel complex-valued encoding optimization algorithm, which is named as CDA, is proposed in this article. Then, the CDA and the AERL are coordinated for the CSVC and AVR. To compare with the convergence of the CDA and the control performance of the AERL, three simulation cases (i.e., IEEE 57-bus, 118-bus and 300-bus systems) are simulated in this article. The major features of the coordinated CDA and AERL can be summarized as follows.
1) Compared with the simulation results obtained by the AERL and other algorithms, the AERL can effectively obtain the highest control performance. Since the agent based on the AERL has an artificial emotional part, the agent based on the AERL is more intelligent than the agent based on only reinforcement learning. 2) Since the output of the AERL can be modified by the artificial emotional part of the AERL, the AERL can update the control strategy on-line and can mitigate the curse of dimensionality. 3) Since the two voltage control levels are coordinated, the proposed coordinated CDA and AERL can effectively obtain the optimal convergence for the CSVC and AVR in multi-generator power systems. 4) The proposed CDA can effectively fit optimization problems with decentralized independent variables.
In the future works: (i) the proposed coordinated CDA and AERL could be employed into numerous optimization problems with decentralized independent variables, such as the coordinated economic dispatch and automatic generation control; (ii) the proposed coordinated CDA could be applied to mix integer nonlinear decentralized independent variables optimization problems, such as the unit commitment of power grids, energy and production efficiency optimization problems, etc.