Optimal Scheduling Framework of Electricity-Gas-Heat Integrated Energy System Based on Asynchronous Advantage Actor-Critic Algorithm

The optimal scheduling of integrated energy system (IES) can improve energy efficiency and economic operation. However, the existing scheduling methods cannot accurately handle the dynamic changes of supply and demand sides in the electricity-gas-heat IES thanks to their power uncertainties. To tackle this problem, an optimal dispatch framework based on the asynchronous advantage actor-critic (A3C) method of IES is proposed. Firstly, we describe the dispatch problem of IES with multiple uncertainties as Markov decision process (MDP) according to the corresponding mathematical models and constraints. Then, the dispatch framework based on A3C is developed to optimize the control decision for supply and demand sides by asynchronous learning of agents and to reduce the relevance of parameters update of neural networks by the multi-agent utilization of the central processing unit (CPU) multi-threading function. Finally, our proposed methods are verified by the simulations. Compared with the previous optimization algorithms, the training time of the proposed methods is shortened by 37% and 30%, and the daily average operating cost is reduced by 3%, 5.2% and 8.7%.

The lower and upper limits of climbing power of PtG. H EB,t The thermal power output by EB at time t. P EB,t The power consumption of EB at time t. η EB The conversion efficiency of the electric heating. µ Loss The heat loss rate of EB. H min EB , H max EB lower and upper limits of the heat power output of EB. SOC x,t+1 , SOC x,t The SOC of the energy storage device x during time t and t + 1. P x,ch,t , P x,disch,t The energy storage device x in the charging and discharging power at time t. SOC x, 1 , SOC x, T The initial and the final energy of the energy storage in a scheduling period.

Q x
The capacity of the energy storage x. η x,ch ,η x,disch The charging and discharging efficiency of the energy storage device x. a x,ch,t ,a x,disch The charging and discharging state parameters of energy. P E,t The power purchased from the external power grid at time t. G G,t The power purchased from the gas grid at time t. P Load,t ,G Load,t , H Load,t The electric, gas, and heat loads at time t. C E The electricity purchase cost of the system. C G The gas purchase cost of the system. C O The operation and maintenance costs of each unit.

Pi,t
The electric power produced by unit i at time t. Pj,t The electric power consumed by unit j at time t. Gm,t The gas power produced by unit m at time t.
Gn,t The gas power consumed by unit n at time t. ε E,t The electricity price at time t. ε G,t The gas price at time t. G G,t The purchased gas power at time t. ε O,i The operation and maintenance cost of unit i. P i,t The output power of unit i at time t. δ t The action penalty cost of the agent at time t. ε ur,i , ε dr, i The upper and lower penalty coefficients of the agent's The climbing action. a i,ur,max , a i,dr,min The upper and lower climbing limits of the action. ε ua,i ,ε da,i The upper and lower penalty coefficients of the agent's action. a i,ua,max , a i,da,min The upper and lower limits of the action. r 0 The reward constant. γ The discount factor. n ω The learning rate of the critic network.

I. INTRODUCTION
As an important carrier for the development of the energy internet, integrated energy system (IES) maintains a significant effect in the coordinated operation of multiple energy sources and the improvement of energy utilization efficiency [1], [2]. However, with the couplings of multiple energy sources, the uncertainties of wind turbines (WT) and photovoltaic (PV) outputs and load demands challenge the optimal operation of IES [3], [4].
The main research methods about the optimal dispatch of IES are divided into model-based traditional methods and model-free DRL methods [5]. Traditional model-based methods need to establish a detailed system dynamics model and solve the problem through a solver [6]. Liu et al. [7] used electric boilers (EB), heat storage and batteries to establish a coordination model considering network transmission characteristics for the regional electricity-heat IES, and used a double λ iterative algorithm to solve the model. Ehsan and Yang [8] considered the constraints of multiple types of power generation and energy storage, established an operation optimization model for the comprehensive utilization and collaborative optimization of multi-energy demand. Wang et al. [9] considered the uncertainty of the load in the combined heat and power microgrid (MG) and optimize the economic dispatch of the time system. For the uncertainty of wind power forecasting, Qadrdan et al. [10] used stochastic programming to establish an optimal scheduling model. However, the above methods are limited to the day-ahead scheduling plan, and cannot be adjusted according to the dynamic changes in load demands and renewable energy outputs.
In response to these problems, model predictive control (MPC) adopts a rolling optimization strategy, which utilizes online repeated optimization calculations and rolling 139686 VOLUME 9, 2021 implementations to obtain dynamic control performance and solve the above problems [11]- [14]. Li et al. [12] proposed a strategy that combines the advantages of combining two-stage stochastic planning and backward control. Petrollese et al. [15] proposed the MPC-based control strategy for the optimal management of energy storage systems in high-permeability renewable energy MGs. Wang et al. [16] proposed an IES scheduling method based on MPC dynamic time intervals. Although the above-mentioned methods have achieved remarkable results in the optimization and decision-making of multi-energy systems, the abovementioned model-based methods require the establishment of detailed models of IES, and the selection of parameters depends to a large extent on expert knowledge.
With the rapid development of artificial intelligence technology, reinforcement learning (RL) has received more and more attention in the optimization control of power systems in recent years [17]- [19]. The RL model realizes the gradual accumulation of experience and improvement of strategies through continuous interaction with the environment. Especially the deep reinforcement learning (DRL) model combines with deep neural network and RL has better adaptive learning ability and optimization decision-making ability for non-convex and nonlinear problems [20]- [22].
Focusing on electricity-gas-heat IES optimal scheduling problems, Ji et al. [23] proposed real-time energy management for MGs based on DRL. MG energy management is modeled as a Markov decision process (MDP), which aims to minimize daily operating costs and is solved by a deep Q network (DQN) algorithm. Brandi et al. [24] use the deep Q learning method to control the indoor temperature of the building. Gorostiza and Gonzalez-Longatt [25] proposed a DRL-based state-of-charge (SOC) management controller for energy storage systems, use the deep deterministic policy gradient (DDPG) method for training. The above literature provides a research foundation for the application of DRL methods in IES. However, the above methods control the continuous actions in IES through discrete time, resulting in inaccuracy control. In addition, the above literature uses a centralized control method for optimization. When faced with the collaborative optimization of various forms of energy and energy storage in IES, the model training takes a long time and is easy to non-convergence.
In this paper, we handle an IES including the renewable energy, combined heat and power (CHP), power to gas (PtG), electric boiler (EB), battery storage (BS), heat storage (HS) and gas storage (GS) without considering system dynamics model and uncertain factor modeling. However, achieving the above goals is challenging for the following reasons. Firstly, the system contains a variety of energy sources, storage devices, and conventional methods used in optimization are inefficient and cannot meet the needs of real-time regulation. Secondly, the system contains a large number of determinants such as electricity price, wind power output, photovoltaic output and electricity-gas-heat load demand. Thirdly, the system exists in a variety of energy storage device-related coupled operation constraints, but the energy storage output at the current moment affects future decisions. In response to the above problems, this paper proposes an A3C-based optimal scheduling method for the electricity-gas-heat IES, which realizes the economic operation of the system by making decisions on the units and energy storage in each period. The main contributions of this paper are as follows: 1) We describe the integrated energy system mathematical models and optimization problem of the electricity-heat-gas network as an MDP including the action space, state space and reward mechanism of the agent.
2) An A3C-based optimal scheduling framework for the electricity-gas-heat IES is proposed, which uses the multi-threaded function of the computer CPU to control multiple agents in parallel and asynchronously.
3) Compared with the previous optimization algorithms, the training time of the proposed methods is shortened by 37% and 30%, and the daily average operating cost is reduced by 3% and 8.7%.
The rest of the paper is organized as follows: Section II describes the mathematical models of IES. Section III proposes the A3C-based optimal scheduling framework of IES. Section IV discusses simulation results. Section V draws conclusions.

II. MATHEMATICAL MODELS OF IES
The structure of IES is shown in Fig. 1. The electric network includes the external power grid, WT, PV, ES and power loads. The gas network includes natural gas stations, GS and gas loads. The heat network includes HS and heat loads, and energy conversion equipment mainly includes a CHP, PtG and an electric heating furnace. In the following section, the model of each unit in IES is provided.

A. ENERGY CONVERSION EQUIPMENT MODEL 1) CHP MODEL
The CHP unit can provide heat and electricity for the system [26]. The mathematical model of CHP can be described by (1) and (2).
2) PTG MODEL The power-to-gas unit converts the electric power into gas power, which can transform renewable energy into natural gas for large-scale, long-term storage. The relationship between the natural gas output and the electric power consumption can be described by (7).
The power and constraints of PtG are shown in (8) and (9).

3) ELECTRIC BOILER
The EB converts electric power into heat power, which is used to supplement the remaining heat load demand when CHP heat supply is insufficient. The relationship between the power consumption and the heat supply of EB can be described by (10).
The operating constraint of EB is shown in (11).

B. ENERGY STORAGE EQUIPMENT MODEL
Adding various energy storage equipment to the IES can alleviate the uncertainty of WT and PV output, stabilize load fluctuations, and increase the economic efficiency of the IES. The mathematical model of energy storage equipment can be described as (12)- (14).
where x is the energy category, which represents battery storage (BS), gas storage (GS), and heat storage (HS), a x,ch,t =1 indicates that the energy storage device operates in charging state at time t, and a x,disch =1 indicates that the energy storage x operates in the discharge state. The charge-discharge power and capacity constraints of energy storage units are shown in (15)- (18).

C. SYSTEM BALANCE CONSTRAINT
To satisfy the electricity-gas-heat load demand in each period, the system balance constraints need to meet the constraints as shown in (19)- (21).
The operation optimization goal of IES is to improve the economic benefits while meeting the constraints of the safe operation of the system. The optimization objective function F of IES is to save the operating cost of the system. According to the IES balance constraints (19)- (21), the electricity and gas purchase of the system is determined by the output of CHP, PtG, EB, BS, GS and HS. The output constraints of each unit have been described in Section II-A. The decision variables of the objective function are determined by electricity purchase power P E,t , gas purchase power G G,t , CHP output power P CHP,t , PtG electrical power P PtG,t , EB electrical power P EB, and energy storage equipment charge and discharge power P x,ch,t , P x,disch,t . The optimization objective function F can be formulated as: where A is the set of electric power generation units, B is the set of electric power consumption units, C is the set of gas power generation units, D is the set of gas power consumption units.

III. PROPOSED A3C-BASED OPTIMAL SCHEDULING FRAMEWORK OF IES
To address the limitations of previous approaches, this paper proposes the framework of A3C to solve the optimal scheduling problem of IES with uncertain problems. In this section, the optimal scheduling problem of IES is transformed into an MDP, and then the principles of the A3C-based framework are explained.

A. MARKOV PROCESS CONVERSION
RL provides the control decision over a sequence of time steps, which can be described as MDP [20]. The five elements of MDP are state space S, action space A, transfer function T , reward function R and discount coefficient γ . Among them, the transfer function describes the probability of the agent changing from state s t to s t+1 under a given action: T : S ×A×S → [0, 1]. In this paper, a discrete-time finite MDP is used to describe an integrated energy system economic scheduling problem. Integrated energy system as an agent which learns control strategies through repeated interactions with the environment. The units of the integrated energy system are treated as the environment. Fig. 2 illustrates the interactions between the agent and the environment in the MDP.
In the optimization process, the agent obtains the state s t at the beginning of each time slot. In each optimization time slot, the agent chooses the corresponding action from the possible action set a t according to the current state s t and the policy function π. The agent receives the next state s t+1 information and the reward value R t , and then repeats the above steps until the end of optimization. The ultimate optimization goal of the agent is to maximize the accumulated reward value.

1) SYSTEM STATE
The state space S is referenced to make decisions for the agent at each period. S includes the controllable state space S C and the uncontrollable state space S UC . Among them, the controllable state space contains environmental variables that can be directly/indirectly controlled by the agent, such as the SOC of BS S BS,t , the SOC of GS S GS,t , and the SOC of HS S HS,t . The uncontrollable state space contains variables that the agent cannot control, mainly including the wind power generation P Wind,t , the photovoltaic power generation P PV ,t and the electric load P Load,t , gas load G Load,t , heat load H Load,t . Therefore, the state space can be defined as: s t = S BS,t ,S GS,t ,S HS,t , P Wind,t , P Load,t , G Load,t , H Load,t , t .

2) SYSTEM ACTION
The action space of the agent includes CHP output P CHP,t , the gas power output of PtG G PtG,t , the electric power output of BS P BS,t , the gas power output of GS P GS,t , the heat power output of HS P HS,t , the heat power output of EB H EB,t . The action space can be defined as: a t = P CHP,t , G PtG,t , P BS,t , P GS,t , P HS,t , H EB,t . (30)

3) REWARD FUNCTION
The reward function is set to guide the current action of the agent to obtain the accumulated maximum reward. The reward function includes the cost of purchasing electric power from the external power grid, the cost of purchasing gas power from the gas grid and the cost of operation and maintenance of each unit. Furthermore, to accelerate the convergence of the RL, the penalty cost of the agent action exceeding the limit of each unit is added to the reward function. The penalty function is designed to generate a negative reward value when the agent's action exceeds the constraints at each time. Thus, the agent's action value is constrained by penalty function to a specified range in the training process.
The penalty cost of the agent's action is represented by δ, mainly includes two parts, agent action amplitude penalty and agent action change rate punishment. Among them, the action amplitude and change rate penalty of the agent are determined by the output constraints and ramp constraints of each equipment in the IES.
In the training progress, to reduce the penalty, the action change of the agent will be gradually constrained to within a certain range.
The reward, that the environment model feeds back to the agent, is a component of the above objective function at each time. Since the RL agent adopts the form of maximizing the cumulative return, the objective function in the reward takes a negative value as shown in (32).
where r 0 is used to ensure that the cumulative return of the agent changes from negative to positive in the learning process.

B. A3C FRAMEWORK
The A3C algorithm is proposed by Google DeepMind in 2016 [27]. The asynchronous training structure of the A3C algorithm is shown in Fig. 3. Compared with the traditional actor-critic algorithm, the A3C algorithm has an asynchronous learning mechanism. Multiple actor-critics are executed in parallel on the VOLUME 9, 2021 CPU thread of a device [28]. Each agent obtains network parameters from the global network through the pull function before running. The agents on each thread interact independently of the environment and update their parameters. After the operation is completed, the parameters are synchronized to the global network through the push function [29]. The global network does not need to be trained and only stores Actor-critic structure parameters. The A3C algorithm can improve operating efficiency through multithreading. Different agent exploration strategies are independent of each other, which is conducive to the convergence of the system.
At each time t, the actor takes action a t according to the policy function π ( a t | s t ; θ) to make the environment state change from s t to s t+1 , and the obtained reward is r t (a t , s t ). The optimized trajectory τ is expressed as s 1 , a 1 , r 1 , s 2 , a 2 , r 2 , · · · s t , a t , r t , · · · . The obtained reward R(τ ) of the trajectory τ is the sum of the obtained rewards at each stage and can be obtained by (33).
In the IES optimal scheduling, a trajectory in one day is divided into 24 h. The probability of trajectory τ is calculated by (32).
The expectation reward of the actor under the policy π can be calculated (33).
Therefore, the purpose of optimization is to guide the actor to take actions to maximize the expected reward value. The expectation reward has been given by (33), so the policy π can be updated by solving the network parameter θ gradient.
In the gradient solution process, the log function derivation and the average value of N sampling methods are used to approximate the expectation reward by (34).
Critic network mainly uses the value function to evaluate the performance of the policy, which is represented by V (s t ; ω), where ω is the critic network parameter. It can be obtained by (35).
The action value function Q π (a t , s t ; ω) represents the expected sum of rewards when starting in the state s t , taking action a t and from then on following the policy π.
The advantage function A π (a t , s t ; ω) in (39) can evaluate the effects that the actor takes an action a t compared to the average at the current state s t . When the action value function Q π (a t , s t ; ω) is greater than the value function V π (s t ; ω), the value of the advantage function is positive, indicating that it is advantageous to take the current action in this state, the policy improvement is positive, and the policy network parameters are updated in the direction of increasing returns.
A π (a t , s t ; ω) = Q π (a t , s t ; ω) − V π (s t ; ω) . (39) In this paper, to speed up the learning process, the A3C algorithm adopts the N-step method, and its advantage function is calculated by (40).
where r t is the instant reward when taking action a t in the state s t . Therefore, the critic network parameter ω can be trained as: To avoid local optimization, policy cross entropy (PCE) is added to the A3C algorithm. Since the cross-entropy index can better describe the uncertainty of the probability distribution, the PCE can improve the ability of exploration diversity of the RL algorithm based on actor-critic. After adding the PCE, the gradient of the policy function is updated as: The optimal scheduling framework of this paper is shown in Fig. 4. The global network parameters are randomly initialized at the beginning of training. The network parameters θ and ω of the actor and critic network in each thread are synchronously obtained from the parameters θ and ω of the global network through the pull function. Then the global pedometer T and the local thread pedometer t are initialized from an episode to the episode T max . The output of each unit of IES is optimized in each episode. The parallel agent interacts with the environment and updates the parameters θ and ω through (42) and (44). In the training process of the A3C algorithm, agents on each thread interact independently with the environment. A3C algorithm adopts an asynchronous training mechanism. The global network preserves the global share parameters (θ, ω), which are updated asynchronously by the agents on each thread. Asynchronous update means that each agent updates the shared parameters after terminating a training episode. The global network saves the newly updated parameters, and the next agent will start the training with the new parameters. At the same time, the historical data is trained in the A3C model for the optimal scheduling of the electricity-gas-heat IES, which can guide the output of each unit to achieve the control purpose in actual operation.

IV. CASE STUDY
This paper establishes a simulation platform in the Gym toolkit of Open AI. The computer hardware configuration is Intel core i7-8700@3.20GHz, 6 cores, 12 threads, and 32GB of memory. In the simulation, the integrated energy system configuration in this paper is shown in Figure 1. The operating cost of the integrated energy system studied in this paper mainly comes from the purchase of energy. Therefore, the influence of the network and users on this method is ignored in the case study. The heat load, electric load, gas load, wind power, PV power generation data and the timeof-use (TOU) electricity price are taken from the State Grid Liaoning Electric Power Supply CO. LTD in China. The simulation parameters of the IES are shown in Table 1. The price of natural gas is 49c | /m 3 . The time-of-use (TOU) electricity price is shown in Table 2. The energy storage equipment   parameters are shown in Table 3. Other parameters are shown in Table 4.
The A3C algorithm uses neural networks to fit actors and critic networks. The input of the actor network is the state set S, the output is the action set A, the number of hidden layers is 3, and the numbers of neurons are 400, 200, and 100. The input of the critic network is the state set S and the action set A, the output is the action value V π , the number of hidden layers is 3, and the numbers of neurons are 200, 100, and 100. All neural networks use the rectified linear unit (ReLU) activation function. The parameters of the A3C algorithm are shown in Table 5.

A. A3C TRAINING PROCESS
The data from the 1st November to the 31st January is used as the training set. The data of February is used as a test set to test the optimization effect. The training data is shown in Fig. 5.
The convergence characteristics of the proposed method are shown in Fig. 6. It can be seen in Fig. 6 (a) that the method obtains a low reward in the initial stage, which is a trial and error process. Then through further training, the reward is continuously improved. The agent can make more reasonable decisions through the training. The algorithm begins to converge when approaching 3300 episodes. At this time, the agent can obtain higher rewards steadily. The training process takes about 3.3 hours. The penalty value of the agent's action exceeding the limit can also be seen. It can be seen that after the 2000 episodes of training, the agent's actions can be controlled within the prescribed range. Fig. 6 (b) shows the value of the cumulative rewards of agents in different threads. Agents on different threads interact independently with the environment, resulting in different reward values at the same episode. The A3C algorithm can significantly accelerate the convergence speed of the algorithm through independent exploration of each thread agent through the asynchronous update. Figure 7 shows the convergence of the A3C algorithm under different discount factors. Referring to the literature [27] and [29] on the A3C optimization algorithm, four discount factors of 0.85, 0.9, 0.95 and 0.99 are selected in the simulation. When the discount factor is 0.85, the algorithm cannot converge. The reward function fluctuates when the discount factor is 0.9 and 0.99. the algorithm can obtain a  better convergence. when the discount factor is 0.95. Therefore, it is not conducive to the convergence of the algorithm when the discount is t selected too large or small.

B. OPTIMIZED CONTROL EFFECT OF IES
The optimized scheduling result is shown in Fig. 7. During the valley electricity price period (23:00 to 05:00), the wind power output is relatively high and CHP does not work. The battery storage charges from wind power and external power grid to satisfy the electric load. The gas production cost of the PtG unit is lower than that of gas purchase from the gas station. As the gas load demand is lower during this period, the gas load is supplied by the PtG unit and GS. Since CHP stops working, the heat load is supplied by the EB, and the HS stores excessive energy at the same time.
During the flat electricity price period (05:00-07:00, 09:00-13:00, 14:00-17:00, and 20:00-23:00), the electricity price is higher than that of the valley period. CHP starts to work. The electric load is supplied by WT, PV, CHP, and external power grid, and BS charges or discharges to obtain a higher reward. For example, during 05:00-07:00, since the electric load demand is higher than WT and PV output, and the SOC of BS is relatively high, the BS chooses does not to charge or discharge. The gas load and CHP are supplied by the gas network. The CHP adopts the 'determining electricity by heat' mode to meet the heat load demand.
During the peak electricity price period (07:00-09:00, 13:00-14:00, and 17:00-20:00), the CHP adopts the 'determining heat by electricity' mode. The electric load is mainly provided by wind power, PV, CHP and BS to reduce the cost of electricity purchase. The insufficient part is purchased from the external power grid. The gas load demand is supplied by the gas station, and the function of GS is to stabilize the gas load fluctuation.
With the proposed A3C-based optimization strategy, each energy storage device selects charging or discharging modes at each scheduling time. Fig. 8 shows the charge-discharge power and the SOC of the energy storage system. Since the penalties for exceeding the limit of actions are added in the reward function, the capacity of the energy storage and the climbing is limited within the constraints.

C. COMPARISON OF CONVERGENCE SPEEDS AND SCHEDULING EFFECTS OF DIFFERENT ALGORITHMS
To verify the effectiveness of the A3C-based scheduling strategy proposed in this paper, DDPG and DQN algorithms are used for comparative analysis. The parameters of DDPG are the same as the proposed method. The parameters of DQN are selected from [18].
The training time and rewards of the DQN algorithm, DDPG algorithm and the A3C algorithm are shown in Table 6. It can be seen that, compared with DDPG and DQN algorithms, the training time of the A3C algorithm with the asynchronous mechanism is reduced by 30% and 37%, and the reward is increased by 25.6% and 30.8%.
The reward curves of the three DRL methods in the learning process are shown in Fig. 9. It can be seen that the fluctuation of the reward curve with the DQN algorithm is   the largest. Although it can eventually converge to a relatively high reward value, its convergence speed is not as fast as the DDPG algorithm and the A3C algorithm. Since the strategy cross entropy is added to the A3C algorithm, it can describe the uncertainty of the probability distribution, and obtain a higher reward value. The convergence speed and learning effect with the A3C algorithm is higher than the DQN and the DDPG algorithms.
The data of one day in February is selected as the test set to verify the actual control effects of the three algorithms. The lowest total costs of the whole day using the A3C, DQN, and DDPG are 2412.25$, 2501.12$, and 2486.36$. The costs of different periods of the day are shown in Fig. 10. The A3C algorithm consumes the lowest cost. During the grid electricity price peak period, the total cost using the A3C algorithm is less than the other two methods. By scheduling the output of each unit, the electricity consumption with the A3C algorithm is shifted from the peak electricity price period to the valley electricity price period. The total costs with DDPG and DQN algorithms are higher. To further verify the stability and generalization ability of different DRL and MPC methods in solving the economic dispatch of IES, 20 days of data are randomly selected from the test set. MPC is widely used as a classical optimization method in the optimal scheduling of integrated energy systems [34]. The parameters of MPC are selected from [16]. In Table 7, compared with DDPG, DQN and MPC methods, the average cost with the A3C method is reduced by 3%, 5.2% and 8.7% respectively.

V. CONCLUSION
In this paper, an A3C-based optimal scheduling framework is proposed for the electricity-gas-heat integrated energy system economic dispatch. We described the IES optimal scheduling problems as an MDP, which does not rely on expert knowledge of the system modeling and the supply-demand forecast of the IES. The proposed A3C-based optimal scheduling framework can optimize the control decision for supply and demand sides by asynchronous learning of agents, and to reduce the relevance of parameters update of neural networks by the multi-agent utilization of the CPU multi-threading function. Compared with the DQN and DDPG algorithms, the training time of the proposed methods is shortened by 30% and 37%. And the daily average operating cost is reduced by 5.2% and 3% compared with the DQN, DDPG and MPC.
In this paper, the economic dispatch of the integrated energy system is mainly realized by coordinating the output of each unit on the energy supply side, without considering the influence of the energy network and the demand side. In future research, the source-load coordination optimization method considering comprehensive demand response will be further studied. JUNYOU YANG (Member, IEEE) received the B.Eng. degree from Jilin University of Technology, Jilin, China, the M.Sc. degree from Shenyang University of Technology, Shenyang, China, and the Ph.D. degree from Harbin Institute of Technology, Harbin, China. He was a Visiting Scholar with the Department of Electrical Engineering and Computer Science, University of Toronto, Canada, from 1999 to 2000. He is currently the Head of the School of Electrical Engineering, Shenyang University of Technology. He is also a Distinguished Professor in the Province of Liaoning and the first 100 level candidates in the BaiQianWan Talents Program. He has led more than 50 research projects and has more than 200 publications in his technical field. His research interests include wind energy, special motor, and its control.
XINYI LU received the B.S. and M.S. degrees in electrical engineering from Shenyang Agricultural University, Shenyang, China, in 2012 and 2016, respectively. She is currently an Intermediate Engineer with the Electricity Intensive Control Department, State Grid Liaoning Marketing Service Center, Shenyang. She is mainly involved in integrated energy system stability analysis.
LIU GAO received the bachelor's degree from Heilongjiang University of Technology, in 2020. She is currently pursuing the master's degree in electrical engineering with Shenyang University of Technology. Her research interest includes scheduling optimization for integrated energy systems.
XIRAN ZHOU received the bachelor's degree in electrical engineering from Shenyang University of Technology, in 2020, where she is currently pursuing the master's degree in electrical engineering. Her research interest includes scheduling optimization for integrated energy systems.