A Day-Ahead Economic Dispatch Scheme for Transmission System with High Penetration of Renewable Energy

The great uncertainty caused by the high penetration of renewable energy brings severe challenges to the economic dispatch of transmission systems. In this study, a day-ahead economic dispatch model is proposed. The model is based on the coordination framework of transmission and distribution, and it fully considers the response potential of the distribution network. The scenario generation method based on the Copula function is used to describe the uncertainty of the energy output of renewable energy sources such as wind power, and then, the response potential of the distribution network is analyzed. A day-ahead economic scheduling framework for transmission and distribution coordination is proposed, which is solved by a deep reinforcement learning algorithm. Finally, the effectiveness of the proposed method is verified through application to the IEEE 6-bus transmission network and the IEEE 7-bus distribution network.


I. INTRODUCTION
The proportion of renewable energy generation in the grid will be further increased as the targets of carbon peaking and neutralization proposed by the seventy-fifth session of the United Nations General Assembly are pushed forward. Many countries have drawn up long-term renewable energy development plans in an effort to achieve carbon peaking and carbon neutralization at an early date. However, the great randomness and large volatility produced by the high penetration of renewable energy in the power grid bring serious challenges to the power grid [1], [2], and how to fully consider the randomness and volatility of renewable energy has become a key problem to be solved [3].
Traditional economic scheduling processes do not consider the uncertainty of renewable energy because they mainly focus on dispatching traditional energy, such as thermal power [4], [5]. Therefore, they are not suitable for current power systems. In the power scheduling process, dispatchers need to forecast the renewable energy output, and if the actual output deviates from the predicted output, either a power shortage or power abandonment will occur. Although renewable energy technologies are becoming increasingly mature, it is still difficult to avoid the prediction error caused by the high randomness and great volatility of renewable energy. In response to the abovementioned problems, Wu et al. [6] adopted a distributed solution method to represent wind power from the perspective of wind power generation and established an economic dispatching model, considering over-dispatching and under-dispatching of wind power. The wind speed scenario is adopted in [7]; the wind power scenario is obtained through the wind speed-wind power conversion curve, and an electricity market equilibrium model is established, considering the penalty of wind power bidding deviation. Considering the uncertainty of renewable energy, another study proposed a robust economic scheduling method for centralized solar power plants using energy storage systems (ESSs) combined with automatic generation control (AGC) in the day-ahead auxiliary power trading market [8]. The time-frequency characteristics of renewable energy are analyzed, and an adaptive hybrid dispatching time uncertain determination method based on the predictive time-frequency characteristics is proposed. On this basis, a day-ahead economic dispatching model is established to effectively improve the economy of power grid operation [9]. In addition, scholars over the world have VOLUME XX, 2021 made contributions to the academic subject about the security problems caused by the access of massive renewable energy generation to the power grid. [10][11][12][13]. Xiong et al. proposed a robust optimization model considering uncertain wind turbine allocation in [10]. Zhang et al. proposed a robust scheduling method under the background of micro-grid with high renewable energy penetration to solve the uncertainty caused by REG in [11]. Ding et al. made a research on the reactive power optimal scheduling problem after massive access of wind generation and proposed a two-stage robust optimization model to apply in the reactive power optimal scheduling process in the distribution network in [12]. Liu et al. built a two-stage stochastic dynamic economic dispatch model to realize the effective economic management of the power system in [13]. Those mentioned papers have analyzed the secure and economic problems brought by the high penetration of renewable energy in the perspective of the market, scheduling process, and so on. But we realize that the current research mainly focuses on the transmission network, distribution network, microgrid, or wind farm separately. They do not involve collaborative scheduling among distribution and transmission networks. With the development of the massive access of flexible heterogeneous energies to the distribution network, the technology that aims at the active distribution network has become more and more mature. Therefore, it is necessary to consider the flexibility of the distribution network in the day-ahead economic scheduling process of the whole power system.
Many scholars have researched the optimal operation of the active distribution network, which makes the distribution network capable of dispatching and brings remarkable benefits to the economy and safe operation of the network [14], [15], [16]. In one study, the load characteristics of air-conditioning systems in office buildings and electric vehicles of a distribution network are analyzed, and a hierarchical scheduling method for an active distribution network considering the flexible loads of the office buildings is proposed, which realizes the economic dispatch of the distribution network [17]. In Zheng et al. [18], a two-stage optimal scheduling method for active distribution networks is proposed to realize the efficient dispatching of distribution network loads, considering the uncertainty risk of renewable energy, load, and electricity price. With the continuous deepening of the research results of active distribution networks, a series of results have been formed in the field of optimal dispatching. In Zhu et al. [19], the service requirements for the cooperative operation and calculation of power transmission and distribution networks are analyzed, and the overall framework of the cooperative operation of power transmission and distribution networks is proposed. Zhu proposes a demand response algorithm for distribution networks, which firstly receives the load instructions of the transmission network and then dispatches the load of the distribution network based on the voltage-load sensitivity matrix [20]. Lin studied the economic scheduling problem of transmission and distribution coordination, proposed a heterogeneous decomposition algorithm, and proved the convergence of the algorithm [21]. Based on the abovementioned research, it is possible to study the economic dispatch of power transmission networks with the goal of cooperative optimal dispatch. Moreover, with the deepening of China's power system reform, distribution networks, renewable energy, generator sets, and other deployable resources can play a role in the day-ahead power trading market. Therefore, this study investigates the economic dispatch of multi-agent power transmission networks.
In terms of model solution, the economic dispatch of multi-agent power transmission networks belongs to a mixed-integer linear programming problem, which can be solved by CPLEX, MOSEK, and particle swarm technologies. However, with the rapid increase of variables in the global power grid, the above solution methods are no longer applicable as it is difficult to fulfill the corresponding requirements. As a kind of promising artificial intelligence technology, reinforcement learning is being drawn more and more attention due to its pronounced advantages of powerful data processing ability, representation ability, and generalization ability [22]. Presently, a branch of research seeks to use reinforcement learning to address complex problems in the field of power systems, see, for instance, Zhao et al. [23] proposed a deep learning method based on the layer-by-layer coding network to identify the fault state of the main bearing of the wind turbine. Liu et al. [24] used the deep reinforcement learning technique to realize effective control of units in the power grid. Although the deep reinforcement learning methods mentioned above can obtain the optimal scheme among a limited number of schemes in different power grid operating environments, they are only suitable for discrete action and state-spaces, and cannot keep up with the requirements of continuous variables in the global economic scheduling model of transmission network. The deep deterministic policy gradient (DDPG) method is a fusion algorithm of the deep Q learning (DQN) and the deterministic policy gradient (DPG) based on actor critical framework in the field of deep learning. DDPG can improve the training effect and algorithm stability through the deep neural network and empirical playback mechanism of the DQN, and realize the active analysis of continuous environment state and the effective formulation of continuous action environment scheme through the DPG. Motivated by this observed common phenomenon, this paper will exploratory use the DDPG method to solve the proposed. The main innovation points are summarized as follows: (1)In order to effectively perceive the uncertainty of the renewable energy in the power grid, a method for renewable energy scenario generation based on the Copula function is proposed to characterize the renewable energy output in the power grid.
(2)With full consideration of the response capability of the active distribution network and the forecasting method of renewable energy, an economic dispatch model considering the coordination of the transmission and distribution network is proposed.
(3)Considering the complexity of the model solution, the DDPG method, which is based on deep reinforcement learning, is applied to the power system, and the effectiveness of the proposed method is verified.
The rest of this article is organized as follows. In Section II, the uncertainty representation method of renewable energy is described. In Section III, the economic scheduling framework of multi-distribution and cooperation is introduced. In Section IV, the DDPG method based on deep reinforcement learning is used to solve the proposed model. In Section V, simulations are detailed, and the results are analyzed. In Section VI, the entire study is summarized.

II. INTERVAL PREDICTION OF WIND POWER BASED ON COPULA FUNCTION
Wind power has strong randomness and volatility. At present, the traditional wind power forecasting method can only obtain certain point prediction results; so the full range of wind power fluctuation cannot be obtained. A large number of wind turbine-based distributed energy sources are currently connected to the grid, and the installed capacity of wind turbine units is increasing year by year. If only using the result of the point forecast to plan and arrange the output of the units, the power dispatch will face some risk, which is not conducive to the safe and stable operation of the power system. Scenario analysis is a common method to deal with the uncertainty of wind power output during power system planning and operation. Compared with point prediction, this method can better describe the uncertainty of wind power by constructing the scenario set, and it can effectively guide optimal power system operation.
First, an interval forecasting model for wind energy is constructed based on the discrete Copula function. By means of the discrete conditional Copula function, the correlation of wind power series in adjacent periods is mined, and the wind power range of a point to be predicted is obtained. Then, the wind power range prediction model is acquired from rolling prediction. Finally, the wind power output scenario set is established by sampling and clustering based on the interval prediction model.

A. Conditional Copula function
If the chosen function used to predict the wind power is continuous, it is necessary to perform multiple inverse function operations for the function itself, which is cumbersome and difficult to establish. In this study, the discrete conditional Copula function is used to construct the prediction model.
Suppose that N independent samples are known for t+1 variables. They are denoted as 1 2 1 [ , , , , ] t t X X X X +  and defined in (1). Each row in the matrix represents a sample, and each column represents the value of the same variable in different samples. 1 1 Because the Copula function is connected to the boundary distribution function between all t + 1 variables, it is necessary to calculate the boundary distribution function of t+1 variables denoted as 1 F , 2 F ,  , 1 t F + . Hence, (1) is substituted into the boundary distribution function to obtain the corresponding boundary distribution function values 1 2 1 [ , , , , ] t t y y y y +  , as shown in (2). All the variables in (2) are in the interval of [0,1], which is decomposed evenly into K intervals, and the range of each interval S is given in (3). 1 1 Since each variable has K subintervals, the (t+1)th variable has a 1 t K + dimensional space. In (2), the edge distribution function of the independent samples of each variable will fall into different subintervals, which is called the Copula function in discrete form. The conditional Copula function refers to the probability distribution function with the first t variables in the Gth sample as the known condition and the (t+1)th variable as the corresponding condition. The probability distribution function 1 1 ( ) with (2) as the known condition is shown in (4).
To approximate a continuous distribution function with a discrete probability distribution, a clustering operation is needed. The first t variables of the N samples are labeled with the same subinterval as those in (2), and they are recombined into a matrix whose dimension is less than or equal to N as a conditional matrix. If the value of the (t+1)th column of all the labeled samples in the condition matrix falls into the sample of the same subinterval, then the total number of samples 1 N can be divided into J classes, and the sample size of each part after classification is 1 M , 2 M , …, J M . The mean value 1 1,2, which is used to represent the (t+1)th function value of each class after classification, and the probability of each class is obtained.
The above expression is the conditional Copula function in discrete form, and the probability distribution function of 1 1 ( ) with (4) as the known condition.

B. Interval forecasting method for wind power
Assuming that the sequence of continuous wind power is known and is denoted as [ ] Each column in the matrix, which is the wind power sequence with fixed time intervals, is treated as . Then, the discrete condition Copula function for predicting wind power is The function is reordered by the size of j p , and 1 p is set to the maximum and J p to the minimum. Because the value of the edge distribution function is in the interval of [0, 1], the interval of [0, 1] is divided into K segments, which is recorded as 1 2 , , K s s s  . A continuous cumulative operation is then applied to j p until the result is greater than or equal to the predictive confidence β , that is, At this point, the subinterval corresponding to the conditional function 1 where the lower bound of the interval is denoted as l s , and the upper bound of the interval is denoted as u s . So, it can be considered that at the confidence level β , the prediction interval of wind power at the (t +1)th moment is the concatenation, that is, By substituting l S and U S into the inverse function of F , the prediction interval of wind power 1 t W + at the (t+1)th moment is obtained:

C. Wind power scenario generation
After the expression of the wind power range is obtained, a large number of wind generation scenarios are generated by sampling and splicing according to probability. In order to simplify the scenarios generation process, it is assumed that wind power output is evenly distributed in each interval Typical scenarios of wind energy joint generation can be obtained by reducing the output scenarios with a clustering algorithm. The K-means algorithm is one of the most widely used clustering algorithms since it offers fast clustering. In this study, the K-means clustering algorithm is used as a scene reduction algorithm. Its clustering steps are as follows: (1) According to the pre-set clustering number, randomly select K scenarios from all wind power joint output scenarios as the initial clustering centers of various types. (2) Calculate the distance between each scene and the cluster center of each category, and classify each scene into the nearest category. (3) Re-compute each cluster center to obtain a new cluster center corresponding to each cluster center. (4) Determine whether the convergence condition is satisfied or not. If it is satisfied, the cluster ends; otherwise, return to step (2). The root mean square error (RMSE) and mean absolute error (MAE) are used to verify the accuracy of the model established after clustering: where the predicted power for time t given by the forecasting model is In order to meet both the multi-level and multi-type scheduling requirements, this study proposes an economic scheduling framework for transmission and distribution cooperation. The whole framework can be divided into three layers: a transmission dispatching layer, a distribution dispatching layer, and a local dispatching layer. The local dispatching layer is a dispatching unit composed of distributed wind power and an ESS, and the distribution network is regarded as a generation unit with a certain regulating capacity.
Under the framework of a hierarchical dispatching algorithm, both the transmission and distribution networks have autonomous capacity, and the power transmission plan is arranged according to the power price of the transmission and distribution boundary.
Because the transmission network can connect multiple distribution networks at the same time, its network can be used as the coordinator to control the power plan between the transmission and distribution networks and realize the safe and economic operation of the system through the coordination of transmission and distribution.

A. Local optimization scheduling model
The local scheduling layer is the basic unit under the scheduling framework. In the distribution network, distributed renewable energy is scattered, lacking the smoothness of centralized renewable energy, and its output is highly uncertain. Therefore, according to the electricity price of the distribution network, the internal power resources are optimized, and the power generation plan is reported to the dispatching layer through joint dispatch with the ESS.
Under the framework of co-scheduling between transmission and distribution networks, the optimization objectives of the local dispatching layer are as follows: (1) to maximize the income of the local dispatching layer, i.e., to maximize the wind power dissipation, and (2) to reduce the output fluctuation of renewable energy during the scheduling cycle. On the basis of establishing the joint optimization model of wind turbine and energy storage, the uncertainty of wind turbine is brought into the optimization objective function. The objective function is shown in (13): where T represents the duration of the scheduling cycle; The constraints of the objective function are shown from (15) to (21).
where unit,t After the optimal generation plan e t P has been calculated by the local scheduling layer in the optimization period, the layer reports the planning value to the distribution scheduling layer and optimizes the follow-up scheduling plan.

B. Scheduling optimization model for distribution network
According to the structure of the dispatching framework, resources can be dispatched by the distribution network, including: (1) distributed power supply (DG), such as micro-gas turbine; (2) transmission power of the connection line; and (3) elimination of planned generation power reported by the local dispatching layer.

1) OBJECTIVE FUNCTION
For any distribution network within the region, the optimization objective is , , , , 1 (1 ) (1 ) where dis k P and dis k P are the lower and upper bound of the transmission power between the transmission network and the kth distribution network respectively; k Q is the planned transmission power between the transmission network and the kth distribution network; ρ is the deviation ratio which is allowed in the process of the actual transmission；τ is the time span of each time period. d) Constraints of line safety In this study, the power flow calculation method for the distribution network is based on the Distflow model. The specific expectation can be used for reference [25].

C. Dispatching optimization model for transmission network
To meet the demand of real-time power balance, it is necessary to adjust and optimize the reserve capacity. In this study, a two-stage coordination optimization model for the power transmission network is established. In the first stage, the reference value of the generator output and the interaction power between the generator and the distribution network are optimized. In the second stage, the reserve demand and the reserve economic distribution among the conventional units are determined according to the generator ramp-up scenario. In the dispatching process of the transmission network, the decision variables are the generation variables and the spare capacity variables of traditional generators, the generation variable of wind power and the boundary passing power variable between transmission and distribution.

1) OBJECTIVE FUNCTION
The objective function of the dispatching optimization model is as follows: where Overall, the weight parameters in (29) are determined by using experts grading method mentioned in [27]. We assume all the power devices in the power system belong to the same operator and choose the maximum economic benefit as the optimization goal for the reason that the weight parameters 1 2 3 4 , , , ω ω ω ω are equal to 1. Specifically:

2) PREDICTIVE SCENARIO CONSTRAINTS
The predictive scenario constraints of the dispatching optimization model are as follows: , , ,max , , ,

3) GENERATOR RAMPING SCENARIO CONSTRAINTS
The generator ramping scenario constraints of the dispatching optimization model are as follows: , ,

IV. Model solving
Deep learning, which is a class of self-learning algorithm based on a deep neural network, has great advantages in solving complex nonlinear problems [26]. The reinforcement learning method is a subject of great interest because it can independently generate precise decisionmaking steps according to environmental information and realize the expected benefit of the model. At present, the main reinforcement learning algorithms are the value function algorithm and the strategy optimization algorithm. Compared with the deep reinforcement learning algorithm based on a value function, the DDPG algorithm based on the Actor-Critic framework has higher efficiency and faster solution speed. Therefore, the current deep reinforcement learning algorithms are generally based on the Actor-Critic framework [27], such as the Deep Q-network algorithm, asynchronous advantage Actor-Critic (A3C) algorithm, and policy gradient algorithm. The Deep Q-network and A3C deep-reinforcement learning algorithms are suitable for discrete action spaces, wherein the number of output actions is finite. For tasks in continuous action spaces, a good decision-making effect can be obtained based on the deterministic policy gradient (DPG) algorithm [28]. Based on this, Deepmind proposed the DDPG algorithm in 2016, using the deep neural network as the approximator to integrate the deep learning neural network into the DPG algorithm and taking into account the computational advantages of the convolution neural network and the continuous optimization characteristics of the DPG algorithm. In the present study, DDPG is used to solve the model. By analyzing the operation data of a power transmission network, distribution network, and wind farm, the optimal output strategy of each generator, the distribution network, and the wind farm is obtained.

A. Principles of DDPG algorithm
The where n N is the number of time periods to be taken into account; t J represents the objective function with attenuation at time t; ( ) E ⋅ is the expectation function; γ is the discount factor, 0 1 γ < ≤ ; and t n r + represents the return value at time t + n.
The training optimization objective function is the minimum loss function L of the main evaluation network: where t q represents the output value of the main evaluation network at time t; and , T t q represents the output value for the evaluation network, which is calculated from the samples to be optimized at time t: where t r represents the expectation of return at time t; and ' 1 t q + represents the output value of the sample target evaluation network at time t + 1.
The optimization algorithm of DDPG is used to obtain the optimal parameters , where η is the divergence factor, and 0 1 η < < ; and , 1 , 1 , c a T t T t θ θ − − represent the deep neural network parameters of the target action network and the target evaluation network, respectively, at time t − 1. Each step of DDPG updates the target network parameters in a very small range to prevent policy divergence and learning process instability caused by network parameter changes. For the in-depth theoretical derivation of DDPG, please refer to [29].

1) INITIAL SAMPLE DATA
This study considers the system composed of the IEEE 6bus transmission network and the IEEE 7-bus distribution network. The structure of the system is shown in Fig. 1 in the appendix, and the wind-driven generator is connected to node 3. The simulation data are used to generate the sample data and establish the sample database. The simulation model is built in MATLAB.
To obtain the independent optimal operation data of the distribution network, the electricity output purchased by each distributed generator set in the distribution network  are  assigned  to  the  intervals  of   ,min  ,max  ,min  ,max  , , ,  , , ,  , ,  , , , , , dis dis buy buy MT k i t MT k i t dis k t dis k t P P P P         under the condition that the power conservation of the distribution network is satisfied. The electricity purchasing price at different times is input, and the total load of the distribution network is changed randomly in the range of 0.9-1.1. The model is solved by the software cplex, and the output data of the distribution network are optimized independently. To obtain the independent optimization data of the wind farm and generate the scenario of the wind farm at random, the forecast value and the unit electricity price of the wind farm at different times are taken as the inputs for the objective function (14), and the independent optimization output data are obtained.

2) EIGENVALUE SELECTION
Because the power network is a nonlinear system, it is necessary to select suitable sample data to represent the operation of the distribution network, which makes the algorithm with good applicability to the change of environmental state.
The model constructed in Section III shows that different decision-making schemes are determined by the objective function of the wind farm, the distribution network, and the transmission network when the load and electricity price change over time. Therefore, the operating states-such as the price of electricity sold by the transmission network, the price of electricity purchased by the distribution network, the cost of unit generation, the benefit and fluctuation of the wind farm, and the safety constraint of the power network-are chosen as the state information. The operation information include the output of each wind turbine unit, the output of the distributed generator set, the purchasing power of the distribution network, and the output of each generator set in the transmission network. The sum of the objective functions of the three modelslocal optimization model, scheduling optimization model for distribution networks, and dispatching optimization model for transmission network-is taken as an important index to evaluate the economy of power scheduling before and after the optimization. After obtaining the action value and state information, the return value can be calculated. The constraints in the model can be expressed as

3) SAMPLING PROCESS
In this study, the modified Metropolis-Hastings (MMH) method, which is suitable for high dimension and small failure rate problems, is used to re-sample the samples during the process of normalizing and extracting the training sample data, and the samples are obtained from the sample library according to the probability of action occurrence to form the experience playback samples. The in-depth learning convolution neural network processes the data in the early stage, which makes the information of high value density as the input data of reinforcement learning. Multiple datasets are trained at the same time to improve the generalization ability of the training model and the training efficiency of the deep learning convolution neural network model. After training, all the historical optimization tasks can provide training sample data for the depth reliability network. So the new tasks can be optimized online directly using the historical samples as the initial database.

1) DDPG LEARNING PROCESS
The DDPG learning framework based on the day-ahead economic dispatch of the transmission network is shown in Fig. 1. The background data processing collects the current status information, as well as the returned value and status information at the last moment, and then, it forms the sample unit to store in the data pool. The MMH re-samples D sample units ( ) from the experience data pool used for training, referred to as the experience sample playback; t s and 1 t s + represent the state values at time t and t + 1, respectively, and t a is the action value at time t. The correlation between the data is broken by using the sample playback.
In the course of optimization, firstly, according to the parameters of the currently not-updated target network, the action prediction value and the corresponding target evaluation value are calculated. The loss function L is obtained to evaluate the training of the network, and the parameters of the main evaluation network are updated. Then, the parameters of the main policy network, target network, and evaluation network are updated. By training the deep neural network, the parameters of the main network and the target network are updated. After being updated, the target network obtains the current action value and outputs it to local optimization scheduling model, distribution model and transmission model. The distribution network status information at time t + 1 is taken as a new sample, and the learning and calculation process is carried out for this moment. The DDPG optimization process uses a deep convolution neural network, which has a strong selfoptimization ability, to process the power network operation data.

2) DDPG TRAINING PROCESS
The DDPG initial training process is the same as shown in Fig. 2, but in the early stage of the program, in order to obtain more possible operation modes, the initial training action (set to t<10000 in this study) adds some randomness and selects the action value according to (51). where ( ) g ⋅ is the output function of the main action network; µ θ represents the main action network parameters; and χ is a Gaussian random variable.
With the deepening of training, the probability of random action is gradually reduced until the variance of the Gaussian random variable is zero. The Adam optimizer was selected for gradient descent optimization and ε greedy algorithm was used to select the action during training. The initial value is set to ε0=0.5, and the minimum value is set to εmin=0.01. In the training process, the layer number of the convolution neural network is chosen to be 60, and each layer has 30 neurons. The discount factor is set to =0.94 γ , the learning rate of the action network is set to -3 10 , the learning rate of the evaluation network is set to -4 10 , the number of samples is set to 64, the sampling amount is set to 4 10 , and =0.001 η . Gaussian noise with a variance of 0.5 is added to the action value by using (51), the return value is calculated according to (50), the scene with the same return value is always kept in the same state, the repeated scene is continuously eliminated, and finally, 10 sample files are formed. In order to improve the training efficiency, when the training sample data are extracted, multiple datasets can be trained at the same time, and the training can be done in the form of data packets.

3) DDPG TRAINING PROCESS TEST
The training process of the DDPG algorithm is evaluated online with the change of return value and loss function. The changes of the return value and loss function value during DDPG network training are shown in Fig. 2  In Fig. 3, the return value r tends to remain stable as the number of training sessions increases. It can be assumed that the system is stable after 700 training sessions. The greater the return value after a certain period of network training, the better the strategy effect. Fig. 4 shows that with the increase of training sessions, the loss function value approaches zero, and the evaluation value of the target network is equal to that of the sample to be optimized.
To sum up, the DDPG algorithm can effectively solve the pre-scheduling model of the power transmission network. The use of a target network and a master network makes the learning process more stable and convergent.

V. Case study
An IEEE 6-bus system is used to prove the efficiency of the day-ahead economic scheduling model based on the DDPG algorithm, as shown in Fig. 2. The detailed data of IEEE 6 node system is shown in Appendix Table II and Table III. The electricity selling price of the wind farm and the electricity purchasing price of the distribution network are shown in tables IV and V in the appendix. Taking T = 24 h, the time of economic dispatching is 1-24 h on the second day, and the time period is 1 h. The parameters of conventional units G1, G2, and G3 are shown in Table I in the appendix. The predicted value of wind power comes from the wind power data of the Irish Power System on March 11, 2015 [30], and the actual value of wind power is the actual power of the Irish power system on that day, as shown in Fig. 1. The wind power curtailment cost is set to 400 yuan/MW. All simulation cases are executed on MATLAB2016a. The computer for emulation is configured with Intel (R) Core (TM) i5-7500 2.40 GHz.

A. Effectiveness of wind power forecasting based on Copula function
In order to verify the validity of the wind power forecast in this study, the following two scenarios are constructed: • Scenario A1: The day-ahead economic dispatch of the transmission network is obtained by using the forecasted value of wind power to solve the model. • Scenario B1: The day-ahead economic dispatch of the transmission network is obtained by using the prediction method of wind power proposed in this study. After 1000 iterations, the DDGP algorithm completes the model learning and returns the final scheduling results. Fig.  4 shows the output of each generator set in the two scenarios, and Fig. 5 shows the amount of wind curtailment in the two scenarios. As can be seen from Fig. 4, because the uncertainty of wind turbine output is taken into account, the output of the generator set in scenario A1 decreases significantly during 08:00-11:00 and 22:00-24:00. Fig. 5 shows that using the predicted value of wind power to solve the model can reduce the amount of wind curtailment in the power grid by 27.3%.
We compare the results of proposed method with the results getting from the BP neural network, SVM and LSSVM algorithm based on the historical data from multiple scenarios. The comparative results are shown in the Fig.6 and the Table I. It can be easy to understand that the effectiveness of proposed algorithm. Table I shows that the RMSE and MAE are the minimum in all those four type of algorithm. It proves that the proposed method can improve the prediction accuracy and the effect of the following optimization dispatching.

B. Effectiveness of day-ahead economic dispatch of transmission network
Similarly, two scenarios are constructed for simulation in this study to verify the effectiveness of the proposed model.
• Scenario A2: The transmission network conducts day-ahead optimization dispatching without considering the participation of the distribution network.
• Scenario B2: The transmission network conducts day-ahead optimization dispatching with the method proposed in this study. In A2, the distribution network is regarded as a fixed load that does not participate in the day-ahead economic scheduling of the transmission network. In B2, the model proposed in this study is used for optimization. A2 and B2 are both solved by the DDPG algorithm. Finally, the optimization results of the two scenarios are as shown in Fig. 7  As shown in Fig. 7(a), the overall economy of the power grid is improved after the economic optimization. The economy efficiency during the wind curtailment periods is also improved to a certain extent, as the entire network scheduling ability is improved after the distribution networks joined. A total of 57,300 yuan is saved in the whole network dispatching process. As shown in Fig. 7(b), the participation of the distribution network during the dayahead dispatching will greatly improve the economic efficiency; however, according to Fig. 7(c), the participation of the distribution network will cause the economic efficiency to decrease. This is because during the process of wind curtailment, the economic efficiency of the distribution network will be weaken to ensure the economy of the whole network. As can be seen from the results in Fig. 7(c), although the economy of the distribution network in the wind curtailment stage is lower than that in the independent optimization stage, there is no significant deviation from the cost of independent optimization.

C. Comparison of algorithm effectiveness
In order to characterize the effectiveness of the algorithm, the traditional ATC and ADMM algorithms are used to solve the model. Considering the limitations of the ADMM algorithm for solving the model with three or more layers, the local optimization model is integrated with the optimization model of the distribution network layer, and the ADMM algorithm is introduced to solve the whole model, where in both the Lagrange multiplier and the penalty factor are set to 1, and 0.001 In the process of solving the ATC, the dispatching model of the transmission network is set as the parent system, and the optimized dispatching model of the distribution network layer and that of the local dispatching layer are set as the subsystem. The parameters are set as 0  The ATC algorithm has a good result for multi-level solution, but it takes 135.4 s to obtain the final result due to the complexity of the model during the solution process. The result of ADMM is inferior to that of ATC, but its solving speed is slightly improved. The DDPG algorithm takes only 5.3 s to solve the model because it trains the model in advance, and its results are better than those obtained by the traditional solution methods. This is because in the process of model training, the reasonable setting of return value and loss function enables the trained model to fully adapt to the day-ahead economic dispatching process of the transmission network. Overall, the DDPG algorithm has a very prominent advantage in solving the model proposed in this study.

D. Generalization of the algorithm
To further demonstrate the effectiveness of the DDPG algorithm, the network architecture of the transmission network and the location of the active distribution network access are changed, and the model is solved. The change process of the return value during the iteration is as shown in Fig. 8. It can be seen from Fig. 8 that the DDPG algorithm can learn autonomously by constantly changing the network structure of the power grid, and it can realize the convergence of the network within 1000 iterations. The final results of day-ahead dispatching of the power grid under the three conditions are shown in Table III. It can be seen that the DDPG algorithm can solve the model effectively and converge within 1,000 iterations. Furthermore, the results obtained with co-scheduling are compared with the results obtained with independent scheduling. The results indicate that the DDPG algorithm has good generalization ability and can effectively adapt to the change of grid structure.

VI. Conclusion
In this study, a day-ahead economic dispatching model of transmission and distribution coordination, considering the uncertainty of renewable energy, is constructed, and the DDPG algorithm is used to solve the model. A simulated case study is conducted to demonstrate the application of the proposed model, and its reliability and effectiveness are verified by various comparisons. Additionally, the following conclusions are drawn: (1) The prediction method of wind power interval based on the Copula function is proposed and introduced into the day-ahead economic dispatching model of the transmission network. Simulation shows that the introduction of the interval forecasting method for wind power can reduce the problem of wind curtailment in the process of day-ahead scheduling.
(2) The day-ahead economic dispatching model of transmission and distribution coordination is proposed, which effectively improves the overall economy of the power grid.
(3) The DDPG algorithm can accurately solve the model in a very short time, and the DDPG algorithm has strong adaptability to changes of the power grid. This study provides valuable theoretical reference for the application of deep reinforcement learning in power systems.
In addition, in the process of model construction, the distribution network side only considers the distributed generator set. However, due to the extensive access of the distribution network to flexible resources such as electric vehicles, the dispatching of the above resources should be considered in the distribution network to deepen the dispatching capacity of the distribution network. This will be the focus of our future research on day-ahead dispatching of transmission networks.