A Comparative Study of Deep Neural Network and Meta-Model Techniques in Behavior Learning of Microgrids

Behavior learning of microgrids (MGs) is a necessary and challenging task for multi-MGs cooperation and energy pricing of distribution energy market. With the increasing demand for user privacy, this problem becomes more severe because of much less limited access to device parameters and models behind the Point of Common Coupling (PCC), which hinders conventional model-based power management methods. In this paper, to address this problem, some novel model-free data-driven methods including Deep Neural Network (DNN) and Meta-model techniques, such as Radial Basis Function (RBF), Response Surface Methods (RSM), and Kriging methods are introduced. These methods can predict the behavior of MGs through continuous iterative learning by accessing merely the historical active power measurements at the PCCs as well as public electricity price and weather information behind the PCCs, without full system identification and no prior knowledge on the system. A comparative study has been fully carried out by comparing with the conventional model-based model to better understand their advantages, drawbacks and limitations. The validity and applicability of the proposed methods is verified by numerical experiments. This paper can provide some references for future MGs interactive operation under incomplete information.


I. NOMENCLATURE
has excellent potential and controllability to participate in energy market services through the optimal coordination operation of DG, RES, DR and ESS [1].In order to realize effective interaction operation between MGs and the distribution network, and to achieve reasonable energy pricing, fast and accurate MG behavior prediction is the primary key However, predicting the behavior of MGs is evolving into an increasingly challenging task for utilities in recent years.The main difficulty is that utilities generally have limited access to real-time asset behaviors and models behind the Point of Common Coupling (PCC) with MGs.This problem becomes more severe with the increasing demand for user privacy [2].Hence, how to deal with this challenge is an urgent issue that needs to be taken in distributed network operation and energy market design.
Traditionally, the behavior prediction of MGs are formulated by the model-based power management methods [3], [4], there have been substantial efforts to investigate the optimal calculating of PCC power in the literature, including heuristic techniques [5], [6], nonlinear programming methods [7], [8], and distributed optimization methods [9], [10].However, these power management methods all highly depend on the full system operator's knowledge of MG operation behind the PCC and customers' private data, which compromise the data ownership of MGs.Moreover, they have much of disadvantages.Such as: (1) Depending on ideal physical model and experience, the timeliness is poor.
(2) Rules are formulated based on fixed model and typical operation mode, their inability to adapt to constantly-changing system conditions when the amount of measurement data is limited.(3) Energy model and control mode need to be greatly simplified and approximated to ensure calculation efficiency.Moreover, the random power prediction error of renewable energy and loads will be further influencing the accuracy and reliability of the model.
On the contract, model-free data-driven methods, such as Artificial Neural Network (ANN), Meta-model techniques, are new effective solutions for MGs behavior predicting, which can predict the behavior of MGs through function approximation or continuous iterative learning by accessing merely the active power measurements at the PCCs as well as public weather information and public electricity price information behind the PCCs, without full system identification and no prior knowledge on the system.There have been some useful preliminary exploration on the prediction of renewable energy generation, or on stability evaluation of power system, etc.Such as the utilization of ANN to forecast short-term load demand for MGs [11], prediction of uncertain factors [12], [13], utilization of RSM to approximate the critical damping ratio of MGs [14].However, for the behavior learning of MGs, there are strong uncertainties included in both power source side and load demand side, also, there exists complex energy management strategies in MGs.Moreover, due to the existence of energy storage and transferable loads, complex time-coupling relationship will be existed for MGs participates in demand response, thus the behavior characteristic of MGs will be more complex and difficult to predict than DGs.
To the best of authors knowledge, the behavior predicting of MGs with strong uncertainties and incomplete information under complex energy management strategies have not yet been fully studied.In addition, a comparative study in testing various model-free data-driven methods to better understand their scope of applications are also not properly addressed.
Based on the above challenges and motivations, In this paper, several recently introduced model-free data-driven methods, including DNN, RL, RBF, RSM, and Kriging in predicting the real-time behavior of MGs under incomplete information are carried out, the validity of the proposed method is verified by numerical experiments, especially, a comparative study is comprehensively implemented by compared with conventional model-based optimal methods, the main contributions of this paper can be listed as follows: 1) A series of novel model-free method for predicting the real-time behavior of MGs are developed with access only to the historical active power measurements at the PCCs as well as public electricity price and weather information behind the PCCs, thus can handle the current limitations raised from data privacy and data ownership.
2) A comparative study has been firstly carried out to compare novel DNN, RL and Meta-model techniques with the conventional model-based model in behavior learning of MGs under incomplete information, provided a better understanding of their advantages, drawbacks and limitations for future MGs interactive operation.
The remainder of this paper is organized as follows.Section II presents the proposed data-driven based prediction mechanism for MGs behavior learning, which consists of the behavior predicting architecture and the data preprocessing method.Behavior learning method for MGs using DNN, RL, RBF, RSM, and Kriging are discussed in Section III.Two cases are tested and different data-driven methods are compared comprehensively in Section IV, followed by the conclusions in Section V.

III. NOVEL DATA-DRIVEN BASED PREDICTION MECHANISM FOR MG BEHAVIOR LEARNING A. DATA-DRIVEN BASED MG BEHAVIOR PREDICTING ARCHITECTURE
Considering the future MGs interactive operation under incomplete information, model-based solutions will be difficult to apply for their requirement of completely observable and controllable information, thus in this paper, a novel data-driven based MG behavior predicting architecture is proposed, as shown in Fig. 1.
In the upper level is the operation management layer, the distribution system operator (DSO) takes the role to optimize the pricing, realize market clearing and overall optimal operation only by observing the response of MGs to external signals at their PCCs, in our novel data-driven based prediction mechanism, DSO or MG will train a black box meta-model or neural network that describing the interactive behavior of the other MGs merely through the available active power measurements at the PCCs and the public historical data information, such as the statistical electricity price, local solar irradiance, wind speed and temperature information that independent of the MGs.Therefore, it is possible to predict the tie-line power of MG without knowing the internal units and parameters, and it is possible to better implement the interactive operation between multi micro-grid (MMG) or between MG and DSO under incomplete information.Moreover, by avoiding explicit MG modeling, the datadriven model becomes highly adaptable against changes in MG parameters which are excluded from MG's state set.
In the bottom level is the multi-MGs physical layer, the MGs receive the external price information from the upper layer DSO, and then carried out optimal power management to maximum its own interest with internal energy management system (EMS).

B. DATA CHARACTERISTICS ANALYSIS BASED ON MACHINE LEARNING
Data characteristics analysis is the first key point for datadriven behavior learning, probability distributions with fixed parameters are often used to describe the stochastic characteristics of weather information, such as weibull distribution for wind speed, beta distribution for solar irradiance and so on.However, these methods are merely approximate fitting of the actual probability histogram, the fitting errors are large especially at the edges [15], meanwhile, the parameters of these probability distributions are difficult to obtain.
On this basis, this paper intends to introduce nonparametric kernel density estimation (KDE), an effective machine learning technique to mine the depth characteristics of weather data.Based on the sample data directly without any prior knowledge, KDE can achieve high-precision fitting on the basis of selecting the appropriate sliding window.The main principle is to estimate the distribution through the kernel function of each discrete interval (red dashed line in Fig. 2).The color dashed line (Blue Solid line in Fig. 2) is accumulated to approximate the equivalent histogram distribution interval.The calculation is shown in (1).where, can be rewrite as where, K (•) represents the kernel function, commonly used kernels function include Gauss kernels function, rectangular kernels function and so on.
Taking the annual data of local weather information as well as local electricity price information as an example, we can first preprocess the annual data by classifying them with time interval, as shown in (3), and then analyze the statistical characteristics of each time interval with KDE separately.
where, A S,k+i * 24 represents the data at the k-th interval for the ith day, S is the variable set, representing solar irradiance (P si ), or wind speed (P wt ), or temperature (P T ), or electricity price (γ P ).Assume the annual data of solar irradiance and wind speed are shown in Fig. 3 and Fig. 4 respectively, the probability density function (PDF) and cumulative probability distribution (CDF) of typical time interval obtained by KDE is shown in Fig. 5 and Fig. 6.
From Fig. 5 and Fig. 6, we can find that the proposed KDE method described the probability distribution characteristics of each time interval well, different from the traditional single peak approximation fitting of normal distribution or Weibull distribution with fixed parameters, the distribution function calculated by KDE method has the characteristics of multi peak and multi valley, more coincides with its histogram and has better fitting accuracy.

C. DATA ENHANCEMENT BASED ON LATIN HYPERCUBE SAMPLE METHOD
Considering the acquired data are often hard to exhaust and cover all kinds of possible scenarios, in order to ensure the data completeness of behavior learning in MG, effective data enhancement is necessary.Data enhancement can be achieved by sampling with the probability characteristics of the acquired data, however the traditional random sampling method usually needs a large number of repeated sampling, and it is difficult to cover the whole sampling space.On this base, an effectively Latin Hypercube Sampling (LHS) method with good uniformity and orthogonally is proposed.As shown in Fig. 7.
Its basic principle is to discretize the range of cumulative probability distribution into several equal intervals, and then stratified sampling is carried out among each interval [17], thus to ensure the integrity and accuracy of sampling data as calculated by (4).
If the number of random variables is K , then the sampling matrix can be formed by (5).
For the directly sampled matrix X KN , there are usually strong correlation among its rows [18], which is not suitable for the generation of diversified samples.To address this problem, cholesky decomposition algorithm is introduced in this paper to reduce the correlation of samples and improve the diversity of samples [19].The main procedures are shown in Fig. 8.
The detailed calculation steps are described as follows: 1) Initialize the ranking matrix L KN , each row consisting of random permutations of integer l, 2, 3. . .N .
2) Calculate the correlation between the rows of L KN , and obtain the correlation matrix D L as shown below.6) Judge whether the relativity of ranking matrix L KN meets the requirements.
7) If it does not meet the requirement, go back to step 2) to recalculate.Otherwise, updating the position of each element in the original sampling matrix based on matrix L KN .
Based on the PDF and CDF of wind speed and solar irradiance obtained, 500 new scenarios are generated with the proposed LHS and cholesky decomposition algorithm, as shown in Fig. 9 and Fig. 10.The iterative convergence process of cholesky decomposition method to reduce the correlation of sampling scenarios is shown in Fig. 11.
It can be seen from Fig. 9 and Fig. 10 that the enhancement scenarios of solar irradiance and wind speed are diverse enough, and basically fill the whole sampling space.From Fig. 11 we can find that the cholesky decomposition algorithm has a very good convergence property, the correlation converges after approximately 5 iterations, proved to be effective.

D. SCENARIO CLUSTERING WITH PSO ASSISTED K-MEANS METHOD
Repetitive learning of similar scenarios may be existed by directly using the above generated scenarios for learning,   in order to improve the learning efficiency, the typical categories should be further determined for the generated scenarios in section C.
Common used classification methods include fuzzy clustering, K-means clustering and so on [20], but these methods all have difficulties in determining the number of clusters.To address this issue, this paper introduced a  novel particle swarm optimization assisted k-means method (PSO-K-Means) to support quickly determination of optimal number of clusters.The flow chart of proposed PSO-K-Means algorithm is shown in Fig. 12, the PSO's fast global search capability is fully utilized to help in finding the optimal number of cluster centers, through continuous iterative evaluation, ensure to find the best cluster center.
With the PSO-assisted K-means clustering algorithm, 500 wind speed and solar irradiance scenarios are clustered, the optimal clustering number determined by PSO-assisted K-means is 8.The clustering results for wind speed and solar irradiance are shown in Fig. 13 and Fig. 14 respectively.It can be seen from Fig. 13 and Fig. 14 that the proposed PSO-assisted K-means algorithm has good effect on scenario classification, and it can effectively identify cluster centers, providing guarantee for improving training and learning efficiency.

A. SURROGATED MODEL BASED BEHAVIOR LEARNING
Surrogated model, also known as meta-model, refers to the use of a large number of sampling points generated by experimental design to construct approximate simplified models by interpolation or fitting to replace complex simulation models for agents that are difficult to model or obtain model parameters.Common used Surrogated include Response Surface Method (RSM), Radial Basis Functions (RBF) model, Kriging model and so on.Each meta-model has its own applicable scope.

1) RESPONSE SURFACE METHOD
Response surface methods (RSM) approximates functions by using the least squares method on a series of points in the design variable space [21].Low order polynomials, such as the second order polynomials in (7) is widely used as the response surface approximating functions where, β 0 , β i , β ii , β ij are parameters computed using least squares regression by minimizing the sum of the squares of the deviations of predicted function values, y is the predictive function values.RSM can be easily constructed, and its smoothing capability allows quick convergence of noisy functions in the optimization.However, this over-simplification may be troublesome for modelling highly non-linear or irregular behavior

2) KRIGING META-MODELS
Kriging is a type of meta-model based on spatial correlation functions.It is a stochastic model used to treat the deterministic computer response as a realization of a random function, with respect to the actual system response [22].A Kriging model postulates a combination of a polynomial model and the minor departure of the form: where, ŷ(x) is the unknown function of interest, f (x) is a known polynomial function often taken as constant, and Z (x) is the correlation function which represents a stochastic process with mean at 0, variance σ 2 , and nonzero covariance.Due to the wide range of correlation functions available, Kriging methods can provide accurate predictions of highly non-linear or irregular behavior.

3) RADIAL BASIS FUNCTIONS
Radial basis function (RBF) meta-model is another surrogated model formed by linear combinations of a radially symmetric function based on the Euclidean distance between the sampled data point and the point to be predicted [23].RBF was presented as an effective analytical method for representing irregular surfaces, the model can be expressed as in ( 9) and (10).
where, c 0 , c 1 and λ i are the parameters for radially symmetric function computed by minimizing the Euclidean distance between the sampled data point and the point to be predicted.ϕ (x) is the radially symmetric function.RBF prove to have good fits to arbitrary contours of both deterministic and stochastic response functions.It has been used successfully in many engineering applications, including ocean depth measurement, altitude measurement, rainfall interpolation, surveying, mapping, geographic and geology, image warping, and medical imaging [24].

B. ARTIFICIAL INTELLIGENCE BASED BEHAVIOR LEARNING 1) DEEP NEURAL NETWORK
Deep neural network (DNN) is a data-driven method that does not rely on any analytical equations, but it utilizes voluminous existing data to formulate the mathematical problem and to approximate the solutions.DNN proved to have strong non-linear fitting ability.The recent years have witnessed the rapid advancement of deep neural network in a variety of applications, e.g., computer vision, machine translation, and remote sensing [25].Fig. 15 shows a basic DNN model.The multiple hidden layers and the large number of neurons within the DNN can automatically extract features for data analysis to achieve an accurate model regression or classification.
Once the DNN is well trained, it will develop high generalization and can be directly applied to new instances without costly numerical computation.Compared to the conventional model-based method, the DNN is highly computational efficient while maintaining considerable accuracy.

2) REINFORCEMENT LEARNING
Reinforcement learning (RL) algorithm is another wellknown model-free method for solving problems with hidden information.RL can obtain the optimal decisions within an unknown environment through continuous interactions between the agent and the environment.For the model-free RL algorithm, the principle evaluates possible actions in terms of the current state at time t, then finds the optimal action with max reward value by ε-greedy strategy to strike a balance between exploration and exploitation of decision space, and finally updates the action-value Q (s t , a t ) at each iteration based on the Bellman optimality equation [26], as follows: Based on the latest expected state-action value, the optimal policy can be estimated to maximize the agent's accumulated reward, as follows: RL does not require any prior knowledge, the algorithm is versatile, and it can protect data privacy.The accuracy of action-state function for RL depends on learning experience, thus RL usually requires a lot of evaluation and iterations, and the calculation time is relatively long.

C. PROCEDURE OF PREDICTING BEHAVIOR FOR MICROGRIDS
Based on the proposed architecture, data preprocessing method and behavior prediction algorithm above, the overall procedure of predicting behavior for micro-grids is presented as algorithm 1, which is specifically described as follows: 1) Utilize KDE to analyze the probabilistic characteristics of local solar irradiance (P si ), wind speed (P ws ), temperature (P T ) and electricity price (λ p ) data.
2) Generate incremental sample data using LHS and Cholesky decomposition algorithm.
3) Cluster all sample data by PSO assisted K-means algorithm to find out typical scenario categories.
4) Take one scenario from each category in order, reorder all the sample scenarios.5) Initialize the training data with previous (i − 1) th days data set: input data: S I → S I + (i − 1) th [P si , P ws , P T , λ P ], output data: S U → S U + (i − 1) th P pcc .
6) Construct Meta-models and DNN network or determine the optimal policy by RL using the training data.
7) Predict tie-line power of the next i th scenario with constructed Meta-models and DNN as well as RL algorithm.
8) Calculate actual tie-line power of MG in the i th scenario using the model based algorithm.9) Calculate the mean square error (β E ) for Meta-models and DNN as well as RL algorithm.
10) Determine whether β E is less than the preset value ε.If it is satisfied, the calculation ends and output the corresponding predicted power.Otherwise, go to step 5 and enter the next iteration.

V. TEST CASE A. CASE I-MICROGRID WITHOUT ENERGY STORAGE SYSTEM
The first test case is a gird-connected MG without energy storage system, the one-line diagram is shown in Fig. 16, and it composes by three types of DGs: solar PV, wind turbine, and Micro-turbine.We intend to train an interactive behavior model of this MG by using the active power measurements at the PCCs and the public historical data information outside of MG. we assume the spot price data are taken from American electricity price [27], the local solar irradiance, wind speed, temperature and electricity price in previous N th days are selected as the input of the training set, and the tie-line power of MG in previous N th days are taken as the expected output of the training set.Data-driven based methods proposed above are utilized for network training, and the trained networks are tested on the next (N +1) th day to verify their accuracy and applicability.
Since the energy storage is not included in MG, the operation periods of the MG can be decoupled from each other, thus the MG can be controlled independently in different time slots.Thereby, the input dimension and output dimension of the training network are selected as 4 and 1 respectively.The training parameters for DNN are shown in the Table 1.
For RL, since we don't know the internal parameters of MG,   so we make full use of DNN to predict its action-reward value, and then form a deep Q-network (DQN) to learn the tie-line power of MG.The training parameters of RL algorithm are shown in the Table 2.
Firstly, the predictive results of DNN is compared with that of meta-model method and model-based method (regard as theoretical results).Among them, the predictive value of meta-model method is calculated by weighting sum of RSM, RBF and Kriging surrogated models.The calculation is shown in ( 13) The prediction results of DNN, surrogated model and conventional model-based method regarding with training days are shown in Fig. 17.For dimension reasons, only threedimensional maps are given out, the electricity price and solar irradiance are selected as input variables and PCC power is selected as output variable.The predicted value of PCC output power by DNN is compared with that of theoretical results through model-based optimal method as shown in Fig. 18, We can find from Fig. 17   RSM, RBF and Kriging, the overall prediction effect is still unsatisfactory as compared with DNN.
We can also see from Fig. 18 that the predicted accuracy of DNN is better and better with the abundance of samples.After 80 days of data learning, PCC power prediction results basically coincides with the model-based theoretical value, reflecting good learning and prediction ability.
Furthermore, the root mean squared error (RMSE) as shown in ( 14) is utilized as the evaluation criterion to test power prediction errors of various methods.The RMSE comparison of RSM, RBF and Kriging models and their combination models is shown in Fig. 19.The RMSE comparison of RSM (the surrogated technique with best performance) and DNN as well as the RL is shown in Fig. 20 and Table 3.
As can be seen from the Fig. 19, with the increase of days and training data, the three surrogated techniques, RSM, RBF, and Kriging show a downward trend in prediction errors.When there are insufficient samples in the early stage, the covariance function calculated from these few points result in large errors for Kriging, so the fitting effect of Kriging technique is not good and the prediction error is large.However, with the accumulation of samples, the fitting error between the covariance function and the actual model decreases gradually, thus the prediction error of Kriging decreases rapidly.The same phenomenon occurs in RBF, since the parameters of its basis function are difficult to match, so the RMSE error is large in the early stage, with the sample becomes more abundant, the basis function parameters are constantly modified, and the fitting accuracy is gradually improved.As for RSM, it uses low-order surface response function to fit the sample space, so it presents a better fitting effect in the early stage.However, because its basis function order is fixed in training process, the fitting improvement is hard to improve even in the later stage with high-dimensional samples, thereby the prediction error of RSM maintains a certain level, which also indicated that low-order basis function and surface fitting is relatively stable, but they are not suitable for accurate behavior prediction.
From the RMSE comparison of DNN, RSM, and RL in Fig. 20, we can find that the prediction error of DNN is large when there are insufficient samples in the early stage, this is because the scenarios learned at the beginning of training stage are difficult to cover all the various scenarios in wind speed, solar irradiance, temperature and electricity doBreak; 13: end if 14: End for prices, however with the abundance of samples and more learning scenarios, the number of scenarios covered by DNN gradually increases, thus the accuracy of DNN increases continuously, and the accuracy of DNN is much higher than that of RSM in the later stage.On the other hand, from the comparison between DNN and RL, for the DQN reinforcement learning algorithm utilized in this paper relies heavily on the predictive ability of DNN to update its reward value, its overall accuracy is slightly lower than that of DNN, but it can constantly learn and approximate the DNN prediction model, so the overall prediction and decision-making effect is also high enough.
From the statistical results of the training time and accuracy of the various methods in Table 3, it can be further seen that DNN and RL do have better prediction accuracy, and their results are better than the meta-model method.From the convergence process of the RL algorithm in Fig. 21, it can roughly converge to the optimal results in about 200 epochs, however, the training time is the longest as seen from Table 3 for the number of policy evaluations required by RL is large.In summary, the above results indicated that DNN has best fitting ability of complex MG energy management functions and it is more suitable for the behavior learning of MGs.

B. CASE II-MICROGRID WITH ENERGY STORAGE SYSTEM
The second test case is a gird-connected MG equipped with battery energy storage system (BESS), the grid structure is shown in Fig. 22.After BESS is connected to MG,  the operation periods of the MG are strongly coupled with each other, therefore, the input data and output data for training are 4 * 24 sequential sample and 1 * 24 sequential sample respectively.For this reason, the long short-term memory (LSTM) deep network that can effectively deal with sequential data [28] is adopted here for behavior learning of MGs equipped with BESS, the training parameters of LSTM are shown in Table 4.
Based on 10000 sample data accumulated by model-based optimization, 9000 training samples and 1000 test samples are selected respectively, the behavior learning results of LSTM is comprehensively compared with that of meta-model technologies as well as the RL algorithm, the statistical results are presents in Table 5.
It can be seen from Table 5 that the surrogated techniques obtain large fitting errors for time-coupled and high-dimensional sequential data.The accuracy of the best performing RBF technique is only about 81.86%, this indicates that the low-order surface response function based RSM and basis function based are powerless in fitting higher-dimensional and temporally coupled spatial variables, thus the fitting error is larger.From the comparison between RL and LSTM, we can find that the predicting accuracy of RL is slightly lower than that of LSTM, this is because that the calculation of the RL algorithm's reward relies heavily on the predictive value of LSTM deep neural network, therefore, its predicted value approaches the LSTM infinitely, but is still slightly lower than that of LSTM.In addition, due to the need for a large of evaluations and feedback to modify network parameters, training time of RL is usually long The LSTM method shows the best prediction effect by processing the time sequential power data with memory gate, it is more suitable for predicting the sequential tie-line power of that contains timing coupling characteristics brought by energy storage, the prediction accuracy of it reaches 96.20%.
The predicted tie-line power by LSTM is further compared with the theoretical results on typical days, as shown in Fig. 23.The process of the training RMSE and loss function of LSTM is shown in Fig. 24 and Fig. 25 respectively.
We can find that the convergence speed of the LSTM is fast, the convergence is achieved after about 300 iterations.At the same time, the prediction accuracy of the LSTM is relatively high, the power prediction values by LSTM are basically consistent with the theoretical results, which shows the effectiveness of LSTM deep network especially in behavior learning of MG equipped with BESS.

C. COMPARISON OF META-MODEL TECHNIQUES AND ARTIFICIAL INTELLIGENCE METHODS
Based on the analysis results of the above test cases, we can further summarize the advantages and disadvantages of the meta-model techniques and the artificial intelligence methods in MG behavior learning problem as shown in Table 6.
We can draw the conclusion from Table 1 that RSM and Kriging methods are more suitable for the fitting and prediction of low-order and low dimensional MG behavior learning problems.While RBF and DNN are more suitable for high-dimensional and large-scale variable fitting problems because of their network depths and free radicals combination features.Although RL is highly scalable and has good versatility for various decision-making and prediction problems, but the action-value function of it is difficult to model.Meanwhile, the policy training time of RL is long for its need for a large number of policy evaluations.
In summary, the predicting process for most meta-model techniques are fast, but the accuracy of them are not high, so they are more suitable for computational expensive design problems.The training time of DNN network and RL algorithm are relatively long, but the accuracy is much higher, so there are more suitable for behavior learning and decision-making problems in MG.

D. CONVERGENCE AND SENSITIVITY ANALYSIS
The sensitivity analysis of DNN (the best performance method) in behavior predicting is further carried out for test case I.The training process with three different hidden layer number are tested: {20 10}, {30 20}, {30 10} respectively, the convergence curves of three different cases are shown in Fig. 26.
It can be seen from Fig. 26 that the hidden layer number has great influence on the training convergence of DNN.Too few or too many hidden layers cannot achieve satisfactory learning fitting effect and are prone to over-fitting or underfitting.In practical application, the number of hidden layer should be decided by considering dimension of input variables, calculation time requirement, fitting effect and so on.

VI. CONCLUSION
A comparative study of several recently introduced modelfree data-driven methods, including DNN, RL, RBF, RSM, and Kriging in predicting the real-time behavior of MGs under incomplete information is carried out in this paper.The validity of the proposed method is verified in numerical experiments by comprehensively compared with conventional model-based optimal methods.We can draw the conclusions that: 1) The proposed model-free behavior prediction methods can effectively protect the privacy requirements of the MG for they do not need to access the internal parameters of the MG.
2) The surrogated techniques is more suitable for the lowdimensional, temporally decoupled behavior prediction problem, and it is not suitable for behavior prediction problems with high-dimensional variables.While the DNN and RL has good adaptability to various behavior prediction such as and time-coupling problems.

FIGURE 1 .
FIGURE 1. Proposed data-driven based MG behavior predicting architecture.

FIGURE 3 .
FIGURE 3. The annual data of solar irradiance.

FIGURE 4 .
FIGURE 4. The annual data of wind speed.

FIGURE 5 .
FIGURE 5. Probability density function (PDF) of several typical time interval obtained by KDE.

FIGURE 6 .
FIGURE 6. Cumulative probability distribution (CDF) of several typical time interval obtained by KDE.

FIGURE 8 .
FIGURE 8. Procedures of proposed algorithm to reduce the correlation of sample scenarios.

3 )
Use Cholesky decomposition method to calculate the nonsingular lower triangular matrix D for the correlation coefficient matrix D L , which satisfied DD −1 = D L .4) Constructing matrix with less correlation G KN = D −1 L KN .5) Reorder elements in matrix L KN based on the size and position of elements in G KN .

FIGURE 9 .
FIGURE 9. 500 new solar irradiance scenarios generated by the proposed LHS and cholesky decomposition algorithm.

FIGURE 10 .
FIGURE 10. 500 new wind speed scenarios generated by the proposed LHS and cholesky decomposition algorithm.

FIGURE 12 .
FIGURE 12. Procedures of proposed novel particle swarm optimization assisted k-means clustering algorithm (PSO-K-Means).

FIGURE 13 .
FIGURE 13.The clustering results for solar irradiance with PSO-assisted K-means clustering algorithm.

FIGURE 14 .
FIGURE 14.The clustering results for wind speed with PSO-assisted K-means clustering algorithm.

FIGURE 16 .
FIGURE 16.Test Micro-grid without energy storage system.

FIGURE 17 .
FIGURE 17. Prediction results of DNN, Meta-model and conventional model-based method (theoretical results) regarding with training days.
that with the training data gradually enriched, the PCC power prediction value of DNN, basically coincides with theoretical calculation value driven by the model, which shows the strong fitting ability of DNN.Although meta-model method has aggregated advantages of

FIGURE 18 .
FIGURE 18.Comparison of DNN prediction results of tie-line power with theoretical values.
Fig 21 presents the reward convergence process of the RL algorithm.

VOLUME 8, 2020 Algorithm 1
Procedure of Predicting Behavior for Microgrids 1: Utilize KDE to analyze the probabilistic characteristics of local solar irradiance (P si ), wind speed (P ws ), temperature (P T ) and electricity price (λ p ) data 2: Generate incremental sample data using LHS and Cholesky decomposition algorithm 3: Cluster all sample data by PSO assisted K-means algorithm to find out typical scenario categories 4: Take one scenario from each category in order, reorder all the sample scenarios, complete data preprocessing process 5: for i in 1 to N D do 6: Initialize the training data with previous (i-1) th days data set:, input data: S I → S I +(i-1) th scenarios data output data: S U → S U +(i-1) th P pcc 7: Construct the Meta-models and DNN or determine the optimal policy by RL using the training data 8: Predicting behavior of the next i th scenario with constructed Meta-models and DNN as well as RL algorithm 9: Calculate actual i th scenario behavior of MG (P pcc ) using the model based algorithm 10: Calculate the mean square error (β E ) for Meta-models and DNN as well as RL algorithm 11: if β E < ε 12:

FIGURE 20 .
FIGURE 20.RMSE comparison of RSM (the best performance surrogated technique) with DNN and RL.

FIGURE 21 .
FIGURE 21.Reward convergence process of the RL algorithm.

FIGURE 22 .
FIGURE 22. Test Micro-grid with battery energy storage system.

FIGURE 24 .
FIGURE 24.Convergence process of the training RMSE of LSTM.

FIGURE 25 .
FIGURE 25.Convergence process of the loss function of LSTM.

FIGURE 26 .
FIGURE 26.The convergence curves of DNN in three cases with different hidden layer numbers.
Data at the k-th time interval for the i-th day s t Current state at time t for RL a ACRONYMSMGMicrogrid PCC Point of Common CouplingThe associate editor coordinating the review of this manuscript and approving it for publication was Salvatore Favuzza .CONSTANTSh window size for kernel density estimation α Learning rate for RL γ Discount factor for RL t Current action at time t for RL II.INTRODUCTION Microgrids (MGs) as an aggregate of Distributed Generators (DG), Renewable Energy Resources (RES), Demand Response Loads (DR) and Energy Storage Systems (ESS),

TABLE 1 .
Training parameters for DNN.

TABLE 2 .
Training parameters for RL.

TABLE 3 .
Comparison of several meta-model technologies and artificial intelligence methods.

TABLE 5 .
Statistical results of LSTM and meta-model technologies.

TABLE 6 .
Comparison of meta-model techniques and artificial intelligence methods.