Towards Real-Time Energy Management of Multi-Microgrid Using a Deep Convolution Neural Network and Cooperative Game Approach

Multi-microgrid (MMG) system is a new method that concurrently incorporates different types of distributed energy resources, energy storage systems and demand responses to provide reliable and independent electricity for the community. However, MMG system faces the problems of management, real-time economic operations and controls. Therefore, this study proposes an energy management system (EMS) that turns an infinite number of MMGs into a coherence and efficient system, where each MMG can achieve its goals and perspectives. The proposed EMS employs a cooperative game to achieve efficient coordination and operations of the MMG system and also ensures a fair energy cost allocation among members in the coalition. This study considers the energy cost allocation problem when the number of members in the coalition grows exponentially. The energy cost allocation problem is solved using a column generation algorithm. The proposed model includes energy storage systems, demand loads, real-time electricity prices and renewable energy. The estimate of the daily operating cost of the MMG using a proposed deep convolutional neural network (CNN) is analyzed in this study. An optimal scheduling policy to optimize the total daily operating cost of MMG is also proposed. Besides, other existing optimal scheduling policies, such as approximate dynamic programming (ADP), model prediction control (MPC), and greedy policy are considered for the comparison. To evaluate the effectiveness of the proposed model, the real-time electricity prices of the electric reliability council of Texas are used. Simulation results show that each MMG can achieve energy cost savings through a coalition of MMG. Moreover, the proposed optimal policy method achieves MG’s daily operating cost reduction up to 87.86% as compared to 79.52% for the MPC method, 73.94% for the greedy policy method and 79.42% for ADP method.


I. INTRODUCTION
In smart grids (SGs), microgrid (MG) is a small network of electricity users with distributed energy resources (DERs) The associate editor coordinating the review of this manuscript and approving it for publication was Huai-Zhi Wang . that is either standalone or connected to the main grid. DERs are energy generation units located within the endusers. MG provides a reliable and efficient power that supplements the main grid in the case of unexpected rise in energy demands, blackouts as well as loss of energy productivity. It also provides an independent energy to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the community. In spite of these numerous advantages of a MG, challenges also exist like technical challenges, such as system control and protection of renewable energy sources (RES). Other challenges are regulatory policies and customers' participation [1]- [4]. To resolve the technical challenges, especially the high penetration of RES in MG, numerous works have been reported by [5]- [11]. However, most of the above-mentioned works focus on day-ahead and time-of-use scheduling plans. To deal with the regulatory challenges, authors in [12] compare the different regulatory challenges of different ownership of MG models that vary based on context and usage. Ownership of MG models include: a utility model, district heating model, landlord model, customer-generation model and co-op model. The regulatory challenges are categorized into information instruments, economic instruments, and command-andcontrol instruments [12]. The idea of MG has been used successfully in several sensitive areas like military, hospitals, and airports to achieve resilience energy supply [13]. Due to the benefits of MG technology, researchers, power industry and stakeholders are beginning to consider this technology. However, because of the intermittent nature of DERs, benefits of this technology have not been fully explored. In literature, researchers focused more on providing energy management (EM) of MG in terms of load curtailment and demand side management. As a result, mechanisms for the coalition of more than one MGs require further explorations. Table 1 provides the abbreviations used throughout this paper. This paper proposes a method for the coalition of multi-MG (MMG) to achieve energy efficiency and management, which in turn provides energy cost savings and real-time optimal scheduling policy. Section II presents the related work. Section III elaborates the proposed system model. In Section IV, the problem formulations are described. Simulation results and discussion are presented in Section V. Finally, Section VI provides the conclusion and future work.

II. RELATED WORK
The world's environmental concerns and power crises have raised the need for RES, which is a clean alternative energy source to fossil fuels. The major hindrance of integrating RES into the power system is its insufficient energy generation and intermittent nature. The power system is a centralized unidirectional energy flow and generations. On the other hand, RES forms part of the MG, which provides bidirectional energy flow. This section discusses the review of related work in four subsections based on MMG system, cooperative game theory approach, column generation algorithm (CGA) approach for MMG system, and real-time optimal scheduling policy of ESS.

A. MULTI-MICROGRID SYSTEM
Nowadays, because of the under-utilization of RES, the power system is experiencing a large inflow of the excessive energy. To simultaneously manage RES of several MGs, a new paradigm known as MMG system has emerged. MMG system refers to the integration of different MGs by spatial distance to achieve grid control [14]. The objective of MMG is to combine different DERs that can achieve high energy resilience and stability of the system through efficient energy exchange. In addition, owners of MG can conveniently participate in the energy market based on their energy generations, distributions and sales. The MMG system architecture is similar to the traditional power grid, which operates based on the duration of certain operating rules. The authors in [15] describe a framework of MMG system based on interface, layer and cost. However, the coordination of MMG into a bulk energy system is not considered in the proposed framework. Similar work in [16] presents the framework of MMG system based on the system of systems architecture. This framework uses bi-level optimization to handle each MG as a multi-stage robust optimization (RO) problem. However, the bi-level optimization does not address the uncertainty of energy demand and supply.
Kou et al. [17] propose a model predictive control (MPC) scheme for MMG EM via coordination of individual MG operation to economically balance system-wide supply and demand. Chebyshev inequality and delta method are also used in quadratic and nonlinear systems to deal with complexities of demand and supply. However, it does not find a fair allocation of expenses to each MG. Holjevac et al. [18] provide a detailed evaluation of the disparate MG by applying MILP for annual simulations. They have also extended the model for short-term, daily operational analysis with receding horizon MPC. This model aims at optimizing the flow of various energy generators, coordinating each MG entity and exchanging energy between entities of other MGs. However, the model has no significant effect on the estimated short term operating cost. Also, it takes an intolerably long time for MILP problem to be solved. The authors of [19] propose decentralized saddle point dynamics and quadratic programming to solve the optimal power problem. The methods suggested achieve low active power loss and high RES utilization. Nevertheless, there is high network connectivity involved, and no thought is given to achieve energy cost savings. RES and fluctuations of load consumption create issues in the operation of MMG. Therefore, authors in [7] propose optimal day-ahead scheduling of MMG using an enhanced particle swarm optimization (PSO) to minimize MG's operating cost. However, there is the problem of parameters tuning, which may lead the model to either premature convergence or fall into a local optima. To execute real-time control and reduce the communication cost of MMG, authors in [20] propose a bi-level game model for voltage control. The model consists of an incentive mechanism derived from the Stackelberg game, which maximizes the payoff of each MG while neglecting arbitration agent. The author of [21] propose a cooperative EM strategy for integrated operations of MMG. This strategy is a stochastic predictive control that solves the coupling constraints problem. However, most of the proposed methods for MMGs are based on deterministic condition, which is tedious to handle the intermittent nature of RES.

B. COOPERATIVE GAME THEORY FOR MULTI-MICROGRID SYSTEM
Cooperative and non-cooperative games make up the fundamental building blocks of a game theory, as shown in Fig. 1. From the figure, game theory's outcome can be in the form of either decision theory, probability theory or utility theory. Each game aims to achieve a globally balanced status, meaning that each player's interest is not further satisfied, and it is known as the Nash equilibrium for the non-cooperative game and core status for the cooperative game [14]. Within a cooperative game, coalition optimization models reach the global optimum. Subsequently, cost-allocation models achieve a rational distribution of benefit to each player. The coalition aims to address competing stakeholder concerns (i.e., global and local). The applications of a cooperative game in SG are discussed in [22]- [27] and [28]. In this study, we employ a CGA to generate nucleoli solution that achieves the core status. The authors of [29] present a survey on the various concepts of game theory for the MG. However, applications of the cooperative game for the coalition of MMG have not gained full explorations. Although, authors in [14] propose a mechanism that ensures a fair distribution of dis-satisfaction of expenses among a group rational MG. The mechanism is a nucleolus core solution based on bender decomposition (BD) to maximize player's payoff in a cooperative game scenario. However, the real-time energy operation based on one step optimization of MMG's operating cost for short term purposes is not considered. In addition, BD has not been applied to several objectives, as its convergence needs many iterations. Unlike the cooperative game, a non-cooperative game relies on each player's overall individual payoff while neglecting the players' global welfare.

C. CGA APPROACH FOR MULTI-MICROGRID SYSTEM
The authors in [30] propose nested column-constraint and generation (CC&G) method for distributive scheduling of MMG, which is based on stakeholder-parallelizing distribution optimization. This method uses an enhanced analytical cascading method to achieve energy optimization. However, knowledge about the probability functions of uncertainty parameters and high computational burden are expected with the proposed model. To achieve optimal collaboration of MMG, authors in [31] propose a two-stage optimizationbased collaboration operation method to minimize operating cost of MMG. Nonetheless, the drawback of the proposed method is the computational effort required at each stage of the optimization operation. The authors of [32] use CC&G method for MG daily operations while considering the uncertainty in RES. The proposed method reduces the investment cost as well as operating cost for each DER. However, the proposed method does not include a real-time EM of MMG. The authors of [33] propose a modified CC&G model to minimize the total load consumption while controlling the battery storage systems, electric vehicle fleets and the aggregation of cooling appliances. The proposed model is a two-stage RO while considering the uncertainty of foresting error from load consumption and generations. The authors of [34] propose a modified CC&G method to address the probability-weighted RO (PRO) problem. The proposed model maximizes the overall profit for long-term planning while considering the uncertainty of wind power and microturbine. Also, PRO optimizes the DERs allocation based on the worst-case scenarios. However, the method does not provide clear conclusions about the distribution of probabilities used.

D. REAL-TIME OPTIMAL SCHEDULING POLICY OF ENERGY STORAGE SYSTEM
In MG, the cost of operating ESS in real-time operations is another challenge that requires urgent attention. The authors of [35] use the hidden, marginal charging and discharging opportunity costs for ESS's operation. Also, the authors proposed a two-layer model, which has a upper layer that allocates power optimally to each generator and a lower layer that solves the dispatch problem using a Lagrangian function. Another work in [36] reviews centralized and distributed ESS with service distribution. The proposed work uses a multi-agent control strategy that coordinates the distributed ESS. In the proposed strategy, an autonomous agent communicates over a sparse network through neighbor-to-neighbor to achieve cooperative goals. However, ESS's real-time operations in the MG are challenging, as they are difficult to evaluate the users' complex behaviors and RES's intermittent nature. The authors of [37] provide MPC based hybrid of Kalman filters and time series analysis to schedule on-line operations of ESS. The proposed model has a feedback correction mechanism that examines and adjusts the forecasting error of the MPC based model. The authors of [38] use virtual inertia control based on MPC to achieve stability of the system and to assess the effect of high RES penetration on ESS. Also, the authors have compared their control with fuzzy logic and remote virtual control systems to determine the control's robustness. Nonetheless, the above studies resolved the scheduling problem of MG's operations in real-time and provided strategies to address uncertainties that may occur in the future. However, the type of models and prediction horizons and parameters are factors that determine accurate results of the proposed approaches.
Most of the recent studies focus on artificial intelligence techniques for real-time EM of a MG. With the intelligent approach, real-time operations of MG can be formulated as either a stochastic or a sequential decision problem. In [39], the authors propose a finite-horizon Markov chain decision process (MDP), approximate dynamic programming (ADP), and deep recurrent neural network (RNN) to obtain the optimal real-time scheduling strategy of an MG. The proposed methods reduce the operating costs of the MG without prior distribution knowledge of the uncertainty in RES. The authors of [40] use ADP based economic dispatch algorithm for intra and day-ahead scheduling of a MG operations under the uncertainties of RES. The proposed algorithm is used for sample training while a Monte Carlo method and piece-wise linear approximation function are also used to detect uncertainty and to make prediction. Also, the work minimizes the impact of uncertainty incurred by RES, electricity prices and loads on MG's operations. The authors of [41] propose policy and value iteration functions to address the optimal EM of a battery as well as MG's control system via ADP. The proposed policy includes the value iteration function that decreases monotonically and converges to the Bellman's solution. Other applications of ADP are examined in [42]- [45] to achieve battery's efficiency and coordination. However, the above studies have proposed methods that are difficult to handle MG's scheduling, especially if high dimensional state spaces are involved. Also, ADP implementation regarding ESS coordination with conventional distributed generators while considering the constraints of MG's power flow is an open question.
Based on the literature above, none of the authors applies CGA for the coalition of MMG to achieve energy cost savings. Hence, using CGA for the coalition of the MMG is the focus of this study. In addition, energy exchange among MMG and market players require further explorations and improvements. Furthermore, applications of the cooperative game have not gained full explorations.
The objectives of this study mainly focus on addressing the limitations of the existing solutions in the literature while minimizing the computational cost of the proposed system. For instance, firstly, the authors of [39] propose a dynamic EM of a single MG using a deep RNN to implement the estimation of the one-step long term operating cost. However, RNN faces the problem of dimensionality. When the amount of space in the state increases exponentially, the model can result in low precision and low efficiency. It also becomes computationally difficult to solve large and complex MG distribution networks to automate global policy. Therefore, solutions that turn an infinite number of MMGs into a coherent and efficient system, where each MG can achieve its own goals and perspectives are required. The proposed solutions should include a machine learning models to estimate the short-term daily energy cost of the MMG. Secondly, authors in [14] propose a BD algorithm to derive the coalition with a high degree of dis-satisfaction concerning the fair cost allocation and finds the upper bound of the optimum solution for the cooperative game. However, BD uses a generalized procedure to solve the MILP problem and also reduces the number of variables at the expense of an increasing number of constraints. In addition, BD requires several iterations for convergence, especially when enumeration method is applied to a small coalition group. Therefore, the algorithms that address the limitations of BD by minimizing the total expenses obtained by the grand coalition are required. Also the proposed algorithms should be able to reduce the number of enumerations as the number of coalitions increases.
This study is the extension of our previous work [46], where our initial work focuses on the non-cooperative game, and in this paper, the contributions are listed below.
1) This study proposes an EM system (EMS), which converts an infinite number of MMG into a coherent and efficient system, where each MG can achieve its goals and perspectives. The proposed EMS manages and controls the real-time operation of each MG, while minimizing the computational effort required at each stage of the optimization operation. 2) A CGA is proposed in this study to derive the nucleolus core solution that provides fair distribution of expenses among coalition members in a cooperative game. In the cooperative game, each MG is a player that hopes to maximize its own payoff via energy cost savings. The proposed method is compared with the method of [14] and Shapley [28]. 3) A deep CNN is proposed in this study to execute the one-step estimation of the aggregated energy cost for short-term purposes. In the proposed deep CNN, the fully connected layer is enhanced using conditional restricted Boltzmann machine (CRBM). Moreover, the proposed deep CNN is compared with other existing models in the literature. 4) An optimal scheduling policy for real-time MG's operation is proposed while taking into account the power flow constraints. For the analysis, the proposed optimal scheduling policy is compared with the existing MPC, greedy policy and ADP policy.

III. SYSTEM MODEL
In this study, the MMG network of Fig. 2 under consideration is a grid-connected mode that actively participates with the main grid via energy exchange using the real-time pricing mechanism. The network is made of multiple interconnected single residential MG (sub-grid), where there is an inter-active behavior of energy among each sub-grid. This study assumes that every sub-grid has a photovoltaic (PV), thermal generators, wind turbine, hydro-solar, dispatchable load, and each sub-grid is attached to a sub-grid storage system. Moreover, each RES output is connected to different bus nodes, while each domestic load is satisfied from its own subgrid, then if there is surplus energy from the sub-grid storage, it can be sold to other sub-grids or the main grid through the EMS. All sub-grids use bidirectional AC/DC converters (BADCs), which ensure the overall sub-grids' stability and voltage support. Each sub-grid is connected to one another via local connector, while the overall sub-grids are connected to the main grid via point of common coupling (PCC) only if it can provide the same frequency rating as the main grid. Because of the intermittent behaviors of the RES, its energy output is regarded as uncertainty in the operation of all subgrids. The EMS manages the operations of each sub-grid in the network to exchange energy and performs coalition EM for each rational sub-grid to achieve energy cost savings using nucleolus and Shapley solutions. These solutions formed the core in cooperative game, where each sub-grid is a player that hopes to minimize its total expenses from the coalition of sub-grids. The EMS performs a real-time operation through a one-step optimization of sub-grid's operating cost for short-term purposes. Therefore, a deep CNN that estimates the aggregate operating energy costs of sub-grid is explored. Also, a real-time scheduling policy method of sub-grid is performed by the EMS. To ensure real-time capability and application, this study uses both IEEE 30-bus and 118-bus distribution systems to evaluate the performance and efficiency of the proposed system. The distribution systems used in this study are considered because they are widely VOLUME 8, 2020 used by the research community for MMG [14] and [19]. The internal topology of each MG as shown in Fig. 3, which is adopted from [47] consists of three thermal generators at bus 1, 2 and 8, while the wind power output is connected to bus 5, and bus 11 is used for the PV power output. While solar plus hydropower output supplies bus 13. Moreover, each RES is assumed to be an independent MG as well as the player in the cooperative game. The works of [39] and [47] inspire us to construct the proposed formulations for the problem statement. The section below discusses the conventional DGs, dispatchable loads, and the energy exchange with the main grid. Afterwards, the optimal operating cost of the MMG is formulated. Lastly, the formulation of the cooperative game is discussed. To avoid verbosity, this study only concentrates on the objective functions to show its contributions.

IV. FORMULATION OF A REAL-TIME SCHEDULING PROBLEM
At time t, the kth conventional DGs cost is expressed as the sum of the total generation cost of thermal generators C tg (t), wind turbine C ws (t), solar system, C ss (t) and small hydro-solar C ssh (t). Thus, the total cost C k total (t) of DGs is defined in Eq. (1) [47]. where where a n , b n and c n are the parameters of nth thermal generation costs for the total thermal generators, TG. Thermal generation power at time t is denoted as E tg,n (t). Let G ws and E ws (t) be the direct cost coefficient of wind plant and scheduled wind output power at time t. H ss and E ss (t) be the direct cost of the PV and scheduled solar power output at time t. Lastly, H ssh , E ssh (t) and M ssh are the direct cost coefficients of the small hydro-solar, scheduled power from the small hydro-solar at time t and direct cost of the small hydro-solar unit, respectively.

A. DESCRIPTION OF THE DISPATCHABLE LOADS
Due to the flexibility of demand-side loads in the MMG, loads influenced by electricity prices can be dispatched to meet the supply constraints. The active and reactive loads are used to define the total l loads of the dispatchable loads (DLs) ACTL DL,l (t) at time t using Eq. (6) and (7), while the load shedding for busy days is given in Eq. (8) [39].
where cos φ is the power factor. ACTL DL,l min (t) and ACTL DL,l max (t) are the minimum and maximum lth DLs of the ACTL DL (t). During the busy days, loads of the users are shed in order not to interrupt the energy supply and to avoid excessive burden on the energy generation plants. Therefore, a piece-wise linear function of a two-segment load shedding cost for lth DLs is defined by Eq. (8).
where the constant of coefficients are denoted as m 0 , m 1 , c 0 and m 1 .

B. DESCRIPTION OF THE ENERGY STORAGE SYSTEM
In this paper, the state of charge (SOC) defines the charging and discharging operations of ESS. Heuristic approach is considered for the SOC, such that if τ (t) P real (t) ≤ 20 cents/kWh, then it is charging; on the other hand, if τ (t) P real (t) > 20 cents/kWh, then it is discharging; where τ is the SOC decision binary variables (τ ∈ [0, 1]). Where at time t, P real (t) is the real-time electricity prices. Let the SOC power of ESS be denoted as E SOC (t), and E ESS (t) is energy level of ESS. The Eq. (9) and (10) describe the constraints of the ESS [39].
where E ESS min (t) = 60 kWh and E ESS max (t) = 30 kWh are the minimum and maximum energy level of ESS. The E ESS (t) is further optimized using the Eq. (11). where λ ch = λ dch = 0.98 is the efficiency of the charging and discharging, respectively.

C. DESCRIPTION OF THE POWER EXCHANGE
The main grid at time t performs energy exchange with the MMG using the constraints in Eq. (12), (13) and (15) [47], respectively.
where at certain time t, the active and reactive energy exchange are denoted by ACTL G (t) and RACTL G (t). Whereas, ACTL G min and ACTL G max are the minimum and maximum active energy exchange and Cap G max is the maximum energy that is either bought from or sold to the main grid. In order to achieve the system's energy balance, the energy of each bus in the distribution system must be equal to the differences between energy generation and load on that bus.
where d ij = d i − d j be the voltage angle difference between the buses i and j. V i and V j are the voltage at bus i and j, respectively. RACTL G li and ACTL G di are the reactive and active lth load demands at bus i. b ij is the set of buses i and j with susceptance, and g ij is the buses i and j with transfer conductance. The operational limits of all the generators are defined as:

RACTP min
ssh and RACTP max ssh are the minimum and maximum reactive power of the RACTP ssh combined generation. The buying cost of energy is calculated by Eq. (16).

D. FORMULATION OF OPTIMAL OPERATING COST
The aggregate operating cost of the MG is considered as the sum of the cost of conventional generators, loads dispatchable or load shedding costs. To ensure efficient operation of MMG, this study uses an ancillary service cost bought from the market. Thus, it addresses the power deviation of E deviation t from the dispatch problem. The ancillary services cost AS(t) at time step t is defined as: where β is a constant factor, and where f is the frequency deviation [47] and E loss (t) is the energy loss and it is defined as: where n l is the number of transmission lines. V i and V j are the voltages of transmission lines for buses i and j, respectively. Hence the aggregate operating cost is defined as: where ω dg t has two decision values, i.e.,''0'' to denote DG not in use and ''1'' to denote DG is in use. The total operating cost is used as the reward function for the MDP. The action function of MDP defined by MG is constrained by the set of all possible actions at time step t using Eq. (9). The transition probability depends on the dynamics of ESS using Eq. (11). The state variables of the DGs and real-time prices follow a joint probability distribution, which is based on the historical output.

1) FORMULATION OF OPTIMAL SCHEDULING POLICY
The state variable at time t is defined as S(t) = (E D (t), ACTP tg (t), ACTP ss (t), ACTP ws (t), ACTP ssh (t), P real (t), E ESS (t)). The transition from the state t − 1 to state t under the action A(t) is defined as S trans = S(t − 1) × A(t) → Prob(S(t)), where Prob(.) is the transition probability. Note that the state variables of E ESS (t) in S(t), the transition state is determined by Eq. (11). The state variables of DGs and P real (t) are defined by their joint probability distributions. However, it may be time-dependent and temporal coupled state variables. Also, the state variables depend on historical and conditionally dependent outcomes [39].
Presently, the objectives of the MMG EM depend on the set of optimal policy µ that reduces the total operating costs of the MMG. The optimal scheduling policy is defined as: (21) where FP is the set of feasible policies µ for making the decision rules that determine the action A(t) at time t. The state transition follows a Markovian policy, which tells us that the transition probability depends on the previous state, and it is defined as: (22) where S (t) is defined as: where θ(t) is defined as: Thus, Eq. (21) is rewritten as:

E. FORMULATION OF THE MULTI-MICROGRID SYSTEM COALITION OPERATION MODEL
In this study, the overall objective function is defined as the minimization of the total cost of energy generations. A coalition mechanism is proposed based on a cooperative game. A sub-additivity is initiated, which encourages each player to participate in the grand coalition via a fair cost allocation. The coalition could be individual rationality, group rationality and grand rationality. To make a coalition, this study formulates Shapley and nucleolus solution, individually. The reasons for selecting CGA over BD proposed in [14], is that BD uses a generalized procedure to solve the MILP problem and also reduces the number of variables at the expense of an increasing number of constraints. BD requires several iterations for convergence, especially when enumeration method is applied to a small coalition group. BD approach is evaluated based on master problems and sub-problems. The sub-problem optimizes the dissatisfaction values in the master problem before the upper bound and lower bound converge uniformly [14]. The sub-problem involves an extension of Taylor series first-order approximation. Note that the authors in [14] consider cost allocations of MG that does not form part of the grand coalition. On the other hand, the CGA method solves RO problems and eliminates any negative reduced cost through repeated iterations.
In the MMG cost allocation problem, each MG is a player that hopes to minimize its allocated expenses. Each player establishes a coalition with one another, after satisfying some set of conditions described in subsection IV-E1. A grand coalition includes all players that are most beneficial. The cooperative game consists of three essential components, i.e., a set of players i = {1, 2, . . . , N } who participate in the cooperation, the coalition S(S ⊂ N ), which is the subset of the players and a grand coalition N , which consists of all players participating in the cooperation. Let {i} be a singleton coalition comprising of the single independent player. Using the above definitions, the cost allocation is formulated as: Let separable cost SC be first alloted expenses of each player and non-separable cost NSC be the reminder expenses after the SC is alloted to all players.

1) CONDITIONS FOR FORMING THE GRAND COALITION
The cooperative game often depends on the establishment of a grand coalition that follows the concept of sub-additivity, which depends on the cost function. The larger a coalition is, the more efficient a coalition will be by sub-additivity. Sub-additivity means that every player is given an incentive to join the grand coalition. The game with a non-empty subset implies that there is an existence of fair cost allocation, only if all players accept to be part of the grand coalition. The sub-additivity game is expressed as: where ∅ is the empty set, S and T are the two disjoint coalitions. Thus, the sub-additivity is the necessary condition for establishing the grand coalition. The concept of the core is introduced in this study to provide a proper understanding of nucleoli. A core is those conditions that the set of allocations must satisfy. It motivates all players to take part in the cooperation. Note that the core can be referred to as individual rationality, group rationality and grand rationality, which are defined in Eq. (29) - (31).
The individual as well as group rationality is defined as the allocation AL = {al 1 , al 2 , . . . , al n }, which the core achieves energy cost savings. Individual or group rationality is used to compare coalition or player that does not engage in the grand coalition, N . The grand rationality refers to the total energy cost each player will get, and it is equal to the total energy cost of the grand coalition. In this study, a comparison of the two unique core solutions such as nucleolus [14] and Shapley [48] is presented in the simulation section.

2) CORE BASED ON NUCLEOLUS SOLUTION
To minimize the maximum expenses v(S) of any coalition S [14], the expenses of coalition S for each player PL, is calculated as: where the minimization of expenses is defined for PL as: In this study, nucleolus solution is obtained using the CGA discussed in Subsection IV-E4.

3) CORE BASED ON SHAPLEY SOLUTION
Shapley method is used to calculate the average marginal cost of all the existing coalitions, which is calculated as [48]: where n ∈ N and s is the number of players in coalition such that S(s = |S|). Each MG participating in the game is a player to manage energy and get a fair allocation. The allocation for each player is obtained from v(S), and the distribution of allocation AL dis is defined as: Note that in Eq. (35), we substitute Shp(i) with the result of Eq. (41) to get a fair allocation of expenses to players.

4) FINDING NUCLEOLUS SOLUTION USING COLUMN GENERATION ALGORITHM
In this study, a CGA finds the nucleolus of a core, which solves the grand coalition problem. The essence of implementing CGA is to minimize the total expenses obtained by the grand coalition while each player hopes to maximize its payoff (cost savings). As the number of coalitions and players increases, it becomes inefficient and difficult to solve; thus, CGA resolves this problem by reducing the number of enumerations. CGA is studied to solve the vehicle scheduling problem [49], sea traffic scheduling problem [50], location routing problems [51], job scheduling problem [52], and fleet designing problem [53]. However, the CGA method is underexplored in the field of MMG. The Algorithm 4 describes the implementation of the proposed CGA of the fair energy cost allocation. The objective function of the energy cost allocation problem is defined as: However, for the sake of computation, the function v(c) is nonlinear. The fundamental idea of column generation is adopted from [54]: S is a finite vector set. In fact, if S is assumed to be discrete, then S * is a finite collection of points (i.e., S * = {mmg 1 , mmg 2 , . . . , mmg p }), which denotes the collection of MMGs and p is the total number of MMGs. However, is binary, while S * falls within its convex hull's utmost points, represented as conv(S * ). So, denoting the bounded polyhedron via utmost points is related to the decomposition of Dantzig Wolfe [55]. For any MMG mmg ∈ S * , it can be set to mmg as: subject to convexity constraints, w ∈ {0, 1} such that if w = 1 is a member of the grand coalition, then if w = 0 is not in the grand coalition. So, w = 1, 2, . . . , p. Assume r w = mmg w and a w = Ay w , v(w) is derived from the column generation form as: Note: the generation of columns is an integer linear programming problem, while the initial problem is a nonlinear objective function. By linearization, mmg = 1≤w≤p a w w , while S can be decomposed, i.e., S = ∪ i≤j≤n S j . Each set S * j = {x j ∈ S j : ∀ x j ∈ Z} is denoted as S * = {mmg where j w ∈ {0, 1}, w = 1, 2, . . . , pj and j = 1, 2, . . . , n. Suppose the subsets for the generation of columns are the same, then S j = S − = {mmg 1 , mmg 2 , . . . , mmg p }. Hence, one subset of S − with w = j j w will represent it. Nevertheless, the total constraint, 1≤w≤pj w = n will override the convexity constraints, where w ∈ Z ≥ 0. In addition, VOLUME 8, 2020 the generation of columns is formulated as the restricted master problem (RMP), which requires a limited number of variables and thus, new variables can be introduced. Fig. 4 shows the flowchart for the proposed CGA. In the flowchart, CGA works by finding any negative reduced cost and this problem is divided into two, i.e., master and subproblem. The original problem, which is the master problem has a single subset of variables; whereas, the sub-problem is the new problem, which is known as new variables. Therefore, a double variable is obtained in RMP for the each constraints. If the sub-problem is solved, and if the sub-problem objective function value is negative, then apply a negative reduced cost to RMP. RMP is resolved until a new set of double values is created by the RMP that is non-negative. Consequently, the sub-problem creates the set of non-negative reduced costs [56].

F. TRAINING OF DEEP CONVOLUTION NEURAL NETWORK
CNN is among the well known deep neural network, which is a multi-layer neural network [57]. CNN structure is built based on multiple layers like operators of convolutions and max-pooling. Applications like natural language processing, video and image recognition are using CNN. The authors of [58] present a hybrid of wavelet transform and deep CNN for deterministic PV power forecasting. The authors of [59] propose an on-line method for voltage security analysis using deep CNN. However, CNN is under-explored in the context of MMG.
In MG, because of the stochastic behavior of RES, it will be tedious to determine the regularity of energy. Also, since there is scalability issue of connecting neurons of CNN, therefore, a deep CNN is proposed in this study to solve this problem by connecting neurons to its neighboring neurons. To recognize a time-dependent progression, RNN is the best model for that. Although, the authors of [60] resolve the time-dependent problem by proposing a recurrent CNN, which does not rely on segmentation technique or any task-specific features, however, consider large input while limiting the capacity of the model. This study considers the differences in sequential data and use them as input data to the convolution layer. Note that larger differences in the sequential data will indicate larger amounts of movement and the effect will be similar to RNN. Authors in [61] propose a hybrid transfer learning algorithm based on CNN and an improved CRBM. In their algorithm, CRBM is used to enhance the fully connected layer of CNN. The limitations are described as: (1) the improved CRBM is based on maximum likelihood probability of intermediate region, which may be biased for small samples, thereby reducing its optimality properties, (2) its complexity may grow exponentially as the number of features increases, (3) the proposed algorithm may be sensitive to the choice of starting values and (4) the feature selection method may not capture trends of different periods as well as the trends of the same periods.
Our proposed deep CNN consists of three layers, as shown in Fig. 5. The first layer is the input layer, which receives a sequence of data. The second layer is the feature learning layer, which extracts feature from input data. Convolution preserves the input by learning features using small squares of input data. A rectified linear unit (ReLU) uses small squares of data and accounts for the interaction effects between the predicted data, and the actual data. It also accounts for the nonlinearity effects. By convention, ReLU is a function that returns 0 if any negative input is received, and it returns the same value for any positive input. In addition to the second layer, is the max-pooling, one for each convolution. Each pooling returns a maximum value of the anticipated output of the convolution. The third layer is the fully connected layer. Its name denotes that the k-neurons are connected to the neurons of the max-pooling. There is a problem of weak generalization that occurs when the number of training samples is small, and the number of neurons is large, which may cause overfitting or over-parameterization. To solve this problem, fully connected layer is trained by a conditional restricted Boltzmann machine (CRBM) [62]. Hence, the deep CNN architecture is described as follows: 1) The input layer I m,t = {i 1,1 , i 2,2 , . . . , i m,t } (t is the time step and m is number of parameters) receives the weekly periodical values of the optimal policy at time t − 1. The data preprocessing of I m (t) is performed to remove erroneous values, which may be caused by reason of the failure of the proposed optimal policy method. Therefore, I m (t) is interpolated using the formulation: where NaN means not a number. To ensure that the outliers are removed from the interpolated dataset, a maxmin normalization is used to scale the interpolated dataset. 2) A 2-D convolution layer is used to extract at regular interval the features from 2-D input data. However, the feature Z a+1,b+1 , which is derived from the Eq. (43) is used.
where w is the weight term, f () is a nonlinear function such as the hyperbolic tangent [63], b is the bias, k 1 and k 2 are the derived kernel functions. k 1 is obtained by subtracting the sum of two rows values from the current first-row value, which capture trends of different periods. While k 2 is derived by subtracting the sum of two columns values from the current first-column value, which captures trends of the same periods. Finally, filters are applied to data and feature maps are produced.
3) The ReLU layer takes an input and performs a threshold operation on it and also assigns all negative numbers to the mean of its corresponding value. Computing the mean operation may introduce some local translation invariance to the model. 4) Max pooling takes an input and reduces the size of it by taking the maximum value in each non overlapping block. In this layer, overfitting is resolved by taking the abstracted form of data representation, which will eventually reduce the dimensionality and computational cost via reduction in the number of parameters. The max pooling ensures that the model learns the basic translational invariance of internal representation of the data. 5) In the fully connected layer, the energy function is adjusted, and classification is performed on the feature extracted dataset. However, to improve the fully connected layer, CRBM adopted from [64] is used. CRBM is the extension of the restricted Boltzmann machine, and it is known for modelling time series and human activities. Conceptually, CRBM works through the derivation of higher-level values from the lower-level values. From Fig. 5, CRBM is examined based on mathematical description in terms of energy function, learning rules and probability reference. Note that the CRBM model captures different trends within a single set of parameters. It also ensures real-time inference efficiently by minimizing contrastive divergence [65]. Once CRBM is trained, more layers can be added. The initial layer is preserved, and the sequence of the hidden state vector is influenced by the given input [65]. The equations for weights and biases used in this paper are obtained from [64] and the other parameters of CRBM used are learning rate ϒ = 10 −3 , hidden layer = 10; output layer = 1; momentum = 0.9 and weight decay = 0.002. The proposed formulations are implemented using an iterative algorithm, as shown in Algorithm 1. Transition from S (t − 1) to S (t) depends on optimal function values. Lines 5-6 show that at the start of the iteration, since W is randomly generated; however, there is a possibility that the value of |W (g + 1) − W (g)| may be larger than the convergence tolerance , which may cause unending loop. To solve the problem, is normalized at line 6. The periods δt = 1, 2, . . . is created to train the network W , until it converges, which leads to the improvement of the network. The process is repeated until it can no longer get the improvement of weight matrix of the proposed deep CNN, i.e., |W (g + 1) − W (g)| < . Thus, the close optimal policy is derived recursively.

V. CASE STUDY
In this section, the objective function of the proposed scheme is implemented on two test cases based on IEEE 118-bus [14] and IEEE 30-bus [19] distribution systems, respectively. Calculate W = W (g + 1) + ϒ; However, we do not modify these distributed systems.
The first test case validates the effectiveness of MMG coalition, while the second test case confirms the efficiency of the proposed method for cost allocations.
In addition, monthly wind turbine and photovoltaic data are collected from the national renewable energy laboratory and national wind technology centre [66]. A battery with the capacity of 60 kW/380 kWh is used for energy storage [39]. The ancillary service price considered is twice the real-time electricity price. The hourly demand loads are taken from [39] and the hourly electricity prices are taken from ERCOT [67].
Simulations are implemented using Matlab and power flow computations are performed using Matpower software tool. To avoid the repetition of formulating the optimal power flow model, which is not the focus of this study. This study adopts the formulations proposed by [47] for the implementation and used the results as the input parameters to implement our proposed system. Deep learning is implemented using the Matlab deep learning toolbox and the parameters used are taken from [58]. All simulations and results are obtained through the Matlab. Moreover, all results validate the effectiveness of the proposed algorithm. The hardware platform is a personal computer with 8 GB RAM and central processing unit of 1.60 GHz. Fig. 6 shows the actual 24 hours of real-time electricity prices and netloads. From the figure, it is observed that during the first 1-5 time slots, electricity users' netloads begin to rise. There are upward and downward patterns of netload in the subsequent time slots, which is due to the users' behavior. However, it is observed that the real-time prices are unstable during the 1-19 time slots, which drastically drop at the 21-time slot and eventually rise at the 24-time slot. The reason is that during this time slot, load demand is minimized.

A. AN ANALOGY WITH OTHER METHODS
To ascertain the efficacy of the proposed optimal policy method, greedy, MPC and ADP policies are used as the benchmark methods for the comparisons. The proposed optimal policy method finds the accurate day-ahead scheduling plan. Fig. 7 shows the different scheduling policy methods versus the average cumulative costs. From Fig. 7, average cumulative cost reported by the proposed optimal policy method and the ADP-h5 policy method have fast growth at the 1-5 time slots. It is observed that an extra power is purchased for charging the battery, which means that the real-time prices are low (i.e., τ (t) P real (t) ≤ 20 cents/kWh). As the real-time prices increase, the energy from the battery will be discharging to meet the load demands of electricity users, which implies that energy cost savings are achieved. Based on the results, the proposed optimal policy method has a lower average cost from 1-24 time slots as compared to the other methods. It also confirms that the proposed optimal policy method is effective and can provide an accurate day-ahead scheduling plan within these time slots. In Table 2, the summary of the daily operating costs in terms of mean, median, minimum, maximum, the first quartile (Q1), third quartile (Q3) and the standard deviation (Std) versus the demand-side policy methods are presented. From the results, the proposed optimal policy method achieves a reduced cost of $168.96 as compared to its counterparts. Considering that there are fluctuations in the demand loads, electricity prices and the renewable generations, the mean of MPC and ADP policy methods are relatively close to one another. It also confirms that these methods can adapt to such fluctuations. Table 3 shows the average daily operating costs of the DGs. The proposed optimal policy method achieves the minimal grid operating cost of $166.85 as compared to its counterparts. From the results, ADP policy method performance is close to the proposed optimal policy method in terms of the grid's operating cost. Note that the proposed optimal policy method achieves better performance in terms of cost reduction for all DGs as compared to other policy methods.  Fig. 8 and Fig. 9 show the forecasting results of optimal function values of the proposed and different forecasting models. A brief description of the existing forecasting models is provided in this subsection. Encoder model is used to solve sequence-to-sequence prediction problem. It involves predicting the next value in a real-valued sequence. It also takes input and converts it to a fixed-sized vector and makes a prediction based on the fixed-sized vector [68]. Support vector machine (SVM) model is a discriminative classifier that solves a wide range of classification problems. It uses labelled training data and produces an optimal hyperplane that categorizes the output data [69]. Artificial nueral network (ANN) calculates the gradient of the energy function concerning the weight of the network using the chain rule. It is also used in training multi-layers networks [70]. RNN is a type of neural network that operates over a temporal sequence of the input vector, where the connection between  nodes creates a directed graph. It also uses memory to process the sequence of input vectors [39].
From the results in Fig 8, the proposed deep CNN is close to the actual data. However, the SVM model is not as efficient as its counterparts. Moreover, SVM model is unable to generalize the actual data. Similarly, Fig. 9 confirms that RNN and SVM models do not predict accurately as compared to the proposed deep CNN and ANN models, respectively. It is observed from the results that the Encoder model over-forecast the actual data. The reason is that the Encoder model works well for short sequence of data, however, inefficient for a long sequence of data, as it is difficult for the Encoder model to memorize the whole sequence of data into a fixed-sized vector. Also, as the sequence of data size increases, its performance degrades accordingly. The performance of the proposed deep CNN is evaluated using the mean absolute percentage error (MAPE), and root mean square error (RMSE) [71]. The smaller value of the error provides better forecasting accuracy. MAPE and RMSE are used to evaluate the accuracy of proposed models, as shown in Table 4. The RNN model achieves the highest RMSE value as compared to its counterparts, which is due to the curse of dimensionality problem. ANN model achieves second lease RMSE value close to the proposed deep CNN model; however, the proposed deep CNN model achieves the least values of MAPE and RMSE as compared to its counterparts. Other statistical parameters like the mean and standard deviation (Std) show that the proposed deep CNN, ANN and Encoder models have relatively close values. Note, the minimal Std value of RNN model is due to under forecasting of the actual data. Also, the mean value is calculated as the average of predicted optimal function values.

C. COMPUTATIONAL COMPLEXITY
The computational time for all models is presented in Table

D. ADAPTABILITY OF THE DEMAND SIDE
This study demonstrates the scenarios of dispatchable loads with different shedding costs. The parameters of the dispatchable loads are taken from [39]. For comparison, the prediction horizons such as 6 h and 5 h are examined, individually. The performance of the methods of the ADP policy and MPC policy are shown in Fig. 10 and Fig. 11, respectively. MPC-6h policy method uses the battery efficiently via charging for the all time slots. The charging of battery occurs due to  low electricity prices. However, within these periods, energy is obtained from the main grid. Whereas, Fig. 11 shows the battery is discharging at time slots t 1 , t 17 , t 19 , t 22 and t 24 , which are used to satisfy the load demands. Fig. 12 shows that dispatchable loads are available for consumption in eventually all the time slots. Whereas, Fig. 13 shows instabilities in the dispatchable loads during load shedding. Load shedding occurs when the estimated active power is not balanced properly during transmission. Load shedding problem is addressed using load balancing that defines a threshold value and as well as the number of load reductions.

E. EVALUATION OF THE PERFORMANCE OF THE PROPOSED COALITION METHOD
In this subsection, the proposed topology of each MG described in Fig. 3 is used and its parameters are taken from [47].

1) FIRST TEST CASE USING IEEE 30-BUS DISTRIBUTION SYSTEM
The simulation is performed for 168 h to ascertain the economy of the MMG coalition behavior for a long period to match the real world situation. The final costs for two coalition methods, i.e., nucleolus and Shapley are presented    in Table 6. As seen from the simulation results, each MMG receives energy cost savings, which substantiates the economic benefit of the MMG coalition operation. The proposed Nucleolus with CGA model achieves better performance in terms of high energy cost savings for MMG as compared to nucleolus with BD [14] and Shapley model [28], respectively. The improvement is achieved because the proposed nucleolus with CGA model has the ability to minimize the maximum expenses by eliminating negative energy cost reductions. The results of Shapley model do not provide reduced energy cost allocation as compared to its counterpart models. It is assumed that the Shapley model is not a core, it uses additivity in which the model must find the sum of participating players' expenses individually and calculates the weight of each participant. Independent MGs that do not participate in coalition have high energy costs as compared to their counterparts.

2) MULTI-MICROGRID COOPERATION ECONOMY ANALYSIS
In this subsection, a single MG (MG 10) is considered for analysis purposes. Energy generation cost is reduced via mutual energy exchange among MMGs, which also increases the efficient usage of zero cost renewable energy. Fig. 14 shows the actual solar power generation, while Fig. 15 reports the generation of small hydro-solar unit for independent operation. In the independent operation case, small hydro-solar  unit needs to supply its own power independently and may incur high energy cost during generation, as there is no energy exchange. From Fig. 16, the coalition operation case shows more small hydro-solar generations than the independent case. The reason is that, when forming the grand coalition, MMG with small hydro-solar generation receives energy from MMG with much solar generation and less user's load demand than the independent MG.
The energy allocation costs of the coalition for all models are presented in Table 7. As shown from the results, each MMG receives energy cost savings, which substantiates the economic benefit of the MMG coalition operation. From the results, the proposed nucleolus with CGA model has reduced energy allocation cost as compared to nucleolus with BD model [14]. However, in the Shapley model [28], there is a drastic reduction of energy allocation cost for all MMGs as compared to its counterpart models. The achievement of Shapley model is based on assigning a random value to the weight of all MMGs. However, the drastic reduction in energy allocation cost by Shapley model may create financial problems when the energy allocation cost, is set below the corresponding expenses borne by each MMG. Overall, the proposed model allows the MMG to exchange power within the distribution system and argument power consumption of its neighboring MMG for economic benefits.

VI. CONCLUSION
In this paper, each MG achieves energy cost savings efficiently through the coalition of MMG. A mechanism for fair allocation of expenses to each MMG is proposed in this work to ensure MMG's stability. The proposed mechanism uses CGA to obtain the core solution for the cooperative game. Simulation results show that the MMG system that implements our proposed nucleolus with CGA model achieves high energy cost savings as compared to MMG systems that implemented nucleolus with BD, Shapley and MMG without a coalition, i.e., independent MG. However, the nucleolus with BD model requires high computing resources as the number of MMGs increases. Similarly, Shapley model assigns a random weight to each MMG based on factorial, which becomes computationally infeasible as the number of MMG grows exponentially. Furthermore, simulation time shows the This study also proposed an optimal policy model for the dynamic EM of MMG in real-time. The proposed model is compared with three existing policy models, such as greedy policy, ADP policy and MPC policy. Moreover, the proposed optimal policy method achieves MG's daily operating cost reduction up to 87.86% as compared to 79.52% for the MPC method, 73.94% for the greedy policy method and 79.42% for ADP method.
In future, this study intends to improve the proposed method with reinforcement learning. In this way, a closedloop control policy may optimally schedule the ESS operations. It will also consider a situation when a fault occurs in a short circuit of MMG constrained at its initial setup. As a result, there are chances that system recovery may surmount to high operating cost, which may be averted by the proposed method.