Scheduled System Maintenance
On Friday, October 20, IEEE Xplore will be unavailable from 9:00 PM-midnight ET. We apologize for the inconvenience.

• Abstract

## NOMENCLATURE

 B Battery. Battery_capinitial Initial value of energy stored in the battery. Battery_capmax Maximum value of energy stored in the battery. Battery_cap($t$) Energy stored in the battery at time $t$. Battery_capsp Energy stored in the battery at SOC set point. CL Critical load. DG Diesel generation. DT Decision tree. $N_{\mathrm{ chr}}(t)$ Continuous number of states the battery is charging. NCL Noncritical load (controllable load). $N_{\mathrm{ dschr}}(t)$ Continuous number of states the battery is discharging. $P(t)$ Power dispatch set point. $P_{B}(t) P(t)$ For charging and discharging battery. $P_{{\rm {CL}}\_{}D}(t)$ Power demands of the CL at time $t$. $P_{{\rm {CL}}\_{}S}(t)$ Total $P(t)$ supplied to CL. $P_{{\mathrm{ CL(PV}}/W)}(t) P(t)$ Supplied to CL by PV and wind. $P_{\text {CL}(B)}(t) P(t)$ Supplied to CL by battery. $P_{\text {CL}(\text {DG})}(t) P(t)$ Supplied to CL by DG. $P_{\text {CL}(G)}(t) P(t)$ Supplied to CL from grid. $P_{\mathrm{ DG}}(t) P(t)$ By diesel generator. $P_{\text {DGmax}}$ Maximum available DG. $P_{G}(t) P(t)$ By grid. $P_{G{\rm max\_{}import}}(t)$ Maximum power that the grid can import at time $t$. $P_{G{\rm max\_{}export}}(t)$ Maximum power that the grid can export at time $t$. $P_{{\rm {NCL}}\_{}D}(t)$ Power demands of the NCL at time $t$. $P_{{\rm {NCL}}\_{}S}(t)$ Total $P(t)$ supplied to NCL. $P_{\text {NCL}(\text {PV}/W)}(t) P(t)$ Supplied to NCL by PV and wind. $P_{\text {NCL}(B)}(t) P(t)$ Supplied to NCL by battery. $P_{\mathrm{ PV}}(t)$ Power available from PV system at time $t$. $P_{W}(t)$ Power available from wind system at time $t$. PV Photovoltaic. SOC(${t}$) State of charge of battery at time $t$. SOCinitial SOC initial value. SOCmin SOC minimum value. SOC $_{\rm min\_{}reserve}$ SOC minimum value with reserve. SOCmax SOC maximum value. SOCsp SOC set point. $U(t)$ Total utility for evaluating microgrid operation at time $t$. $U_{B}(t)$ Utility for evaluating SOC(${t}$). $U_{\mathrm{ BCY}}(t)$ Utility for evaluating $N_{\mathrm{ chr}}(t)$ and $N_{\mathrm{ dschr}}(t)$. $U_{\mathrm{ CL}}(t)$ Utility for evaluating $P_{{\rm {CL}}\_{}S}(t)$ to meet $P_{{\rm {CL}}\_{}D}(t)$. $U_{\mathrm{ DG}}(t)$ Utility for evaluating $P_{\mathrm{ DG}}(t)$ to meet $P_{{\rm {CL}}\_{}D}(t)$. $U_{G}(t)$ Utility for evaluating $P_{G}(t)$. $U_{\mathrm{ NCL}}(t)$ Utility for evaluating $P_{{\rm {NCL}}\_{}S}(t)$ to meet $P_{{\rm {NCL}}\_{}D}(t)$.
SECTION I

## INTRODUCTION

Microgrids integrate modular distributed energy sources, such as wind, solar, and fuel cells, with storage devices and controllable loads to form a low-voltage distribution system. A microgrid can be defined as a small-scale, self-supporting network driven by on-site generation sources with the ability to separate from an external grid for sustainability or energy security purposes. They improve grid reliability and supply sustainable and quality electric power. Microgrids can be connected to a main power network or operated autonomously, similar to the power systems of physical islands [1].

Smart microgrids promise a new approach for electric power generation through the clusters of small distributed on-site generators. There may be numerous advantages in developing microgrids, including the following:

1. to manage growing demand without overloading existing electricity infrastructure or expanding capacity;
2. to reduce frequency and duration of grid disruption through distributed energy resource management system and self-healing functionality;
3. to ensure energy security through self-sustainability;
4. to address climate change by utilizing clean energy resources;
5. to supply electric power to areas where local utility is unable to provide reliable service or have access to customers.

Because the output of renewable resources fluctuates depending on the weather condition and time of day, the majority of renewable energies cannot guarantee a continuous and steady amount of power generation. Besides, electricity demand may be partly unpredictable in microgrids, which adds another dimension of complexity to the control of power system. Grids that want to allow for the integration of renewable energy sources (RESs) have to consider these destabilizing effects due to such variable energy inflow and outflow. As variable power sources (such as wind and solar power) reach high levels of grid penetration, energy storage devices and intelligent energy management that can handle variability and uncertainty become essential. For a microgrid to provide reliable power supply, advanced control algorithms to manage energy dispatch and maximize its performance become crucial.

Conventional dynamic energy management systems implemented using decision trees DTs (D-DEMS) have been developed, but are highly inefficient. They dispatch energy based on available power and the state of energy storage. In addition, the D-DEMS supplies the entire CL and NCL requirements without assigning any priority, and will charge the battery only if more energy is available than load [2]. If energy sources are not available, the battery supplies full-load requirements until it discharges fully. This minimizes the lifespan of energy storage through random and abrupt charging and discharging decisions, and also reduces the security and reliability of electric power supply by not considering load priorities (no temporary load shedding), which may result in a total power outage. Expert systems-based energy management schemes for battery storage have been used [3], [4] to incorporate operational constraints, such as state of charge (SOC) limit and charge/discharge current limits, for dispatching renewables.

Computational intelligence methods are widely employed in EMSs [5], [6]. Research in this area has been active. In [7], real-time particle swarm optimization (PSO)-based energy management of a stand-alone hybrid wind and microturbine energy system was presented. Dynamic programming-based power management optimization was compared with simple rule-based management for a grid-connected PV with batteries in [8]. An EMS for a microgrid using a fuzzy logic expert system to minimize the operation’s cost and emission levels was developed in [9]. A smart EMS to achieve optimal microgrid operation costs was presented using genetic algorithms in [10]. The nonlinear regression technique was used for real-time energy management in microgrids in [11]. A fuzzy logic control EMS was presented in [12] to satisfy the energy load demand and maintain the SOC of the battery and the hydrogen tank level between certain target margins, while trying to optimize the utilization cost and lifetime of the energy storage system. An expert energy system was proposed in [13] to simultaneously minimize the total operation cost and the net emission by finding the optimal set points of distributed energy resources and storage devices. An adaptive training algorithm based on genetic algorithms, fuzzy clustering, and neuron-by-neuron algorithms was used for real-time microgrid operations in [14]. Generation and load forecast models were combined to create a new a microgrid EMS in [15]. To optimize energy dispatch, an optimized fuzzy logic controller (FLC)-based PV energy dispatch controller was implemented using a PSO algorithm [16], in which an FLC was developed to assign energy dispatch priority to CLs, then to the battery, and finally to NCLs. An FLC membership function and rule set was optimized by PSO, such that the optimized FLC could maximize energy to the system loads while maintaining a higher than average battery SOC.

Action-dependent heuristic dynamic programming (ADHDP), a type of adaptive critic design (ACD)-based controller, was implemented in a PV system in [17] and in a battery system [18], in which two neural networks were used to derive an optimal control strategy. The intelligent methods in [16] and [17], however, were limited to energy management between PV and the energy storage system and did not demonstrate dynamic energy dispatch considering grid-connected mode and additional generation, such as wind turbines and diesel generators.

The microgrid considered in this paper consists of wind and solar power, a diesel generator, and a battery energy storage system. For maximum utilization of renewable energy sources, an intelligent DEMS (I-DEMS) was developed, which optimizes the microgrid system operations on a minute-by-minute time scale using an optimal energy dispatch strategy. In this paper, backup battery energy storage and thermal generation were used to overcome the uncertainty and nondispatchability challenges in a microgrid with RESs.

The primary contributions of this paper include the following.

1. Development of an adaptive I-DEMS using an ADHDP approach. ADHDP is based on combined concepts of adaptive dynamic programming and reinforcement learning concepts. The ADHDP framework employs two neural networks.
2. Dynamic optimization of the I-DEMS using an evolutionary strategy to improve its dispatch solutions over time. To the best of our knowledge, this is the first paper to present the introduction of evolutionary learning approach to empower adaptive dynamic programming to solve dynamic optimization problems such EMS optimally in a faster manner.
3. Development of an I-DEMS from data (from a location’s PV solar and wind power profiles, and customers’ CL and NCL profiles).
4. An I-DEMS framework to accomplish multiple objectives. In this paper, these objectives include maximizing reliability, self-sustainability, environmental friendliness, extended battery life, and maximize customer satisfaction.
5. Development of a performance index (PI) to compare I-DEMS with D-DEMS. This is an applicable and useful metric to evaluate other DEMS approaches.

Using the I-DEMS based on an evolutionary ADHDP framework allowed the RESs and energy storage devices to be utilized to their maximum in order to supply the CL at all times. The I-DEMS used the microgrid’s system states to generate energy dispatch control signals, while a forward-looking network (FLN) evaluated the dispatched control signals over time. A penalty and reward concept was introduced in utility function formulations to be utilized in the evolutionary ADHDP approach. The integration of evolutionary learning into the I-DEMS framework allowed for fast online dynamic optimization of the I-DEMS performance.

The remainder of this paper is organized as follows. Section II describes the microgrid model and the wind and solar energy and load profiles used in this paper. The development of DEMS based on the DT-based approach is presented in Section III. Section IV describes the I-DEMS framework, the dynamic utility formulation, and their development. Section V presents the dynamic optimization of the I-DEMS with evolutionary learning to yield fast and enhanced energy dispatch solution. Section VI presents typical results with the I-DEMS and discusses its performance in comparison with that of the D-DEMS for integrated grid-connected and islanded operations. The robust performance of the I-DEMS by examining microgrid operations under different battery energy storage conditions is also presented in this section. Finally, the conclusion is drawn in Section VII.

SECTION II

## MICROGRID STRUCTURE

The microgrid system in Fig. 1 consisted of hybrid energy sources, namely, 40-kW solar PV generation, 30-kW wind generation, a 10-kW diesel generator, a battery energy storage system, and CL and controllable (noncritical) load. The solar and wind power profiles chosen for the study are shown in Fig. 2. Fig. 3 shows the CL and controllable load profiles.

Fig. 1. Microgrid system showing the interface to the I-DEMS and microgrid operator. The maximum size in kilowatt is shown for each microgrid component.
Fig. 2. Renewable energy [solar, $P_{\mathrm{ PV}}(t)$ and wind, $P_{W}(t)$] profiles for a period of two days (a total of 2880 min).
Fig. 3. Critical ($P_{{\rm {CL}}\_{}D}(t))$ and controllable ($P_{{\rm {NCL}}\_{}D}(t))$ load profiles for a period of two days.

The battery size in Fig. 1 was designed to power CLs for at least 3 h. The assets considered in the microgrid system (CL, NCL, PV, wind generation, DG, and battery) were aggregated or lumped representations of assets; thus, all these assets were scalable not only in terms of kilowatt/megawatt size, but in number of different types of components as well. This will allow the development of different sized microgrid clusters, which can operate independently or in parallel with the broader utility grid and perform under a centralized EMS.

The maximum CL and controllable load were 20.6 and 26.1 kW, respectively. Day 1 data (1440 min) was used to develop I-DEMS. Day 2 data (a second set of 1440 min) was used to evaluate the I-DEMS’s performance on unseen data. The microgrid operation DEMSs were implemented in MATLAB in this paper.

SECTION III

## DECISION TREE APPROACH-BASED DEMS

The DT approach-based DEMS is a deterministic energy dispatch manager that evaluates the system states of a microgrid based on set rules and, then, computes the respective energy dispatches. The operation of the D-DEMS is shown in Fig. 4 (flowchart). In developing the rules for the D-DEMS, the time varying CL was given the highest priority, the load that must be met at all times. In a nut shell, the D-DEMS operation strictly implements the following dispatch steps sequentially in two possible cases.

1. Case I: If sum of energy from RESs, ($P_{\mathrm{ PV}}(t) + P_{W}(t))$ is greater than the CL demand $P_{{\rm {CL}}\_{}D}(t)$.
1. The CL is met first.
2. The surplus energy will be supplied to the battery, $P_{B}$(${t}$), to increase its SOC(${t}$) to the set point, SOCsp, if determined necessary.
3. At this point, the remaining energy, ($P_{\mathrm{ PV}}(t) + P_{W}(t)- P_{{\rm {CL}}\_{}D}(t)- P_{B}(t))$, will be used to meet NCL $P_{{\rm {NCL}}\_{}D}(t)$ in part or full as determined.
4. After satisfying the above three steps, any excess energy, ($P_{\mathrm{ PV}}(t) + P_{W}(t)- P_{{\rm {CL}}\_{}D}(t)- P_{B}(t)- P_{{\rm {NCL}}\_{}D}(t))$, will be exported to the grid, upon receiving a request from the utility network operator. Prior to supplying any $P_{G}(t)$ to the grid, $P_{{\rm {NCL}}\_{}D}(t)$ must be supplied in full.
2. Case II: If energy from RESs, ($P_{\mathrm{ PV}}(t) + P_{W}(t))$ is insufficient to meet CL $P_{{\rm {CL}}\_{}D}(t)$.
1. Dispatches from battery, $P_{B}(t)$, diesel generator, $P_{\mathrm{ DG}}(t)$, and grid, $P_{G}(t)$, will strictly be utilized in this sequence to meet CL $P_{{\rm {CL}}\_{}D}(t)$.
2. The NCL $P_{{\rm {NCL}}\_{}D}(t)$ will never be met using DG, $P_{\mathrm{ DG}}(t)$, or grid import, $P_{G}(t)$.
3. Under this condition, the battery will not be charged and no grid export is possible.
Fig. 4. DT approach-based DEMS (D-DEMS) for the microgrid in Fig. 1.

In the case of an islanded microgrid operation, when the battery energy storage system and the diesel generator were utilized to their maximum (SOC $_{\min }$ and $P_{\mathrm{ DGmax}})$ but fell short of meeting the full CL demand, a concept of battery storage reserve capacity was introduced to supply the unmet CL. The energy storage unit was allowed to discharge to a new minimum SOC level (SOC $_{\rm min\_{}reserve})$, thus meeting 100% of CL at all times.

The following sign convention is utilized in this paper: 1) a positive $P_{B}(t)$ represents discharging; 2) a negative $P_{B}(t)$ represents charging; 3) a positive $P_{G}(t)$ represents import; and 4) a negative $P_{G}(t)$ represents export.

SECTION IV

## INTELLIGENT DYNAMIC ENERGY MANAGEMENT SYSTEM

The multiple objectives of the I-DEMS in this paper (Fig. 1) are as follows.

1. Supply the power requirements of the CLs, $P_{{\rm {CL}}\_{}D}(t)$, at all times. This provides 100% reliability with regard to power supply to CLs.
2. Maintain the battery SOC at an optimal level (defined as the operator through a set point for SOC). This ensures and supports meeting the reliability goal in 1).
3. Maximize controllable load dispatch $P_{{\rm {NCL}}\_{}D}(t)$. This means more customer satisfaction and it creates opportunities for demand-response capability.
4. Maximize the utilization of renewable energy resources, and minimize the use of DG and import/export from the grid. This means more environmental friendly and sustainable operation.
5. Increase battery life by maximizing battery charging or discharging for a continuous number of states (each state is the dispatch instant, every minute in this paper) and thus enhanced sustainability by reducing the rate of replenishing batteries.

These objectives ensure the microgrid operations to be self-sustainable, reliable, environmental friendly, and technology ready for smart grid functionality. The cost of operation of microgrids, hour-to-hour, will not be a dominant factor in the future, as communities invest in microgrid assets and as I-DEMS technology advances. Therefore, the emphasis here did not include economics in the cost function to be optimized dynamically. The I-DEMS framework is developed based on an ADHDP approach. The ADHDP approach is based on the combined concepts of adaptive dynamic programming and reinforcement learning [19].

### A. Adaptive Dynamic Programming and Reinforcement Learning

Dynamic programming is a very useful tool in solving optimization problems. In particular, it can easily be applied to optimal control of nonlinear systems with or without constraints on the control and state variables. However, it is often computationally difficult to run exact dynamic programming due to the backward numerical process required to find the optimal solution, as a result of the curse of dimensionality [20]. To circumvent this problem, adaptive dynamic programming was developed to approximate the cost-to-go function $J$ of dynamic programming [given in (1)] by using a $J$-function approximator, such as a neural network [19]. Inputs to this approximator are measurements of known/predicted system outputs and control inputs. This approach was first introduced in [21] and was later called ACD. ACD seeks to minimize the expected value of the cost function with respect to the control, conditioned on knowledge of the system dynamics, its state, and the probability distributions of uncertainties [22]. Several synonyms for ACD can be found in the literature, including approximate dynamic programming [19], [23], adaptive dynamic programming [24], [25], HDP [26], [27], neural dynamic programming [28], and reinforcement learning [29].

As shown in Fig. 5, this ACD architecture (HDP) consists of the critic, model, and action networks. These three networks perform the function of evaluation, prediction/estimation, and control, respectively. The model-independent HDP architectures are referred to as ADHDP [19].

Fig. 5. Forward path of HDP architecture.

ACD-based designs yield nonlinear EMS, whereas classical optimal designs typically yield linear EMS. Moreover, classical methods rely on the linear model of the system, whereas ACD approaches can be a measurement-based design [30]. The ADHDP can use the approximation capabilities of neural networks to develop optimal DEMS from measurements of available system inputs and outputs, feedback control, and reinforcement signals received, as shown in Fig. 6.

Fig. 6. ADHDP network receiving feedback for learning from the external environment/plant through a primary reinforcement signal, $U(t)$.

In this paper, ADHDP uses two neural networks, an action network (I-DEMS), which provides the control signals, and a critic network, also referred to as FLN, which criticizes the I-DEMS performances. These two networks solve the Hamilton–Jacobi–Bellman equation of optimal control [19]. The FLN network approximates the cost-to-go function $J$ of Bellman’s equation of dynamic programming, which is given by TeX Source$$J( t )=\sum _{i=0}^\infty {\gamma ^{i}\times U(t+i)}$$ where $\gamma$ is the discount factor in the range [0, 1]. The utility, $U(t)$, guides the FLN in evaluating the I-DEMS performance. The I-DEMS network provides optimal control to minimize or maximize the cost-to-go function $J$.

The utility function formulation, variable scaling, discount factor selection, learning rates, and convergence of the ADHDP controller are explained in [19]. The ADHDP-based I-DEMS optimizes power dispatch to and from CL and controllable load, battery, grid, and diesel generator to achieve the multiobjectives mentioned above. The system states are the actual wind power, solar power, battery SOC and load demand values, and the grid status, as shown in Fig. 1.

Fig. 7 shows the ADHDP architecture-based I-DEMS for a smart microgrid. The I-DEMS drives the microgrid system to the desired states, and the FLN provides dynamic performance feedback to the I-DEMS with respect to achieving the desired states. I-DEMS learns a policy function, whereas the FLN learns a value function given by Bellman’s equation of dynamic programming.

Fig. 7. I-DEMS controller development framework based on the ADHDP approach.

### B. Forward-Looking Network (Critic Neural Network)

The FLN was implemented using a multilayer perceptron (MLP) feedforward neural network, as shown in Fig. 8. The inputs to the FLN consisted of the inputs (states) and outputs (power dispatches) of the I-DEMS at time instances $t$, $t-1$, and $t-2$, as well as a bias term. The five system states were the CL power requirements, NCL power requirements, SOC of the battery, PV, and wind power (a total of 15, including the states at the time delayed values). The power dispatches were the CL and NCL and energy to the battery, and from the diesel generator and the grid (a total of 15, including the dispatches at the time delayed values).

Fig. 8. Forward-looking neural network implementation (critic network) for providing performance feedback to the I-DEMS (action network) dispatch controller.

The input, hidden, and output layers consisted of 31 linear neurons, 30 sigmoidal neurons, and one linear neuron, respectively. The output of the FLN in the ADHDP framework was the approximated $J$, given by (1). As a rule of thumb, the number of neurons in the hidden layer should at least be the size of the inputs. The optimization of the neural network structure is not within the scope of this paper. The utility function, $U(t)$, provides the FLN with immediate performance measure of the I-DEMS dispatch signals and is given by TeX Source\begin{align} U(t)=&U_{\text {CL}}\!+ U_{\text {NCL}}(t) \!+ U_{B} (t) \!+ U_{\text {DG}}(t) \!+ U_{G} (t)\!+ U_{\text {BCY}}(t)\notag \\=&{w}_{1}(t)\times \!f (P_{\text {CL}\_{}S}(t))+{w}_{2}(t)\times \!f (P_{\text {NCL}\_{}S}(t))\notag \\&+\,{w}_{3}(t)\times \!f (\text {SOC}(t))+{w}_{4}(t)\times \!f (P_{\text {DG}}(t))\notag \\&+\,{w}_{5}(t)\times \!f (P_{G}(t))+{w}_{6}\times f (N_{\text {chr}}(t) + N_{\text {dschr}}(t))\notag \\ {}\end{align} where weights ${w}_{1}(t)$, ${w}_{2}(t)$, ${w}_{3}(t)$, ${w}_{4}(t)$, ${w}_{5}(t)$, and ${w}_{6}$ are based on prioritizing the power dispatch to the CL, controllable load, battery, diesel generator, grid, and the sum of the continuous number of states that the battery is charging and discharging. The weights were carefully selected to handle the objectives of I-DEMS: 1) self-sustainability; 2) reliability; and 3) carbon emission reduction. This allows the development of smart microgrid clusters with improved reliability, efficiency, economics, and sustainability of the electricity generation and distribution. If necessary, the weights can be selected to prioritize power dispatch by considering the generation cost of each unit. However, cost consideration is not within the scope of this paper.

The utility function is very important, because it guides the critic network to improve the performance of the action network. The feedback loop allows the action network to improve behavior over time. Algorithms 15 illustrate how to compute the subutilities in (2). Algorithm 6 illustrates the computation of the $U(t)$. $U(t)$ was used to evaluate the microgrid operation at time $t$ with the objective of reducing the amount of unmet loads, if any, reducing the use of DG and grid generation as much as possible, and enhancing the lifecycle of the battery [31]. The operational and maintenance costs of the battery were assumed to be proportional to the number of charging and discharging cycles [32]. The utility formulation was based on the concept of penalty and reward allocation. The I-DEMS was rewarded for optimal dispatches and penalized for nonoptimal dispatches.

#### Algorithm 6 Total Utility Function Evaluating Microgrid Operation at Time $t$

The utility for evaluating supplied energy to CL to meet CL demand is given in TeX Source$$U_{\mathrm{ CL}}(t) = {w}_{1}(t)\times \text {abs}(1-P_{{\rm {CL}}\_{}S}(t)/P_{{\rm {CL}}\_{}D}(t))$$ where TeX Source$${w}_{1}(t) = 1- ({w}_{2}(t) + {w}_{3}(t)+{w}_{4}(t)+{w}_{5}(t) +{w}_{6}(t)).\qquad$$ The weights ${w}_{1}(t)$, ${w}_{2}(t)$, ${w}_{3}(t)$, ${w}_{4}(t)$, ${w}_{5}(t)$, and ${w}_{6}$ were initialized to some values and then updated dynamically based on system states and dispatches, as outlined in Algorithms 15.

The development of FLN in the ADHDP framework began with a discount factor of zero, where $J(t)$ was trained to meet $U(t)$ and then gradually increased to a value lower than one. The horizon was determined by the discount factor. During training, the objective of FLN development was to minimize (5) and update the FLN weights. The weight update algorithm used was the standard backpropagation TeX Source$$\sum \limits _{t=0}^\infty {E_{\text {FLN}}^{2}(t)}$$ where TeX Source$$E_{\text {FLN}}( t )=U( t )+\gamma \cdot J( t )-J(t-1).$$

### C. I-DEMS (Action Neural Network)

The I-DEMS was implemented using an MLP feedforward neural network, as shown in Fig. 9. I-DEMS learned the policy iteration using the feedback from FLN, which was used to update the weights of the MLP. The number of neurons in the input layer is equal to the number of input features. The inputs to the I-DEMS were composed of five system states at time instances $t$, $t-1$, and $t-2$ (a total of 15 including the states at the time delayed values); five power dispatch set points at time instance $t-1$ and $P(t-1)$; and a bias term.

Fig. 9. Action network implemented using a neural network.

The five system states were the CL power requirements, NCL power requirements, SOC of the battery, PV power, and wind power. The input, hidden, and output layers of the I-DEMS consisted of 21 linear neurons and 50 sigmoidal neurons. The output of I-DEMS (five linear output neurons) is the power dispatches to the CLs, NCLs and battery, and power dispatched from the diesel generator and the grid.

The initial development of the I-DEMS is based on mimicking the rule set of the D-DEMS using the popular supervised learning, as shown in Fig. 10 (block diagram).

Fig. 10. The initial development of the I-DEMS using supervised learning approach based on the D-DEMS’s dispatch signals.

The power dispatched by the I-DEMS may result in a small degree of power unbalance between the load dispatched and the sources dispatched. The power dispatched to the loads was the sum of CL and NCL. For an islanded microgrid, the power dispatched from the sources was the sum of RESs, battery, and DG, while for the grid-connected microgrid, that value was the sum of RESs, battery, diesel, and import from the grid. Any imbalance was addressed by implementing the following checks at the output of the I-DEMS.

1. The power dispatched to the CL and NCL does not exceed the CL and NCL demands, respectively.
2. The power dispatched to charge the battery does not exceed the sum of energy from RESs, and the power dispatched to discharge the battery does not exceed the total load demand.
3. The SOC of the battery has a minimum and maximum threshold, SOC $_{\min }$ and SOC $_{\max }$, respectively.
4. The power dispatched by the DG does not exceed its maximum capacity and is not negative.

The microgrid studied herein operates in islanded and grid-connected modes. The power balance rules for each of these modes are described as follows.

1. Power balance for islanded microgrid operation is as follows.
1. If the sum of energy from RESs exceeds the total load dispatched, diesel operation is restricted.
2. If the total load dispatched exceeds the sum of energy from RESs, battery discharging and DG are strictly utilized in this sequence. The battery supplies the imbalance if its SOC($t$) is greater than SOC $_{\min }$.
3. With the introduction of the battery reserve capacity, the battery attains a minimum SOC of SOC $_{\rm min\_{}reserve}$ only if the energy storage device and the diesel generator are utilized to SOC $_{\min }$ and $P_{\mathrm{ DGmax}}$, respectively, and cannot meet the CL power demand.
4. If the sum of energy from RESs exceeds the total load dispatched, battery charging occurs. The battery consumes any imbalance if SOC($t$) is less than SOC $_{\max }$.
2. Power balance for grid-connected microgrid operation is as follows.
1. If the sum of RESs exceeds the total load dispatched, grid import and diesel operation are restricted.
2. Power exported to the grid does not exceed the sum of RESs, and power imported from the grid does not exceed the total load demand.
3. If the load dispatched exceeds the sum of RESs, grid export is restricted.
4. If the total load dispatched exceeds the sum of energy from RESs, battery discharging, DG, and grid import are strictly utilized in this sequence. The battery supplies any imbalance if its SOC($t$) is greater than SOC $_{\min }$.
5. If the sum of energy from RESs exceeds the total load dispatched, battery charging and grid export occur strictly in this sequence. The battery consumes any imbalance if SOC($t$) is less than SOC $_{\max }$.
SECTION V

## DYNAMIC OPTIMIZATION OF THE I-DEMS

The development of the optimal energy dispatch controller in the ADHDP framework was an iterative process in which the utility and cost-to-go functions decreased over time, thus improving the performance of the I-DEMS. In this ongoing dynamic optimization process, if the total utility [given by (2)] for evaluating the microgrid’s operation increased, then the I-DEMS weight updates were rejected, and the previous best weights were retained. Always, the best I-DEMS weights were used for power dispatches. The objective of the dynamic optimization of the I-DEMS was to minimize $J(t)$ overall, thereby minimizing the sum of all $U(t)$ values over the horizon of the operation. The constant $dJ(t)/dJ(t)= 1$ was backpropagated through the FLN to obtain $dJ(t)/dP(t)$ in order to adjust the dispatch, $P(t)$, so as to minimize $J(t)$. A feedback error from the FLN to the I-DEMS was computed using (7), and the I-DEMS weights were updated TeX Source$${Err}_{DEMS}( t )=\frac {\partial J(t)}{\partial P(t)}.$$

The dynamic optimization of the I-DEMS based on ADHDP develops one optimal DEMS over time. In order to enhance the speed of convergence for finding the optimal or near-optimal policy at every dispatch time, the concept of a modified evolutionary algorithm was combined with ADHDP. The enhanced I-DEMS framework consisted of multiple I-DEMSs, as shown in Fig. 11; this architecture enhances the speed of convergence and dispatches near-optimal controls. In order to implement the evolutionary I-DEMS, the selection operator used in conventional evolutionary computing was borrowed, but the mutation operator was replaced by the ADHDP FLN feedback to produce offspring.

Fig. 11. Enhanced and fast I-DEMS framework to achieve near-optimal solution. This framework uses a modified evolutionary learning algorithm.

Three parent I-DEMS controllers (including default I-DEMS) were initialized, in which each parent produced an offspring I-DEMS using the feedback from the FLN (Figs. 7 and 9). The parent default DEMS controller is the controller that resulted from the offline ADHDP process discussed in Section IV. The learning process still continues, while I-DEMS is operating online, however, in the online evolutionary optimization of DEMS (Fig. 11); the default DEMS is always present to make sure that the power dispatch is near-optimal and meet the CL demand. In other words, reliable controller is present at all times. At time instant $t = 3$, the I-DEMS yielding the lowest utility value (of now a total of six I-DEMS) was selected to dispatch optimal energy to the microgrid system and to update the FLN. The population of parent I-DEMSs was reduced to three before the next dispatch instance (one default I-DEMS and two others with lowest utility value selected from the evolutionary process). Two additional checks were incorporated into the operation. First, it was likely that the best two I-DEMSs would have converged to the same weight set, resulting in equal utility values. Therefore, one of the controller’s weights was perturbed by adding a random number between [−0.01, 0.01]. This approach tries to overcome any local minimum problem that may occur. The second check verified that the selection of the dispatch adhered to the objective of the I-DEMS, which was to ensure that power supplied to CL met 100% of the CL power demand. However, when the total utility, $U(t)$, was reduced but the CL dispatch did not meet total CL demand, the parent I-DEMS and the critic network weight updates were rejected, and the previous best weights were retained.

SECTION VI

## RESULTS AND DISCUSSION

### A. Initialization of Parameters

In this section, the performance of I-DEMS is compared with that of the D-DEMS. In this paper, the initial I-DEMS was developed using the first day’s data and an initial battery SOC of 35%, and the DEMS performances were evaluated using the second day’s data (Figs. 2 and 3) and under different battery conditions. The maximum, minimum, and set point SOC of the battery were 100%, 30%, and 65%, respectively. The initial weights ${w}_{1\rm ini}$, ${w}_{2\rm ini}$, ${w}_{3\rm ini}$, ${w}_{4\rm ini}$, ${w}_{5\rm ini}$, and ${w}_{6\rm ini}$ in the utility functions were 0.3, 0.05, 0.05, 0.25, 0.25, and 0.1, respectively, and were dynamically updated during microgrid operation. An example of dynamic change in the weight ${w}_{4}(t)$ for given set of operating conditions is shown in Fig. 12. The variation of ${w}_{4}(t)$ changed the equation for computing $U_{\mathrm{ DG}}(t)$, as shown in Algorithm 2. As it can be observed, from 12:00 A.M. to 8:30 A.M., many times the initial value of ${w}_{4}(t)$ (0.25) dropped to zero. At these instances, the power from renewable resources and battery was unable to meet the CL demand. Therefore, going through Algorithm 2, the value of ${w}_{4}(t)$ was changed to zero to relax the utility function and allow for maximum dispatch of diesel generator to meet CL.

Fig. 12. Dynamic changes of ${w}_{4}(t)$ for determining the subutility function $U_{\mathrm{ DG}}(t)$.

The number of charges and discharges equaled one only when the battery discharged or charged continuously for three successive dispatches, respectively. The look-ahead horizon of ten steps, which was equivalent to 10 min in this paper, was determined using the discount factor of 0.8.

### B. Performance Index

To the best of our knowledge, there is no standard PI for evaluating the dispatch of a microgrid DEMS. Therefore, a PI in (8) was developed to compare the I-DEMS and D-DEMS according to the objectives of this paper. The PI’s numerator is the ratio of renewable energy utilization to the total energy dispatched from all generation sources. It equals one when all of the CL and controllable load requirements are met without the use energy from the diesel generator, grid, or the battery. The denominator of the PI is the ratio of NCL demand to the NCL met. The denominator equals one when all of NCL is met. Thus, the maximum and ideally desirable value of (8) is 1. The higher the value of the PI, the better the DEMS is in its dispatch. For easiness of comparison, a normalized PI is obtained by dividing with the PIs of I-DEMS and D-DEMS with the PI of the D-DEMS TeX Source$$\textrm {PI} =\frac {\frac {\sum {(P_{\mathrm{ PV}}( t )} \,+P_{W}(t))}{\sum (P_{\mathrm{ PV}}( t )\,+P_{W}( t )\,+P_{\mathrm{ DG}}( t )\,+P_{B\_{}{\mathrm{ dschr}}}( t )\,+ P_{G\_{}{\mathrm{ import}}}( t ))}}{\frac {{\sum ( P_{{\rm {NCL}}\_{}D}(t))}}{\sum ( P_{{\rm {NCL}}\_{}S}( t ))}}.$$

This PI evaluates two aspects. First, an indication of how much of renewable energy was utilized to meet the load demand. Second, an indication of how much of the controllable load was met.

To consider battery life performance, a modified performance index considering battery life (PI-LF), given in (9), was developed to introduce reward to the I-DEMS depending upon the charge and discharge performance of the battery TeX Source$$\text {PI-LF} =\frac {\frac {\sum {(P_{\mathrm{ PV}}( t )} \,+P_{W}(t))} {\sum {(P_{\mathrm{ PV}}( t )} +P_{W}( t )\,+P_{\mathrm{ DG}}( t )\,+\,{\alpha \cdot P}_{B\_{}{\mathrm{ dschr}}}( t )\,+ P_{G\_{}{\mathrm{ import}}}( t ))}}{\frac {\sum ( P_{{\rm {NCL}}\_{}D}(t))} {\sum ( P_{{\rm {NCL}}\_{}S}( t ))}}\qquad$$ where TeX Source$$\alpha = \frac {N_{\mathrm{ chr-DDEMS}}+N_{\mathrm{ dschr-DDEMS}}}{N_{\mathrm{ chr-IDEMS}}+N_{\mathrm{ dschr-IDEMS}}}$$ where $N_{\mathrm{ dschr}}$ and $N_{\mathrm{ chr}}$ are the number of discharges and charges, respectively.

### C. Case Studies

Microgrid operations were altered between grid-connected and islanded modes every 5 min. This study was carried out for a 4000-kW-min battery with initial SOC levels at 35%, 60%, and 90%. The performance of the I-DEMS on its second day of operation was tabulated, as shown in Tables IIII. The total PV and wind generation on the second day were 206.88 and 208.75 kWh, respectively. The total CL and controllable load requirements were 303.41 and 245.37 kWh, respectively.

Table I Comparison of D-DEMS and I-DEMS Controllers’ Performance on the Second Day of Operation With a 35% Battery $\mathrm{SOC}_{\mathrm{ initial}}$
Table II Comparison of D-DEMS and I-DEMS Controllers’ Performance for the Second Day of Operation Witha 60% Battery $\mathrm{SOC}_{\mathrm{ initial}}$
Table III Comparison of D-DEMS and I-DEMS Controllers’ Performance for the Second Day of Operation Witha 90% Battery $\mathrm{SOC}_{\mathrm{ initial}}$

From Tables IIII, both DEMSs were able to meet 100% of the CL power requirements from RESs, thus satisfying the most important objective of supplying the CL completely. Moreover, I-DEMS outperformed the D-DEMS in terms of the power dispatched to the controllable load, as well as the performance of the battery storage device, which means more customer satisfaction. Increments of 5%–7% in supplying controllable load were observed with the I-DEMS, at the expense of a small amount of diesel and grid dispatches. Furthermore, I-DEMS was able to minimize the utilization of DG and grid generation as well, thus meeting the goal of self-sustainability. On the other hand, the D-DEMS used less diesel and grid energy at the expense of reduced battery life and lower satisfaction of the controllable load requirement.

As it can be observed, in all cases, the dynamically optimized I-DEMS performed superior, meeting 100% of the varying CL demand from renewable energy sources, meeting more of the controllable load, maintaining minimal transactions with the diesel generator and the grid, as well as a maintaining mean SOC level, while extending the battery life.

Table IV shows that performance of the I-DEMS on its minute dispatches, with and without the evolutionary strategy. The evolutionary strategy was able to enhance the performance by 0.11%, at an increased computational cost of 80 ms. Thus, it is expected that the evolutionary algorithm with three parents will find a better I-DEMS over a longer period of time than with a single I-DEMS. Furthermore, knowing that the scheduled dispatches are for the following minute and the necessary computation time is <1 s, it is possible to increase the population size of the evolutionary strategy and still implement the I-DEMS in real time. With a foreseen implementation on a graphic processing unit platform, it is expected that performance can be significantly improved. This remains to be investigated as part of future work.

Table IV Comparison of Performance (for 1 min of Dispatch) of the I-DEMS Controller With and Without the ModifiedEvolutionary Algorithm-Based Learning andthe D-DEMS Controller

The energy dispatched by the D-DEMS and I-DEMS for an initial SOC of 35% on the second day of operation is shown in Figs. 1315.

Fig. 13. Comparison of load dispatch of the D-DEMS and I-DEMS controllers. (a) D-DEMS (blue line) and I-DEMS (red line) meet the CL demand for the entire day. (b) I-DEMS has a smooth switching behavior of the controllable load, whereas the D-DEMS is drastic in its switching behavior of the controllable load.
Fig. 14. Comparison of generation dispatch of the D-DEMS (blue line) and I-DEMS (red line) controllers. (a) Both controllers have a similar diesel generator usage. (b) I-DEMS has a slightly greater usage (positive kilowatts represent import) of grid power.
Fig. 15. Battery dispatch comparison between D-DEMS and I-DEMS.

Fig. 15 shows that the battery power dispatched with the I-DEMS was less than 25 kW at any time, compared with over 35-kW battery power dispatched by the D-DEMS. Furthermore, the battery life cycle has clearly been improved in I-DEMS due to a fewer number of charges and discharges, thus reducing operation and maintenance costs.

The performance of both DEMSs for a 3000-kW-min battery with various initial SOCs, from 35% to 95%, was studied and shown in Fig. 16. As it can be observed, I-DEMS always performed better than the conventional D-DEMS. I-DEMS was able to smoothly and continuously accomplish higher performance, while initial SOC was increased. However, D-DEMS did not perform well until the battery SOC level reached a certain percentage. Moreover, I-DEMS has achieved a better performance (PI-LF) in terms of battery life cycle, which allows the battery to last longer and avoid additional operation and maintenance cost.

Fig. 16. Performance comparison of DEMSs (3000 kW-min).

The normalized PI was used to compare the performance of I-DEMS with D-DEMS. Fig. 17 shows the normalized PI for different battery sizes and initial SOCs. It is observed for all possible combinations I-DEMS resulted in a greater than one per-unit normalized PI. In other words, I-DEMS provides better performance under different operating conditions independent of battery size and initial SOC.

Fig. 17. Normalized PI [PI using (8)] for different battery SOCs and sizes.
SECTION VII

## CONCLUSION

The development of an I-DEMS for smart microgrid operation has been presented. The ADHDP approach, based on combined concepts of adaptive dynamic programming and reinforcement learning, was utilized to evolve an optimal control policy and an approximate cost-to-go function for microgrid operation, with variable and uncertain renewable energy generation and varying CL and controllable load profiles. A modified evolutionary computing learning approach was introduced to speed up the convergence for finding near-optimal control policy and cost-to-go function during online operation as it becomes necessary to enhance performance. The performance of the I-DEMS was compared with that of a DEMS developed using a DT-based approach under seen and unseen operating conditions. The I-DEMS, while satisfying the primary goal of meeting 100% of the CL demand requirements, still managed to improve the energy dispatched to controllable loads, and its dispatch strategy extended the lifecycle of the battery. This means that microgrids of the future can be managed intelligently to be self-sustainable, reliable, and environmental friendly.

Future work will focus on extending the I-DEMS framework to include dynamic state prediction, and on carrying out a real-time implementation to coordinate the set-point active power dispatches with reactive power controls.

## Footnotes

This work was supported in part by NEC Laboratories America Inc., and the National Science Foundation under Grant ECCS 1232070, Grant ECCS 1308192, and Grant IIP 1312260.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available