A Fast and Optimal Smart Home Energy Management System: State-Space Approximate Dynamic Programming

Dynamic programming (DP) can be used to generate the optimal schedules of a smart home energy management system (SHEMS), however, it is computationally difficult because we have to loop over all the possible states, decisions and outcomes. This paper proposes a novel state-space approximate dynamic programming (SS-ADP) approach to quickly solve a SHEMS problem but with similar solutions as DP. The state-space approximations are made using a hierarchical approach, which involves clustering and machine learning. The proposed SS-ADP can generate the day-ahead value functions quickly without compromising the solution quality because it only loops over the necessary state-space. Our simulation results showed that the solutions from the SS-ADP approach are within 0.8% of the optimal DP solutions but saves the computational time by at least 20%. The paper also presents a fast real-time control strategy under uncertainty using the Bellman optimality condition and long short-term memory recurrent neural networks (LSTM-RNN). The Bellman equation uses the day-ahead value function from the SS-ADP and the instantaneous contribution function to make fast real-time decisions. The instantaneous contribution is calculated using the PV and load predicted using LSTM-RNN, which performs significantly better than the widely used persistence method.


I. INTRODUCTION
A Smart home is an automated residential building that uses distributed energy resources for managing energy consumption and providing suitable levels of comfort to it's inhabitants. The number of smart homes with photovoltaic (PV) and battery storage systems has increased dramatically in some parts of USA, China, Australia, New Zealand and Europe, in response to rising electricity costs, government incentives, decreasing installation costs, and growing concerns about climate change.
In order to maximize the benefits of PV-storage systems, residential energy users will use a smart home energy management system (SHEMS) to schedule their energy use.
The associate editor coordinating the review of this manuscript and approving it for publication was Guangya Yang .
A SHEMS consists of a day-ahead planning and real-time decision making stages as depicted in Fig. 1. The day-ahead planning involves solving a stochastic sequential decision making problem using the day-ahead predictions of PV generation, load and electricity prices.
This underlying optimization problem can be solved using deterministic and stochastic mixed-integer linear programming (MILP) [1]- [9], particle swarm optimization (PSO) [10]- [14], dynamic programming (DP) [9], [15]- [17], approximate dynamic programming (ADP) [15], [16], [18]- [23] and policy function approximations using machine learning [24]. Solving the day-ahead planning problem using MILP and PSO means that we have to either resolve a difficult optimization problem during the real-time decision making process when uncertainties regarding PV, load and price arise, which will require a high computational VOLUME 8, 2020 This power. Moreover, these off-the-shelf solvers might not result in a solution for some constraints. For example, the SHEMS parameters that are set by the user can result in constraints to the optimization problem that have no possible solution.
On the other hand DP and ADP generate value functions and value function approximations, respectively, in the day-ahead planning stage, which lets the user make fast real-time decisions using Bellman optimality condition, which is a much faster process compared to having to solve a difficult optimization problem. However, DP is computationally difficult due to dimensionalities of state, decision and outcome spaces and forward ADP results in slightly lower quality solutions. Note that DP results in close-to-optimum solutions when the state, decision, and outcome spaces are finely discretized, and moreover, we can consider non-linear constraints such as battery degradation and efficiencies without any additional computational burden.
A SHEMS should be computationally inexpensive to be integrated into an existing smart meter or a raspberry pi board. Implementing a SHEMS in a cloud would mean that the computational speed would become less important, however, since the computational time of DP increases exponentially when we increase the number of devices [16], it would be beneficial either way to keep the computational cost to a minimum level.
Given these insights, we first propose a hierarchical approach for state-space approximate dynamic programming (SS-ADP) to generate the value functions quickly but without affecting the solution quality. The trick here is to approximate the state-space of the problem so the DP only needs to loop over the necessary states. The approach proceeds as follows: First, a historical dataset of battery state-of-charge (SoC) profiles are generated under different PV generation, electrical load and price values using a deterministic solver. Second, cluster this data into different groups according to the daily battery SoC pattern. Third, identify the correct cluster before the day-ahead optimization using machine learning so the modified state-space can be generated. A machine learning based state-space approximations have been proposed in [25] to solve an energy management problem of a power plant, which uses artificial neural networks for predicting the statespace.
The real-time decision-making stage according to the Bellman optimality condition is as follows: First the value function at a given time-step from the SS-ADP is combined with the instantaneous contribution function, which is generated using the real-time forecasts of PV and load. Second, the decisions corresponding to the maximum value of this combined function is selected as the optimal decisions. This paper also proposes to forecast PV and load using long short-term memory recurrent neural networks (LSTM-RNN), which is significantly better than the widely used persistence forecasts.
The PV output and electrical load profiles used in this paper are from data [26] collected during the Smart Grid Smart City (SGSC) project by Ausgrid in New South Wales, Australia, which investigated the benefits and costs of implementing a range of smart grid technologies in Australian households [27].
Section II presents the problem formulation. The proposed SS-ADP approach is in Section III. Modeling procedure of the day-ahead SHEMS problem is presented in Section IV. Section V estimates the real-time PV and load using LSTM-RNN. The real-time control strategy using the Bellman optimality condition is explained in Section VI. Simulation results and the discussion is in Section VII. Section VIII concludes the paper.

II. SMART HOME ENERGY MANAGEMENT PROBLEM
In Section II-A, we present the general formulation of the sequential stochastic optimization problem and then Section II-B formulates our SHEMS problem in this form.

A. GENERAL SEQUENTIAL STOCHASTIC OPTIMISATION PROBLEM
A sequential stochastic optimization problem consists of: K and k represent the total number of time-steps and a particular time-step in the decision horizon, respectively.
The state variables contain the information that is necessary and sufficient to make the decisions and compute rewards, costs and transitions. The compact form of the transition functions is given as s k+1 = s M (s k , x k , ω k ). Note that we only need transition functions for the controllable devices and the combined random variables vector of the non-controllable inputs is given by ω k (without a superscript).
• An objective function: where C k (s k , x k , ω k ) is the contribution (i.e. reward or cost of energy, or a discomfort penalty) incurred at time-step k, which accumulates over time.

B. INSTANTIATION
The aim of the SHEMS is to minimize energy costs over a decision horizon. In general, stochastic optimization techniques solve this sequential stochastic optimization problem before the beginning of each day, using either a daily or a two day decision horizon. Here we consider a smart home with a PV and battery, as depicted in Fig. 2 for convenience, however, the developed algorithm in Section III can be used for smart homes with other distributed energy resources. An effective SHEMS needs to incorporate variations in electricity price, electrical load and PV output in the optimization. We model the stochastic variables using their mean as state variables and the variation as random variables. Electricity price is assumed to be available in the form of time-ofuse pricing.
Now we cast our SHEMS problem as a sequential stochastic optimization problem according to Section II-A as follows. The daily decision horizon is a 24 hour period, divided into 30 minute time intervals giving a K = 48 time-steps. Similarly, we do this for the two day decision horizon. We choose the 30 minute resolution as it matches with a day-ahead market resolution, and also the PV generation and load data are only available at 30 minute intervals.
The non-controllable inputs are the electrical load, PV output and electricity price, which are represented using: • State variables for the mean electrical load, s d,e k , mean PV output, s pv k , and electricity tariff, s p k . • Random variables for the variations in electrical load, ω d,e k , and variations in PV output, ω pv k . The controllable device is the battery, which is represented using: • State variables for the battery SoC, s b k . • Control variables for charge and discharge rates of the battery, x b k . Given this, , are defined for each time-step, k, in the decision horizon, as depicted in Fig. 2.
• The energy balance constraint is given by: where: µ i is the efficiency of the inverter (note that the efficiency is 1/µ i when the battery is charging); µ b is the efficiency of the battery action corresponding to either charging or discharging; and x g k is the electrical grid power. The charge rate of the battery is constrained by the maximum charge rate x b+ k ≤ γ c (i.e. charging is positive) and discharge rate of the battery is constrained by the maximum discharge rate where l b (s b k ) models the self-discharging process of the battery. The battery SoC transition function is a non-linear function of state. The discharge efficiency of the battery, efficiency of the inverter and battery degradation constraints are non-linear and are given in Fig. 3 and [16]. The number of battery cycles that a battery can have decreases exponentially as the depth-of-discharge increases [28]. The remaining device characteristics are as follows: the charging efficiency of the battery is µ b+ = 1; the battery capacity is 10 kWh; and the maximum charge and discharge rates of the battery are 4 kWh. The self-discharging losses of the battery per half-hour is 0.005 s b k . The battery is operated between 10% to 100% SoC. The optimal policy, π * , is a choice of action for each state π : S → X , that minimizes the expected sum of future costs over the decision horizon; that is: where C k (s k , x k , ω k ) is the cost incurred at a given time-step, which is given by: Here the problem is formulated as an optimization of the expected contribution because the contribution is generally a random variable due to the effect of ω k . In all the SHEMSs, we obtain the battery decisions x b k , depending on the state variables

III. A HIERARCHICAL APPROACH FOR STATE-SPACE APPROXIMATE DYNAMIC PROGRAMMING
This section first explains DP and then presents the hierarchical approach used to make state-space approximations that improve the computational burden of DP.

A. DYNAMIC PROGRAMMING (DP)
The problem in (4) is easily cast as a Markov decision process (MDP) due to the separable objective function and Markov property of the transition functions. Given this, DP solves the MDP form of (4) by computing a value function V π (s k ). This is the expected future cost of following a policy, π, starting in state, s k , and is given by: An optimal policy, π * , is one that minimizes (4), and which also satisfies Bellman's optimality condition: The expression in (7) is typically computed using backward induction, a procedure called value iteration, and then an optimal policy is extracted from the value function by selecting a minimum value action for each state. This is the key functional point of difference between DP and stochastic MILP. DP enable us to plan offline by generating value functions for every time-step. Once we have the value functions, we can make faster online solutions using (7) (more details are towards the end of this section). Note that a value function at a given time-step consists of the expected future cost from all the states. This process of mapping states and actions is not possible with stochastic MILP. An illustration of a deterministic DP using a simplified model of a battery storage is shown in Fig. 4. At every time-step, there are three battery SoC states (i.e. highest, middle, and lowest) and three possible battery actions that results in different instantaneous costs. At the last time-step, k = K , the expected future cost from the desired state, s k = M, is zero, while the other two states are penalised with a large cost. This is an important step that allows us to control the end-of-day battery SoC (discussed in Section V). The expected future cost at every possible state is calculated using (7), which is the minimum of the combined instantaneous cost that results from the decision that we take and the expected future cost from the state we end up at the next timestep. In Fig 4, instantaneous cost is on the edges of the lines while the expected future cost is below the states. An optimal policy is extracted from the value functions by selecting a minimum value action for each state using (7). For example, from s b 1 , if we take the optimal decision to go to s b 2 = L then the total combined cost of 10 consists of a instantaneous cost of 2 and a expected future cost of 8. Even though the expected future cost of 7 from s b 2 = M is lower than the expected future cost from s b 2 = L, the instantaneous cost that takes us there is 4 so the total combined cost is 11. Given this, the expected future cost of following the optimal policy from s b 1 is 10 and at time-step 2 we will be at s b 2 = L.  (7). The instantaneous contributions from the battery decisions are on the edges of the lines while the expected future cost is below the states. The optimal policy satisfies (4), which is obtained using (7).
There are several reasons to prefer DP over MILP. First, DP produces close-to-optimal solutions when the value functions obtained during the offline planning phase are from finely discretized state, action and outcome spaces. Second, in practical applications, the SHEMS can make real-time decisions using the policy implied by (7). This means that at each time-step, the optimal decision from the current state can be executed. Note that (7) is a simple linear program at each time-step so is computationally feasible using existing smart meters. This stands in contrast to a stochastic MILP formulation, which would involve solving the entire stochastic MILP program, which is computationally difficult even for the offline planning. Third, we can always obtain a solution with DP regardless of the constraints and the inputs while MILP fails to find a solution when the constraints are not satisfied.
However, the required computation to generate value functions using DP grows exponentially with the size of the state, action and outcome spaces. This paper overcomes the computational burden of DP using state-space approximations, while maintaining the benefits of DP.

B. AN HIERARCHICAL APPROACH FOR STATE-SPACE APPROXIMATIONS
This section presents the proposed hierarchical approach used to make state-space approximations. The aim is to only loop over the necessary states, so the solution quality stays the same as DP. Here only the state-space reduction of battery SoC is considered, however, the approach can be used for other controllable distributed energy resources (DER).
The hierarchical approach shown in Fig. 1(b) proceeds as follows: • First historical battery SoC profiles over a day are generated by solving a deterministic optimization problem by assuming perfect foresight. This can be done using any optimization technique as this is only needed to be done once.
• Second cluster these historical battery SoC profiles into a range of patterns using a k-means algorithm. Attributes used are the average battery SoC values at certain periods of the day.
Next, in the day-ahead planning stage, use the day-ahead electrical load and PV forecast to generate the day-ahead battery SoC profile by solving the SHEMS problem. Here any optimization technique can be used because the battery SoC values are only needed to be accurate enough to correctly identify the cluster that it belongs. This classification is done using k-nearest neighbors algorithm. Finally, this chosen cluster is used to obtain the modified state-space for our stochastic DP approach where a finer discretization is used for the battery SoC values that are in the chosen cluster. Fig. 5 shows the upper and lower limits of the battery SoC over a day generated using the proposed approach. This area is finely descretized so the SS-ADP results in the same battery SoC profile as DP.

IV. MODELING THE DAY-AHEAD SHEMS PROBLEM
This section models the day-ahead SHEMS problem using the proposed SS-ADP and DP.
The stochastic inputs to the day-ahead optimization (formulated in Section 2 I-II), PV output and electrical load, are estimated using the clustering and kernel regression-based approach proposed in [29]. In summary, the approach proceeds as follows: First historical electrical load and PV generation data are clustered according to the load pattern and total daily PV output, respectively. The PV generation clusters represent day-types such as sunny days, cloudy days etc. The electrical load clusters are defined in a similar way but also using the time of the energy use as an attribute. The median values in each cluster is the predicted profile. Second, the probability density functions within these clusters are determined using kernel regression. Finally, we can use a less accurate forecast to choose the correct PV and load clusters for a given day. An example of these models is shown in Fig. 6. The median profile of a chosen cluster is the predicted profile for a deterministic optimization while the probability models are used for the stochastic optimization. The day-ahead value function approximations in Fig. 7(a) are generated using the proposed SS-ADP approach. The example shows the expected future contribution for following the optimal policy from time-steps k = 2, k = 11, k = 30 and k = 35 vs. battery SoC. Note that the expected future contribution increases with the battery SoC but the instantaneous contribution either decreases or stays the same with the increasing battery charge rate. The instantaneous contribution for battery charging using solar energy is zero. Given the high uncertainties associated with the day-ahead PV and load forecasts, the schedules generated from the day-ahead optimization have large errors. Moreover, solving the entire optimization problem at every time-step would require expensive computational power. To overcome this, [16] proposes to use the day-ahead value functions from DP and ADP to generate real-time decisions according to the Bellman optimality condition (more details in Section VI). In order to accurately generate the instantaneous contribution function, SHEMS needs to forecast the PV and load at the next time-step.

V. REAL-TIME FORECASTS OF PV AND LOAD USING LSTM-RNN
This section proceeds as follows: First a brief review of existing solution techniques used to forecast the real-time PV and load (i.e. 30 minutes ahead forecast) is presented. Second the chosen LSTM-RNN is explained. Third and fourth sub-sections present the modeling and results, respectively.

A. BRIEF REVIEW OF EXISTING SOLUTION TECHNIQUES
The underlying PV and load forecasting problem is a time series prediction problem. First we review PV forecasting techniques, which can be categorized into physical models and data-driven models. Physical models use numerical weather prediction, which shows good performance for forecast horizons from several hours up to six days [30]. Data-driven models can be further divided into statistical and machine learning models. Statistical models include auto-regressive integrated moving average, auto-regressive moving average, coupled auto-regressive and dynamic system, Lasso, and Markov models. Machine learning models include support vector machine, feed-forward neural networks, and RNN such as LSTM networks [31]- [33]. These approaches can be further subdivided according to the type of input features that are used to train the model. A forecasting model that uses only a target time-series as an input feature (solar irradiance in this case) is referred to as a nonlinear auto-regressive (NAR) model. On the other hand, if a model uses additional exogenous inputs, such as temperature and humidity, it is referred to as a nonlinear auto-regressive with exogenous inputs (NARX) model. According to [31], [32], LSTM-RNN performs the best at forecasting PV generation.
Similar approaches have been used to forecast residential load such as feed forward neural networks, support vector machine, and RNN such as LSTM networks and gate-recurrent units [34], [35]. According to [34], [35], LSTM-RNN performs the best at forecasting residential load. Given this, this paper focusses on LSTM-RNN for forecasting both the 30 minute ahead PV generation and electrical load.

B. LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
LSTM networks are a special type of RNN that are capable of learning long-term dependencies. Fig. 8 illustrates the RNN and Fig. 9 the LSTM cells [31]. Note that the variables defined are only applicable to this section. A RNN has a feedback connection where the output depends on the current input to the network, and the previous inputs, outputs, and/or hidden states of the network, as depicted in Fig. 8. Given an input sequence x = x 1 . . . x t . . . x T , RNNs compute the hidden vector h = h 1 . . . h t . . . h T , and output vector sequence y = y 1 . . . y t . . . y T by iterating (8) and implementing the calculation shown in (9).
where H is the activation function of the hidden layer. W xh , W hh and W hy are the weight matrices of the input-hidden, The forget gate f t outputs a 0 or 1 for each number in cell state to decide what information we want to put out from the previous cell state c t−1 , according to: where σ is the sigmoid activation function. W f and b f are the weight matrix and bias of the forget gate, respectively. Similarly, the input gate output i t decides the new input information that should accumulate in the memory cell: where W i and b i are the weight matrix and bias of the input gate, respectively. The LSTM cell state is then updated as follows, but with conditional self-loop weights W c and b c : The output hidden state h t of the LSTM cell depends on the cell state c t and the output gate o t : The output gate o t is calculated using: where W o and b o are the weight matrix and bias of the output gate. Note that the hidden state h t can be shut off via the output gate o t , which uses a sigmoid activation function.
Steps one and two are done day-ahead while step three is done in real-time. Historical data are separated into training and testing sets. Note that only one feature is used for predicting PV output and load (i.e. we implemented a NAR model). Two years of data are used for training while one year is used for testing.

D. RESULTS
The real-time 30 minute-ahead forecasts of the PV output and electrical load generated using LSTM-RNN are compared with the widely used persistence approach where the forecast for the next time-step is the current value. Table 1 compares the two approaches using the rootmean-square error (RMSE). The PV generation forecasts from LSTM-RNN are significantly better than the persistence approach while the electrical load forecasts are only slightly better. The LSTM-RNN forecasts of the electrical load can be further improved if we consider the number of inhabitants and their behavioral patterns as features of the LSTM model (i.e. a NARX model). Unfortunately, the dataset used in this paper only consists of the aggregated load profiles so we were unable to show the complete benefits of using LSTM-RNN for electrical load prediction.

VI. REAL-TIME DECISION MAKING USING BELLMAN OPTIMALITY CONDITION
This section explains the real-time decision making process using the Bellman equation (7). In the real-time decision making process, the real-time forecasts of the PV and electrical load values are used to update the instantaneous contribution function (example in Fig. 7(b)) according to (5). Next, this instantaneous contribution function is added to the day-ahead value function of the next step that was generated during the offline planning stage. Note that there will be a value function VOLUME 8, 2020  for every time-step in a day. Finally, the optimum battery decision for the next time-step is the battery decision corresponding to the maximum value of this combined function (i.e. Bellman optimality condition).

VII. SIMULATION RESULTS AND DISCUSSION
This section presents the simulation results from the proposed SS-ADP approach and DP in Table 2 for two households over a year. The yearly electricity cost from the proposed SS-ADP approach is within 0.8% of the close-to-optimal DP solution but the computational time is improved by over 20% because the SS-ADP only loops over the necessary states. Moreover, the computational time of SS-ADP only increases linearly when we incorporate additional DER while DP results in an exponential increase. The computational time for Household B is higher than the Household A because Household B has a larger PV system and a higher electrical load profile so the number of states required is also higher.
The total yearly electricity cost of Household 1 and 2 can be reduced by 45.85% and 39.72%, respectively by having a PV-battery system that is controlled by SS-ADP. The number of deep battery cycles that greatly affects the battery lifetime is also less as shown in Fig. 10. The benefits of a SHEMS has been very well studied so our focus here is to propose this computationally efficient algorithm that provides similar quality solutions as the DP.

VIII. CONCLUSION
This paper proposed a novel state-space approximate dynamic programming approach for solving a smart home energy management system problem quickly without compromising the solution quality of dynamic programming. The day-ahead value function approximations generated from this approach can be used to make fast real-time decisions using photovoltaic generation and electrical load forecasts generated from long short-term memory recurrent neural networks.

ACKNOWLEDGMENT
(Zhiheng Zhao and Chanaka Keerthisinghe contributed equally to this work.)