A Data-Driven Home Energy Scheduling Strategy Under the Uncertainty in Photovoltaic Generations

To address the uncertainty in photovoltaic (PV) outputs for day-ahead home energy scheduling in hourly timescale, a novel stochastic optimization strategy based data-driven method is proposed. Based on available historical PV outputs, the Gaussian mixture model (GMM) algorithm combined with improved prediction strength of clustering method is applied to establish the forecasted probabilistic PV outputs model. Based on the seven-step approximation model of Gaussian distribution, only PV outputs with larger probability level at each hour are used to generate scenarios. Then the typical scenario set can be constructed by scenario reduction method. By finding the solution in typical scenario set using mixed-integer nonlinear programming (MINLP), the scheduling strategy will be closer to real cases. Test results verify the effectiveness of proposed probabilistic PV output model and solution method.

Heat power generated by Heater.

K Hea
Auxiliary binary variable, 1 when in winter, 0 otherwise. u t a Auxiliary binary variable for the appliance a, a ∈ A cr ,1 when operating, 0 otherwise. u t b Auxiliary binary variable for the appliance b, b ∈ A ca ,1 when operating, 0 otherwise. P t b Operation power of the appliance b, b ∈ A ca at hour t. P t Grid,B (ω) , P t Grid,S (ω) Power (kW) brought from and sold back to the grid at hour t, t ∈ (1, 2, . . . , 24) in scenario r t e,B , r t e,S , respectively.

I. INTRODUCTION
The day-ahead home energy scheduling in hourly timescale makes great contribution to reducing residential energy cost [1], [2]. However, the increasing integration of grid-tied photovoltaic (PV) units into households has made achieving home energy scheduling strategies a challenging task in home energy management strategy. The main problem is how to address the stochastic nature of PV outputs.
To tackle this problem, a variety of optimization methods have been proposed. They can be roughly classified into three types: deterministic optimization methods, stochastic optimization methods, and robust optimization methods. The deterministic optimization methods are initially applied to this problem. By assigning a constant value to PV output, various optimization methods are applied to solve this problem [3]- [5]. Although deterministic optimization methods are easy to be implemented, the random nature of PV outputs is not considered, thereby potentially yielding approximate optimal solutions. The robust optimization methods [6], [7] can tackle the uncertainty in PV outputs, but they are conservative for more safety operation. However, PV units in households are small scale and have little effect on safety and reliability in operation. Thus robust optimization methods may increase residential cost due to their conservation. Since the stochastic optimization methods [12]- [14], [16]- [21] can address uncertainty in PV outputs rather than conservation as in robust optimization methods, they have been widely applied to the day-ahead home energy scheduling.
The stochastic optimization methods employ scenario generation technique which converts the stochastic optimization problem to a deterministic optimization problem. The most widely used methods are those based Monte Carlo simulation (MCS) technique. There are two problems for MCS technology. One is that the specified probability distribution functions (PDFs) such as Beta, Weibull, Gaussian distribution [8]- [10] cannot reflect the multimodal nature of PV outputs. The other is that the number of scenarios generated by MCS is enormous [11], which is time consuming.
To address the first problem, Ref [12] employs the Gaussian mixture method which combining several Gaussian distributions to forecast 15-min PDFs of PV outputs by using the higher-order Markov chain. Though ref [12] reflects the multimodal nature of PV outputs, the day-ahead hourly PV output forecasting is not considered. Moreover, how to determine the number of Gaussian distribution in Gaussian mixture model (GMM) is not investigated, which is a key factor in the accuracy of Gaussian mixture method.
To address the second problem, many scenario reduction methods have been proposed. Reference [13] employs MCS to only generate 10 different scenarios. Reference [14] assumes that the solar irradiance follows the Beta distribution and generates 1000 scenarios initially. Then the only ten scenarios are reserved by eliminating the scenario with very low probability and aggregating scenarios of close distances based on specific probability metric. Finally, the optimal solution is found within these ten scenarios. Reference [15] models solar irradiance by Beta distribution and generates 1500 initial scenarios. The simultaneous backward reduction method [16] is employed to merge similar scenarios. Reference [17] assumes that the solar irradiance followed a two-parameter Beta distribution. Wasserstein distance metric and K-medoids-based scenario generation and reduction techniques are employed. Besides MCS methods, other scenario generation methods have also been studied recently. Reference [18] employs quantile forecast results for both PV output and load. The median (quantile 50%) is taken as central forecast. The quantile 10% and 90% are taken as lower and upper bounds. Thus 9 scenarios are generated by combination of these values. Reference [19] employs a nonparametric data-driven forecast for solar irradiance and provides 5 scenarios by employing K-nearest neighbor procedure. Reference [20] employs K-Means clustering to classify the available historic PV outputs. Six clusters i.e. scenarios are generated. The methods in [13]- [20] obviously improve calculation efficiency. However, the reserved scenarios still depend on the initial scenarios. Only when the number of initial scenarios is enormous, the reserved scenarios can follow specified PV probability distribution. But how many it is cannot be determined for different cases. So the initial scenarios cannot guarantee the really reflection on the actual probability distribution of PV outputs. If the initial scenarios are mainly consisted of PV outputs with smaller probability level in PV probability distribution, the reserved scenarios will include fewer scenarios with larger probability level.
To include the scenarios with larger probability level, ref. [21] employs a two-point estimate method to model the uncertainty in PV outputs, which only generate larger probability level scenarios. However, for day-ahead scheduling in hourly timescale, there are only 48 scenarios, which is too small to cover scenarios with larger probability level.
To construct approximately PDFs of PV outputs more precisely, the multi-model probability distribution is constructed by employing GMM algorithm combined with improved prediction strength of clustering method. Different from the existing scenarios generation methods, the proposed method can directly generate scenarios with larger probability level. The main novelty and contributions of this paper are twofold: 1. Based on available historical PV outputs, a data-driven forecasted probability distribution of PV outputs is proposed. Instead of using one PDFs to describe the probability distribution of PV outputs as in existing studies, this method employs multiple PDFs and will lead to more precise results.
2. Based on the seven-step approximation model of Gaussian distribution, only PV outputs with larger probability level at each hour are used to generate scenarios. Thus by scenario reduction method, typical scenario set can be constructed. As a contrast, too many smaller probability level scenarios may be generated in MCS when the number of scenarios is not larger enough. This make the scheduling scheme not meet the real cases.
The rest of the paper is organized as follows. Section II describes the framework of a domestic energy system. Section III describes the methods to deal with uncertainty in PV outputs. Section IV describes the optimization model. The numerical results are provided in Section V, and the conclusions are drawn in Section VI.

II. FRAMEWORK OF A DOMESTIC ENERGY SYSTEM
The framework of a domestic energy system is shown in Fig.1. The components consist of general home appliances, thermal load, BESS, PV, CHP, smart meter and energy scheduler. The meter 1 is a two-way electric meter between the household and the grid, which records and sends energy consumption data to the energy scheduler. The meter 2 records the PV outputs and sends them to energy scheduler.

III. THE METHOD TO DEAL WITH UNCERTAINTIES IN PV OUTPUTS A. PROBABILISTIC PV MODEL
The available historical PV outputs can be classified into different categories in terms of weather conditions and seasons. For each category, the PV outputs of the same hour on different days can be merged to construct a dataset. The data in a dataset can be clustered to form different clusters. Before clustering, the number of cluster must be determined in advance. Among all clustering methods, only hierarchical clustering and the prediction strength of clustering method [22], [23] can determine the number of cluster. However, hierarchical clustering is time-consuming when there are enormous data. So the prediction strength of clustering method is applied to determine the number of clusters in this paper.
The PV outputs in a dataset is random divided into two subsets the training subset X tr and the test subset X te . Let K denote the number of clusters. The k-means algorithm is applied to cluster the data in X tr and X tr into K clusters respectively. Denote this clustering operation as C (X tr , K ) and C (X te , K ). For a candidate number of clusters K , let A K 1 , A K 2 , . . . , A KK be the indices of the clusters 1,2,.. K in X te . Let n Kj denote the number of PV outputs in A Kj . Now, we judge that the PV outputs PV outputs x i and x i fall into the same cluster in C (X tr , K ), and zero otherwise.
The prediction strength of clustering C (·, K ) is calculated by For each test subset X te , we compute proportion of observation pairs x i = x i ∈ A Kj which are also assigned to the same cluster in X tr . The prediction strength ps(K ) is the minimum of this quantity over the K test clusters. VOLUME 8, 2020 If K = K 0 , the true number of clusters, then the K clusters in subset X tr will be similar to that in subset X te , and hence will predict them well. Thus ps(K ) will be high.
Let K max denote the maximum value of K . Then we have ps(K 0 ) = max {ps(1), ps(2), . . . ps(K max )}. Since the number of clusters is usually not too large, K 0 can be determined by the exhaustive method.
After K 0 is determined, GMM [24] is applied to model PV outputs in a dataset. By applying GMM, Gaussian PDF is placed on each cluster in a dataset. Compared to k-means algorithm, GMM not only points out if a sample belongs to a certain cluster but also gives the probability of belonging to a certain cluster.
Let µ k denote the mean value and δ k denote the covariance of kth,k ∈ [1, 2, . . . , K 0 ] Gaussian distribution. Let p (x|µ k , δ k ) denote the probability of PV output x under kth Gaussian distribution. GMM indicates the sum of p (x|µ k , δ k ).
where π k denotes the weight factor subjecting to The determination process of parameter π k , µ k and δ k in Eq.(2) is as following: 1) Initialize parameters π k , µ k and δ k ; Let N denote the total number of PV outputs. 2) Calculate the probability of given PV output x i belonging to cluster k under parameters π k , µ k and δ k as following: 3) Update parameters π k , µ k and δ k by the following equations: 4) Repeat step 2) and 3) until the relative difference of values of the parameter π k , µ k and δ k in two consecutive iterations are below a threshold respectively. For example, when K 0 = 3, the GMM of PV outputs in one hour can be shown by Fig. 2. Since the existing PDFs can only describe unimodal or bimodal probability distributions, the PV outputs with trimodal probability distributions shown in Fig.2 can only be described by GMM.

B. THE METHOD TO GENERATE TYPICAL SCENARIOS
Although MCS and backward reduction method is a general method to generate scenarios, but it is complex and time consuming. Moreover, when the samples are not large enough, the number of scenarios with larger probability level may be smaller. Then the reserved scenarios will not be representative. Though the scenarios generated by the two-point method are those with larger probability level, the number of scenarios is too small to reflect the true cases. To address this problem, based on seven-step distribution model (0, ±1δ, ±2δ, ±3δ) [25] for Gaussian distribution as shown in Fig.3, where δ is the standard deviation, this paper develops a method which can directly generate typical scenario set. Such a model is considered sufficient since it encompasses more than 99% of the Gaussian distribution.
The detailed procedure is as follows: Step 1: The available PV outputs within the same hour of different days are merged in terms of weather conditions and seasons.
Step 2: The multi-probability distribution of PV outputs at each hour is constructed by employing GMM algorithm and the prediction strength of clustering method.
Step 3: For each Gaussian distribution, the sevenstep approximation model is applied to divide it into seven intervals, and each interval is assigned a probability value. Then the PV outputs for all intervals are (u − 3δ, u − 2δ, u − δ, u, u + δ, u + 2δ, u + 3δ) and their corresponding probability values p i , i = 1, 2, . . . , 7 are (0.006, 0.061, 0.242, 0.382, 0.242, 0.061, 0.006). When the selected probability of the kth Gaussian distribution is π k , the occurrence probability level of the ith interval is π k p i .
Step 4: when an interval in a probability distribution is selected for each hour in the scheduling horizon, a scenario is generated. The probability level of this scenario is calculated by 24 t=1 π t k p t i .
Step 5: All the scenarios with larger probability level can be generated by repeating Step 4 when the selected intervals at each hour are all the those with larger probability level.
Step 6: The typical scenario set can be constructed by the simultaneous backward reduction method for the scenarios generated at step 5.

IV. OPTIMIZATION MODEL A. BESS MODEL
Eq. (7) and constrain (8) establish the definition and limits of state of charging (SOC) of BESS at hour t.
where E t BESS denotes the energy of BESS at hour t, t ∈ [1,24]. C BESS denotes the nominal capacity (kWh) of BESS. SOC min BESS and SOC max BESS denote the minimal and maximal value of SOC, respectively.
where P t char and P t disc denote the charging and discharging rate at hour t, respectively. t represents time duration, t = 1. η c and η d denote the charging and discharging efficiency, respectively. u t BESS denotes an auxiliary binary variable at hour t. 1 when BESS is charging, 0 otherwise.
Eq. (10) assumes that initial energy is equal to the final value.
Constrains (11) and (12) establish the bound of charging and discharging power.
where P max char and P max disc denote maximal bound of the charging and discharging rate (kW), respectively. Fig.4 shows a CHP configuration consisting of Fuel Cell (FC) and Heater [26], [27].

B. CHP MODEL
FC generates power P t FC for electricity and thermal load demand. where P t FC,th denotes the thermal load demand (kW). P t FC,e denotes electricity load demand (kW). K FC,th is an auxiliary binary variable, 1 when in winter, 0 otherwise. Constrain (14) establishes the limits of P t where P min FC and P max FC denote the maximum and minimum limits of P t FC , respectively. The heat power generated by Heater P t Hea depends on the power of P t where η t denotes gas to thermal conversion efficiency. η e denotes gas to power conversion efficiency.

C. THERMAL LOAD
Eq. (16) establishes the relationship between indoor and outdoor temperature at the hour t and t + 1 where ε denotes factor of inertia. COP denotes coefficient of performance. A denotes thermal conductivity W /m · • C. T t in and T t out denote indoor and outdoor temperatures (in • C) at the hour t, respectively. The symbols (+) and (−) denote heating and cooling process, respectively.
As shown in Fig.4, the thermal load demand supplied by both FC and Heater is expressed as P t Ther,L = K FC,th P t FC,th + K Hea P t Hea (17) where K Hea denotes an auxiliary binary variable, 1 when in winter, 0 otherwise.

D. GENERAL HOME APPLIANCES MODEL
Let P a denote the power of the appliance a. Constrain (18) establishes the limits of P a 0 ≤ P a ≤ P max a (18) where P max a denotes the maximal value of P a . The energy consumption vector P a within the 24-hours scheduling horizon is expressed as P a = u 1 a , u 2 a , . . . , u t a , . . . , u 24 a · P a (19) where u t a is an auxiliary binary variable,1 when operating, 0 otherwise. VOLUME 8, 2020 Let E a denote the total energy consumption of the appliance a across the 24-hours. Given the beginning and ending operation time interval α a , β a , we have β a t=α a u t a P a = E a (20)

E. OBJECTIVE FUNCTION
Based on the typical scenario set, this stochastic optimization problem can be converted into a deterministic scenario-based optimization problem.
Let ω and π (ω) denote a scenario and its probability level, respectively. Let denote the set of typical scenarios. As in [28], the objective function aiming at minimizing the energy costs of a household one-day ahead in hourly timescale can be formulated as where P t Grid,B (ω) and P t Grid,S (ω) denote the power (kW) brought from and sold back to the grid at hour t, t ∈ (1, 2, . . . , 24) in scenario ω, respectively. r t e,B and r t e,S denote electricity price ( /kWh) bought from and sold back to the grid at hour t, t ∈ (1, 2, . . . , 24), respectively. r g denotes the price of natural gas per kWh ( /kWh).
Constrains (22) and (23) establish the limits of decision variable P t Grid,B (ω) and P t Grid,S (ω), respectively 0 ≤ P t Grid,B (ω) ≤ P max where, P max Grid,B and P max Grid,S denote the maximum power bought from and sold to the grid, respectively.
For BESS, the decision variables are SOC t BESS , E t BESS , u t BESS , P t char and P t disc in Eq.(7)-Eq. (12). For CHP, the decision variables are P t FC , K FC,th , P t FC,th , P t e,th and P t Heat in Eq.(13)-Eq. (17). For general home appliances, the decision variables are P a , u t a in Eq.(18)-Eq. (20). Since t ∈ (1, 2, . . . , 24), thedimensions of each decision variable is 24 in this paper.
It can be seen that this is a mixed-integer nonlinear programming (MINLP) problem, which can be solved by using Cplex solver in Gams/Cplex software.
It should be noted that the solution of this deterministic scenario-based optimization problem is the scheduling scheme for typical scenario set. The solution is also the scheduling scheme for the scenarios with larger probability level. Since the scenarios with larger probability level have representative, the solution may be the optimal or near optimal scheduling scheme for day ahead home energy scheduling.

V. CASE STUDIES
The case is solved by a PC with an Intel Core i7-4790 3.6 GHz CPU and 4 GB RAM.

A. SIMULATION DATA
The home appliancesconsist of refrigerator-freezer, oven, water heater, pool pump, lighting (10 standard bulbs), dishwasher, clothes washer, clothes dryer, and PHEV. The detailed parameters from [1,29] are listed out in Table 1. The parameters of BESS from [30] are listed out in Table 2. The parameters of CHP from [31] are listed out in Table 3. The time-of-use electricity rate in Shandong province in China is adopted. During the peak period 8:00-22:00, the electricity rate is 0.5769 /kWh. During the off-peak period 01:00-7:00 and 23:00-24:00, the electricity rate is 0.3769 /kWh. The feed-in tariff is 0.37 /kWh. The natural gas rate is 0.3323 /kWh. The outdoor temperatures at each hour for the scheduling day are listed out in Fig.5. The required indoor temperature is set to be 19 • C and thermal load demands at each hour calculated by equation (17) are also illustrated in Fig. 5.
The rated capacity of PV unit is 5 kW, and the historical PV outputs from January 2018 to March 2019 is obtained from State Grid Shanghai Municipal Electric Power Company.

B. SIMULATION RESULT
Since the scheduling day in this example is a sunny day in winter, the historical days such as sunny day in winter are selected. The PV outputs within the same hour in these days are clustered to construct 24 datasets. When 61 days are selected and the PV outputs are recorded every 15 minutes, each hourly dataset has 244 PV outputs. For each dataset, the prediction strength of clustering method and the GMM is performed, and the results are listed in Table 4.  Table 4, after the Gaussian PDFs whose weight factor is less than 5% are ignored, there are two Gaussian PDFs at 7:00, 10:00,11:00, 14:00,15:00,17:00 and 18:00, and there are three Gaussian PDFs at 8:00,9:00,12:00,13:00 and 16:00. These results suggest that the existing Beta, Weibull, and Gaussian distribution maybe not precisely describe random PV outputs. So Table 4 demonstrates that the accuracy of the proposed model for PV outputs relative to the existing unimodal and bimodal probability distributions.

Aslisted in
The weight factor π k is given in Table 5. When π k is given in Table 5, the probability level of ith interval can be determined by p i π k . The intervals with the first two probability level are listed in Table 6. Based on Table 6, each hour has two alternated PV outputs. Since these values have larger probability level, the scenarios generated by them are also have larger probability level. The number of generated scenarios is 2 12 = 4096. Then 10 scenarios are reserved in typical scenario set by backward reduction method [16].
The minimal energy cost is 28.77 for the typical scenario set, and its corresponding scheduling scheme of the appliances except for BESS is shown in Fig. 6.
As shown in Fig.6, all controllable appliances except for PHEV operate from 9:00 to 18:00 when there are plenty of PV outputs. This is because consuming PV outputs will decrease more cost in comparison with selling them to the main grid. The clothes washer operates at time 13:00 and its energy consumption is 1.94 kWh. The clothes dryer operates at time 14:00 and its energy consumption is 2.5kWh. The dishwasher operates at time 12:00 and its energy consumption VOLUME 8, 2020  is 1.44kWh. The oven operates at time 11:00, 18:00 and its energy consumption is 1.005kWh, 1.005kWh. The water heater operates at time 10:00, 12:00 and its energy consumption is 0.925kWh, 0.925kWh. The pool pump operates at time 9:00, 13:00 and its energy consumption is 0.9kWh, 0.9kWh. The PHEV operates during the off-peak period when the electricity price is the lowest due to its operational constraints in Table 1. The PHEV operates at time 7:00, 23:00, 24:00 and its energy consumption is 3.3kWh, 3.3kWh, 3.3kWh.
When charging, the BESS is taken as a load. On the contrary, when discharging, the BESS is taken as a DG. The charging/discharging power of the BESS at each hour is shown in Fig.7.
As shown in Fig.7, the charging time of BESS is 11:00, 12:00, 15:00 and 16:00, when there are plenty of PV output. Since the feed-in tariff 0.37 /kWh is lower than electricity rate 0.3769 /kWh, all the charging energy is obtained from PV units instead of the primary grid. The energy of BESS will be consumed by appliances at time 18:00,19:00,20:00 and 21:00, when the electricity rate is at its higher level 0.5769 /kWh.
The refrigerator-freezer operates all the time and consumes 0.005kW at each hour. The lighting operates from 19:00 to 23:00 and consumes 0.2kW at each hour. Since the natural gas rate 0.3323 /kWh is lower than the lower electricity rate 0.3769 /kWh, the FC will operate on its maximal power limits. The heat power generated by Heater is 0.45kW.

C. RESULTS COMPARISON
It is assumed that the hourly data set follows Gaussian distribution under the MCS method and the two-point estimate method. Then these two methods are also performed for comparison. The number of initial scenarios in MCS method is 5000 and the number of reserved scenarios is 10 by backward reduction method [16].
The energy cost and computation time of different methods are shown in Table 7. As listed in Table 7, the energy cost of MCS method is different from the proposed method. This is because the reserved scenarios in MCS method are different from those in proposed method. Further, all the PV outputs in the proposed method are those with larger occurrence probability, while those in MCS method are not. This suggests that the initial scenarios are not representative when the number of them is not larger enough. To address this problem, the number of initial scenarios in MCS is increased. When the number of initial scenarios is 50000 and 500000, more PV outputs with larger occurrence probability are incorporated in the reserved scenarios, which make the energy cost closer to that in the proposed method. However, this process will obviously increase computing time.
The PV outputs at each hour in two-point method are all around the mean at each hour PV dataset. So the energy cost is closer to that in the proposed method. However, 48 scenarios in two-point method are too small to reflect the really energy cost than the proposed method.
The scheduling strategy comparisons between the proposed method and the other methods are listed in Table 8. The operation time in hours of water heater, pool pump, dishwasher, clothes washer, and the charging time of BESS is different due to different reserved scenarios set under different methods.
When the PV outputs is abundant from 11:00 to 14:00, the energy consumption of both the proposed method and MCS(500000) is 9.49kWh. However, the energy consumption of the MCS(5000 times), two point method and MCS(50000 times) is 8.43kWh, 9.43kWh and 8.43kWh, which is lower than that in the proposed method. Since the feed-in tariff is smaller than the electricity rate in this paper, the PV outputs should be consumed as much as possible to reduce energy cost when the PV outputs are abundant. So the above results suggest why the proposed method and MCS(500000) obtain smaller energy cost than MCS(5000), MCS(50000) and two point method.

VI. CONCLUSION
This paper presents a novel day-ahead home energy scheduling strategy based on data-driven method. By including two or three Gaussian distributions for hourly PV outputs dataset, the proposed PV model is more precisely than existing models using one specified probability distribution. The PV outputs at each hour in the proposed method have larger probability level than MCS under the same number of scenarios, which demonstrates that the typical scenarios generated by the proposed method is more reasonable than those generated by MCS. Moreover, the calculation efficiency of the proposed method is higher than MCS.