Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders

The implementation of efficient demand response (DR) programs for household electricity consumption would benefit from data-driven methods capable of simulating the impact of different tariffs schemes. This paper proposes a novel method based on conditional variational autoencoders (CVAE) to generate, from an electricity tariff profile combined with exogenous weather and calendar variables, daily consumption profiles of consumers segmented in different clusters. First, a large set of consumers is gathered into clusters according to their consumption behavior and price-responsiveness. The clustering method is based on a causality model that measures the effect of a specific tariff on the consumption level. Then, daily electrical energy consumption profiles are generated for each cluster with CVAE. This non-parametric approach is compared to a semi-parametric data generator based on generalized additive models and that uses prior knowledge of energy consumption. Experiments in a publicly available data set show that, the proposed method presents comparable performance to the semi-parametric one when it comes to generating the average value of the original data. The main contribution from this new method is the capacity to reproduce rebound and side effects in the generated consumption profiles. Indeed, the application of a special electricity tariff over a time window may also affect consumption outside this time window. Another contribution is that the clustering approach segments consumers according to their daily consumption profile and elasticity to tariff changes. These two results combined are very relevant for an ex-ante testing of future DR policies by system operators, retailers and energy regulators.

ξ Effect of p in the semi-parametric generator µ Mean of the power consumption x, X Vectors of exogenous variables N Number of generated samples Y Power consumption half-hourly profile p Tariff, with p ∈ P = {Low, Normal, High} Z Variable of CVAE latent space (decoder inputs)

Introduction
The deployment of smart meters, which provides access to new sources of information like 5-15 minutes resolution electrical energy consumption, makes it possible to envisage the development of new customers services Mallet et al. [2014]. For example, electricity demand response (DR) policies aim at modifying customers' energy consumption behavior (see Siano [2014] for an overview) to enable higher integration levels of renewable energy sources.
Most of these DR schemes rely on changes in electricity prices, which can take the form of seasonal tariffs, super-peak time-of-use, real-time pricing, critical peak pricing, etc. Dutta and Mitra [2017]. Recent works (see among others Alfaverh et al. [2020] and Brégère et al. [2019]) proposed online learning algorithms to optimize these price incentives, considering human preferences and satisfaction level. However, the responsiveness to a tariff change may change from a consumer to another. By clustering consumers according to their tariff responsiveness, an electricity supplier can send different signals depending on the cluster to which they belong, and further improve DR management. For instance, for a given temperature, day of the week, etc., the electricity supplier defines an hourly electricity tariff profile to send to some consumers clusters.
A energy consumption data simulator is very useful to conduct an ex-ante assessment of the algorithms that set tariff profiles (i.e., ensure that they induce the right behavior from consumers) or to study the business models of different DR models Karlsen et al. [2020] or to implement data-driven DR strategies such as contextual bandit Brégère et al. [2019]. This simulator should be able to randomly generate energy consumption profiles for different combinations of exogenous variables and tariff profiles, with consumers clustered according to their tariff responsiveness. The present paper proposes a novel method, based on conditional variational autoencoders (CVAE), which aims to randomly generate daily energy consumption profiles conditioned by a specific electricity tariff combined with weather and calendar variables.
The remainder of this paper is organized as follows. Section 2 conducts a literature review of the clustering and data generation methods applied in the energy domain and identifies the main contributions from this work. In section 3, the data set used throughout the rest of the paper is presented. The structure of our contribution is to first provide a clustering method, in Section 4. Then, the CVAE approach used to generate energy consumption profiles is presented and discussed in Section 5. In order to evaluate the proposed method, Section 6 introduces a benchmark data generator based on semi-parametric models often used for energy consumption forecasting. Section 7 presents a comparison of the two generators and simulations that illustrate the interest of our approach. Section 8 summarizes the main conclusions and identifies potential for future work.
The reproducibility of this research was ensured by applying the methodology to the open data set "SmartMeter Energy Consumption Data in London Households" from UK Power Networks UKd, where price incentives were sent to users via their smart meters, and by making the CVAE code available in a GitHub repository 1 .

Motivation: generation of daily power consumption profiles for household clusters
Since electricity is difficult to store on a large scale, its management is classically performed by anticipating demand and adjusting accordingly production. The deployment of smart meters, which provides access to new sources of information, makes it possible to envisage the development of new customers services (see Yan et al., 2013;Mallet et al., 2014). For example, electricity demand management policies aim to modify customers' energy consumption behavior, see Siano [2014] for an overview. This would allow to adjust to intermittency of renewable energies. Most of them rely on changes in electricity prices. Indeed, a higher tariff of the electricity when the electric system stability is jeopardized may induce a drop of electricity uses; and a lower tariff when electricity production is high may encourage consumption. The paper considers a demand response management system similar to the one experimented on some London households that took part in the UK Power Networks led Low Carbon London project in 2013 2 : price incentives were sent to users via their smart meters. Recent works (see among others O'Neill et al., 2010 andBrégère et al., 2019) proposed online learning algorithms to optimize the sending of these price incentives. The responsiveness to a tariff change may change from an consumer to another. By considering clusters of consumers who response in the same way, electricity provider could send different signals depending on the cluster to which they belong, and further improve demand response management. In a context similar to Low Carbon London project, for a given temperature, day of the week etc., the electricity provider defines an half-hourly electricity tariff profile to send to some costumer clusters. In order to test the algorithms which set tariff profiles (i.e., to ensure they make the right choices), a full-information data set is necessary: for each cluster, a realistic (but random) power consumption profile associated with each possible combination for exogenous variables and tariff profile must be generated. We propose a method based on conditional variational auto-encoders (CVAE) to generate a random daily power consumption profile from an electricity tariff profile combined with exogenous meteorological and calendar variables.
In the next section, we provide a quick overview of clustering techniques and of data generation methods that we know about, in the electrical field. At the beginning of section 4, we present the data set that we use throughout the rest of the paper. The structure of our contribution is to first provide a clustering method, in Section 4. It relies on using a causal model which measures the effect of a tariff on half-hour power consumption. Then, our approach is presented and discussed in Section 5: we use conditional variational autoencoders to generate power consumption profile and calibrate their hyper-parameters by grid search. To evaluate the proposed method, we introduce, in Section 6, a benchmark data generator based on semiparametric models often used for power consumption forecasting. Section 7 concludes the paper with a comparison of the two generators and with simulations which illustrate the interest of our approach.

Clustering Methods
Different clustering approaches were already proposed in the literature to segment consumers according to their energy consumption behavior. Generally, they relied on the construction of individual features from the average/total consumption and demographic factors. With the recent smart meter deployment, individual consumption records at higher temporal resolutions are now available and allow to consider energy consumption time series in consumers segmentation.
Therefore, more complex features may be extracted and used to cluster consumers with classical algorithms. Among others, Chicco et al. compared the results obtained by using various unsupervised clustering algorithms (i.e., modified follow-the-leader, hierarchical clustering, k-means, fuzzy k-means) to group together customers with similar consumption behavior Chicco et al. [2006]; Le Ray and Pinson proposed an adaptive and recursive clustering method that creates typical load profiles updated with newly collected data Le Ray and Pinson [2019]; Rodrigues et al. described an online hierarchical clustering algorithm, which was applied to cluster energy consumption time series in a load forecasting task Rodrigues et al. [2008]; Fidalgo et al. described a clustering approach based on simulated annealing that tries to reconcile billing processes that use 15 min meter data and monthly total consumption and derive typical profiles for consumers classes Fidalgo et al. [2012]; Sun et al. proposed a copula-based mixture model clustering algorithm that captures complex dependency structures present in energy consumption profiles and detects outliers Sun et al. [2017].
These clustering methods do not include information about the elasticity of consumers to tariff changes. However, recent research developed mathematical and statistical models for modelling price responsiveness from domestic consumers. Ganesan et al. applied a causality model to the Low Carbon London data set in order to rank consumers according to their responsiveness to tariff changes, and outperformed correlationbased metrics Ganesan et al. [2019]. Saez-Gallego and Morales applied inverse optimization to improve the accuracy of load forecasting when aggregating a pool of price-responsive consumers and considering the effect of calendar and weather variables Saez-Gallego and Morales [2017]; Le Ray et.al. applied a clinical testing approach (based on a test and a control group) to assess whether or not loads of households participating in the EcoGrid EU DR program are price-responsive Le Ray et al. [2016]; Mohajeryami et al. proposed an economic model to explain the consumption shift between peak and off-peak hours that maximizes customer's utility function Mohajeryami et al. [2016].
These works are closely linked to the forecast of consumers reactions to DR policies, but, to our knowledge, were never combined with (or embedded in) clustering techniques for consumer segmentation or used to simulate daily consumption profiles according to their price-responsiveness.

Data Generation Methods
The generation of energy consumption profiles for households is not new and it was already covered by different authors in the literature. Capasso et al. proposed a bottom-up approach based on the aggregation of individual appliance consumption in order to produce a household consumption profile Capasso et al. [1994]. A Monte Carlo simulation model was proposed to combine behavioral data (home activities, availability at home from each member, etc.) and engineering functions (appliance mode of operation, technological penetration, etc.) with associated probability distributions. Park et al. proposed a platform, exploiting SystemC language for event-driven simulation, which simulates the behavior of individual appliances and smart plugs Park et al. [2010]. Both works did not considered weather-dependent appliances (e.g., heating, ventilating and air conditioning -HVAC) or the effect of price signals.
Physically-based models for appliances (including HVAC) are also proposed in Muratori et al. [2013], combined with heterogeneous Markov chain for activity patterns, to simulate households energy consumption. A similar approach was followed in Richardson et al. [2010], but using individual appliance consumption data. A set of physical models for appliances are proposed in López et al. [2019], implemented in MATLAB Simulink, and can simulate optimal on/off decisions of household appliances. Gottwalt et al. described a simulation engine for households with two modules: (a) bottom-up approach that generates consumption data for each appliance by combining statistical data about appliance use and resident presence at home; (b) optimization of appliances schedule in order to find the optimal load shift according to time-based tariffs Gottwalt et al. [2011]. Iwafune et al. proposed a Markov chain Monte Carlo method for simulating electric vehicle driving behaviors, which enables an evaluation of the DR potential when combined with domestic photovoltaic panels Iwafune et al. [2020].
The aforementioned methodologies assume that information about individual appliances (usage patterns, energy consumption, etc.) and behavioral data is available, instead of just using the total household consumption collected by the smart meter. One exception is Li et al. [2019], which describes a methodology based on an elasticity coefficient (approximated by a Gaussian distribution) to estimate indices that characterize the impact of real-time prices in the consumption pattern, such as proportion of maximum load decrease, proportion of peak-valley difference of load decrease, etc. The method consisted in an empirical rule-based calculation of transferred consumption between periods, which was only applied to aggregated consumption of an electric power system and not to households.

Contributions
The major contributions from this paper are described in the following paragraphs.
The CVAE-based generator of daily energy consumption profiles, in contrast to the methods revised in Section 2.2, only relies in data collected by smart meters for the total household consumption and exogenous variables such as tariff profile, weather and calendar variables. Compared to Park et al. [2010], Muratori et al. [2013], Richardson et al. [2010], it is fully data-driven and does not require physical models for individual appliances and consumer behavior data.
Moreover, in comparison to Li et al. [2019], the proposed method is non-parametric and estimates changes in consumption profiles by applying a deep learning model without empirical assumptions about load shifting, showing a high capacity to learn behavioral changes when consumers experience different tariff schemes. In statistical literacy, the proposed method corresponds to sampling random vectors from a given joint density function, which was also explored in the renewable energy forecasting literature to generate temporal trajectories from conditional marginal probability distributions (see Pinson et al. [2009] and Chen et al. [2018] for wind energy trajectories forecast with Gaussian copula and generalized adversarial networks correspondingly). In this work, we are sampling random vectors (i.e., coherent energy consumption profiles) conditioned by tariff, weather and calendar variables. It is important to note that CVAE were recently applied in Marot et al. [2019] to learn specific representations for atypical conditions discovery (e.g., holidays) in daily electrical consumption, but not explored for synthetic data generation.
As a complementary contribution, a novel semi-parametric data generator, based on generalized additive models, is proposed as a benchmark model. Its numerical performance highlights the main advantage offered by the CVAE-based approach, which is the capacity to take into account and reproduce the rebound (the fall or rise in consumption shifts to another time of the day when a special tariff is applied over a period) and side (the fall or rise induced by a special tariff lasts longer -for High tariff -or less long -for Low tariff -than the period in which the tariff is actually applied) effects in the generated consumption profiles.
Finally, the proposed clustering methodology extends the clustering algorithm from in Brégère and Huard [2020] in order to include the causal model between tariff and energy consumption. Thus, in contrast to the methods revised in Section 2.1, this clustering algorithm gathers consumers according to their (total) consumption behavior and tariff-responsiveness.

Data set Description and Preprocessing
As a case-study for this work, we consider the open data set published by UK Power Networks and containing energy consumption (in kWh per half-hour) of around 5 000 households throughout 2013 UKd. A sub-group of approximately 1 100 customers was subjected to a dynamic Time of Use (ToU) tariff. The tariff values, among High (67.20 p/kWh), Low (3.99 p/kWh), or Normal (11.76 p/kWh), and the (half-hourly) intervals of day where these prices are applied, were announced day-ahead via the smart meter or text message. All non-ToU customers were on a flat rate tariff of 14.228 p/kWh and we refer to them as Standard (Std) customers. The report of Schofield et al. (see Schofield et al. [2014]) provides a full description of this experimentation and an exhaustive analysis of results.
The data set contains tariffs and energy consumption records, for each client, at half-hourly intervals. Only ToU customers with more than 95% of data available (1 007 clients) are kept and the same number of Std clients are sampled to build a control group. We denote by I T oU the set of ToU households and by I Std the set of Std ones. The missing values in the time series were filled by linear interpolation, using the previous and next interval records for small gaps and the days preceding and following for longer periods of missing data. Finally, for each household, we also compute the average energy consumption, its minimum, and its maximum as well as the half-hour of the daily peak and of the daily trough, for the hot months (from April to September) and for the cold months (the others).
Since weather has a strong impact on energy consumption, we added half-hourly data points of air temperature in London obtained from hourly public observationstem by linear interpolation. Thus, for each household i ∈ I T oU ∪ I Std , for any day t of year 2013, we get three 48-vectors denoted by Y 1 t (i), . . . , Y 48 t (i), p 1 t , . . . , p 48 t , and τ 1 t , . . . , τ 48 t , which are energy consumption profiles, tariff for ToU consumers and temperature respectively. From now on, H = 48 represents the number of consumption readings per day. Since a smoothed temperature -that models the thermal inertia of buildings -is likely to improve forecasts (see among others, Taylor [2003] and Goude et al. [2014]), a daily smoothed temperatureτ t is introduced (see Appendix A.1 for further details). Energy consumption also depends on calendar variables such as the typeof-day and position-in-the-year. Thus, two additional variables were created: (i) binary variable w t that takes 0 on weekends and 1 on working days; (ii) κ t , a continuous variable which increases linearly from 0 (on January, 1.) to 1 (on December, 31.).
The final data set (presented in Table 2) contains, for each of the 2 014 households (half Std, half ToU), T = 365 observations of the energy consumption, tariff, and temperatures profiles, the smoothed temperature, the type-of-day, and the position-in-the-year.
This data set is split in two sub-sets: a training set which contains about 75% of the original data -days are randomly sampled from those of 2013 -and a testing set made of the remaining data points. A perfect design of the experiments would require four data sets but the size of the original data led us to exclude this possibility. As the household clustering is a prior knowledge for the creation of the data generators (we create a generator per cluster), the entire data is used to cluster the clients. The (non-parametric and semi-parametric) data generators are optimized on the training set. The testing set is used to calibrate CVAE-based data generators and to choose the best combination of exogenous variables to give in input. Moreover, the best CVAE among several executions of the training process (CVAEs may converges to local minima) is selected thanks to this testing set. Finally, it also permits to compare the two approaches, nonparametric and semi-parametric, in the experiments of Section 7. To simplify notation, we re-indent the observations of the original data set: observations from 1 to T 0 = 273 form the training set, and the ones from T 0 + 1 to T = 365 form the testing set. The dataset division and use is summarized in Table 1 Table 1: Summary of the use of the two data sets: the training set (75% of the original data) and the testing set (remaining data). The clustering of the households is detailed in Section 4. The training process for the CVAE-based generator is explained in Section 5; the calibration of the hyper-parameters and the selection of the best CVAE are detailed in the subsections 5.2 and 5.4, respectively. The training process for the semi-parametric generator is in Section 6. Both data generators are compared in the experiments of Section 7. Linear value between 0 (January, 1.) and 1 (December, 31.) κ t Table 2: Summary of the variables provided and created for each household i of the data set.

Causality model
To measure the impact of the tariff on the energy consumption, a causality model similar to the one proposed by Ganesan et al. (see Ganesan et al. [2019]) is considered. The finite set of available tariff is denoted by P = {Low, High, Normal} and its cardinal by |P|. For each household and each tariff, a daily profile of the mean and the standard deviation of its energy consumption will be computed. For an household i, at an half-hour h, the random variable Y h (i) refers to the individual energy consumption of household i. It depends on the chosen tariff p ∈ P but also on the exogenous variables gathered in a vector Here, the aim is to estimate, for each tariff p and for each half-hour h, the expectation and the standard , . . . , T }, of energy consumption, tariffs, and exogenous variables, respectively, a model that gives, for the tariff p and the exogenous variables x h , a forecast of the expected consumption at h when tariff p is picked, is trained. In the original model, the authors used kernel regression and then an approach based on bootstrapping to provide an estimation of the standard deviation (see Ganesan et al. [2019] for further details). In this work, for any exogenous variable x h t and tariff p h t , the random energy consumption Y h t (i) is assumed to be Gaussian of mean µ i (x h t , p h t ) and standard deviation σ i (x h t , p h t ) and that theses mean and standard deviation depend on additive smooth predictors. They are estimated with generalized additive models (GAM), see Wood [2006] -full calculations are detailed in Appendix A.2. Therefore, for any tariff p, the trained model provides these estimations, that are denoted by µ i (x h t , p) and σ i (p, x h t ). Then, an approximation of the impact of a tariff change is computed with the two following quantities: (1) For simplicity of notation, these approximations associated with an household i ∈ I T oU ∪ I Std , are denoted by µ h i (p) and σ h i (p), respectively. Vectors µ 1 i (p), . . . , µ H i (p) will be used to cluster the consumers whereas vectors σ 1 i (p), . . . , σ H i (p) will not be used until later, in Section 6 for the creation of the benchmark data generator. Actually, they will not be directly useful, but a similar approach will be applied to compute the standard deviation per tariff of the energy consumption of a consumer cluster, namely by replacing household i by a group of households.

Clustering Method
The proposed method used to cluster the households according to their consumption profile is very similar to the one used in Brégère and Huard [2020]. In this section, I will refer indifferently to I T oU or to I Std . For any household, i ∈ I, the causality model described in the previous section provides, for each tariff p ∈ P, a daily energy consumption profile, namely H mean energy consumption µ 1 i (p), . . . , µ H i (p). As the focus is more on the shape of the profiles, rather than on the amount of consumed electricity, the profiles of an   Table 3: Calinski-Harabasz score for a random clustering ("Rd"), for a clustering based on classical features ("Features"), and the clustering method proposed in Section 4 ("NMF") computed for different consumption record series.
household i are first rescaled with its average consumption associated with a base tariff, namely Normal tariff.
Then, these profiles are concatenated in a matrix M ∈ M |I|×H|P| that gathers all the households. The dimension of M is reduced with a non-negative matrix factorization (NMF): with r a small integer, M is approximated by W H, where W and H are |I| × r and r × H|P|-non-negative matrices, respectively. As soon as this approximation is good enough, line i of the matrix W is sufficient to reconstruct household i profiles (with the knowledge of matrix H -which is not used for the clustering). Thus, for each household i, from the H|P|-vector (µ 1 i (p), . . . , µ H i (p)) p∈P , r features are extracted: line i of W. With this low dimension representation of households in R r , k-medoids clustering algorithm provides the k clusters C 1 , . . . , C k , using KMedoid function implemented in the Python-library sklearn_extra. The diagram in Figure 1 sums up the steps of the procedure described here in a summarized way and detailed in Appendix A.3.

Evaluation of the Households Clustering
Three different clustering approaches of the households of I T oU and of I Std , with k = 4 clusters, are compared. The first one is a random clustering: an integer between 0 and k − 1 is randomly assigned to each household. The second one relies on classical features used to define an households profile: the minimum, maximum, and average consumption in winter and in summer, the peek-hour, and the off-hour (average instant of maximum and minimum consumption). From these rescaled features, k-medoid algorithm is used to cluster the households. The third approach is the one proposed in this paper and described in the previous section. For a cluster C , and for any day t and half-hour h, we will, from now on, consider the average energy consumption  H j=1 Y j s (C ) . Classical features allow to discriminate households depending on the amount of electricity they consume but does not really catch daily or weekly behavior. Conversely, profile types clearly come off with the proposed method.
The Calinski-Harabasz index, (see Caliński and Harabasz [1974]) is a variance ratio criterion, that evaluate the relevance of the clustering. By denoting Y (i) the vector that contains some of the consumption records associated with household i, and by Y (C ) the one with the average consumption records of cluster C and by Y (I) the average consumption records of all households, the score S CH is defined as the ratio of inter-clusters  variances and intra-cluster variances: with Var(C 1 , . . . , where Var(C ) is the intra-cluster variance of C and Var(C 1 , . . . , C k ) is the inter-clusters variance.
To compute this score, three different vectors Y are considered. First, all the records of the data set are taking into account, namely the records of the entire year 2013; therefore, in Equation (3), the vector Y (i) is equal to Y 1 1 (i), Y 2 1 (i), . . . , Y H T (i) . Then we look at the normalized energy consumption records, so Finally the normalized records associated with the sending of incentive signals are selected: only the normalized records associated with tariff Low or High are kept and the others are removed. The results are presented in Table 3, where we observe a higher score on non-standardized records for the "classical features" clustering, which is totally coherent with the curves of Figure 2. The proposed clustering method seems efficient for catching households behavior. Indeed it gets the higher score for standardized records. Moreover, the score is even higher when we select only records associated with special tariff and this increase is more important for ToU consumers that for Std ones. This presumes that the clustering is not only catching a global behavior but also the reaction to a tariff change.
It is important to mention that since we want to simulate energy consumption of quite large sub-groups of households (between one and five hundreds households), we did not investigated the optimal number of clusters k (i.e., it was fixed to 4).
In the following sections we present the two data-driven methods that simulate energy consumption profiles associated with the clusters of I T oU obtained with the method described above. For both approaches, we will train a data generator per cluster. So from now on and for simplicity of notation, a record Y h t will refer to Y h t (C ), where C designs any clusters of set I T oU .

Energy consumption profile generation with conditional variational autoencoder
The training set made of the T 0 observations (Y 1 , X 1 ), (Y 2 , X 2 ), . . . , (Y T 0 , X T 0 ) is considered. For a day t, Y t = (Y 1 t , . . . , Y H t ) is the H-dimension vector which corresponds to the daily profile of the half-hour energy consumption of a household cluster. The vector X t gathers calendar, weather, and tariff information of day t, which will be detailed further.

Description
The proposed method to generate energy consumption profiles uses the conditional version of variational autoencoders (VAE), which are generative models introduced by Kingma and Welling in 2013 (see  for further details). Autoencoders were mostly used for dimensionality reduction or feature learning (see, among others Rumelhart et al. [1986] and Hinton and Zemel [1994]). They consist of two neural networks: an "encoder" E and a "decoder" D. An autoencoder learns a low dimension representation of a set of H-dimension data points by training both networks at the same time. Indeed, the encoder transforms the H-dimension vectors into d-dimension vectors (with d H) and the decoder tries to rebuild initial vectors from the encoder outputs. Considering Z = E(Y ) as the d-dimension output of the encoder for the H-dimensional input Y and D(Z) as the H-dimension output of the decoder for the d-dimension input Z, the autoencoder is trained to minimize the following "reconstruction loss" where · is the Euclidean norm. Therefore, a data point Y can be represented in a d-dimension latent space by E(Y ).
In the autoencoder framework, there is no constraint on this latent space and the only guarantee is that the representation Z = E(Y ) can be decoded in the original signal D(Z) ≈ Y . Moreover, we have no idea what the decoded variable D(Z) would look like for a value of Z / ∈ {E(Y 1 ), . . . , E(Y T 0 )}. Thus, there is no guarantee on the shape of the latent space. Without regularization term, for any d 1, by increasing the number of neurons in both the encoder and the decoder networks, we can create an autoencoder with enough degrees of freedom to fully overfit the data, which points out the need for a regularization term. In VAEs, the introduction of a penalty on the latent space implicitly makes the strong assumption that the distribution of data points E(Y ) is close to a given prior distribution. This prior is often set to the standard normal distribution, which we also do in our experiments. From now on, the encoder encodes the distribution of Z|Y , which is wanted close to N (0, I d ). We consider that Z | Y ∼ N (µ(Y ), Σ(Y )), where µ(Y ) and Σ(Y ) are the encoder outputs. The outputs Y t of the decoder are now D(Z t ), where the random variable Z t is sampled from a d-multivariate Gaussian of mean µ(Y t ) and covariance matrix Σ(Y t ), which are the encoder outputs. With D KL (P || Q) as the Kullback-Leibler divergence from Q to P , the VAE is trained by minimizing the following loss The first term corresponds to the reconstruction error and the second one is a regularization penalty on the latent space. The coefficient η balances these two terms. Calculations from  are recalled in Appendix A.4. They show how, under some assumptions on the existence of a representation of the data in a d-dimensional latent space, minimizing this loss corresponds to conjointly maximizing the likelihood of the observations with the density induced by the data generation process and minimizing an Finally, conditional variational autoencoders (CVAE)  are an extension of VAE where a vector of exogenous variables X is given as input to both the decoder and the encoder. Adding this conditional information may improve the reconstruction. Figure 3 depicts a scheme of the CVAE architecture used in the experiments. The encoder takes as input a daily energy consumption profile Y (so namely a Hvector gathering the half-hourly records of energy consumption) and an exogenous vectors X (with calendar, weather, and tariffs information) and outputs the d-dimension vectors µ and ln Σ (it is usual to consider a log-transformation, see Marot et al. [2019]). The vector ln Σ is also of dimension d. Indeed, only the diagonal of the covariance matrix Σ is encoded since both approaches (diagonal and full-matrix) were tested and there was no major difference on the reconstruction loss (obviously the regularization term is higher for a full covariance matrix). Since considering a full-matrix (which is symmetric) increases the dimension of encoders outputs (from 2d to d(d + 1)/2) and the CVAE converges slower, we decided to keep a diagonal matrix to encode the covariance matrix Σ. The random variable Z is then sampled and given to the decoder as well as the vector of exogenous variables X. Finally, the decoder outputs Y .
Once the CVAE is trained, the decoder is isolated and used to generate data. For any day s, it is enough to sample a random variables Z s ∼ N (0, I d ) in the latent space and give it as input to the decoder, combined with a vector of exogenous variables X s (that could be taken from the original data set or eventually created). Then, the decoder generates a H-vector Y s that corresponds to a new randomly generated daily consumption profile, for the day s and the contextual variables X s .

Implementation details
The CVAE were implemented by using the software libraries Tensorflow and Keras in Python programming language. The architecture of a CVAE is defined by the latent dimension d as well as the number of layers and units in encoder and decoder neural networks. We use dense layers which are deeply connected neural network layers. Once the architecture of the CVAEs is set, hyperparameters are chosen: the neural activation functions, the initialization method for neural weights and the parameter η, defined in Equation (4), that balances the two terms of the loss. The choice of the architecture and hyper-parameters calibration is detailed in Section 5.2.
In order to optimize a CVAE, so namely to compute weights and bias for each neural of both the encoder and the decoder, the loss is minimized by using the Adam optimizer (see Kingma and Ba [2015]), an extension of stochastic gradient descent method, which is commonly used in deep learning and already implemented in Keras. Note that the learning rate of this optimizer is also an hyper-parameter to set before training CVAEs.
Finally, the energy consumption records are rescaled to get values between 0 and 1 by computing the maximum Y max and minimum Y min of the energy consumption observed on the train period. The generated value are re-scaled to get coherent profile, mostly between Y min and Y max .
We recall that the data described in Section 3 was divided into two data sets: the training set contains 75% for the observations (sampled randomly from the complete data set) and is used to train the CVAE (see Table 1); the testing set, made with the remaining daily observations, is used to calibrate hyper-parameters (see Section 5.2). Finally, as CVAE may converge into local minima, many CVAE are trained and the testing set is also used to select the best one (see Section 5.4).

Hyper-parameters calibration
The process described below will be applied for each of the cluster defined in Section 4, for which a half-hourly energy consumption profile for each day of 2013 is available.

Methodology
To perform CVAEs hyperparameter calibration we opt for a grid search approach that is simply an exhaustive searching through a manually specified subset of the hyperparameter space. This optimization is guided by the performance metric detailed below, which is simply an evaluation on a held-out validation set. For each set of parameters, namely for each point of the grid, we train a CVAE and test it according to the procedure described below. Once the CVAEs have converged, (we stop the convergence process when the loss is not decreasing any more), we compute the mean squared error (which corresponds to the reconstruction loss) on the testing set made of the observations Y T 0 +1 , . . . , Y T : The architecture and hyperparameters of the CVAE hat reaches to lowest MSE are kept.

Results
We tested different values from 1 to 20 for the latent dimension d and reached a final value of 4, which is coherent with the results in Marot et al. [2019] for the daily energy consumption in France. Moreover, we also performed a principal component analysis (PCA) on the consumption data and found that 4 components were enough to explain more 80% of the variance in the data. We tested CVAEs with one or two hidden layers of 10, 15, 20 or 25 units per layer and concluded that an architecture with a hidden layer of at least 15 neurons performed much better than smaller architectures. We continued to increase the number of layers or the number of neurons per layer, but without improvement in the MSE. Moreover, the number of iterations necessary before convergence increased. So we decided to keep a single hidden layer of 15 units for both the encoder and the decoder.
Concerning the activation function of the neurons; rectified linear unit (ReLU), linear, and sigmoid functions were tested and there was no doubt that the best performance was obtained with a ReLU activation function.
For the initialization of the network weights, we compared various Keras initializers (Glorot uniform, HE normal, Lecun normal, Zeros, Ones) and a manual initialization with PCA (as described in Miranda et al. [2014]). We noticed that the weigths initialization does not have a strong impact on the results and therefore the Glorot uniform initializer was selected Glorot and Bengio [2010].
For the regularization parameter η that balances the two terms (reconstruction and regularization) in the loss function, various strategies to tune its value already exist. For example, Higgins et al. [2017] showed that a constant η > 1 may outperform classical VAE (defined with η = 1). Moreover, Liang et al. [2018] and Bowman et al. [2016] considered a moving parameter that gradually increases from 0 to 1 across iterations, linearly and according to a sigmoid, respectively. We tried the three approaches and opted for a constant regularization parameter equal to 10. Finally, we tested various learning rates for Adam optimizer but did not notice major variations in the performance, so we set it to 10 −3 .

Conditional variables preprocessing
We tried various combinations of the exogenous variables described in Table 2 and selected the one with the lowest MSE on the testing set. For a day t, the conditional vector X t gathers the variables described below.
Without loss of generality, prices are categorical variables (Low, Normal or High), so, for an day t and an half-hour h, the prices p h t are encoded into two binary variables 1 p h t =Low , and 1 p h t =High (if these two variables are null in the same time, the tariff is Normal). The position-in-the-year κ t ∈ [0, 1] and the binary variable w t for the type-of-day are also considered.
Taking into account the half-hourly temperature τ 1 t , . . . , τ H t significantly improves the MSE on the testing set, but the dimension of the conditional variables vector is then quite high. We tried to reduce the dimension of the temperature profile and obtained better results. A PCA was performed on the vectors made of all temperatures at day t (half-hourly records and smoothed temperature). Three components were enough to explain 98% of the variance. Therefore, we only keep the three components provided by the PCA and re-scale them between 0 and 1 to provide the variables τ 1

Simulator creation
Finally, we emphasize that CVAEs may converge into local minima. To avoid it, each CVAE is trained 50 times and the one with the lowest MSE on testing set is selected. For each of the cluster presented in Section 4, we thus get a CVAE that takes as inputs the daily energy consumption profile Y t = (Y 1 t . . . , Y H t ) of the considered cluster (which is rescaled during the training process) and the conditional vector X t described above. Then, the decoder is isolated and enables the generation of new data. Indeed, for a new vector X t at a day t , which can either be created or extracted from the data test, we sample a vector Z t ∼ N (0, I d ) and give these two vectors as inputs of the decoder, which outputs a daily energy consumption profile. The quality of the generated data is evaluated in two situations. First, samples for the conditional vectors X T 0 +1 , . . . , X T associated with the training set are generated. Thus, we will measure the ability of the data generators to forecast energy consumption (we will see that we can deduce a foretasted density from the generated samples). Secondly, we will create new vectors X t for which we modify the variables 1 p h t =Low , and 1 p h t =High in order to measure the impact of tariff changes. These results are presented in Section 7 and compare them with data generated according to a semi-parametric data generator presented below.
6 Semi-parametric generator: Additive Model The following semi-parametric method based on generalized additive models (GAM), see Wood [2006], is proposed to generate new daily consumption profile data. GAMs form a powerful and efficient semi-parametric approach to model electricity consumption (see, among others, Gaillard et al. [2016]) as a sum of independent exogenous variable effects. Here, we assume that there exists a class of functions F, such that, for a given half-hour h and an instance t, with x h t a vector of exogenous variables and p h t the tariff, the energy consumption expectation satisfies After estimating the functions f h (we detail further the set F and how GAMs may approximate these functions), we could compute the residuals and try to fit a model on them. They are centered, but a time dependence is observed, so adding a independent white noise to each forecast will not provide realistic profiles. It is important to note that the same problem can be found in renewable energy uncertainty forecasting and the need to generate scenarios (or trajectories) with inter-temporal dependency structure for multi-period stochastic optimization (see Pinson et al. [2009] for more details).
In this paper, we propose an approach based on a conjoint estimation of both mean and variation of the energy consumption. Then, we tried to used Gaussian copula to create trajectories, applying the methods proposed by Pinson et al. Pinson et al. [2009]. We faced an important problem: as soon as the function f h is not very well-estimated, the residuals variance comes, in majority, from the estimation error. More precisely, a bad estimation of the expected consumption leads to an increase of the estimated standard deviation.
As the focus is on generating realistic a profile (and not necessary on having the best forecast in expectation), the standard deviation used to simulate data must reflect the variability observed in energy consumption data. Thanks to the causality model of Section 4.1, that is now fitted on cluster consumptions (and not on individual ones), we can estimate the standard deviation of the noise as a function of the tariff and the half-hour h. We recall that we denote by σ h (p) the approximation of the standard Var[Y h (i) | P = p] deviation associated with the half-hour h and the tariff p -see Equation (1). It is used to normalize the residuals, which should then be centered and of variance 1 (but not independent). Finally, we consider the standardized residual vectors and compute an estimation of their correlation matrix Σ. We can now generate new data points this way: Functions (f h ) 1 h H are estimated with GAMs and the exogenous vector x h t gathers the temperature of the instance at the considerate half-hour τ h t , the smoothed temperatureτ t , the position in the year κ t , the binary variable w t , which is equal to 1 if the day considered is a working day and 0 otherwise. For each half-hour h, we set the same underlying GAM: Therefore, F is the set of functions that can be written this way. The s h τ , s h τ , and s h κ functions are catching the effect of the temperatures and of the yearly seasonality. They are approximated by cubic splines, i.e. C 2 -smooth functions made up of sections of cubic polynomials joined together at points of a grid (the knots). Fixing the number of knots k and their position is sufficient to determine a linear basis of dimension k in which these functions can be projected. The mgcv R-package allows to estimate the coordinates of the splines in their basis and the coefficients α h , ξ h Low , and ξ h High that catch day of the week and tariff effects. Appendix A.5 provides details on the estimation of the correlation matrix Σ, which makes it possible to model the correlations between the consumption profiles of two half-hours of the same day, whereas keeping a variance of the residuals that varies according to the half-hour and the price.

Evaluation Metrics
By generating lots of energy consumption profiles from the simulators, an estimation of their densities can be obtained. Therefore, we use some proper scoring scores from probabilistic forecast evaluation to assess the quality of our generators. The three scores detailed below allow to evaluate the data generated on the testing period and compare both generators. For a day t of the testing set, from the vector of exogenous variables X t , both generators output H-random vectors that are assumed to be drawn from an underlying distribution F t . These distributions approximate the true and unknown H-dimensional distributions F t from which the observation (Y 1 t , . . . , Y H t ) is actually drawn. We generate N = 200 samples Y (1) t , . . . , Y (N ) t for each generator. From these H-random vectors, we can approximate the three scores described below, that measure the adequacy between the observation vectors Y t and the distributions F t .
First of all, for a distribution F , and a vector of observation y, the root mean squared error is considered: where Y is a random vectors distributed according to F . The first score is thus the RMSE between the expectation of the distribution F t (which we approximate with empirical mean of the generated samples) and the observation Y t : Here, the expectation of the distribution F t is actually seen as a forecast of the energy consumption Y t . But to evaluate the quality of F t , a criterion including the variance and shape of the densities is necessary.
The two other scores are proper scoring rules used to evaluate weather ensembles or temporal trajectories generated by a statistical method (e.g., copula model). The energy score, introduced by Gneiting and Raftery Gneiting and Raftery [2007], generalizes the univariate continuous ranked probability score (CRPS) and is defined as where Y and Y are two independent random vectors that are distributed according to F . This score is approximated by splitting the generated samples in two groups Y : Scheuerer and Hamill have shown that the ability of energy score to detect correctly correlations between the components of the multivariate distribution was limited (see Scheuerer and Hamill [2015] for further details). To remedy, they introduced the variogram score of order p: where Y is a random vectors distributed according to F . On simulated data, they compared the performance of different scores (including the energy score) with the variogram scores for various p. This score is approximated with: We emphasize that for all the scores above, the smaller the value, the better the forecast.

Numerical results
For each cluster and each day t of the testing set, we compute, for both generators (CVAE-based and GAM-based) the three scores (thanks to the 200 generated samples). Results are represented by boxplots in Figure 4. Moreover, for the first three days of the testing set (that are actually the first three days of 2013), 20 samples generated by the simulators for one of the 4 clusters, their empirical means (computed on all the samples) and the corresponding observations Y t are plotted in Figure 5. Plots of each cluster can be found in Figure 7 of Appendix A.6. It is quite difficult to discriminate significantly both generators from these scores, but some conclusions may still be drawn. First, RMSE bloxplots and plots suggest that GAM-based generators work better than those that use CAVE when it comes to generating the average value of the original data (which is approximated by the empirical mean of the samples). However, the energy score is slightly lower for the nonparametric approach (namely for CVAE-based simulator) than for the semi-parametric one (GAM-based simulator). Thus, the method that consists in adding a noise term to a forecast in expectation may have some limits whereas CVAEs seem to catch correctly the distributions of daily energy consumption.
Experiments of Scheuerer and Hamill [2015] highlight that, when the estimation of the average value of the original data is incorrect (namely when the expectation of F differs from the expectation of y in Equation (12)), variogram scores increase. Moreover, a too low or a too high variance -when the variance of F differs from the one of y -also increases variogram. Given the variogram scores and the plots, we conclude that CVAE-based generators face an estimation of expected energy consumption worst than the semi-parametric generator but provide also samples with a too low variance. Conversely, GAM-based generators provide sample with too much variance, which also leads to a quite high variagram score.
Moreover, in the CVAE approach, consumption values from an half-hour to another are very correlated, when in the semi-parametric one, consumption profiles are more erratic. Observations suggest that the real Normal High Low Figure 6: Left: data generated with the CVAE-based simulator. Right: data generated with the GAM-based simulator. Black lines: for a single cluster on the first three days of the data test set, 20 energy consumption profiles and empirical mean profile, computed over 200 samples (in bold), obtained by giving, to the two simulators, a Normal tariff for every half-hour and the weather and calendar variables observed over this period. Blue lines: same plots but with a High tariff in the evening and Normal tariff otherwise. Yellow lines: same plots but with a Low tariff in the early morning and Normal tariff otherwise.
variances and correlations lie somewhere in between. The semi-parametric method is very sensitive to the standard deviation σ h (p) estimations. Thus, over-estimating these variances, provide, for sure, very different samples, which may be also very erratic. Concerning CVAE-based generator, the variance of the samples could manually be increased by generating the decoder inputs according to N (0, σ 2 I d ) with σ > 1.
Finally, we emphasize that in the semi-parametric approach, the variance depends only on the tariff and on the half-hour, whereas in the CVAE, all exogenous variables are taking into account. Moreover, the next section presents some strong advantages of the CVAE generator.

Impact of the tariff
In these last experiments, for a day t of the testing set, three different conditional vectors X Normal t , X Low t and X High t are considered. The tariff is Normal for all the day long for X Normal t . For the vector X Low t , Low tariff applies from 4:30 to 9:30 a.m., and Normal one otherwise, finally, tariff is Normal expect from 7:30 to 10 p.m. where it is High for X High t . For all other components, namely for the calendar and weather variables, X Normal t , X Low t , and X High t are equal to X t . Still for the first three days of the testing set, 20 samples generated by the generators for one of the 4 clusters and their empirical means (computed with all the sample) are plotted in Figure 6. Plots of each cluster can be found in Figure 8 of Appendix A.6.
For both data generators, an increase of the consumption when tariff Low is applied and a decrease when the tariff is High are observed. For the GAM-based generator, the effect of the tariff is very interpretable, it is actually measured by coefficients ξ h Low and ξ h High of equation (8). This model makes actually this assumption that the tariff effect only depends on the half-hour. Moreover, matrix Σ models the correlations between the energy consumption at two half hours of the same day; this implicitly assumes that these correlations do not change according to the applied tariff profile. Conversely, CVAE-based generator does not have this assumption and the effect of a tariff may differ from a day to another.
Moreover, two effects that cannot be modelled by the semi-parametric approach are observed. First, the fall of the energy consumption occurs a little bit before the effective establishment of a special tariff High and continues a little after it is stopped. Thus, the effect of the High tariff exceeds the time window in which the special tariff is actually applied. This is called a side effect. Secondly, in comparison to a day of Normal tariff, when tariff Low is applied in the morning, there is a drop of the consumption in the afternoon and evening. Similarly, we observe a little increase of the consumption in the afternoon when the tariff is High during the evening. Therefore, the fall or rise in consumption shifts to another time of the day when a special tariff is applied over a time window. This is called a rebound effect. These side and rebound effects are well known behaviors of consumers and it is very valuable that the generator detects them. The main drawback of this non-parametric generator is the generation of non-intuitive consumption profiles when the input is a tariff profile never observed in the training set, like an entire day of High tariff for example. This shows that the method has a limited generalization capacity. Enlarging the data set, especially the variety of price signals, would eliminate this limitation. On the other hand, for a full day of tariff High, the semi-parametric model generates samples with an energy consumption below the typical one for each half-hour, which is unrealistic since electricity uses cannot be delayed indefinitely. Figure 8 of Appendix A.6 shows that tariff-responsiveness vary from a cluster to another, i.e., rebound or side effects are not always observed and the amount of electricity over or under consumed also depends on the considered cluster. These results fully illustrate the motivation behind the use of the causality model to cluster consumers.

Conclusions
This paper proposed a data-driven and non-parametric methodology, based on CVAE, for generating synthetic energy consumption profiles for households enrolled in a DR program with different tariff schemes. The results for the largest data set publicly available (released by UK Power Networks) show that the proposed non-parametric generator captures correctly the effect of the exogenous variables and performs almost as well as the benchmark semi-parametric generator to generate the mean value of the original data. Besides and above all, the whole point of the CVAE approach comes from its ability to capture the effect of a daily tariff profile on the daily consumption profiles. Indeed, unlike the semi-parametric generator that only captures the effect of a special tariff for the half-hours affected by this tariff change, the generator built from a decoder of a CVAE provides daily consumption samples for a daily tariff policy, including rebound and side effects. Moreover, for the same conditional variables as inputs, the generated samples differ from one group of consumers to another. Thus, the proposed clustering approach divides correctly the households according to their responsiveness to a tariff profile.
Finally, to deal with the lack of variability in the sent tariff profile of the original data set, we could imagine an online data generator: when a new tariff profile is sent, the observed consumption is integrated in the training set and the data generator is updated. The use of transfer learning methods could also improve the realism of the generated data. This machine learning field focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. Therefore, by combining data sets of consumer responsiveness to various DR programs (i.e. by combining diverse knowledge of electricity demand in the face of tariff changes), a data set with a higher variability in the sent tariff profiles may be obtained. These data generators could be very useful to test potential future DR policies, before deploying such solutions in consumer households. Another topic of interest is the extension of the proposed model to consider privacy of the smart meter measurements and where recent research in privacy-preserving machine learning is a promising approach Al-Rubaie and Chang [2019].