Framework for Extracting and Characterizing Load Profile Variability Based on a Comparative Study of Different Wavelet Functions

The penetration of distributed energy resources (DERs) on the electric power system is changing traditional power flow and analysis studies. DERs may cause the systems’ protection and control equipment to operate outside their intended parameters, due to DERs’ variability and dispatchability. As this penetration grows, hosting capacity studies as well as protection and control impact mitigation become critical components to advance this penetration. In order to conduct such studies accurately, the electric power system’s distribution components should be modeled correctly, and will require realistic time series loads at varying temporal and spatial conditions. The load component consists of the built environment and its load profiles. However, large-scale building load profiles are scarce, expensive, and hard to obtain. This article proposes a framework to fill this gap by developing detailed and scalable synthesized building load profile data sets. Specifically, a framework to extract load variability characteristics from a subset of buildings’ empirical load profiles is presented. Thirty-four discrete wavelet transform functions with three levels of decomposition are used to extract a taxonomy of load variability profiles. The profiles are then applied to modeled building load profiles, developed using the energy simulation program EnergyPlus®, to generate synthetic load profiles. The synthesized load profiles are variations of realistic representations of measured load profiles, containing load variabilities observed in actual buildings served by the electric power system. The paper focuses on the framework development with emphasis on variability extraction and application to develop 750 synthesized load profiles at a 15-minute time resolution.


I. INTRODUCTION
The penetration of distributed energy resources (DERs) on the electric power system, especially the distribution system segment, is changing the energy landscape (see Fig. 1 [1]) as well as traditional power flow and analysis studies. DERs may cause the systems' protection and control equipment to operate outside their intended parameters, due to DERs' predictability, variability, and dispatchability [2]. As this The associate editor coordinating the review of this manuscript and approving it for publication was Hui Ma . penetration grows, hosting capacity studies and protection and control impact mitigation become a critical component to advance this penetration. IEEE Std 1547.7 identifies conventional distribution studies and special system impact studies that should be considered as needed when considering DER interconnection on the distribution system [2]. Conventional distribution studies include: steady state simulation; system protection studies; short-circuit analysis; protective device coordination; automatic restoration coordination; area electric power system grounding; synchronization; unintentional islanding; arc flash hazard study; VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and operational characteristics-loading, load shedding, etc. Special system impact studies include: quasi-static simulation; dynamic simulation; dynamic stability; system stability; stability analysis interpretation; voltage and frequency ride through; electromagnetic transient simulation; ferroresonance; interaction of different types of DER; temporary overvoltage; system grounding; DC injection; harmonics and flicker; harmonic analysis; harmonic problems; harmonic resonance; and flicker. In order to conduct such studies accurately, the distribution systems' components should be modeled correctly, and will require actual time series loads at varying temporal and spatial conditions. This is part of IEEE Std 1547.7, which identifies the need for actual load conditions and conditions under various load levels when conducting specific impact studies [2]. The load component at the secondary distribution system consists of the built environment, comprising residential, commercial, and industrial buildings. Buildings' electrical load profile data captured by advanced metering infrastructure (AMI) and similar data loggers are nonstationary time series data. This is because the time series contain trends (daily, weekly, and seasonally) and changes in level and slope [3]. These varying temporal and spatial time series load profiles and variability characteristics provide critical information for use with demand-side management, quasi-static simulation, DER planning, tools for short-term load forecasting, and synthetic load profile development, among others. In [4], the authors discuss the need for end-use load profiles to conduct an accurate analysis and understand the impact of DER and electric vehicles on the distribution system. They compare two methods for generating end-use load profiles, the ZIP model and the physical model. They develop models for residential load profiles using both approaches and analyzed them on the IEEE 8500-node distribution system. Their result shows the need for a physical model that shows the variability that takes place in actual building load profiles.
As such, buildings' load profiles at multiple time resolutions are critical to determine the impact of building loads on a given distribution system and understand the baseload of the distribution system. This baseload is then used for comparison when DERs are integrated onto the system. As stated in IEEE Std 1547.7 (2013), ''power flow simulations including time series or quasi-static simulations may be needed to fully study the impacts of DER on Area EPS [electric power system] voltage'' ( [2], page 8). Furthermore, the IEEE standard describes quasi-static simulations: ''Quasi-static simulation refers to a sequence of steady-state power flow conducted at a timestep of no less than 1 second but that can use a time step of up to one hour'' ( [2], page 77).
However, large-scale load profile data sets and load profile variability characteristics at varying temporal and spatial intervals are scarce, expensive, hard to obtain, and may create user privacy issues. To overcome this lack of data, building energy use/consumption at various levels of detail can be modeled, via mathematical models, using computer-based simulation software. The approach is referred to as building energy modeling (BEM). BEM can be used to model new buildings and/or evaluate existing building performance and the effect of new renovations to its subsystems to quantify the impact in terms of energy efficiency, capital cost, savings, and other parameters of interest [5]. One such simulation software program is EnergyPlus R , a U.S. Department of Energy (DOE) open-source software, that contains more than 500,000 lines of code to accomplish various modeling objectives for buildings [6]. EnergyPlus models whole-building energy consumption and water usage at different time-scale resolutions [7]. For example, EnergyPlus is used in [8] to develop a building energy model for a university administration building in Brazil. A total of 54 days are used in the simulation. The results are then compared with actual energy measurement, showing that 80% of the time Ener-gyPlus agrees well with the measured energy, even though the parameters used to build the simulated building were simplified. BEM has also been proposed as a tool that could be used for creating synthetic load profiles [9]. However, as noted, BEM does not currently represent variability well. DOE, in collaboration with its national laboratories, developed the commercial reference building model (CRBM). CRBM is a reference building for the 16 [10]. In addition, researchers at the National Renewable Energy Laboratory (NREL) are currently developing ResStock TM and ComStock TM tools that investigate large-scale residential and commercial building energy analysis [11], [12]. EnergyPlus can then be used to generate energy models for these benchmark buildings to represent a specific building in each spatial location at a given temporal resolution for a given U.S. weather data set and climate zone. Furthermore, EnergyPlus can provide yearly whole-building and building end-use categories (subsystems) time series load profile data at various timescales, ranging from 1-minute to 60-minute time resolutions, for the U.S. building stock. However, these load profiles represent building and occupant energy behaviors based on set schedules and parameters, yielding load profiles that contain less variability than what is encountered in actual buildings.
The lack of large-scale measured data sets for buildings, the availability of EnergyPlus, the availability of CRBM's benchmark building stock, and the proposed generation of taxonomy of load variability in this article present a compelling case for large-scale data set generation of synthetic yet accurate representations of building energy profiles for benchmark buildings that can be used for a variety of applications, including hosting capacity studies. This article provides a framework for generating such data sets by combining the low-frequency information in modeled load profiles from EnergyPlus with high-frequency variability extracted from measured data to create synthetic load profiles.
Although several approaches are possible to isolate the high-frequency variability in measured data, this article utilizes discrete wavelet transform (DWT)-an established technique most commonly used in load profile analysis and other applications to remove high-frequency fluctuations, but used here to isolate, analyze, and reintegrate this information. The extracted load variability profiles are then integrated with modeled building load profiles, generated using EnergyPlus, to develop detailed and scalable synthetic load profiles at varying temporal resolutions for the built environment for use by the scientific community for distribution system analysis. Description for the generation of 750 synthetic building load profiles at a 15-minute time resolution is presented to demonstrate the application of the proposed framework.
The major contribution and novelty of this work and the proposed framework is to generate synthetic load profiles with realistic variability resembling actual measured load profiles. This is an improvement to existing modeled load profiles that contain a limited amount of variability. While other literature work focuses on decomposition of the time series signal to remove variability (noise), our proposed work focuses on this variability and uses it to generate realistic models from modeled load profiles. The proposed framework is developed via the application of the following novel features: 1. Selection of DWT decomposition over other decomposition techniques such as empirical mode decomposition, discrete Fourier transform, and variational mode decomposition VMD, in part because DWT defines variability based on fixed frequencies (levels of decomposition). 2. Application of 34 different DWT functions with varying decomposition levels. These parameters provide different amounts of variability from the same underlying measured load profile data set, allowing for the generation of a taxonomy of load variability. 3. Normalization and scaling of the extracted variability to generate synthetic load profile data sets with realistic high-frequency behavior.
The paper is organized as follows: Section 2 discusses the current state of load variability characteristic extraction and the principle operation of DWT. Section 3 provides a framework for the proposed approach and description of the framework components. Section 4 provides detailed information on the application of the framework to generate synthetic yet realistic building load profiles, and describes the implementation of the framework to generate 750 synthetic load profiles. Section 5 provides a summary and conclusions, and Section 6 provides future work discussion.

II. LOAD VARIABILITY CHARACTERISTICS AND FEATURE EXTRACTION USING WAVELETS
Wavelet analysis has been used for feature extraction in multiple time series research domains, including power systems, image compression, pattern recognition, voice coding, signal processing, and hydrological time series data, among other fields [13]. It provides localized details in time and frequency and can detect changes and provide tools to extract these features. During the decomposition process of a given time series signal, wavelet functions provide an approximation signal and a detailed signal. These signals are then utilized depending on the objectives of the research analysis. The following subsections provide a synopsis of wavelet application in various domains and the principle operation of the DWT.

A. APPLICATIONS OF WAVELET TRANSFORM IN TIME SERIES DATA DOMAINS
In the hydrological field, wavelets are used extensively for prediction and forecasting. In [13], the authors propose DWT to reduce the dimension of hydrological time series data and then apply a clustering technique for a more efficient approach. Specifically, they decompose the daily precipitation of a yearly data sequence to use the approximation signal and discard the detail signal because the detail signal is considered the noise in the sequence. Clustering on the approximation signal is then carried out for a yearly precipitation series data from 1963-2001 to generate three clusters. The authors in [14] used the combination of a wavelet transform function and neural network to develop a multimodel combination for rainfall-runoff modeling. Specifically, a db8 wavelet function with nine levels is used to decompose the input data. By using nine levels, the eighth and ninth details provide the seasonal variation on an annual basis.
In [15], multiple types of wavelet families and subclasses are investigated, similar to the approach applied in this article, for hydrologic forecasting. The authors discuss the challenges in selecting the right wavelet to the application being addressed. Two main properties of wavelets are discussed, the region of support and the number of vanishing moments. The region of support indicates whether a wavelet is compact supported or supported at a longer span. Longer span will result in a higher averaging. The number of vanishing moments refers to the wavelet ability to represent the signal's polynomial structure that the wavelet is trying to decompose. Six different wavelet functions are used in the study: VOLUME 8, 2020 Daubechies db1 (Haar), db2, db3, db3, Sym4, and B-spline. Analysis conducted on synthetic time series shows how a wavelet with a reasonable support and good time-frequency localization property is able to capture the underlying trend and short time variability in the time series signal. In [16], a total of 32 wavelet functions from 6 wavelet families are applied as part of a methodology to predict sewage sludge parameters from a given sewage sludge data set. With respect to determining the decomposition level (L), the authors indicate that the number of data points (N) in the time series signal determines the maximum decomposition level based on the following equation: L = int(Log(N)).
In the field of load forecasting, wavelet transform functions have been used for short-term load forecast predication using dynamic features of the load characteristics. In [17], the authors propose a combination of Daubechies db4 with 2 decomposition levels and a committee of 13 recurrent neural networks to forecast 1 hour ahead, 1 day ahead, and 1 week ahead load forecasting. Various subclasses of the Daubechies db family, namely db2, db3, db4 and db5 at levels two and three, are investigated. Furthermore, the author of [17] indicated that the weather does not have any impact on the details of the db4 level two wavelet. The weather data is used with the approximation signal that represent the smoothed version of the load series. In [16], the authors use multiresolution analysis (MRA) to extract load characteristics and use for precise forecasting and provide tools for short-term load forecasting. The forecasting analysis is based on predicting the load's future behavior by independently forecasting the subseries of the load that is generated using wavelets.
In [18], the authors address the limitations of the time domain analysis and traditional load profile characteristics using typical load profiles with the proliferation of advanced metering infrastructure (AMI) and their different time domain profiles. Two spectral analysis techniques are thus introduced to address the limitation of the time domain analysis, including discrete Fourier transform and DWT. The authors compare the performance of each technique focusing on load characterization and low-order approximation; this shows the superior performance of the DWT when considering data with granular details. With respect to DWT, a Haar mother wavelet with three decomposition levels is selected because it articulates the on-off operation of typical appliances in a residential setting. For the smart meter data, 30-minute time resolution for 1 year for 6,369 customers is used in the study.
In [19], wavelet decomposition is used for feature extraction of load profile data of customers in order to identify nontechnical losses, specifically to identify fraudulent and nonfraudulent customers. The method is used to identify abrupt changes in the customer's load profiles. The time resolution of the data is the daily power consumption data collected between 2014 and 2016 for a total of 881 days for 2,271 customers. The accumulated data set is preprocessed to remove any outliers using the smoothing splines method. Then the change in the daily power consumption is computed by taking the difference between two successive days. A three-level maximal overlap discrete wavelet-packet transform is used to extract the features.
The authors in [20] propose a two-dimensional DWT to model customer load profiles. They discuss using wavelet to compress large amount of AMI data. The authors propose two wavelet-based load models, a multiresolution wavelet load model for each customer that uses DWT to transform the original load profile to the wavelet domain, and a classified wavelet load model where a single load model is used for customers with similar load behavior. The authors use 1-hour time series AMI data for 323 customers (residential, commercial, and other type customers) on a feeder of a distribution system. The measurements were recorded from May 27, 2013, to December 29, 2013 (31-week time window, 5,208 hourly kWh measurements for each customer). For the wavelet load model, the approximation signal from the last level DWT is used. The performance of the signal is then compared with the original AMI data using the following three parameters: the synthesized load profile's total energy content, the distribution of the absolute value of the hourly error, and the hourly percentage error. To classify the customers, a proposed two-dimensional DWT is used to generate a classified load model. The authors then perform a time series power flow analysis using the three models (original AMI data, the load profile generated from the approximation signal, and the classified wavelet model) as input into the distribution system. Their results show that the classified load model provides better results than the wavelet load model [20].
In the field of coherent structures, the authors in [21] use DWT and empirical mode decomposition for the extraction of coherent structures from a chaotic time series data. The authors compared the performance of the empirical mode decomposition method against DWT. A Daubechies db4 wavelet function is used in the analysis for three separate experiments. Furthermore, various decomposition levels from level 1 to level 12 are investigated, and the log-variance of the wavelet coefficient vs. levels are used to identify certain features. However, the authors identified that the selection of an appropriate DWT plays a critical role in the results. In [22], a Haar DWT is used for time series data dimension reduction and in content-based search and retrieval.
In the field of electricity price forecasting, [23] uses wavelet transforms coupled with machine learning for electricity price load forecasting. A Daubechies db5 wavelet function with combinations of decomposition levels 1 to 5 are used to provide details of the electricity price series signal and make it smoother. The authors considered this db wavelet function because it provides a compromise between the electricity price signal wavelength and smoothness. Similar work is carried out in [24] and [25]. Specifically, in [24], constitutive series generated by a Daubechies db5 with three levels is used with an ARIMA model for a day-ahead electricity price prediction. In [25], similar wavelet functions are used with an ARIMA and a neural network model for day-ahead electricity price and demand forecasting.
In the field of load characterization and load variability, the authors in [26] developed load variability models for residential loads using DWT. Specifically, 12-level Daubechies db2 is used to decompose a high-resolution time series load (1 second time resolution) into different frequencies to extract periodicity and bias components. The authors indicated that load profile characterization of less than 1 minute is not required because this data is well characterized by noise characteristics. In their study, the detail coefficients between the 7th and 11th levels are used to capture the 1-minute to 30-minute load variability changes.
In [27], the authors present a DTW method to generate time series load profiles for residential loads at varying time resolutions (1 second to 30 minutes) from actual utility transformer databases. The transformer is connected to 10-12 residential homes. This database contains 3 years' worth of data at 1-second time resolution intervals. The actual and modeled data are tested on two feeders to evaluate four critical metrics for the distribution system, namely voltage ramp distribution, minimum voltage, maximum voltage, and the number of regulator tap changes. The test result on the IEEE-123 feeder shows the modeled data accurately matched the real data in determining the number of voltage regulator operations in the time resolution of 5 minutes to 30 minutes. However, in the 1-second and 1-minute time resolution there was approximately a 27% and 23% error rate, respectively. This highlights the need of actual load profiles in the 1-second timescale to accurately model the distribution system to capture key component operations such as the number of regulator tap changes. The test result on an actual feeder in California with 619 load nodes and three regulators show similar performance in the 5-minute to 30-minute timescales. However, at 1-second and 1-minute time resolution, the error rate is approximately 5% and 10%, respectively. Test results for both feeders considering the other metrics show clear success in the modeled data to capture the influence.

B. WAVELET FUNCTIONS AND THEIR OPERATION
In this work, DWT functions are proposed to decompose time series data into a series of detail and approximation signals in the time and frequency domain to extract features at different timescales. Localizing key features of the signal at different timescales is a property of the applied mother wavelet and on the number of decomposition levels each selected mother wavelet allows.
The proposed approach uses the extracted detail signals to generate synthetic load profiles with more realistic variability. These synthetic profiles become critical to applications that are sensitive to high-resolution power variation. One application, for example, is the optimization formulation for unit commitment and economic dispatch (UC/ED). Because unit commitment and economic dispatch are optimization problems to minimize the cost associated with grid and generation, these load profiles containing more variability reflecting real systems will provide more accuracy addressing the optimization formulation in UC/ED [28]. Furthermore, when the variability that is extracted from the measured data is used to generate the synthetic load profiles, any gross errors in the measured data will carry over and affect the profiles. To address this shortcoming, robust estimation methods can be used to process the measured data before the application of the proposed method in this work. The work described in [29] and [30] proposed a novel method that provides improvements over other similar methods and can be applied to the measured data to remove any error data.
In this work, multiresolution analysis (MRA-DWT) is used, in which the scaling and position of the mother wavelet signal is based on the power of 2. The approximation signal output of the analysis is a low-frequency signal that captures the base load and holds general trends of the signal. The details signal output is a high-frequency signal that captures abrupt changes in the load profile at different timescales. Fig. 2 shows a three-level DWT. To generate the first level of information, the original time series signal is passed through a low-pass filter to generate the first level approximation signal (A1), and passed through a high-pass filter to generate the first level of details (D1). The approximation signal is then passed through a low-pass and a high-pass filter to generate A2 and D2, and so forth. To reconstruct the original time series signal, all the details and the last approximation level are used, as shown in equation (1).
The general form of this equation is shown in equation (2), where Dj and Aj are the details and approximation at the j level, and J is the total levels of the decomposition. The principle of operation consisting of a mother wavelet function ψ(t) is described mathematically as follows: where ψ i,j (t) stands for wavelet expansion functions. VOLUME 8, 2020 The coefficients of DWT are determined as follows: where ψ i,j (x) can gain its parameters through: The DWT coefficients are decomposed into two groups (low-pass and high-pass filter coefficients), corresponding to the details and approximation signals. They are determined in the following equations.
where h in (6) and g in (7) can be considered to be filters of the wavelet of the low-pass filter and high-pass filter, respectively. A DWT decomposition process corresponding to Fig. 2 can be mathematically shown as follows: where a 0,k , a j+1,k , d j+1,k are the coefficients at scale j+1 and can be determined as follows: where a j+1,n is the approximation coefficient, and d j+1,n is the detailed coefficient. Fig. 3 shows the application of a three level MRA-DWT function on a load profile time series signal. The time series signal is measured at a 15-minute time resolution for a primary school building type for one day, February 7, 2018. The following figure highlights the different detail and approximation signals at each level. Fig. 4 shows a block diagram of the proposed framework. The following subsections and further discussion in Section 4 will provide a description of the process and working principles of the framework.

A. SELECTION OF DWT FUNCTIONS
DWT operates by decomposing a time series signal (a load profile) into a detail signal that contains the high-frequency components of the signal (load variability signal) and an approximation signal that contains the low-frequency content  of the signal (base load signal in some applications). For the proposed framework in this research project, the focus is on capturing different detail signals, because they contain the sudden changes that take place when the building and/or the occupant(s) interact with the electrical system to start/stop/alter the energy consumption behavior.
The variability signal that is extracted is dependent not only on the underlying data, but on the parameters chosen for the DWT process, including level of decomposition and choice of wavelet function. There are many DWT families with subclasses for selection along with custom wavelet functions that can be designed and developed to address a given research task.
Within each DWT function, there are multiple levels of decomposition that can be used. The level of decomposition depends on the application and number of data points available in the measured time series load profile data set. Each decomposition level extracts a fixed frequency range of variability. For example, for a measured data set with a 15-minute time resolution, the first level contains the 15-to 30-minute high-resolution variability, whereas the second level contains the 15-to 60-minute variability. The highest level of decomposition to apply in a given application is the time-resolution above which the extracted signal is considered part of the base load profile. In practice, the decomposition level selection will vary for different applications.

B. MEASURED BUILDING LOAD PROFILE DATA SET
A 15-minute time resolution measured building load profile data set with five primary/secondary schools (commercial buildings) and five residential buildings is used to describe the framework for extracting load variability features using MRA-DWT. Fig. 5 shows the total daily kWh load profile usage for each of the five commercial buildings for the entire year. The data is measured starting at 12:00 AM, January 1, 2018, to 11:45 PM on December 31, 2018, for a total of 35,040 data points for each building. The commercial buildings are located in a Midwest state in the United States.   6 shows a typical load profile for one day for each building on September 12, 2018. Although all the measured building types come from one building sector (primary/secondary school), diversity and variability in the load exists, as shown in Fig. 6 and in Fig. 7 for July 9, 2018.     8 shows the total daily kWh load profile usage for the five residential buildings (homes). The data is measured from VOLUME 8, 2020  12:00 AM on April 13, 2012, to 11:45 PM on April 12, 2013, for a total of 35,040 data points for each home. Fig. 9 shows a typical load profile for one day for each home on July 9, 2012. Although all the measured building types come from one geographical location (U.S. Northwest), diversity and variability in the load exists as shown in this figure and in Fig. 10.

C. MEASURED LOAD PROFILE DECOMPOSITION AND FEATURE EXTRACTION
The process to extract various levels of load variabilities from the measured data from both building types starts with data preprocessing to clean the data by isolating missing and null data, and using interpolation to calculate and replace it. For contiguous missing data, scaled data from another building (the most similar) is used to substitute, ensuring the start and end of the substitutions took place at low-load nighttime points to avoid large discontinuities. After preprocessing, all 34 DWT functions listed in the previous section are applied to each building's load profile for the entire year. For each DWT function, three decomposition levels (level 1, 2 and 3) are applied. Equations (11)- (16) show the details and approximation when applied to the measured signal Pmeas, where Pmeas = [P0,P1,P2. . . PN −1] where N = sample size = 35,040 (see Algorithm 1 for more info) and i is the index for ith mother wavelet function among the 34 DWT functions.
This will generate a taxonomy of 102 unique load variability profiles (34 unique load variability signals at level 1, 34 unique load variability signals at level 2, and 34 unique load variability signals at level 3) for each building, for each day of the year. These load variability profiles are the detail components of the decomposed load profile signals. The level 2 and level 3 details are the total details, meaning at level 3 the load variability profile consists of the sum of the level 1, 2, and 3 details, or equivalently, the difference between the measured load profile signal and the level 3 approximation signal. Once the load variability profiles are extracted, a normalization process is applied by finding the maximum daily load of the level 3 approximation for each day, then dividing the load variability profile by that amount, yielding a pseudo-normalized variability profile represented as a percentage of the peak daily ''base'' load. The process is summarized in Algorithm 1. The following subsections provide examples of feature extraction from the 10 measured buildings.
The computational efficiency of this algorithm is O(N), where N is the number of data points in the measured load profile, meaning the cost scales linearly with N. This cost is multiplied by the number of total levels and wavelet functions to be considered.

1) EXAMPLES OF VARIABILITY FEATURES EXTRACTED FROM THE FIVE COMMERCIAL BUILDINGS
As stated earlier, 34 wavelet functions are applied to each of the five buildings to extract 102 load variability profiles each Algorithm 1 Generate daily load variability profile from measured data 1: Given a measured load profile P as a function of time, measured at T timesteps, let P t denote the load at time t: P = [P 1 , P 2 , . . . P T ] 2: Decompose the signal using wavelet w (w = db1-11, coif1-5, etc.) at level L (L = 1,2,3. . . The final normalized variability is a function of time, and is also dependent on wavelet choice w and level of decomposition L. 7: Repeat the above steps for all wavelets w and levels L of interest. day. For example, consider the load profile for one measured building, at a 15-minute time resolution, during the week of February 5, 2018, shown in Fig. 11. Fig. 12 shows the original load profile, the approximation, and the total details signal as a result of using a Daubechies-4 wavelet function with level 3 decomposition.
The following examples illustrate single-day analysis with the application of different wavelet functions for clarity and ease of interpretation. Fig. 13 shows a measured load profile from another measured building on February 8, 2018. The highlighted areas are the variability features to be extracted  as they represent realistic consumption behavior. Fig. 14,  Fig. 15, and Fig. 16 show extracted variabilities using various wavelet types and levels of decomposition from this load profile.

2) EXAMPLES OF VARIABILITY FEATURES EXTRACTED FROM THE FIVE RESIDENTIAL BUILDINGS
Similar approaches are applied to extract load variability from the five measured residential buildings. Fig. 17 shows the load profile for a residential building on November 17, 2012. Fig. 18, Fig. 19, and Fig. 20 show extracted variabilities using various wavelet types and levels of decomposition.

D. TAXONOMY OF LOAD VARIABILITIES AND VARIABILITY INDEX
A taxonomy of 37,230 daily load variability profiles are extracted from each of the five commercial buildings, yielding a total of 186,150 daily load variability profiles with 510 unique variability profiles for each day. The selection of a certain load variability profile to apply to a modeled   building load profile depends on the application and objectives. Possible approaches to consider include day-by-day selection (where machine learning techniques via clustering     modeled building load profile, in this case one of the 102 profiles are then selected for each day). This year-by-year approach is used in Section 4.
Furthermore, investigating the characteristics of the load variability signal itself reveals several findings to help in the selection process. All 34 wavelets functions with different levels are applied on a given one-day load profile to extract 102 load variability signals, and the standard deviations of each signal are shown in Fig. 21. The standard deviation of a variability signal is effectively a simple measurement of how much variability the signal contains. It is clear from the figure that for a given wavelet function, higher level decompositions yield a greater amount of variability.
Additionally, at a fixed level of decomposition, the choice of wavelet function can also greatly affect the amount of variability extracted. Fig. 22 shows the standard deviations of each wavelet function at level 1, sorted by increasing variability. Which wavelets yield more variability is different for each level of decomposition. These figures can aid in selecting one or more of the 510 or 102 profiles, depending on whether a day-by-day or year-by-year approach is used and on the level of desirable variability.
Determining how much of a signal should be considered variability, and how much is the base load, is an important decision that will differ with every profile and application. A multiwavelet, multilevel analysis such as this, however, can provide effective bounds for the variability signal as well as an effective index of variability from which to choose a wavelet function and level that provides a given magnitude of realistic variability. Of note from the previous figures, and highly consistent when examining any time period of the measured data available, is that the Daubechies-db1 level 3 decomposition yields the highest amount of variability. As this is one of the coarsest wavelet approximations, this can be considered a gross overestimation of the variability, thus providing a safe upper bound of variability for any application of this data.

IV. M SYNTHETIC BUILDING LOAD PROFILE GENERATION
EnergyPlus is used to generate modeled building load profiles for a primary/secondary building type at 15-minute time resolution. Synthetic building load profiles at the same time resolution are generated by applying a day-by-day or year-byyear approach, as discussed in the previous section. For the discussion in this section, year-by-year is selected to match a modeled building load profile type to one of the five measured building load profiles. Fig. 23 shows a graphical description of the general process. As discussed previously, with the application of the 34 wavelet decompositions at three levels to one measured school building, a total of 102 synthetic, unique, and realistic building load variability profiles are generated, then pseudo-normalized. The selection of any of these 102 synthetic load profiles depends on the application  purpose and level of variability desired. Fig. 24 shows the load profile for the modeled building on September 13, 2018, along with the load profile for the five measured buildings on the same day.
The general process to generate a synthetic load profile for one modeled school building starts by finding the maximum daily load for each day, then choosing the pseudo-normalized variability profile to be added (D1, D2, or D3) to the modeled data. After this selection, for each day, multiply the normalized daily variability profile by the maximum daily load of the modeled data and add this scaled variability profile to the modeled profile to obtain the final synthetic building load profile. The process is summarized in Algorithm 2. Fig. 25 shows the modeled load profile building with a sample of five different variability profiles for the application of the above process to select one profile and add it to generate the synthetic profile.  Although these figures show the process applied to wholebuilding data, it could also be applied to an end-use subset (e.g., energy used by heating, cooling, or lighting) of the whole-building data, with the resulting synthetic end-use profiles then aggregated to a synthetic whole-building load profile.

A. EXAMPLES OF MODELED LOAD PROFILES WITH ADDED VARIABILITY
To illustrate the application of load variabilities to a modeled building load profile, Fig. 26 shows the modeled building load profile of another primary school on February 12, 2018. This figure also shows a measured building primary school profile on the same day, scaled to match peak loads. As discussed in Section 3.3, different amounts of variability can be extracted from the measured profile, depending on the choice of wavelet and decomposition level. Fig. 27 shows the synthetic profile generated when a relatively low amount  of variability is extracted, using the Daubechies db4 wavelet at level 1. Fig. 28 shows the synthetic profile with a larger amount of variability, using the Coiflet-1 wavelet at level 3.
When year-by-year match is used, in some cases, the ''base'' load profile of measured data may not match the modeled profile well, presenting interesting findings and additional steps and challenges to consider. Fig. 29 shows a different modeled profile for a primary school on May 7, 2012, as well as measured data from a different primary school in 2018. Here the on and off times of the modeled and measured buildings do not match well, resulting in underand overestimation of the variability at various times of the day. Fig. 30 shows the synthesized profile using a low estimation of variability, the Fejér-Korovkin-8 wavelet at level 1. This synthetic profile contains more variability than the case in Fig. 27, because the measured data used exhibits more variability. Finally, Fig. 31 shows the synthetic profile with the highest amount of variability considered in this analysis,  using the Daubechies-1 wavelet at level 3. This is effectively adding a ''worst-case'' estimate of variability to the modeled data. Fig. 32 shows the distribution of yearly standard deviations for the 500 variability profiles applied to the commercial data sets at 3 levels, as well as a ''worst-case'' scenario where db1 level 3 is applied to each building. Fig. 33 shows this same data for the residential set.

B. APPLICATION OF LOAD VARIABILITIES TO THE 750 MODELED BUILDING DATA SET
A total of 750 modeled buildings (500 commercial and 250 residential buildings) are generated using EnergyPlus to simulate a segment of U.S. building stock. The buildings are composed of the DOE's reference building types, but with modified hours of operation and physical building characteristics. A process is developed to apply the taxonomy of load variability profiles extracted from the 10 measured building data sets to these 750 buildings to demonstrate the application VOLUME 8, 2020   of load variability and generation of synthetic load profiles for a large data set. The year-by-year approach is applied and repeated for the for the 750 modeled buildings, using all 34 wavelet functions at 3 different levels.

V. SUMMARY AND CONCLUSION
This article presents a framework to extract load variability profiles from measured building load profiles. Feature extraction is conducted using DWT functions. The extracted load variabilities are then applied to modeled building load profiles, generated using EnergyPlus, to generate synthetic building load profiles. These synthesized load profiles can then be used in distribution system analysis or any other application that requires a large set of diverse load profiles. This research focused on extracting load variability from five measured commercial buildings and five residential buildings at a 15-minute time resolution. The extracted load variabilities are then applied to 750 modeled load profile building models to demonstrate the application of the framework for a large-scale data set.

VI. FUTURE WORK
Although the variability in the synthetic load profiles developed using the framework and methodology described appear realistic to human eyes, this realism has not yet been quantified. Quantification would require a larger set of real data and would need to include buildings other than schools, which the authors currently do not have access to. As part of this quantification, it would be possible to evaluate and compare the different DWT functions and decomposition levels to determine which combinations create the most realistic synthetic profiles. Additionally, because the realism of the synthetic load profiles depends on both the measured variability and the accuracy of the modeled building load profile, such an evaluation would either need to investigate the pieces independently or determine a way to attribute results to either DWT selection or the building energy model. This work is part of a large project that is taking place at NREL to generate end-use load profiles [11]. As larger measured data sets become available, the proposed framework will be refined and applied to additional modeled data sets generated as part of ResStock and ComStock. The generated synthetic load profiles will be available for free online in a to-be-determined location at the conclusion of the project in 2021. DONGMING PENG received the Ph.D. degree in computer engineering from Texas A&M University. He has been an Associate Professor with the Electrical and Computer Engineering Department, University of Nebraska-Lincoln. He has published more than 100 technical articles in the research areas of digital image processing, multimedia security, wireless sensor networks, cybersecurity, computer architecture, and related topics. He has conducted various research projects in these areas with a special contribution in energy-efficient wireless and secure communications in multimedia sensor networks. He is currently an Associate Professor with the Durham School of Architectural Engineering and Construction, University of Nebraska-Lincoln. His research interests include electrified transportation, load profile analysis and real-time power monitoring in the built environment, battery power management, and renewable energy alternatives. His previous publications include several papers in the area of electrified transportation, power management, and microbattery testing and system design implementation. His industry experience includes electrical distribution system infrastructure planning and design for new and renovated facilities. He holds a Professional Engineering licensure since 1996. He is a peer reviewer for several journals and conferences, including the IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS.