Sensitivity Analyses of CVR Measurement and Verification Methodologies to Data Availability and Quality

Electric utilities deploy Conservation Voltage Reduction (CVR) and Volt-VAR Optimization (VVO) programs to reduce energy consumption and peak demand by lowering the voltage on the distribution system. These programs offer a cost-effective way to improve system-wide energy efficiency and to provide benefits to customers. This paper focuses on conducting a comprehensive study, modeling, simulation, and comparison to identify the sensitivity of various CVR Measurement and Verification (M&V) methodologies to various data anomaly issues. A major challenge in evaluating the results of CVR M&V methodologies is the lack of benchmark load consumption measurement when CVR is active. Therefore, a benchmark test system is created in this paper to allow access to pre-CVR measurements and enable analyses on the impact of various data anomaly issues. This benchmark is created based on real utility data (considered as pre-CVR data), and through a detailed ZIP load modeling and post-CVR data generation. The studies show that a time-varying ZIP load model, accompanied by a constrained and bounded Sequential Least-Squares Quadratic Programming (SLSQP) method for parameter identification, is suitable for precise load modeling. In this paper, SCADA data is used as it shows higher accuracy in load modeling compared to its corresponding AMI data. Consequently, the sensitivity of multiple commonly used CVR M&V methodologies, including regression-based, comparison-based, and constant CVR factor, against data anomaly issues is examined using this benchmark system. The simulation results advocate that regardless of the methodologies utilized, data anomaly issues cause divergence of the results from their original values, however, with various degrees of sensitivity.


I. INTRODUCTION
The steppingstone in Conservation Voltage Reduction (CVR) is the fact that the permissible voltage band for distribution consumers can be lowered based on the ANSI standard [1], [2] or applicable state level voltage bands, without adversely affecting customer appliances and utility assets. Through CVR, numerous customer devices draw less power at lower voltages, resulting in energy consumption reduction and The associate editor coordinating the review of this manuscript and approving it for publication was Feng Wu.
savings [3]- [7]. For example, the US Department of Energy (DOE) reports savings from 1% to 4% based on the prior implementation of CVR and Volt-VAR Optimization (VVO) programs [8].
To report energy savings to the public utility commissions or examine the cost-benefit ratio to decide on further CVR deployment, the energy savings and benefits of CVR need to be quantified through Measurement and Verification (M&V) [9]- [12]. The M&V of CVR effects has always been a technical challenge in its application, considering a lack of benchmark load consumption measurement when VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ CVR is active [13]. In addition, distinguishing the changes in load and energy consumption due to voltage reduction from other impact factors (e.g., weather) is challenging but is required for quantifying CVR effects [14]. CVR effects can be evaluated by a CVR factor, which indicates the relationship between energy savings and changes in voltage from CVR operations. The CVR factor is defined as the ratio between the percentage change in energy and the associated percentage change in voltage. Therefore, a substantial amount of load and voltage data over an extended period and for each CVR-enabled circuit must be collected to estimate the CVR factor. Utilities face several challenges in applying CVR M&V. One of the main challenges is the discrepancy in data management. This discrepancy may cause significant divergence in obtaining CVR impacts and alters CVR calculation results. Most notably, inadequate and anomalous data can jeopardize the analysis regardless of the methodology used to derive the savings or CVR factor. A lack of defined guidelines on selecting the CVR M&V methodology is another major challenge in CVR deployment, however there are existing standardization efforts at IEEE.
Based on our previous benchmarking studies in [15], utilities primarily leverage three CVR M&V methodologies as discussed in the following: • Comparison-based methods: The comparison-based methods leverage operational data under CVR treatment) and non-CVR (control) conditions and accordingly determine the CVR factor by comparing these two cases [16]. There are two general categories for comparison-based methods: correlated-feeder and correlated-weather. The comparison-based methods are straightforward and easy to implement [17], [18].
• Regression-based methods: Regression-based methods model load and nodal voltage as a function of various factors, including temperature and CVR impact [19]. The CVR factor is calculated by generating this function using data associated with CVR-on and CVR-off conditions. Commonly used approaches to estimate the load model in regression-based methods are linear and nonlinear regressions. In regression-based methods, physical interpretations are potentially embedded in the regression models, so electric utilities can understand the model behavior based on impact factors [14], [20].
• Simulation-based methods: Simulation-based methods simulate the load consumption in CVR-off condition and further use this model within power flow calculations to determine the difference with measured load consumption and calculate the CVR factor accordingly. Simulation-based methods show high precision if the load models are highly accurate while allowing the system to run continually [21]- [23]. In addition to the methodologies mentioned above, there have been cases where utilities have calculated the CVR factor for a selected number of circuits and used the result, commonly an averaged CVR factor, for other circuits in their service territory.
Considering that there is a lack of benchmark load consumption measurement during the CVR-on period, assessment and verification of CVR factor and energy savings are challenging tasks. This paper aims to conduct a comprehensive study, modeling, simulation, and comparison to identify the sensitivity of various CVR M&V methodologies to data quality and availability. This paper contains mainly two parts: first, a benchmark load consumption measurement is created, and then the sensitivity of various methods in finding CVR effects under various data quality and availability scenarios is investigated. The rest of the paper is organized as follows. Section II elaborates the model outline of the proposed benchmarking studies. Section III explains load modeling, and Section IV represents CVR M&V methodologies. Numerical simulations are presented in Section V. Section VI concludes the paper. Figure 1 illustrates the outline of benchmark development and comparative analysis to identify the sensitivity of various CVR M&V methodologies to data quality and availability issues. A major challenge in evaluating the results of CVR M&V methodologies is the lack of benchmark load consumption measurement when CVR is active. In other words, when CVR is on, the pre-CVR data for that specific time is lost and there is no simple way to accurately find that missing baseline data. Various CVR M&V methodologies attempt to estimate this pre-CVR data, however, there is no concrete way to evaluate which methodology has estimated the pre-CVR data more accurately. To resolve this issue, and to ensure that we have a complete picture of the pre-CVR data before analyzing various methodologies, we create a benchmark test system that includes both pre-CVR data (from real utility feeders) and post-CVR data (generated using ZIP load modeling).

II. MODEL OUTLINE
Load modeling is the first step to create the benchmark load consumption measurement during the CVR-on period. The goal of load modeling is to create post-CVR data based on available pre-CVR data. A time-varying ZIP load model is proposed based on various time resolution scenarios. In terms of available data to create the benchmark, three datasets, i.e., single-customer AMI dataset, aggregated AMI dataset, and SCADA dataset, are inspected to determine the best dataset for further studies. The benchmarking is done by creating post-CVR data. The post-CVR data is used in (1) comparing the pre-and post-CVR data to find the benchmark CVR factor and energy savings; and (2) application in various CVR M&V methodologies to calculate CVR factor and energy savings from each methodology. Together, these two applications enable a comparative analysis of the performance of various CVR M&V methodologies. Comprehensive information on the overall comparative analysis approach and the use of ZIP load models in creating post-CVR data is provided in the following sections. The studies are conducted in the following order, and based on the outline in Figure 1: Step 1, one feeder is selected. The power and voltage data associated with the feeder are cleaned and reconstructed to minimize errors in the subsequent steps. Dataset of the feeder is used as ''pre-CVR data'' since no voltage change is yet applied.
• In Step 2, associated ZIP load models are generated. • Step 3 uses pre-CVR data and the generated ZIP load models to create the post-CVR data. It is done by applying a voltage adjustment for selected CVR hours to pre-CVR data. • Step 4 records the post-CVR data generated in Step 3.
• Step 5 compares the pre-CVR and post-CVR power data and finds the benchmark values of the CVR factor and energy savings. These values are utilized as a baseline to assess the sensitivity of various CVR M&V methodologies.
• In Step 6, various methodologies, including regressionbased, comparison-based, and constant CVR factor are applied to the post-CVR voltage and power data of Step 4 to estimate CVR factor and energy savings. The sensitivity of each methodology is evaluated through the comparison of results out of Step 5 (benchmark) and Step 6 (estimated). As both baseline and CVR-on results are available, the sensitivity of each method against data anomaly can be observed. This paper compares the results based on multiple anomalies, including: (i) sensitivity to data availability and completeness, i.e., missing data, (ii) sensitivity to bad data and outliers, and (iii) sensitivity to load shifts.

III. LOAD MODELING
Load modeling is an essential task in power system analysis, planning, and control. One of the load modeling applications is energy savings assessment from CVR and VVO programs [24], [25]. Load modeling is used in this paper to create a benchmark system for CVR M&V studies.
There are two main steps in load modeling. The first step is to determine and select the mathematical load model structure that properly represents the load characteristics. Once the load model structure is selected, the next step is to estimate the load model parameters. The load model structure explains the mathematical relationship between the power and voltage of a load (single or aggregate). Load models are categorized into static and dynamic models, and various load model structures can be derived from these models [26]. There are various load models in the literature [27]- [30]; however, this paper selects the time-varying ZIP load model, based on a comprehensive initial study by the authors, as the most relevant to create the benchmark system. A constrained and bounded Sequential Least-Squares Quadratic Programming (SLSQP) method is proposed in this paper to estimate model parameters.
In a practical network, load composition changes over seasons, days of the week, and hours of the day. In other words, various factors such as weather, customers' behavior, and switching on/off individual loads can impact the load. As a result, a fixed static load model may not be sufficient to model the dynamic nature of the load. Instead, time-varying load models can be used to capture the time-variant load behaviors. The time-varying ZIP load model is one of the most discussed load models in the literature and can be represented as in (1) and (2) [31].
where t is the time index based on the selected resolution. V 0 is the nominal voltage, and P 0 (t) and Q 0 (t) are the real and reactive power at V 0 , respectively. Z p (t), I p (t), and P p (t) are the real power ZIP coefficients, and Z q (t), I q (t), and P q (t) are the reactive power ZIP coefficients. ZIP coefficients show the proportion of three parts of the composite load model, i.e., constant impedance, constant current, and constant power; thus, the summation of the ZIP coefficients should be one as in (3): As the CVR factor calculations are commonly done on real power, further discussion in this paper will focus on real power. However, similar studies can be done for reactive power. The voltage and real power datasets include erroneous data points that need to be removed from the data before employing the CVR M&V methodologies, which is explained as follows. VOLUME 9, 2021 A. DATA PREPARATION It is assumed that voltage and real power datasets are recorded in a proper time interval and correctly time-stamped to be usable for ZIP load modeling. We follow a cleaning process as described below: • Removing non-numeric, missing, and interpolated data points: The available datasets for voltage and real power are contaminated with non-numeric, missing, and interpolated data points. These data points are thoroughly identified and removed from the datasets.
• Reconstructing removed data points: The removed data points mentioned above are reconstructed from voltage and real power datasets. To this end, the removed data points are replaced with the most nearby available prior/future time-stamped data points.
• Removing outlier values in voltage dataset: Values less than 0.95 p.u. or above 1.05 p.u. of nominal voltage are assumed to be voltage outliers. These voltage outliers and the corresponding values in real power are identified and removed from the datasets.
• Removing outlier values in real power dataset: Values greater/less than 5 standard deviations from the real power mean value are assumed to be real power outliers. These real power outliers and the corresponding values in voltage are identified and removed from the datasets. It should be highlighted that erroneous data leads to losing data points and reducing the available number of data points for M&V methodologies, which can adversely impact the accuracy of the CVR calculations.

B. SCENARIO CREATION
The cleaned datasets prepared in the above step are utilized to create various scenarios for employing in time-varying ZIP load model. The scenarios are listed below. Note that the studies are done for one year. This scenario will find an hourly function for each hour of each day of a week in each month. Therefore, there will be seven functions for the entire year for each hour in each month. (generating a total of 2016 functions) Table 1 summarizes the scenarios described above. According to each scenario, voltage and real power datasets are created to be used in the load modeling approach.

C. CONSTRAINED BOUNDED REGRESSION-BASED ZIP LOAD MODEL
After selecting the load model structure, data cleaning, and scenario creation process, the next step is to estimate the load model parameters. Component-based and measurement-based load models are the two most common approaches in load model parameter estimation.
One of the most popular methods employed for ZIP load model parameter estimation is the regression-based least square method which falls into the measurement-based load modeling category. In this regard, a regression-based method is used in this paper for estimating the coefficients of the time-varying ZIP load model shown in (1). In this model, P 0 (t) is considered as a variable and is estimated alongside the ZIP coefficients. As a result, the number of parameters that need to be estimated for real power would be 4 and the parameter vector is defined as a 4-dimension vector, i.e., θ t , as in (4).
In addition to the equality constraint associated with the ZIP load model in (3), parameters in (4) need to be limited by their proper ranges. To satisfy both the equality constraint and bounds over the parameters, the constrained and bounded SLSQP is employed. SLSQP uses the Han-Powell quasi-Newton method with a Broyden-Fletcher-Goldfarb-Shanno (BFGS) update of the B-matrix and an L1-test function in the step-length method. A comprehensive discussion on SLSQP can be found in [32].
Sequential quadratic programming supports both equality and inequality constraints [33], where the equality constraint is represented in (3), and the inequality ones (bounds) are defined in (5) and (6): where P lb Z ,I and P ub Z ,I are the lower and upper bounds for Z p (t) and I p (t), while P lb p and P ub p are the lower and upper bounds for P p (t), respectively. Moreover, where P lb 0 is a lower bound for P 0 (t), where P 0 (t) is bounded to be greater than or equal to a positive number.
Additionally, for each selected resolution, a set of time-dependent model parameters would be obtained. The post-CVR dataset is formed using the obtained model parameters, a CVR schedule, and a voltage reduction percentage. Once post-CVR data is created from the time-varying ZIP load model, it is used to compare the CVR M&V methodologies explained in the following section.

IV. CVR M&V METHODOLOGIES
In this paper, the comparison-based, regression-based, and constant CVR factor methodologies are considered. Different utilities have widely used these methodologies in pilot and program level studies [11], [15]. These methodologies are explained below. It should be noted that this paper does not assess which methodology may be superior to others. Instead, the purpose of the study is to show how various data anomaly issues can impact the CVR M&V analysis.

A. COMPARISON-BASED METHODOLOGY
The comparison-based M&V methodologies include correlated-weather and correlated-feeder approaches. The correlated-weather approach conducts CVR-on (treatment group) and CVR-off (control group) testing on a single feeder to collect the power and voltage measurements for comparison. To determine the CVR effects resulting from CVR operation (i.e., different voltage levels), the treatment and control groups should share similar characteristics such as temperatures, time of the day, and day of the week. The correlated-feeder approach conducts CVR-on testing on one feeder (treatment group), and at the same time, compares its operation with another feeder (control group) where CVR is off. The feeders in the treatment and control groups should be geographically close to each other to experience similar temperatures. In addition, these feeders should have the same characteristics such as customer (RCI) mix, load behaviors, circuit miles, and feeder topologies.
Both comparison-based approaches are straightforward to implement. Ideally, the correlated-weather approach requires the treatment and control groups to have the same temperature data. However, the temperature difference always exists during different testing periods, while the control group required by the correlated-feeder method may not always exist. Besides, the feeders need to restrain themselves from load shifting during the test periods in both approaches.
There are different ways to carry out time-period matching in the correlated-weather approach, with the rule-based approach being the most common. The rule-based correlatedweather approach is framed around available CVR-on data, specifically around the CVR-on temperature mean. To avoid potential skewing of CVR-on/off data points, all data points (during CVR-on and CVR-off conditions) outside +/−1 standard deviations of the CVR-on temperature mean are eliminated. Accordingly, the voltage and power data for each hour is segmented to calculate hourly mean voltage reduction percentage ( V %) and power reduction percentage ( P%). The hourly mean voltage reduction percentage is calculated as the difference between CVR-off and CVR-on voltages, divided by the CVR-off voltage. This value is accordingly multiplied by 100 to represent its percentage value. The hourly mean power reduction percentage is calculated in a similar way. Using these hourly voltage and power reduction percentages, hourly CVR factors are calculated as in (7): The above steps are repeated for multiple iterations to ensure an equal sample size of CVR-on and CVR-off hourly datasets is achieved since during the hourly data segmentation, one dataset may contain more data points than the other. VOLUME 9, 2021 The balance in sample size is processed randomly so that all data points can be part of the sample size. The final CVR factor and voltage reduction will be the mean of these iterations. This method is denoted as comparison (rule-based) method.
The optimization-based correlated-weather approach is a specific type of the comparison-based method that is under investigation by the authors and is further considered in this paper. This approach follows the same procedure as explained above but uses an optimization-based time-period matching process. In other words, to match every CVR-on time-period to a CVR-off time-period, in order to identify the pre-CVR voltage and power values and estimate hourly CVR factor and energy savings, an optimization model is run. This model minimizes the weighted temperature, season, day type, and time of the day difference between CVR-on and CVR-off time-periods and accordingly determines the best match. In this paper, this methodology is referred to as comparison opt-based.

B. REGRESSION-BASED METHODOLOGY
Regression-based methods model the load and/or voltage as a function of the different predictors or explanatory variables using multivariable linear regression. These characteristics include but are not limited to, temperature, season, type of day, hour of day, and the CVR status. Based on the captured power and/or voltage measurements and predictors, the coefficients of the corresponding power and voltage functions can be determined. Next, the counterfactual power and voltage can be estimated based on these coefficients and contrary explanatory variables (i.e., CVR status). Then, using the difference in energy consumption and voltage level, CVR effects are revealed.
The multivariable linear regression has an advantage considering the physical meanings are embedded in the regression model itself, making the model and analysis results easier to interpret and understand. However, the regression model can have estimation errors due to inaccurate CVR effects estimation. In addition, the nonlinear effect of load consumption may not be captured precisely in linear regression.
The two functions in (8) and (9) are defined to represent power and voltage: where, VO represents the CVR status; H , D, and S denote the associated indices of time of day, type of day (i.e., weekday/ weekend), and season of the year (i.e., spring, summer, fall, and winter); and T corresponds to the absolute value of degrees above/below a reference temperature;MW it and V it designate the feeder-specific power and voltage data; i and t represent the feeder and time indices, respectively. The CVR factor and energy savings are calculated by first utilizing (8) and (9) to estimate the coefficients. Using α 1 and β 1 along with the mean CVR-off energy (P CVRoff ) and voltage data (V CVRoff ), voltage reduction ( V %) and power reduction ( P%) are calculated, respectively, using (10) and (11).
which are accordingly used to estimate the CVR factor as in (12): C. CONSTANT CVR FACTOR METHODOLOGY Several utilities consider a constant CVR factor as their system-wide or feeder-wide CVR factor. Based on this constant CVR factor, they accordingly obtain baseline energy and energy savings [15]. A constant or deemed CVR factor can be chosen based on the studies conducted on a set of feeders and their simple or load-weighted average. Once the CVR factor is obtained from either of the methodologies, it can be utilized along with the voltage reduction to estimate energy savings and baseline energy.

V. SIMULATION RESULTS
One feeder from a major electric utility in the U.S., here called feeder F1 to ensure confidentiality, is selected for this study.
In addition, one single customer from F1 is selected, which is here called meter M1. Aggregated AMI data for all customers in F1 is also calculated for the studies in this section. No DER is connected to this feeder. Utilizing the SLSQP method, the SCADA data measured for feeder F1, AMI data collected from M1, and aggregated AMI data (i.e., S1) are utilized to create ZIP load models based on the nine predefined scenarios. The predefined lower and upper bounds for Z and I coefficients are considered as −8.48 and +6.85, respectively, while for the P coefficient these values are to be −2.69 and +4.45, respectively. These assumptions are made based on the existing literature [34]. To evaluate the performance of the proposed ZIP load model under these scenarios, the Mean Absolute Percentage Error (MAPE) index is utilized. Table 2 summarizes the average and standard deviation of the MAPE for the three datasets under these scenarios. Figures 2, 3, and 4 summarize the MAPE results pertaining to the single-customer AMI, aggregated AMI, and SCADA data, respectively.
Following is a summary of the major findings of dataset and scenario selection: • The simulations are performed using SCADA data, aggregated AMI data, and single-customer AMI data. The simulation results associated with the SCADA data outperform the aggregated AMI as well as single-customer AMI data. The obtained findings demonstrate that using SCADA data results in a better solution, i.e., a smaller MAPE, compared to   single-customer AMI data. In addition, SCADA data leads to more accurate results compared to aggregated AMI data.
• The simulations show that scenario 8 has the best performance, evaluated based on average and overall range of MAPE, amongst all the scenarios studied in this paper. The average MAPE, when using SCADA data, is consistent with the existing values in the literature. Now that the best scenario and dataset are selected (i.e., Scenario 8 and SCADA data), the sensitivity of three methodologies, i.e., comparison-based, regression-based, and constant CVR factor, are investigated under data anomaly issues. Note that the weather data is available, and the time resolution associated with weather data is matched with post-CVR power and voltage data for the selected feeder. Datasets used in this study have 30-minute resolution, which is a common practice in utilities. Constant CVR factor is considered to be 0.8 in this study. The constant CVR factor of 0.8 is obtained based on the M&V benchmarking studies performed in [15] and the fact that numerous utilities calculate a similar or close CVR factor to this value.
The proposed testbed is employed to assess the impact of data anomaly issues on various CVR M&V methodologies. To evaluate the impact of data anomaly issues on these methodologies, two metrics including annual CVR factor and annual energy savings are utilized. The following cases are studied: Case 0: Post-CVR data creation Case 1: Simulations based on no data anomalies Case 2: Simulations based on missing data Case 3: Simulations based on bad data (outliers) Case 4: Simulations based on load shifting Case 0 creates the post-CVR data for further studies in this report. Post-CVR data is created by using the available pre-CVR data and the CVR schedule, and will further provide the baseline energy, CVR factor, and energy savings for the studied feeder. In Case 1, the created post-CVR data is utilized in each methodology to estimate annual CVR factor and energy savings. No data anomaly is considered in this case. Cases 2-4 use a modified post-CVR data that represent a specific data anomaly (missing data, outliers, and load shift, respectively) to further show the impact of various data anomaly issues on the methodologies. Note that the performance of each methodology under data anomaly cases is studied with respect to its own results in Case 1. In other  words, this paper does not compare the methodologies against each other to show the superiority of one over the others; instead, the sensitivity of each methodology to various data quality and availability issues is studied.
Case 0: In this case, post-CVR data is generated based on the available pre-CVR data and the CVR schedule. Table 3 shows the CVR deployment schedules for the feeder. As shown in this table, CVR is active for approximately 50% of the year under study. A 4-day-on/4-day-off cycling is utilized for the feeder.
The baseline energy, post-CVR energy, annual CVR factor, and annual energy savings associated with this feeder are calculated and tabulated in Table 4.
Case 1: In this case, baseline energy, annual CVR factor, and annual energy savings associated with comparisonbased, regression-based, and constant CVR factor methodologies are calculated for the selected feeder. Table 5 shows the simulation results for this case. Moreover, Figure 5 shows the percentage difference in CVR factor and energy savings of these three methodologies in Cases 0 and 1. The main purpose of this case is to show how each methodology performs when no data anomaly is considered in the simulations.
Case 2: In this case, a portion of the data is removed to see the impact of data anomaly on each methodology. The result of this case shows the sensitivity against missing data. A random function generator is used to remove a known percentage of data. The following cases for missing data are investigated: • Case 2A: Missing 5% of data (randomly) • Case 2B: Missing 10% of data (randomly) • Case 2C: Missing 20% of data (randomly) • Case 2D: Missing 30% of data (randomly) Table 6 shows the post-CVR energy under various missing data scenarios in Case 2. As expected, by increasing the percentage of missing data from Cases 2A to 2D, the post-CVR energy is reduced.
As shown in these tables, by considering missing data, results of each CVR M&V methodology are changed compared to Case 1. In other words, if we have missing data, the results are different from the ones utilizing the original data.  Therefore, regardless of the methodology used, missing data can cause divergence of the results from the results in Case 1. Figure 6 summarizes the percentage difference in CVR factor and energy savings of these three methodologies in Case 2. Figure 6 shows that the regression-based and comparisonbased (opt-based) methods have less overall sensitivity to missing data while the comparison-based (rule-based) method is extremely sensitive. However, all methods deviate from their Case 1 results and this discrepancy occurs regardless of the CVR M&V methodology utilized. The results further advocate that the number of missing data would impact the CVR factor and energy savings calculations, however the methodologies are able to account for the missing data and find the values although with some errors. However, to achieve more accurate results, the gaps created because of     the missing data may need to be carefully and systematically filled out.
Case 3: In this case, bad data (outliers) is added to selected times to investigate the performance of each methodology under data anomaly issues. The data cleaning process does not detect this bad data. The outlier data is generated by increasing the original data by 40% of its value. If the increased value exceeds 110% of the peak demand, it is reduced to prevent detection by the cleaning process. The following cases are investigated: • Case 3A: Adding 1% of bad data (randomly): 1% of power data is randomly increased by 40% of its original value.
• Case 3B: Adding 2% of bad data (randomly): 2% of power data is randomly increased by 40% of its original value. Table 11 shows the post-CVR energy under various bad data scenarios in Case 3. Tables 12 and 13 show the simulation results for Cases 3A and 3B, respectively.
As shown in Tables 12 and 13, by considering bad data, the results of each CVR M&V methodology are changed compared to Case 1; meaning if we have bad data, the results are different from the ones utilizing the original data. Therefore, bad data scenarios impact the results of all three studied methodologies. The percentage differences in CVR factor and energy savings of the three methodologies are shown in Figure 7 for Case 3.
As shown in Figure 7, it can be observed that there is a discrepancy in the results, and this discrepancy occurs regardless of the CVR M&V methodology employed. However, this discrepancy is more apparent in comparison-based (rule-based) and constant CVR factor methods.
Case 4: In this case, the impact of load shifting on the methodologies is investigated. The result of this case shows the sensitivity against load shifting and a sudden change of the load. The load shift is generated by increasing the original data by 150% of its value. If the increased value exceeds 110% of the peak demand, it would be increased by 110% to   prevent removal by the cleaning process. The following load shift events are studied: • Case 4A: Load shifting in 10% of the year (continuously): 10% of power data is consecutively increased by 150% of its original value.
• Case 4B: Load shifting in 20% of the year (continuously): 20% of power data is consecutively increased by 150% of its original value.  Table 14 shows the post-CVR energy under various load shifting events in this case. As expected, by increasing the percentage of load shifting, the post-CVR energy is accordingly increased.
Tables 15 and 16 demonstrate the simulation results for Cases 4A and 4B, respectively.
As shown in Tables 15 and 16, by considering load shifting in the original data, results of each CVR M&V methodology    are changed compared to Case 1; meaning if we have load shifts, the results are different from the ones utilizing the original data. Load shifting events impact the results of all studied methodologies. Figure 8 demonstrates the difference between the annual CVR factor and energy savings in Case 4. By analyzing the sensitivity of comparison-based, regression-based, and constant CVR factor methodologies against 10% and 20% load shift events, a deviation from the original results in Case 1 can be seen. While this deviation is more evident in the comparison-based (rule-based) method, we conclude that this deviation occurs regardless of the CVR M&V methodology employed.

VI. CONCLUSION
This paper conducted a comparative study to identify the sensitivity of various CVR M&V methodologies to data quality and availability, including missing data, outliers, and load shifts. Three widely-used CVR M&V methodologies, i.e., comparison-based, regression-based, and constant CVR factor, were employed to investigate the sensitivity of these methodologies against data anomaly issues. The optimization-based correlated-weather approach proposed by the authors was also studied in this paper which showed promising results against data anomaly issues. Numerical simulations based on real utility data captured from CVR deployed feeders demonstrated that regardless of the methodology used, missing data, outliers, and load shift events impact the CVR factor and energy savings calculations and cause divergence from baseline scenarios utilizing original data. This paper concludes that: (1) electric utilities could reap the benefits of the studies conducted in the paper to determine how sensitive each CVR M&V methodology is against data anomaly issues and accordingly decide on the best way forward in methodology selection and dealing with various data anomaly issues; and (2) since these data anomalies are leading towards different results by utilizing various methodologies, a standard approach is required to define data management processes, including data collection, cleaning, and reconstruction. There is an ongoing effort within the IEEE to develop this standard, which will also be investigated in our future work.