Hierarchical Model-Free Transactional Control of Building Loads to Support Grid Services

A transition from generation on demand to consumption on demand is one of the solutions to overcome the many limitations associated with the higher penetration of renewable energy sources. Such a transition, however, requires a considerable amount of load flexibility in the demand side. Demand response (DR) programs can reveal and utilize this demand flexibility by enabling the participation of a large number of grid-interactive efficient buildings (GEB). Existing approaches on DR require significant modelling or training efforts, are computationally expensive, and do not guarantee the satisfaction of end users. To address these limitations, this paper proposes a scalable hierarchical model-free transactional control approach that incorporates elements of virtual battery, game theory, and model-free control (MFC) mechanisms. The proposed approach separates the control mechanism into upper and lower levels. The MFC modulates the flexible GEB in the lower level with guaranteed thermal comfort of end users, in response to the optimal pricing and power signals determined in the upper level using a Stackelberg game integrated with aggregate virtual battery constraints. Additionally, the usage of MFC necessitates less burdensome computational and communication requirements, thus, it is easily deployable even on small embedded devices. The effectiveness of this approach is demonstrated through a large-scale case study with 10,000 heterogenous GEB. The results show that the proposed approach can achieve peak load reduction and profit maximization for the distribution system operator, as well as cost reduction for end users while maintaining their comfort.


I. INTRODUCTION
The increasing integration of renewable energy sources is reshaping the power grid of the future. Most notably, the intermittent nature of the renewables such as solar and wind led to a transition from the traditional ''generation on demand'' strategy to the ''consumption on demand'' strategy [1]. The consumption on demand strategy, however, requires a significant amount of load flexibility in the demand side. Demand response (DR) can reveal and utilize this demand flexibility by enabling the participation of a large number of gridinteractive efficient buildings (GEB) [2]. DR programs aim The associate editor coordinating the review of this manuscript and approving it for publication was Siqi Bu .
to modify the electricity consumption of end users, by means of incentives, in favor of the operational needs of the power grid [3].
In a DR program, an operator, called the distribution system operator (DSO), purchases the electricity from different resources available in the wholesale market at a cost and sells the purchased electricity to end users at a price. When determining the price, the DSO will have several objectives such as increasing its profit and enhancing the reliability of the grid. The end users, on the other hand, adjust their electricity consumption in a way that minimizes their electricity cost while maintaining their comfort. In recent years, there has been a significant research attention in the area of demand flexibility, and the effectiveness of DR in many areas VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ has been proven. In this regard, many review studies on DR have been conducted in an attempt to classify and compare existing methods. For example, [4] provided a comprehensive review on demand-side management (DSM). In [5], the authors conducted an exhaustive review on residential DSM architectures, approaches, optimization models, and methods. Accordingly, the DR programs can be classified based on either their ways to engage end users or control strategies. In terms of ways to engage end users, DR programs can broadly be categorized as price-based, direct-load control, and transactive control methods [6]- [8]. Price-based DR programs use dynamic price signals to incentivize or disincentivize consumption patterns of end users. However, these programs do not engage the end users in the pricing process and require an accurate modeling of end-user reactions in response to given electricity prices, and failure to do so can ultimately result in poor grid stability. Direct-load control programs enable DR operators to remotely control specific end-user loads. Such programs do not require any modelling efforts because the changes in electricity consumption is certain as it is directly controlled by the DR operator. However, these programs may violate the preferences and privacy of end users. Transactive control programs use market mechanisms to attract end users to provide DR. Such programs engage the end users in the pricing process using a set of optimization (negotiation) mechanisms that comply with their preferences. The most commonly used optimization mechanisms are the game theoretical mechanisms [9]- [12], metaheuristic mechanisms [13], [14], and parameterizationbased mechanisms [15]. The main challenge associated with transactive control is to coordinate and aggregate the loads of end users. For a successful DR program, an effective and efficient coordination and control of a very large number of geographically distributed GEB is essential.
In terms of control strategies, DR programs can broadly be categorized as rule-based, model-based, and reinforcement learning (RL)-based methods [16], [17]. Rule-based controllers (e.g., [18]) are one of the most popular control approaches used in DR due to their simplicity. These approaches simply rely on a set of simple heuristics that are derived based on expert knowledge. For this reason, the success of rule-based controllers mainly depends on the expertise and knowledge of the users. However, these approaches may not be suitable for dealing with complex, multiple, and/or nonlinear objectives. Model-based controllers are planningbased approaches that optimize a control problem over receding time. These controllers require accurate models that define the system dynamics. However, obtaining such models requires a significant amount of time and effort. On the contrary, RL-based approaches are potentially model free and can help alleviate the limitations associated with modelbased approaches. Nevertheless, these approaches learn an optimal control policy by interacting with the surrounding environment and thus may take a long time to learn [19].
Towards addressing the aforementioned limitations related to the ways to engage end users and control strategies, this paper proposes a scalable hierarchical model-free transactional control approach that incorporates elements of virtual battery, game theory, and model-free control (MFC) mechanisms. The proposed approach separates control into upper and lower levels connected by load aggregators (LAs). The upper level is based on virtual battery and game theory. A oneleader (DSO) and multiple-follower (LAs) Stackelberg game is formulated to coordinate the aggregated GEB using dayahead load scheduling and pricing. In addition, the concept of virtual battery is integrated into the game as a set of constraints for guaranteed thermal comfort of end users. The lower level is based on MFC, which is a novel online control strategy that does not require any modelling or training efforts and can be applied to both linear and nonlinear systems [20]. To the date, it has already been successfully implemented in many other domains [21], [22].
This paper contributes to the body of knowledge on two main aspects. First, this paper couples the MFC with the game-theoretic control and proposes a scalable model-free transactive control approach. MFC does not require any modeling effort or model training for the various building loads. This is very beneficial since deriving an accurate model for every single unit participating in DR programs and obtaining all the parameters about the units (e.g., thermal coefficients, standby losses) are infeasible. Also, MFC is computationally efficient, easily deployable even on small embedded devices, and can be implemented in real time. Second, this paper integrates the concept of virtual battery into DR via the Stackelberg game. The concept of virtual battery enables efficient coordination and aggregation of a large number of flexible GEB with guaranteed thermal comfort of end users.
The rest of this paper is organized as follows. Section II presents an overview of the proposed model-free hierarchical transactional control approach. Section III describes the analytical formulations of the different components of the proposed approach including virtual battery, Stackelberg game, and MFC. Section IV presents a large-scale case study with ten LAs and 10,000 heterogenous GEB to demonstrate the effectiveness of the proposed approach. Finally, Section V concludes this paper.

II. THE PROPOSED APPROACH: OVERVIEW
The proposed approach utilizes a two-level control architecture (see Fig. 1). The architecture includes three parties: the DSO, LAs, and end users.
The upper level is handled by a Stackelberg game and a set of virtual batteries. The Stackelberg game conducts the negotiation between the DSO and the LAs within the limits of the aggregate flexibility offered by the participating GEB. The virtual batteries assist with quantifying the aggregate flexibility that each LA can offer for a guaranteed thermal comfort of end users. The lower level has two objectives: (1) allocating a sufficient amount of power to all end users to ensure the comfort of end users are maintained (2) tracking the aggregated load profile determined in the upper level. Subsequently, the proposed hierarchical model-free transactional control approach includes three primary steps: determining the virtual battery constraints, computing the optimal power and price signals using a Stackelberg game, and allocating the optimal power to the loads of end users using MFC. In the next section, the concepts of virtual battery, Stackelberg game, and MFC are explained.

III. THE PROPOSED APPROACH: ANALYTICAL FORMULATION
A. VIRTUAL BATTERY A virtual battery model assists in modelling the flexibility offered by a set of thermostatically controlled loads (TCLs), including residential and commercial heating, ventilation, and air conditioning (HVAC) and water heater (WH) units. For example, for a cooling scenario and assuming all TCLs have the same temperature setpoint (T r ) and comfort band ( ), a fully charged battery means that the temperatures of the TCLs are T r − . On the other hand, an empty battery means that the temperatures of the TCLs are T r + . The virtual battery model, adapted from [23], [24], determines whether a given power profile is feasible for a given set of TCLs. The feasibility of a power profile is defined by two criteria: (1) whether it can satisfy the comfort requirements of all TCLs, and (2) whether it can be tracked by the aggregate power consumption of all TCLs.
As mentioned earlier, the proposed approach is fully model-free, and therefore it does not require the mathematical models of any TCLs. The mathematical models presented in the remaining of this section are only used for simulating the considered TCLs (HVAC and WH units). In practice, only the periodical input-output measurements of the units (i.e., input powers and output indoor/water temperatures) that are participating in the DR program are needed for the controller. To simulate the residential and commercial HVAC units, the building thermal model depicted in Fig. 2, adapted from [25], is used.
The continuous-time dynamics of the model are as follows: where x is the system state vector (x 1 : room air temperature, x 2 : interior-wall surface temperature, and x 3 : exterior-wall core temperature), u is the on-off state of the HVAC, w is the vector of external disturbances (w 1 : outdoor temperature, w 2 solar irradiance), and y is the system output (room air temperature). The system parameters A, B, C, D, and G are defined as follows: where K 1 , K 2 , K 3 , K 4 , and K 5 are the thermal conductivity values of ceiling, floor, windows, external walls, and internal walls, respectively; C 1 , C 2 , and C 3 are the heat capacity values of the air, interior wall, and exterior wall, respectively; P m 1 , COP 1 , and γ are the rated power of the HVAC, coefficient of performance of the HVAC, and solar heat gain coefficient, respectively. To simulate the WH units, the following model adapted from [26] is used: where T is the temperature change for a given time period t,Q is the heat added to the water tank, UA is the standby heat loss coefficient, T w is the water temperature, T amb is the ambient temperature,ṁ is the mass flow rate, c p is the specific VOLUME 8, 2020 heat of the water, T fresh is the inlet water temperature, m is the mass of water, P m 2 and COP 2 are the rated power and coefficient of performance of the WH, respectively.
The aggregate nominal power for a set of GEB is defined as the power profile that is required to keep the TCLs in the set of GEB at their setpoints and maintain the operation of non-TCLs, such as pool pumps, washers, and dryers, in the set of GEB as desired by the occupants. There exist many data-driven methods (e.g., [27]) that predict the aggregate nominal power using weather forecasts and historical power usage data. In this paper, the aggregate nominal power for a set of TCLs is determined by simply adding up the nominal power profiles for all HVACs and WHs. The nominal power for a single HVAC and a single WH can be computed by setting (1) and (4) to zero and extracting u andQ, respectively. Consequently, the aggregate nominal power for a set of TCLs can be computed by: where P 0 is the aggregated nominal power for N heterogenous HVAC and WH units, and p 0 i is the nominal power for TCL i (HVAC or WH). On the other hand, the aggregate nominal power for a set of non-TCLs is obtained using the historical load profiles as: where P 1 is the aggregated nominal power for I heterogenous non-TCLs, and p 1 i is the nominal power for non-TCL i obtained using the historical load profiles. The capacity of the aggregate virtual battery is affected by the quantity and dynamics of the TCLs and the comfort preferences of the end users. The aggregate demand flexibility of the system increases as the number of TCLs increases. The slower the TCL dynamics are, the more flexible the set of TCLs is. Also, the higher the comfort bands are, the more flexible the set of TCLs is. The battery capacity for a single HVAC and WH is defined as in (7) and (8), respectively.
where C HVAC and C WH are the virtual battery capacities, HVAC and WH are the comfort bands for HVAC and WH units, respectively, and B D is a coefficient extracted by discretizing the building model (1) into the intervals of t. The aggregate capacity for a set of HVAC and WH units is defined as: where C is the aggregate capacity of N 1 HVAC units and N 2 WH units. The capacity of the virtual battery represents the amount of aggregate flexibility of the HVAC and WH units. The level of charge of the virtual battery capacity is given by: is the level of charge at the previous time step, P (t − 1) is the power consumed at the previous time step, P 0 (t − 1) is the nominal power at the previous time step, and δ is the battery dissipation rate. The difference between the power consumed and nominal power determines whether the virtual battery is being charged or discharged. When the power consumed is greater than the nominal power, the virtual battery is charged and the level of charge increases, and vice versa. The δ variable depends on the properties of the TCLs (e.g., insulation characteristics) and can be determined empirically. The level of charge can be converted to the state of charge (SOC) as: The SOC (t) satisfies the following constraint: Whether a power profile can be tracked by the total power consumption of a set of TCLs is determined by the minimum and maximum powers that the TCLs can consume. The minimum power that they can consume is zero, which simply occurs when all TCLs are turned off. On the other hand, the maximum power that they can consume occurs when all TCLs are turned on. Such limits lead to the following constraint: where P m i is the power rating of the i th TCL.

B. STACKELBERG GAME
The interaction between the DSO and multiple LAs in a day-ahead pricing market is designed as a Stackelberg game, where the DSO acts as the leader and LAs are regarded as the followers. In the defined game, the DSO determines the electricity price, while the LAs modify their consumption in response to the given price. The DSO aims to maximize its profit and social satisfaction while minimizing the peak load, while the LAs aim to maximize their satisfaction and minimize their electricity cost. In order to ensure that the optimal power profile for each aggregator is feasible (i.e., the reference power profile generated from the game theoretic approach can be tracked and the temperature responses of the TCLs stay within the comfort bands), the virtual battery is integrated into the game as a set of constraints for the LAs. The Stackelberg game is applied using two types of electricity pricing structures: time-of-use (TOU) pricing and flat pricing. In the TOU pricing structure, the electricity prices vary according to the time of day. The DSO sets higher prices during the peak demand hours and lower prices during offpeak demand hours. In the flat pricing structure, the DSO sets a fixed electricity price throughout the day. For both pricing structures, it is assumed that all LAs are subject to the same prices as set by the DSO.
The objective function of the DSO for the TOU pricing structure is defined as in (14). The function consists of three terms. The first term is the profit of the DSO, generated through buying electricity from the wholesale market and selling to the LAs. The second term represents the overall satisfaction of the DSO that changes parallel to the satisfaction of the LAs. The DSO should pay attention to the satisfaction of the LAs, which are its customers. Thus, the DSO takes care of the satisfaction of the LAs as part of a customer fulfillment strategy. Finally, the third term is the amount of peak load. The objective of the DSO is to maximize the first and second terms and minimize the third term while fulfilling the constraints given as follows: where p t is the electricity price for all LAs at time t, c t is the marginal cost of the electricity generation at time t, l n,t is the load of LA n at time t, which is the sum of the TCLs (dr n,t ) and non-TCLs (hr n,t ) power consumption, S DSO is the overall satisfaction value of the DSO, which is the sum of the satisfaction values of all LAs as per in (15), ST is the number of time decision periods, k is the peak load, θ is a weighting coefficient to prioritize or deprioritize peak load reduction. The satisfaction value of LA n is computed as in (16) [28], [29], where D n,t is the nominal power of the LA n at time t, which is the sum of the nominal powers of TCLs and non-TCLs, and α n,t is the sensitivity of LA n at time t towards consumption curtailment. For the flat pricing structure, only the following additional equality is added to the aforementioned set of constraints to ensure the price does not vary throughout the day.
The objective function of the LAs is given as in (17). The function consists of two terms. The first term is the cost of electricity to the LAs. The second term is the satisfaction of the LAs. The satisfaction of the LAs reduces as the load of the LAs (l n,t ) deviates from the nominal power of the LAs (D n,t ). The LAs may prefer not to deviate much from their nominal loads as it might disturb the routine of end users. The objective of the LAs is to minimize the first term and maximize the second term while fulfilling the constraints given as follows: min l n,t U LA = t p t ×l n,t − S LA n dr n,t ≤ dr n,t ≤ dr n,t , ∀ n,t hr n,t ≤ hr n,t ≤ hr n,t , ∀ n,t b n,t = b n,t−1 + dr n,t−1 − P 0 n,t−1 , ∀ n,t b n,1 = b n, where dr n,t and dr n,t are the minimum and maximum loads that the TCLs of LA n at time t can consume, respectively. Similarly, hr n,t and hr n,t are the minimum and maximum loads that the non-TCLs of LA n at time t can consume, respectively. The level of charge of the LA n at time t, (b n,t ) must be within the range of the aggregate capacity (9), which is equivalent to the SOC (11) being between 0% and 100%. The initial and end charge levels are assumed to be 0 (SOC = 50%). Note that the objective function of neither the DSO nor the LAs includes any parameters (e.g., K 1 and UA) from the mathematical models that are used for simulating the HVAC and WH units. This shows that the proposed approach is model free and does not require any mathematical modelling and/or model training. The Stackelberg game is solved using the backward induction method [28], [30], which follows two main steps. First, it derives the optimal load l * n,t by computing the first-order derivative of LAs' objective functions (17). Second the optimal load, l * n,t , computed using (18), is plugged into the objective function of the DSO (14). This operation converts the bilevel optimization problem into a single level optimization problem, in which the price p t is the only variable. The backward induction method is simple yet effective method and scalable to any number of LAs.

C. MODEL FREE CONTROL
The control of the end-user loads is not an easy task because of their large scale, heterogenous nature, and sometimes uncertain behavior. The controller has to determine the units that need to be powered on from a large number of units. Also, the controller needs to track the optimal power profile closely to ensure that the outcomes anticipated by the DSO and LAs can be realized. Failure to do so may cause significant discomfort, severe financial losses, and/or poor grid stability. Thus, the selection of the controller is crucial. For this reason, the MFC is utilized for the control of the end-user loads. MFC is a control mechanism that does not require any modeling effort for the various TCLs such as HVAC systems, WHs, and others. It is based on approximating the TCL by an ultra-local model as [20]: where u is the input power (or on/off state) of the TCL system, y is the rate of change in the indoor/water temperature of the TCL system, and F describes the poorly known or unknown parts of the TCL system. The parameter σ is to correct the difference in the magnitudes of the input and the output. Equation (19) should not be confused with a ''black box model'' of the TCL system. In MFC, (19) is updated at each timestep from the knowledge of the input-output behavior of the unmodeled TCL system in order to estimate the quantity F, which is approximated by a piecewise constant function given as [31]: Note that F is estimated using the measurements of the system obtained in the last L seconds and it is being updated accordingly. Unlike the classic proportionalintegral-derivative (PID) controllers, MFC continuously updates the local model (F) in (20) via the unique knowledge of the input-output behavior for both linear and nonlinear systems. For this reason, MFC has been shown to overperform the classic PID controllers and many others including fuzzy and neuro-fuzzy controllers [32], [33]. Based on the numerical knowledge of F, the control is computed using (21) as a simple cancellation of the nonlinear terms, as described in F, in addition to a closed-loop tracking of a reference trajectory. More specifically, using the latest F, the intelligent proportional control law is given by [34]: where y * is the desired reference (temperature) trajectory and K p is the proportional control gain. Combining (19) and (21) provides the error dynamics as: where r = y − y * is the tracking error and F does not appear anymore. The solution of the first-order differential equation in (22) is given as: where t 0 is the initial time. Eq. (23) shows that the tracking error asymptotically decays to 0 for K p > 0, which guarantees the asymptotic stability of the system and makes the tuning of the proportional gain straightforward. Thus, the tracking condition can be easily achieved by setting the value of K p to be the solution of (23) as: Note that K p is updated at every timestep according to (24). As a result, σ and L are the only two parameters to be tuned, and their selection is also straightforward. A sufficiently small value is needed for the parameter L, and σ can be determined from collected input-output measurements as σ =ẏ/u. This makes the control design simple to implement. The asymptotic stability criterion in addition to the closed form solution of the control gain K p for MFC make it more intelligent and beneficial than the classic PID controller.
In summary, MFC is employed in this study because it has the following main characteristics: a) It does not require any modeling effort for the different components and disturbances in the system.
b) It is straightforward to tune, in contrast to the commonly used classic PID controllers that are very challenging to tune (usually depend on trial and error methods).
c) It is asymptoticly stable, in contrast to the classic PID controllers.
d) It is very simple to implement in real time since it requires very light computations.
The additional power allocation and power tracking constraints in the MFC design are imposed using Algorithm 1, which works similar to the priority-based control algorithm in [23]. The only difference is that, instead of the priorities determined based on temperature deviations, Algorithm 1 uses the control input u values to determine the units to be turned on or off.

Algorithm 1 Power Allocation of TCLs for LA n
1: FOR every time step, t 2: FOR every TCL j with rated power of P m (j) 3: Compute u using (21) 4: Sort u values in descending order and get the rank r of TCLs 5: Initialize as P consumed = 0 and r = 1 6: WHILE P consumed ≤ dr n,t 7: Find the TCL j with the rank of r 8: Turn on the TCL j 9: P consumed := P consumed + P m (j) 10: r := r + 1 11: Simulate the TCLs using (1), (2), and (4)

IV. CASE STUDY
A large-scale case study is presented in this section to demonstrate the performance of the proposed hierarchical model-free transactional control approach with ten LAs and 10,000 heterogenous building TCLs and 10,000 non-TCLs. Each LA includes 1000 TCL and 1000 non-TCL units. For the TCL units, a random set of numbers adding up to 1000 is considered for the numbers of residential HVAC, commercial HVAC, and WH units. For the non-TCL units, two prototype building models provided by the U.S. Department of Energy are considered [35]. The non-TCL units include 500 singlefamily and 500 medium office buildings.
The properties of the TCLs are summarized in Table 1. As shown in the table, some properties of the HVAC and WH  units are randomized within 20% range to further differentiate the LAs.
In the case study, the Stackelberg game is conducted hourly, while the control of the TCLs using MFC is conducted at 10-minute intervals. Accordingly, the case study includes three main steps. First, the hourly nominal power profiles are generated using the day-ahead forecasts of the external temperature, solar radiation, and hot water usage profiles, and the corresponding battery constraints are determined for each LA. Second, as a result of the Stackelberg game, an optimal power profile and a price signal are generated for each LA. In computing the optimal power profiles, four different scenarios are considered. Table 2 summarizes the scenarios considered for the optimization. The scenarios aim to capture the impacts of the pricing strategy (e.g., flat and TOU) and the peak load reduction weight (θ) on the resulting optimal loads. Third, the obtained power profiles are allocated across the different TCLs using the MFC every 10-minute time intervals.
A. PARAMATER SETTING Fig. 3 shows the disturbances, including weather conditions and hot water usage profiles that the LAs are subjected to. The nominal power profiles and the battery constraints of the LAs are computed accordingly and passed to the Stackelberg  game model. The outdoor weather conditions including temperature and solar radiation data are taken from the typical meteorological year 3 (TMY3) weather data of Las Vegas [36]. The hot water usage profile data are taken from a singlefamily detached house [37]. Prior to the case study, both outdoor weather conditions and hot water usage data profiles are converted into hourly intervals. Fig. 4 shows the hourly α values assumed for computing the satisfaction values of the LAs as per in (16). The higher the value of α is, the less conservative the LAs to their satisfaction. By altering the α values, the potential variability that can occur in the preferences of the LAs are taken into account. VOLUME 8, 2020 B. STACKELBERG GAME Fig. 5 shows a comparison of hourly nominal and optimal power profiles for the four scenarios. It is observed that the scenarios with flat pricing (i.e., Scenarios 1 and 2) are not successful in reducing the peak load. However, the scenarios with TOU pricing (i.e., Scenarios 3 and 4) achieve significant amount of peak load reduction. The peak loads are 82.51 MW and 71.92 MW for Scenarios 3 and 4, respectively, as compared to the peak loads of 108.20 MW and 95.69 MW for Scenarios 1 and 2, respectively. This shows that the proposed Stackelberg game is able to optimize the power profiles for TOU pricing by shifting loads from peak to off-peak hours, and that the weight of load reduction is useful in prioritizing or deprioritizing the objective of peak load reduction. Fig. 6 shows the electricity prices across the different scenarios and the marginal cost of electricity. Scenarios 3 and 4 set higher prices between 12:00 and 18:00 to reduce the peak load occurred during this period. Overall, the prices in Scenario 4 are higher than the prices in Scenario 3, because Scenario 4 gives more importance to peak load reduction and therefore sets more aggressive prices. Likewise, Scenario  1 sets a lower price than Scenario 2, because Scenario 1 gives less importance to peak load reduction. Table 3 compares the overall results for the DSO and LA 1. For all scenarios, TOU pricing achieves higher peak load reduction than flat pricing, which shows the effectiveness of the TOU pricing structure in DR. Scenarios 1 and 3 result in more profit but higher peak loads. This shows the tradeoff between profit and peak load reduction from the DSO perspective. Scenarios 2 and 4 cost less to LA 1, because these scenarios attach more importance to peak load reduction, and therefore the DSO incentivizes LAs more to modify their consumption. The results for the other nine LAs are similar and therefore not included in the table for the sake of conciseness. Fig. 7 shows the resulting temperature responses for the optimal load profile for LA 1 in Scenario 4. The MFC is able to maintain temperatures within the comfort bounds for the given reference power profile with no prior information about the HVAC and WH units. Similarly, temperature responses for all scenarios and LAs are within the bounds. For this reason, only the results shown in Fig. 7 are presented here.

C. TEMPERATURE RESPONSES
The case study also shows that MFC is a very computationally efficient algorithm and can be deployed for very small-time scales. Specifically, each iteration (time step) run for MFC requires 3 milliseconds to make control decisions using Matlab on a standard four-core personal computer, while it requires 600 milliseconds for the traditional modelpredictive control (MPC) [38]. Thus, MFC is 200 times faster. Such attribute makes MFC an ideal controller for realtime or near real-time DR applications.  In terms of power allocation and tracking, the MFC ensures the aggregate power consumption of LAs to closely follow the reference optimal power profiles. Fig. 8 shows the tracking performance of the MFC for LA 1 of Scenario 4. Please note that the MFC takes control after 20 minutes because MFC uses the previous 20 minutes of measurements before starting to make decisions. So, there is no control in the first 20 minutes of the simulation. Fig. 9 shows the changes in the SOC of the virtual battery in LA 1 for Scenario 4. The virtual battery is charged until 6:00 as the optimal load is greater than the nominal load. Consequently, residential and commercial buildings' indoor temperatures decrease, and WH temperatures increase. After that the optimal load is less than the nominal load and the virtual battery is discharged until 18:00. In this period, residential and commercial buildings' indoor temperatures increase, and WH temperatures decrease. Finally, the virtual battery is charged again until 00:00. As shown in the figure, the battery constraints are never violated and therefore the temperatures stay in bounds.

V. CONCLUSION
This paper proposed a scalable hierarchical model-free transactional control approach, which consists of upper and lower levels connected by LAs. In the upper level, the interaction between the DSO and LAs are designed as a virtual batteryintegrated Stackelberg game, where the DSO acts as a leader and the LAs are regarded as followers. In the defined game, the DSO determines electricity prices, while LAs modify their consumption in response to the given prices. In the lower level, the MFC is responsible for allocating sufficient amount of power to all end users to guarantee the comfort of end users and tracking of the aggregated load profiles determined in the upper level.
The effectiveness of this approach is demonstrated through a large-scale case study including 10,000 GEB. The results of the case study show that the proposed approach can achieve peak load reduction, profit raise, and satisfaction improvement for the DSO, while reducing the cost for LAs and increasing their satisfaction. Also, when the generated optimal load profile is allocated using the MFC, the resulting indoor temperatures stay within the comfort bounds. This shows the usefulness of integrating the virtual battery model into the Stackelberg game to ensure the thermal comfort of end users.
The results also show that the proposed approach is computationally efficient and scalable. For example, MFC requires only 3 milliseconds to make control decisions on a standard four-core personal computer for each iteration. Overall, the results show that the proposed approach has a great potential to be used for a DR program.
In their future work, the authors plan to further improve and validate this approach in real-life context using real data collected from real buildings.

ACKNOWLEDGMENT
This manuscript has been authored by UT-Battelle, LLC under Contract DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). He is also an Adjunct Associate Professor with the Electrical Engineering and Computer Science Department, The University of Tennessee. He has published more than 150 archival publications, including journals, conference proceedings, book chapters, and technical reports, in addition to numerous presentations at professional conferences and international symposia. His research interests include smart grid and smart buildings, smart grid communications and control, building-to-grid integration, cyber-physical systems, complex systems, wireless communications, 5G wireless networks, wireless sensor networks, wireless security, big data integration and analytics, artificial intelligence, statistical signal processing, and discrete-event simulation. He is a member of the Phi Kappa Phi Honor Society. He was a recipient of the Significant Event Award by ORNL, in 2007, the Mediterranean Conference on Intelligent Systems and Automation, in 2008, the best paper awards at the IEEE International Symposium on Power Electronics for Distributed Generation Systems, in 2018, and the Best Mentor Award by ORNL, in 2019.