Peer-to-Peer Energy Trading in Smart Energy Communities: A Lyapunov-Based Energy Control and Trading System

This paper studies the real-time energy trading problem in a smart community consisting of a group of grid-connected prosumers with controllable loads, renewable generations and energy storage systems. We propose a peer-to-peer (P2P) energy trading system, which integrates energy trading with energy management, enabling each prosumer to jointly manage its energy consumption, storage scheduling and energy trading in a dynamic manner for smart communities consisting of a group of grid-connected prosumers with controllable loads, renewable generations and energy storage systems. The proposed community-based P2P energy trading system combines an online energy control and trading algorithm with a double auction mechanism. The energy control and trading algorithm is designed based on the Lyapunov theory, allowing each prosumer to independently determine its bid in each time slot only based on its current energy supply condition, while the trading price, which is determined via the double auction mechanism, reflects the collective energy supply conditions of all prosumers participating in energy trading. The integration of the Lyapunov-based energy control and trading algorithm and the double auction mechanism yields a dynamic energy trading pricing mechanism that induces the prosumers to participate in energy trading in a coordinated manner by influencing the energy consumption, energy charging/discharging and energy trading decisions of the prosumers. Numerical simulation results demonstrate that energy exchange in the proposed scalable energy trading system yields significant improvements in terms of energy cost savings and renewable energy utilization efficiency, while ensuring the fair sharing of the benefits reaped from energy trading among the prosumers.

INDEX TERMS Demand side management, double auction, energy management, energy trading, Lyapunov optimization, peer-to-peer, smart grids.

LIST OF MAIN SYMBOLS t
Time slot index. D i (t) Prosumer i's served load demand. D i (t) Maximum energy demanded by prosumer i. D i (t) Minimum energy demanded by prosumer i that cannot be shed. δ i (t) Indicator of the sensitivity of prosumer i towards its energy consumption deviation D i (t) − D i (t).
i Upper bound of prosumer i on the long-term time-averaged load shedding ratio.
The associate editor coordinating the review of this manuscript and approving it for publication was Salvatore Favuzza .
g R i (t) Prosumer i's harvested renewable energy. g l i (t) Energy purchased from the utility grid by prosumer i that directly supplies its load. g s i (t) Energy purchased from the utility grid that is stored into prosumer i' ESS. g dis i (t) Total energy discharged from prosumer i' ESS. p (t) Unit energy price from the utility grid. p max Maximum unit energy price from the utility grid. p min Minimum unit energy price from the utility grid. p ask es,i (t) Ask price of prosumer i. p bid eb,i (t) Bid price of prosumer i. p ET (t) Unit energy trading price. g ET es,i (t) Energy sold by prosumer i in energy trading. g ET eb,i (t) Energy bought by prosumer i in energy trading.
Energy cost of prosumer i. c R i (t) Average per unit cost of prosumer i's energy available for trading.

I. INTRODUCTION
Peer-to-peer (P2P) energy trading enabling energy sharing among multiple interconnected distributed energy resource (DER) owners is envisaged to be a next-generation energy management mechanism for collaborative energy communities [1]- [5]. In P2P energy trading communities, proactive electricity consumers, so-called prosumers, with renewable energy production and storage capabilities, actively manage their production and consumption and trade their excess energy with other interested consumers at a relatively cheaper rate. The development of community-based P2P energy trading has the potential to benefit the prosumers in earning revenues, reducing electricity costs and improving returns on investments in distributed generation [1]. On the other hand, the utility grid can benefit from the lowered reliance of the prosumers on the main electricity grid [6]- [8], such as reducing peak demand, improving reliability, etc. In addition, increasing local consumption of renewable generation through energy trading in a community is more attractive than the conventional peer-to-grid (P2G) trading, which could lead to adverse impacts on the utility grid stability [9]. Nonetheless, it is challenging to design a proper energy management mechanism to motivate prosumers, who only aim to maximize their own benefits, to participate in energy trading so as to facilitate a sustainable and reliable balance between the generation and consumption of renewable energy within energy communities.
In this paper, we study the energy trading problem in a community with grid-connected prosumers in close proximity. Specifically, prosumers with excess renewable energy have to decide to store the extra energy into their energy storage systems (ESSs) or sell to other prosumers and at what prices, while prosumers with energy deficit have to decide whether to buy energy from other prosumers and at what prices, or to purchase energy from the utility grid. Due to the finite capacity of ESSs, all the charging/discharging actions are coupled across time. Inevitably, the inherent time-coupling feature of ESSs affects the energy trading and purchasing actions. Furthermore, the ESS scheduling, energy trading and energy purchasing decisions of a prosumer not only affect each other, but also impact those of other prosumers. In addition, the role of a prosumer in the P2P trading system varies according to the change in its energy supply condition due to its time-varying demand and renewable generation. A prosumer has to make energy control and trading decisions based on its current energy supply condition, while its time-varying energy supply condition is not only dependent on its own energy control and trading decisions but also affected by the energy trading decisions of other prosumers. Therefore, the energy trading problem and the energy control problem need to be jointly considered when making the energy control and energy trading decisions that are interrelated with one another over time.
A properly designed trading pricing mechanism that facilitates energy sharing among prosumers through financial incentives is important for the implementation of the P2P energy trading system. Auctions that handle situations where multiple buyers and sellers bid to exchange a designated good have been widely applied in P2P energy trading studies to model the interactions among autonomous self-interested prosumers. Intensive research and development efforts have been conducted in this regard. Various auction mechanisms presented for P2P energy trading in energy communities in literature can be broadly divided into two categories: (a) autonomous mode, where prosumers trade energy with each other directly under an internal energy price mechanism aiming to optimize their own benefits individually [10]; and (b) coordinated mode, where prosumers trade energy via a third-party entity that coordinates energy trading in a centralized or distributed way aiming to optimize the overall economic benefits of the trading system.
Most of the studies on autonomous energy trading have mainly focused on prosumer-centric interaction mechanisms for decentralized negotiation processes between prosumers and complicated decentralized iterative algorithms are developed to determine trading decisions with short-term optimization models [11]- [15]. A real-time P2P energy trading model with the goal of maximizing per-slot individual welfare is established in [11], [12] without fully exploiting the potential of ESSs in energy cost saving from a long-term point of view. In [13], [14], to minimize daily energy cost, a day-ahead optimal dispatch model is developed to determine hourly trading prices and optimal scheduling of DERs in advance. It is assumed that renewable generation and load demand are perfectly known a day-ahead, which is difficult to achieve in practice due to the stochastic nature of renewable generation and load demand.
On the other hand, the studies on coordinated energy trading have mainly focused on coordination methods for controlling DERs of prosumers [16]- [19] or inducing prosumers to reach optimal solutions (from the community perspective) by indirectly influencing prosumers' energy trading decisions and demand response actions via certain pricing signals [20]- [24]. Energy trading prices are simply defined based on supply-demand ratio (SDR) [18], [19] or mid-market rate (MMR) [21]- [23], which is the mid-value of the buying and selling prices set by the utility grid. However, without considering the utility maximization objective of prosumers, both SDR and MMR pricing mechanisms may not be able to financially incentivize self-interested prosumers to participate in P2P trading.
Given the inherent time-coupling feature of ESSs, randomness of renewable energy generation, arbitrary changes in load demand and time-varying energy consumption preferences of individual prosumers, an effective and systematic control strategy that jointly carries out energy control and energy trading while efficiently adapting the dynamic changes in such a P2P trading system, with the objective of maximizing the benefits of individual prosumers, is necessary to determine the above-mentioned interrelated decisions optimally.
Lyapunov optimization, a technique that provides a per-slot optimal solution with lower computation complexity for time-average stochastic optimization problems without requiring any knowledge of the probability distributions of the random event processes, has been widely applied in designing online energy management mechanisms, such as demand side management (DSM) and energy storage management [25]- [29]. There have been attempts employing the Lyapunov optimization techniques in studying joint optimization of energy control and energy trading. A Lyapunov-based online energy sharing method is proposed in [30], aiming to improve the self-sufficiency of the NGC without considering the economic benefits of individual nanogrids. A joint energy scheduling and trading algorithm based on Lyapunov optimization and a double-auction mechanism are proposed for multiple microgrids with multi-energy resources in [31], mainly focusing on the synergies among various energy systems.
In this paper, energy trading is integrated with DSM and energy storage management for energy cost optimization. The joint energy control and trading optimization problem in the presence of the randomness of renewable energy generation and arbitrary changes in load demand is exploited. The main contributions of this paper are as follows: • We propose an energy management framework that incorporates energy trading with demand response and energy storage management. Based on Lyapunov optimization, we develop a joint energy control and trading algorithm for each prosumer to individually and dynamically determine its energy trading parameters (how much energy to trade and at what price) along with its energy consumption and storage scheduling decisions only based on its current energy supply condition and energy consumption preference.
• We design a double auction based energy trading system. The scalable energy trading system combines a double auction mechanism with the Lyapunov-based energy control and trading algorithm, enabling the cost-minimizing bids of individual prosumers, which are decided based on the changes in their individual energy supply conditions over time, to contribute to the decisions of the final trading prices. The energy management actions of the prosumers are accordingly influenced by the final trading prices, which reflect the time-varying energy supply conditions of all prosumers participating in energy trading. Thus, the dynamic online double auction mechanism yields a dynamic energy trading pricing mechanism reflecting the changes in the energy supply state of the system over time. The dynamic price signal is essential to inducing the prosumers to participate in energy trading in a collaborative and coordinated manner by influencing the demand response and energy trading decisions of individual prosumers.
• In the distributed coordination energy trading system, where the bid of each prosumer is private and independent of the bids from other prosumers, the double auction mechanism determines the trading price with little global information, while ensuring the truthfulness of the ask and buy prices that the prosumers submit in energy trading and guaranteeing the participating prosumers economically benefit from energy trading. The rest of the paper is organized as follows: Section II briefly discusses the related work. A joint energy control and trading system model is presented in Section III. In Section IV, a double auction based online P2P energy trading system is developed, where a Lyapunov-based online energy control and trading algorithm is integrated with a double auction mechanism. Simulation evaluations are presented in Section V. Finally, Section VI provides concluding remarks.
Notations: The main symbols used in this paper are summarized in List of Main Symbols.

II. BACKGROUND ON COMMUNITY-BASED P2P ENERGY TRADING AND RELATED WORK
Energy trading has been viewed as an effective solution that allows prosumers to trade their surplus renewable generation within their local community market, improves self-consumption and self-sufficiency of local renewable generation, and reduces energy costs. Various bidding policies have been applied to develop P2P energy trading mechanisms and approaches for community-based energy trading. Most of the studies on autonomous P2P energy trading focus on different prosumer-centric interaction mechanisms for negotiation processes between prosumers. In autonomous mode based P2P energy sharing systems, intensive sensing and communication infrastructures are required for direct information exchange between prosumers and complicated decentralized iterative algorithms are required to determine trading prices and optimal schedules of DERs. In [11], the authors proposed a game-theoretic real-time P2P energy trading model for small community microgrids with PVs and ESSs, where the seller selection competition among buyers is modeled as an evolutionary game and the direct negotiating interaction between the sellers and buyers is modeled as an M-leader and N-follower Stackelberg game. Two iterative algorithms were used to find the equilibrium states of the games. The iterative frameworks are subjected to divergence concerns and require intensive communications and computation for energy price bidding. Two computationally efficient mechanisms were proposed to construct a stable grand coalition of prosumers and optimize the operation of their ES units cooperatively at each time interval in [12]. A two-stage optimization strategy with two iterative algorithms was proposed based on the Nash bargaining theory for day-ahead energy trading and scheduling in [13]. The authors in [14] proposed a multiagent based framework, where multiple autonomous agents interact, negotiate and cooperate with each other to achieve their individual objectives. An auction based energy trading mechanism, where sellers provide bids by announcing their available capacities and linear cost models, and passive buyers announce the amount to purchase, was designed to allow agents to dispatch their DERs with a day-ahead dispatch schedule.
Compared to the autonomous mode of energy trading, which involves direct negotiation processes, the coordinated mode requires simpler communication systems (only bi-communication links between the central coordinator and prosumers are required) and less data processing. However, well-designed incentive mechanisms are required to encourage prosumers to participate in energy sharing while ensuring the fairness of energy sharing.
Under centralized coordination [16]- [20], while presumers trade energy with others via a third party, each presumer directly controls and manages its DER. Hence, direct control is extensively studied. For instance, in [17], the authors integrated DSM with P2P energy trading and proposed a centralized coordinated model to schedule the loads and DERs of smart homes within the connected community, aiming to maximize the benefit of all participants in P2P trading. The authors in [18] proposed a time-ahead internal pricing model, where the internal trading prices are defined as a function of the feed-in tariff of the utility grid and the SDRs of PV prosumers, to allow PV prosumers to carry out price-based demand response after the energy trading prices are set. An energy sharing provider (ESP) was proposed to coordinate the trading processes among prosumers and a distributed iterative algorithm was developed to solve the optimization problem of energy sharing and demand response. However, the iterative pricing process affected by the demand response participation level might not converge. A variant of the SDR method was used in [19], where an energy storage (ES)equipped ESP facilitates energy trading among neighboring PV prosumers to ensure cost fairness among the prosumers. Similarly, an MMR trading pricing scheme was used in [20], where a centralized P2P energy trading model was proposed based on cooperative game theory to encourage all peers to form a grand coalition by maximizing the total social welfare of the coalition.
Under distributed coordination [21]- [24], prosumers trade energy with others and manage their loads and DERs via a third party, e.g., a coordinator, which usually influences prosumers' energy trading decisions and demand response actions indirectly via energy trading price signals [32]. The distributed coordination model combines the features of the autonomous mode and the centralized coordination model. Hence, it provides a higher level of privacy and autonomy for prosumers and delivers a higher level of flexibility as compared to the centralized coordination model, while the behaviors of prosumers can be better coordinated to improve social welfare as compared to the autonomous model.
One of the core issues of P2P energy trading under distributed coordination is pricing mechanisms for energy trading. In [21], with an objective to improve the participation of prosumers, the authors developed a coalition game based peer-to-peer trading approach employing a rule-based MMR pricing scheme, in which the trading (buying and selling) prices are set based on the grid selling and buying prices depending on the difference between the total surplus energy and the load demands of the prosumers. Similarly, a coalition formation game model was designed utilizing the MMR as a pricing mechanism for P2P trading to form a grand coalition in [22]. The authors proposed a multi-cluster deep reinforcement learning approach in [23] to motivate households to engage in P2P trading through an incentive-driven market mechanism based on the MMR pricing scheme to attract prosumers to participate in energy trading. Despite the simplicity of implementation of the MMR pricing scheme, the resulting energy trading and control decisions may not be utilitymaximizing. In [33], the authors proposed a game-theoretic based energy trading framework, which combines a double auction mechanism with a non-cooperative game allowing a number of storage unit owners to strategically and individually decide the amount of stored energy to sell to a number of buyers who need a certain amount of energy without considering DSM. An iterative algorithm, under which the sellers can reach a Nash equilibrium point, was designed to solve the non-cooperative game in which the reservation prices/bids of the seller/buyers are predefined.
In the online P2P energy trading system proposed in this paper, where a double auction mechanism is combined with a Lyapunov based energy control and trading algorithm, each prosumer dynamically decides its cost-minimizing bidding prices and quantities based on the changes in its energy supply condition over time. The cost-minimizing energy management actions of each prosumer are accordingly influenced by the final trading prices, which reflect the time-varying energy supply conditions of all prosumers participating in energy trading. The resulting dynamic energy trading pricing mechanism provides a simple tool to enable efficient distributed coordination of self-interested prosumers.

III. SYSTEM MODEL
In this paper, we consider a smart community with I = {1, 2, . . . , I } prosumers in close proximity, which are interconnected to each other through bi-directional power links and connected to the utility grid. The prosumers, each of which is equipped with a small-scale renewable energy system and a finite capacity ESS to store energy for future use, can trade energy with each other via an auctioneer, which manages double auctions among the prosumers. Each prosumer independently determines and submits its bid (the price and amount of energy to sell/buy) to the auctioneer, who computes the trading price based on all the submitted bids using the proposed double auction mechanism. Thus, only bi-directional communications between the auctioneer and prosumers are required for auction in the energy trading system. A cloud infrastructure can be a suitable platform to implement the proposed double auction mechanism. As illustrated in Fig.1, the bids submitted by the prosumers are processed in a cloud using the proposed double auction mechanism and the resulting trading price and quantities are sent to the respective prosumers. The traded energy can be used to serve a prosumer's loads and/or stored into its ESS. The power system operates in slotted time t ∈ {0, 1, . . . , T − 1}.

A. LOAD DEMAND AND SERVING
The time-varying load demand of each prosumer can be served with energy harvested from its own renewable energy generator, drawn from its ESS, exchanged from other prosumers, and/or purchased from the utility grid. Note that all power quantities are in the unit of energy per time slot in this paper. We consider a DSM strategy, where flexible loads can be shed in response to supply conditions. Thus, prosumer i' load that is served in time slot t, D i (t), is bounded by: where D i (t) is the maximum energy demanded by prosumer i in time slot t, i.e., the most preferred energy consumption of prosumer i, and D i (t) is the minimum power demanded by prosumer i in time slot t that cannot be shed. Note that D i (t) and D i (t) are the demand requests decided by prosumer i based on its energy consumption preference. If a prosumer refuses load shedding in time slot t, the D i (t) and D i (t) will be the same. The maximum and minimum demand requests of each prosumer in each time slot are assumed to be stochastic. However, load shedding used for cost saving may cause discomfort to the prosumers. Discomfort experienced by prosumer i can be represented by a discomfort cost function, which is given by where the weighted coefficient δ i (t) is a positive constant that represents the sensitivity of prosumer i towards the power consumption deviation D i (t) − D i (t) in time slot i: the higher the value of δ i (t), the more sensitive the prosumer i towards the power consumption deviation. Meanwhile, in order to control the quality-of-service (QoS) [28] for each prosumer, an upper bound is imposed on the long-term time-averaged load shedding ratio (the ratio of the shed elastic loads to the elastic loads), which can be formally expressed by [34] lim where is the total elastic load demand that can be shed in time slot t, and i ∈ (0, 1] is a pre-designed threshold for controlling the QoS. The threshold i reflects the tolerance of prosumer i to the energy consumption deviation. A smaller i indicates a tighter QoS control. Note that δ i (t) and i are decided by prosumer i based on its energy consumption preference and δ i (t) could vary over time in a stochastic manner. Prosumer i' harvested renewable energy in time slot t is denoted by g R i . We assume a priority of using the harvested renewable energy g R i (t) to directly supply D i (t) and consider the following two cases: i.e., energy deficit, all the harvested renewable energy is used to serve load and the residual, can be served by discharging energy, g dis−D i (t), from its own ESS; buying energy, g eb−D i (t), from other prosumers; purchasing energy, g l i (t), from the utility company in case the energy drawn from its ESS and brought from other prosumers is insufficient. Thus, a balance between purchasing energy and discharging energy must be struck under the following feasibility condition: i.e., energy surplus, prosumer i can store the excess renewable energy into its own ESS. Let g ch−R i (t) denote the amount of excess renewable energy charged into prosumer i's ESS in time slot t; sell the excess renewable energy to other prosumers. Let g es−R i (t) denote the amount of excess renewable energy sold to other prosumers by prosumer i in time slot t. We then have Note that, due to the finite storage capacity, a portion of the excess renewable energy could be curtailed if there is not enough storage space.

B. ENERGY STORAGE DYNAMICS
In each time slot, prosumer i can store its own extra renewable energy, g ch−R i (t), the energy bought through energy trading, g eb−E i (t), and/or the energy purchased from the utility company, g s i (t), into its ESS. Each prosumer can then draw the stored energy from its ESS to serve its loads, g dis−D i (t), and/or sell to other prosumers, g es−E i (t). We now consider the energy storage model of the ESS at each prosumer.
In practice, energy conversion losses occur during the charging and discharging processes. Denote S i (t) as the energy state of prosumer i's ESS, i.e., state of charge (SoC), at the beginning of time slot t, which evolves as follows: are the charging and discharging efficiency coefficients of prosumer i's ESS, respectively, and g ch (t) are the total charging and discharging amounts in time slot t, respectively.
Note that, energy charging and discharging should not happen simultaneously, i.e., Due to limitation imposed by charging and discharging circuits, the amount of energy that can be charged/discharged into/from prosumer i's ESS is upper bounded. The maximum charging and discharging rates of prosumer i's ESS are denoted by R ch i and R dis i , respectively. We have 0 ≤ g ch Charging an ESS near its capacity or discharging it close to zero will significantly reduce its lifetime [35]. Thus, the SoC of prosumer i's ESS in time slot t is bounded by where S min i and S max i are the preferred energy lower and upper bounds respectively.
Combining (6), (8) and (9), in time slot t, the amounts of charging and discharging energy are bounded by the capacity constraint and energy availability constraint, which can be compactly expressed by Note that all feasible control decisions on charging and discharging energy must ensure that both the capacity constraint in (10).a and the energy-availability constraint in (10).b are satisfied for all time.

IV. ONLINE P2P ENERGY TRADING SYSTEM A. ENERGY CONTROL AND TRADING ALGORITHM BASED ON LYAPUNOV OPTIMIZATION 1) ENERGY COST MINIMIZATION OF INDIVIDUAL PROSUMERS
In each time slot, each prosumer can purchase energy from the utility company at the unit price p(t), p min ≤ p(t) ≤ p max , which is time-varying, to supply its load and/or store into its battery to take advantage of price variations. In addition, each prosumer can trade energy with other prosumers at the buying or selling price p eb i (t) or p es i (t). To encourage energy trading among proumers so as to reduce conventional energy purchase from the utility company, the buying and selling prices are capped by p(t), i.e., p es i (t) ≤ p(t) and p eb i (t) ≤ p(t). Thus, the energy cost of prosumer i in time slot t consists of the cost incurred for energy purchase from the utility company, the expense/revenue incurred/generated in energy trading with other prosumers, and the discomfort costs of load shedding, which is given by The objective of each prosumer is to minimize its longterm time-averaged energy cost subject to the time-varying renewable energy generation and load demand along with the operational constraints of its ESS, by jointly managing energy purchasing, energy trading and energy charging/discharging actions. Thus, the strategy set of prosumer i can be denoted as Then the optimization problem of prosumer i is to find a control strategy that determines the optimal strategy set based on its current renewable energy generation, load demand and SoC of its ESS in each time slot to minimize its time-averaged energy cost, which can be formulated as the following stochastic control optimization problem, called P1, where E{·} is taken with respect to prosumer i's energy supply . We assume statistics of g R i (t), D i (t) and D i (t) are unknown and their dynamics are arbitrary. Taking into account the system dynamics, the stochastic optimization problem P1 seeks control decisions for the whole process. However, the control actions Y i (t) that are correlated over time due to the time-coupling constraints make P1 a particularly challenging problem to solve.

2) ONLINE ENERGY CONTROL AND TRADING ALGORITHM DESIGN
The Lyapunov optimization theory [36] provides simple online solutions based on the current information of the system state as opposed to approaches like Markov decision processes and dynamic programming, which require statistical information of the random variables for forecasting future information and suffer from high computational complexity. In this paper, the Lyapunov drift optimization theory is applied to solve the time-coupling optimization problem P1.
Employing the concept of one-slot look-ahead queue stability to handle the time-coupling constraints through successive problem relaxation and transformation, the Lyapunov based optimization algorithm determines the control vector Y i (t) for each prosumer in each time slot based only on its current energy supply state X i (t), without requiring any statistical knowledge of its renewable energy generation and load demand.
In general, for complex dynamic systems, time-averaged constraints are transformed into queue stability constraints and simple real-time algorithms can be constructed based on the virtual queues to achieve system optimization using the Lyapunov optimization theory. However, the constraint in (9) couples the charging and discharging decisions across time slots, making the standard Lyapunov optimization technique inapplicable directly to the problem P1. To overcome such time-coupling, the constraint (6) can be relaxed to the following soft constraint: Instead of bounding the energy state, S i (t), in each time slot, (13) maintains the stability of the mean rate of the effective charging and discharging amounts in the whole process. The derivation of (13) follows the framework of Lyapunov optimization [36] and is given in our previous work [37].
In the relaxed problem P2, the dependency of per time slot control decisions on the battery state is removed, so that the standard Lyapunov optimization techniques can be applied to tackle P2. As in [37], we now introduce a virtual energy queue E i (t) and a QoS-control load queue Q i (t) to transform the time-averaged constraints (13) and (3) in P2 into constraints with queue stability. The virtual energy queue is defined as E i (t) = S i (t) − θ i , where θ i is a perturbation parameter that can be designed to guarantee the energy state constraint in (9) is satisfied. The dynamics of E i (t) is given by . (15) The QoS-control load queue is defined as where the arrival rate is the shedding percentage and the departure rate is i . To ensure the QoS-control load queue Q i (t) to be stable, the time-averaged load shedding percentage must be less than or equal to i . Hence, maintaining the stability of Q i (t) is equivalent to keeping the constraint (3) satisfied [36]. We then define i (t) [E i (t), Q i (t)] as the concatenated vector of the virtual queues and a Lyapunov function associated with i (t) as L i ( i (t)) . In a decision making algorithm minimizing a drift of the quadratic Lyapunov function of E i (t), keeping the quadratic Lyapunov function small pushes the value of S i (t) towards θ i . Hence, carefully choosing the value of the perturbation parameter will ensure the battery queue always lies in the feasible region.
Define the conditional one-slot Lyapunov drift, which represents the expected change in the Lyapunov function from one time slot to the next, as follows: where the expectation is taken with respect to all the random processes associated with the energy supply state, , given the current virtual queue states of E i (t) and Q i (t).
We now incorporate a weighted version of the timeaveraged energy cost into the Lyapunov drift and obtain the following drift-plus-penalty expression: where the Lyapunov drift in the first term represents the stability of the virtual energy queue, and V i in the second item serves as a weight controlling the performance tradeoff between minimizing the queueing delay and minimizing the energy cost. A larger V i indicates a greater priority to minimize the energy cost at the expense of a greater size of the virtual energy queue and vice versa.
Based on the drift-plus-penalty minimization method [36], the control decisions are chosen to minimize the upper bound on the drift-plus-penalty expression, which is given in Lemma 1, to jointly maintain the stability of the virtual energy queue and minimize the time-averaged energy cost of prosumer i. Lemma 1: For any possible control decision, the driftplus-penalty expression for all t is upper bounded by: The energy control and trading algorithm is then constructed: in each time slot t, the control decision Y i (t) of each prosumer i is determined based on its current virtual queue state [E i (t), Q i (t)] and energy state X i (t) by solving the following linear programming problem P3: s.t. (4) (5) (7) (8) (15) (16).
Selling energy in energy trading will reduce a prosumer's instantaneous energy cost in a time slot. However, an optimal solution to P3 could lead to a situation, in which a prosumer sells too much energy at relatively lower energy trading prices to other prosumers, so that it has to purchase more energy from the utility company to serve its own load demand later on. To address the over-selling problem, the accumulated energy gap between prosumer i's stored energy available for trading and discharged energy for load serving is bounded: In time slot t, if EG i (t) < φ i , prosumer i will not sell its stored energy in energy trading, i.e., g es−E i (t) = 0.

B. DOUBLE AUCTION MECHANISM
Due to the intermittent renewable energy generation, timevarying electricity demand and finite energy storage capacity, a prosumer might be unable to meet its load demand with only its own DER. In this respect, the prosumers with energy deficit can acquire energy from other prosumers who are willing to sell their extra renewable energy or stored energy in their ESSs. In each time slot, each prosumer determines its ask/bid price and the amount of energy to sell/buy, with which it is willing to participate in energy trade with other prosumers, by solving the optimization problem P3 based only on its current energy supply state. The prosumers who are willing to trade energy with others report their ask/bid prices, which are given in Lemma 2, to the auctioneer who facilitates the prosumers to determine the trading price and the amounts of energy to trade using a double-auction mechanism based on the strategy-proof double auction schemes proposed in [33], [38]. Lemma 2: In time slot t, the ask/bid price p ask es,i (t)/p bid eb,i (t) of prosumer i is given by where c R i (t) is the average per unit cost of prosumer i's stored energy available for trading in time slot t, which refers to the total cost of stored energy available for trading divided by the amount of stored energy available for trading, i.e., , where c DER i is the per unit cost of prosumer i's stored renewable energy, which reflects prosumer i's capital costs and maintenance costs of its renewable energy generator and ESS. Proof: See Appendix B. In time slot t, assume M prosumers, referred to as potential sellers, submit their offers about the amounts of energy they want to sell and at what prices, and N prosumers, referred to as potential buyers, submit their bids about the amounts of energy they seek to acquire and at what prices. The M potential sellers and N potential buyers form a P2P energy trading market. Note that M and N vary in each time slot and could be zero. The auctioneer first sorts the offer prices in increasing order and the bid prices in decreasing order as follows: p ask es,1 (t) < p ask es,2 (t) < . . . < p ask es,m (t) < . . . < p ask es,M (t) p bid eb,1 (t) > p bid eb,2 (t) > . . . > p bid eb,n (t) > . . . > p bid eb,N (t), to determine the ask and bid intersection point, which corresponds to a seller K and a buyer L with p ask es,K (t) > p bid eb,L (t) and p ask es,K −1 (t) ≤ p bid eb,L−1 (t). Once the intersection point is identified, it implies that the sellers K = {1, 2, . . . , K −1} and the buyers L = {1, 2, . . . , L − 1} will participate in energy trading and the trading price can be selected within the interval [p ask es,K (t), p bid eb,L (t)] [39]. As in [33], the K −1 participating sellers and L − 1 participating buyers will exchange energy at a trading price given by p ET (t) = p ask es,K (t) + p bid eb,L (t) 2 .
Note that p ask es,K (t) < p ET (t) < p bid eb,L (t). Given the trading price in (22), the participating sellers K and buyers L redetermine the amounts of energy to sell and buy, g adj es,k (t) and g adj eb,l (t), respectively, by solving the optimization problem P3. Then in order to match the supply and demand, the market clearing scheme developed in [38] is adopted to decide the amount of energy traded between each of the K − 1 participating sellers and L − 1 participating buyers, which is given in the following two rules, such that the total energy demand and supply will balance while ensuring a strategy-proof double auction.
i.e., the total supply exceeds the total demand, the amount of energy traded by each participating seller k ∈ K and each participating buyer l ∈ L is given by • Rule 2: if K −1 k=1 g adj es,k (t) < L−1 l=1 g adj eb,l (t), i.e., the total demand exceeds the total supply, the amount of energy traded by each participating seller k ∈ K and each participating buyer l ∈ L is given by As elaborated in [38], using Rule 1 or Rule 2 to balance the demand and supply among the participating sellers and buyers, we have Lemma 3: In each time slot, no prosumer participating in the energy trading auction benefits by deviating from its truthful offer price p ask es,i (t) or bid price p bid eb,i (t) given in (20) and (21), and the double auction is strategy-proof.
Proof: See Appendix C. After balancing the demand and supply, the participating sellers and buyers decide their optimal control actions Y i (t) using the finalized trading price, p ET (t), given in (22), and the quantities of energy to trade, g ET es,k (t)/g ET eb,l (t), obtained by (23)/ (24).
The online P2P energy trading system is summarized in Fig.2. In summary, by transforming the original stochastic control optimization problem P1 into the linear programming problem P3, the Lyapunov based energy control and trading algorithm provides a low-complexity alternative for each prosumer to independently determine its energy control decisions along with energy trading decisions by solving its optimization problem P3 on a per slot basis with all information obtained locally or through simple bi-communication between each prosumer and the auctioneer, without requiring any statistical information of the system. The integration of the energy control algorithm with the double auction mechanism allows each prosumer to determine its energy management decisions as a response to its energy supply condition as well as the collective energy supply condition of other prosumers, which is reflected through the energy trading price. The proposed energy trading framework is relatively simple to be implemented and can cope with an arbitrary number of prosumers while preserving the privacy of the prosumers.

C. PERFORMANCE ANALYSIS
Since the time-coupling constraint (6) is replaced with the time-average constraint (13), the solution to P3 might not be feasible for P1. It is shown in the following Lemma that the boundedness of the energy states (8) in P1 can be satisfied by appropriately designing the perturbation parameter, θ i and the control parameter, V i , so that the solution to P3 meets all constraints of P1. Thus, the control decisions Y(t) derived from P3 are a feasible set of P1.
Lemma 4: Set the perturbation parameter θ as 42924 VOLUME 10, 2022 where Then, under the energy control and trading algorithm, we have 1) In each time slot t, i.e., the control decision Y i (t) derived from P3 is feasible to P1.
2) The resulting time-averaged cost under the proposed energy control and trading algorithm by solving P3, . Proof: See Appendix D. Lemma 4.1 indicates that the control decisions Y i (t) derived by the proposed energy control and trading algorithm are a feasible set of P1. Lemma 4.2 further characterizes the gap between the expected time-averaged energy cost achieved by P3 and the optimal energy cost of P1, which implies that this performance gap can be minimized by setting the control parameter V as V max

V. NUMERICAL SIMULATION A. SIMULATION SETUP
In order to evaluate the effectiveness of the proposed energy trading system, a residential microgrid consisting of 10 interconnected prosumers with solar systems is considered. The prosumers are classified into 3 types: Type I with low electricity consumption and PV generation capacity, Type II with medium electricity consumption and PV generation capacity, and Type III with high electricity consumption and PV generation capacity. The prosumers in the same type produce a similar amount of solar energy in each time slot and the load demand profiles of different prosumers resulting from the operation of various appliances vary temporally, as shown in Fig.3. The time-varying energy consumption of household appliances is simulated using the appliance demand profile generator (ADPG) developed in [37] to synthesize the variability in the load demands among prosumers at different times of day. For each prosumer, the total load demand generated by the ADPG in each time slot is used as the most preferred load request D i (t), while the inelastic load D i (t) is randomly set from [0.3D i (t), 0.9D i (t)]. The QoS related parameters δ i (t) and i are randomly selected from [1.8, 3.8] and [0.4, 0.9], respectively. The values of the parameters δ i (t) are chosen to ensure the weighted discomfort cost is comparable to the energy cost in the objective function of the optimization problem P1, so that both energy cost and discomfort cost are active factors in the optimization problem. We randomly generate 10 prosumers: 3 Type I prosumers with 26.30kWh of average load demand and 8.51kWh of average solar generation per day, 3 Type II prosumers with 31.97kWh of average load demand and 14.83kWh of average solar generation per day, and 4 Type III prosumers with 39.11kWh of average load demand and 29.40kWh of average solar generation per day. Note that there are just slight differences in the load profiles of different types of prosumers, while the solar generation profiles of different types of prosumers differ considerably. For the sake of easy comparison, the average monthly solar generations and load demands of individual prosumers are listed in Table 1. In addition, the corresponding average monthly costs without any DSM mechanism are listed in Table 1 as lower benchmarks.
Each prosumer is equipped with a battery with S max i listed in Table 1 Table 1. The simulation is performed for a duration of 90 days with T = 2160 and the Time-of-Use tariff of Johannesburg city power, in which the peak, standard and off-peak energy prices are R1.7268, R1.3660 and R1.0746 per kWh, respectively, is used in the simulation. Fig.4 and Fig.5 illustrate how the trading price and the traded energy are influenced by the energy supply condition of the system and the utility grid prices. As can be observed, the spikes in the trading prices coincide with a) the drops in the stored energy; and b) the rise in the energy deficit. In other words, the P2P trading price reflects the changes in the energy supply condition of the system: the more energy available for trading, the lower the trading price, which encourages local energy trading and consumption, as shown in Fig.5. In contrast, when less energy is available for trading, the trading price rises, which in turn induces the prosumers to adjust their energy consumption in response to the higher trading price. Additionally, the trading prices, which are capped by the utility grid prices, fluctuate based on the utility grid prices.

B. SIMULATION RESULTS AND ANALYSIS
The traded energy can be classified into the following four cases: 1. the sold PV generation is used to serve buyers' loads directly; 2. the sold PV generation is stored into buyers' ESSs; 3. the sold energy from sellers' ESSs is used to serve buyers' loads; 4. the sold energy from sellers' ESSs is stored into buyers' ESSs. It can be observed in Fig.5 that, while the incidence of case 1 is much lower as net demand patterns among prosumers are relatively similar, a large portion of surplus energy production is traded in case 2 taking advantage of the increased flexibility brought by the ESSs. Particularly, more energy is traded in case 2 in response to the higher utility grid prices. In the case of the stored energy, a large portion of energy brought at relatively cheaper prices in energy trading is used for load-serving (case 3) to reduce energy costs of buyers. Since charging and discharging can not happen simultaneously, under the proposed energy control algorithm, a prosumer could sell its stored energy only in the case of energy deficit, where the energy available for discharging is allocated for load-serving first. Hence, just a smaller portion of the stored energy is sold in case 4 when the grid prices are peaked, so that both sellers and buyers benefit from energy trading.
To verify the effectiveness of the proposed energy trading mechanism, comparisons are drawn with the scenario without energy sharing, where each prosumer operates independently under a similar Lyapunov based energy control algorithm and does not share energy with each other. Fig.6 compares the real time energy storage scheduling and energy purchasing actions with and without energy trading. As shown in Fig.6, since stored energy in ESSs can be traded between prosumers, the proposed energy trading mechanism allows more surplus solar generation to be stored into the ESSs, which in turn reduces the storage space available for the prosumers to store energy purchased from the utility company. In addition, under the proposed energy trading mechanism, in addition to its stored energy, a prosumer with energy deficit is able to  serve its load with energy bought via energy trading, thereby purchasing less energy from the utility company, as shown in Fig.5.
Although the role of each prosumer in energy trading dynamically changes with its net-demand, it can be observed in Fig.7(b) that, in energy trading, the Type I prosumers, with low ratios of PV generation to demand, buy more energy than what they sell, whereas the Type III prosumers, who produce more solar energy, sell more energy. In addition, the energy traded by a prosumer is dependent not only on its energy generation and storage capacities but also on other prosumers' real-time net demands. Especially, prosumer 1, with a larger storage capacity, buys more energy than other Type I prosumers, while prosumer 7, with a lower ratio of PV generation to demand, sells more energy than other Type III prosumers. Fig.7(c) further demonstrates that, the prosumers with lower energy generation reduce energy purchases from the utility company after buying more energy via energy trad- ing, while the prosumers selling more energy via energy trading are able to keep enough storage space to store their surplus solar energy with almost zero solar generation curtailment as shown in Table 1, which in turn lowers their reliance on the utility grid. As illustrated in Fig.7(c), thanks to energy trading, the energy that individual prosumer purchases from the utility grid is reduced by 2.03%-15.67%, and the total energy purchased from the utility company is reduced by 8.17%, compared to the scenario without energy sharing. Accordingly, the monthly energy consumption cost of each prosumer decreases by 4.80%-15.93%, as shown in Fig.7(d).
To evaluate the fairness of the proposed energy trading mechanism, a similar energy trading mechanism without an energy selling control scheme is compared. As shown in Fig.7, without a scheme to control energy selling in energy trading, the Type III prosumers with more energy available for trading sell energy as much as possible, so they have to purchase 5.83%-9.41% more energy at higher prices from the utility company to serve their loads, which leads to 0.77%-3.89% increase in their energy costs compared to the proposed energy trading mechanism. In contrast, due to the incorporation of constraint (19), the proposed energy control and trading algorithm ensures the Type III prosumers, who VOLUME 10, 2022 contribute more energy, benefit more (11.84%-15.93% cost reduction compared to the scenario without energy sharing) in energy trading compared to others, as can be observed in Fig.7(d).
According to Table 1, compared with the lower benchmark case, the independent energy control algorithm reduces the monthly energy costs of the prosumers by 29.83%-54.47% with 6.82%-9.62% load demands being shed and 0.05%-17.46% solar production being curtailed. Under the proposed energy trading framework, each prosumer achieves a further 2.03%-15.67% cost reduction with a slightly lower demand shedding rate (6.53%-9.62%) and near-zero solar generation curtailment (0-3.69%).

VI. CONCLUSION
This paper studies the real time energy trading problem in smart energy communities and presents a double auction based energy trading system that integrates energy consumption management, energy storage control and energy trading, aiming to minimize the long-term time-averaged costs of individual prosumers while maintaining customer comfort. Based on the Lyapunov theory, we propose an online energy control and trading algorithm, under which each prosumer optimizes its energy trading decision along with its energy consumption management and storage charging/discharging decisions in an independent manner without requiring any statistical knowledge of the system. Since the double auction mechanism enables all prosumers participating in energy trading to contribute to the decisions of the trading prices, under the proposed energy control and trading algorithm, the energy management decisions of each prosumer depend not only on its own energy supply condition, but also on the energy supply conditions of others. Numerical evaluations provide a more comprehensive insight into the interactions among the self-interested prosumers with various energy generation and storage capacities and diverse load demand profiles. Simulation results show that, compared to the scenario without energy sharing, energy exchange via energy trading reduces the energy costs of individual prosumers and improves the utilization efficiency of local renewable generation, while ensuring the fair sharing of the benefits reaped from energy trading. Since the proposed online energy control and trading algorithm allows each prosumer to independently control its DER and determine its bid on a per slot basis, prosumers are able to freely join or leave the proposed scalable energy trading system anytime without increasing computational complexity.

Proof of Lemma 1:
According to the definition of L i ( i (t)), Based on the queue update rule in (15), we have Similarly, based on the queue update rule of Q i (t) in (16), Applying inequalities (30) and (31) to (29), taking the conditional expectation over L i ( i (t + 1)) − L i ( i (t)) given i (t) and adding the penalty term V i E{C i (t)} yield the upper bound in (17).

Proof of Lemma 2
We first rearrange the optimization problem P3 to P4 : min We consider the following two cases: • Energy Deficit: when g R i (t) < D i (t), according to (4), we have g l (t) = 0 and g es−R i (t) = 0. Then, the optimization problem P4 can be written as follows: As can been seen is the average per unit cost of prosumer i's energy available for trading. In case Since energy charging and discharging can not happen simultaneously, prosumer i is not able to buy energy via energy trading to store into its battery, i.e., g eb−E (t) = 0. Then, the optimization problem P4 can be written as follows: As can been seen In case E i (t) ≥ 0, we have g ch−R i (t) = 0, i.e., prosumer i's surplus renewable energy is not able to be stored into its battery. Prosumer i tends to increase g es−R i (t) by choosing the lowest possible ask price, i.e., the average per unit cost of prosumer i's energy available for trading, VOLUME 10, 2022 c R i (t), to reduce its energy cost and avoid the waste of renewable energy. On the other hand, in case E i (t) < 0, prosumer i tends to increase g ch−R i (t) and g es−R i (t) with any ask price p ask es,i (t) ≥ c R i (t). To encourage energy trading among prosumers to reduce conventional energy purchase from the utility company, the ask price is set as the lowest possible price: p ask es,i (t) = c R i (t).

Proof of Lemma 3
Prosumers are assumed to be rational so that each prosumer chooses a strategy that minimizes its energy cost. The proof of strategy-proofness is the same as Vickrey's argument. In the proposed double-auction mechanism, in each time slot, the ask/bid price submitted by prosumer i, p ask es,i (t)/p bid eb,i (t), is its reservation price that minimizes its energy cost, and the trading price p ET (t) determined by the double-auction mechanism is the actual energy trading price. Suppose prosumer i submits ξ p bid eb,i (t) as its bid price. In case of p bid eb,i (t) > p ET (t), i.e., prosumer i is supposed to win the trade, overbidding, i.e., ξ > 1, results in the same benefit as if it bids p bid eb,i (t). On the other hand, underbidding, i.e., ξ < 1, may cause it to lose the trade. Even it wins the trades with ξ p bid eb,i (t), it gets the same benefit as if it bids p bid eb,i (t). In case of p bid eb,i (t) < p ET (t), overbidding may lead to negative benefit if prosumer i's bid is included in the final trade, or zero benefit, which is the same as if it bids p bid eb,i (t), if it's bid is not included. Conversely, underbidding just leads to zero benefit, which is the same as if it bids p bid eb,i (t). For the same reasons, each prosumer who intends to sell its energy will report its true reservation price.

Proof of Lemma 4
Proof of Lemma 4.1: The per-slot problem P3 includes all constraints of the original problem P1 except for the energy state constraint. Hence, to prove the solution derived from P3 are feasible to P1 is to show the energy state of prosumer i, S i (t), is bounded within [S min i , S max i ]. The proof proceeds by induction. First, it is obvious that the lower and upper bounds hold for t = 0. We now suppose that S min (t), g l * i (t), g s * i (t) and D * i (t) be the optimal solution to (33) and (34). It is noticed that D * i (t) does not directly affect the virtual energy queue E i (t). Hence, D * i (t) can be treated as a given load when determining how to schedule prosumer i's battery charge/discharge. We now study the energy deficit and energy surplus cases separately.
• Energy Deficit: We prove the upper and lower bounds considering the following cases: -Case 1: E i (t) ≥ −V i p es i (t)/η dis i , as p es i (t) < p(t), we have g s * i (t) = 0 and 0 < g es−E * i (t) + g dis−D * i (t) ≤ R dis i . Based on the update equation (15), we have E i (t + 1) < E i (t) ≤ S max i − θ i ; In addition, as p es i (t) < p(t) < p max , we have E i (t) > −V i p max /η dis i . Then, we get On the other hand, as E i (t) < 0, similar to Case 2, we have E i (t + 1) < η ch i R ch i ≤ S max i − θ i .

Proof of Lemma 4.2:
The proof of the performance boundary follows the performance result derivation in the Lyapunov optimization framework and is similar to that of our previous work. Interested readers may refer to [37] for details.
HAILING ZHU received the M.Ing. degree in electrical and electronic engineering and the Ph.D. degree in engineering management from the University of Johannesburg, South Africa. She is currently a Research Fellow with the Department of Electrical and Electronic Engineering Science, University of Johannesburg. Her research interests include game theory, resource allocation, queuing theory, network economics, system optimization, and energy management. Her research is in the broad area of game theoretical applications in communication systems and modeling and optimization of communication networks. ADNAN M. ABU-MAHFOUZ (Senior Member, IEEE) received the M.Eng. and Ph.D. degrees in computer engineering from the University of Pretoria. He is currently a Chief Researcher and the Centre Manager of the Emerging Digital Technologies for 4IR (EDT4IR) Research Centre, Council for Scientific and Industrial Research (CSIR), an Extraordinary Professor at the University of Pretoria, a Professor Extraordinaire at the Tshwane University of Technology, and a Visiting Professor at the University of Johannesburg. He participated in the formulation of many large and multidisciplinary research and development successful proposals (as a principal investigator or main author/contributor). He is the Founder of the Smart Networks collaboration initiative that aims to develop efficient and secure networks for the future smart systems, such as smart cities, smart grid, and smart water grid. His research interests include wireless sensor and actuator networks, low power wide area networks, software-defined wireless sensor networks, cognitive radio, network security, network management, and sensor/actuator node development. He is a member of many IEEE technical communities. He is a Section Editorin-Chief at the Journal of Sensor and Actuator Networks and an Associate Editor of IEEE ACCESS, IEEE INTERNET OF THINGS JOURNAL, and the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS.