Transmission Loss-Aware Peer-to-Peer Energy Trading in Networked Microgrids

Networked microgrids (MGs) have a great potential to improve the efficiency, reliability, resilience, security, and sustainability of power supply services. Peer-to-peer (P2P) energy trading built on a smart information system in networked MGs is an emerging economic approach to facilitate energy sharing among networked MGs to achieve mutual cost-effective operation and improve the reliability and stability of energy supply service. Such a distributed and competitive energy trading market urges the need for an efficient energy trading strategy that incentivizes the self-interested MGs with various energy production and consumption profiles to participate in energy trading. In this paper, we propose a distributed real-time P2P energy trading strategy that integrates energy trading into energy management and enables the MGs with renewable energy sources (RESs) and energy storage systems (ESSs) to manage their storage scheduling, energy supply, and energy trading in a dynamic manner, jointly considering the randomness of renewable energy generation and load demand, operational constraints of ESSs and transmission losses associated with energy exchange. The proposed energy control and bidding algorithm allows each MG to dynamically and independently determine its energy control actions and price-quantity bids/offers, while the proposed trading pair matching algorithm matches the MGs on a many-to-many basis with respect to their individual payoffs, which couple price-quantity bids/offers of the MGs with distance-dependent energy transmission losses associated with energy exchange. Numerical simulation results demonstrate that the proposed distributed energy trading system yields significant improvements in terms of energy cost savings and renewable energy utilization efficiency, while reducing energy transmission losses within the system.


I. INTRODUCTION
Renewable energy sources (RESs), e.g., photovoltaic (PV) arrays and wind turbines, have been seen as a feasible solution for energy scarcity and environmental problems due to the increasing power demands. Microgrid (MG) framework, which is defined as a local distribution system that integrates RES-based distributed generators (RDGs) and energy storage systems (ESSs) to serve different local loads (domestic, industrial and commercial) at medium and low voltage levels, The associate editor coordinating the review of this manuscript and approving it for publication was Jamshid Aghaei .
has been conceived as one of the critical components in the distribution side of smart grid [1]. The decentralized MGs help relieve the burden on the main grid and enhance local reliability and reduce power losses in distribution networks [2]. Unfortunately, the fluctuation and uncertainty of renewable energy generation could result in a significant temporal mismatch between energy supply and demand in an MG, thus posing challenges to the operation and control of MGs. In addition, the non-dispatchable and distributed characteristics of renewable energy pose technical and economic challenges in effectively integrating MGs with time and weather-dependent RESs and limited-capacity ESSs into traditional power grids [3], [4].
The advancements in information and communications technologies as well as embedded systems, have led to the emergence of the internet of things (IoT). In smart grids, where MGs play a key role, the bidirectional communication capacities through interconnections of all components enable advanced IoT technologies to monitor, control and coordinate smart devices in real time, thus providing dynamic energy management infrastructure and delivering new energy management capabilities to MGs. In such an IoT-aided intelligent energy system, the integration of energy and information technologies facilitates real-time interactions among MGs, e.g., energy exchange, with enhanced interaction capabilities [5], [6].
The advanced concept of networked MGs has emerged, in which several adjacent MGs are interconnected to form a distribution network so that MGs can share energy mutually, taking advantage of diverse supply and demand patterns in different MGs, thus improving self-consumption of local RESs [7], [8]. The clustered arrangement of MGs as networked MGs allows MGs to balance demand with production in a more flexible and cost-efficient manner through mutual energy sharing among neighboring MGs, thus achieving mutual cost-effective operation, improving the reliability and stability of energy supply service and smoothing the incorporation of distributed generation into power systems [9], [10], [11].
Peer-to-peer (P2P) energy trading among the geographically correlated MGs has been regarded as a promising economic approach to facilitate energy exchange among networked MGs in a decentralized way to achieve energy supply reliability and economic benefits, leveraging advanced information and communication technologies [12], [13], [14]. The distributed structure of networked MGs, which are subject to technical constraints and limitations, creates a competitive energy trading market. Thus, it is challenging to develop an appropriate P2P trading strategy, which is capable of coordinating and motivating MGs, who only aim to maximize their own benefits, to participate in energy trading, thus ensuring an applicable, continuous and sustainable operation of P2P energy trading where a central controller does not influence a lot on the decision of participants [15]. Furthermore, decentralized energy management methods are required in power system operation and control to accommodate the decentralized characteristics of P2P energy trading.
In this paper, we study the P2P energy trading problem in a smart distribution network consisting of a group of networked MGs, each serving a group of users with random and time-dependent demands and operating in gridconnected mode. In such a P2P energy trading market, each MG faces the problem of balancing energy generation, supply and trading in the presence of the randomness of renewable energy generation and arbitrary changes in load demand. While demand-supply balancing is a priority for the MGs, maximizing economic benefits in power dispatch is another control objective. Since the finite capacity of an ESS renders the energy control actions coupling over time, the storage scheduling, energy trading and energy purchasing decisions of an MG are interrelated with one another over time. The interrelated energy control and trading decisions of an MG also impact those of other MGs through energy exchange. Therefore, the energy trading and energy control problems need to be jointly considered. In addition, any energy transfer between MGs in energy trading is associated with power losses in the network lines [3], which impact trading benefits and decisions of MGs. It is essential to properly integrate power losses into a P2P energy trading strategy to ensure the economic benefits of MGs in the P2P market.
A properly designed trading mechanism that motivates networked MGs to participate in energy trading through financial benefits is important for the implementation of a P2P energy trading system, where MGs engage directly in bilateral energy transactions. Considering the decentralized characteristics of networked MGs and P2P energy trading, various approaches based on game theory and auction, which have been considered as viable techniques to model the interactions and negotiations between selfish participants, to incentivize local energy trading between MGs are presented in [16], [17], [18], and [19]. In these approaches, where the response of a seller depends on the response of buyers and vice-versa, complicated decentralized iterative algorithms are developed to obtain per-slot equilibrium strategies that maximize the individual benefit of each participant involved in energy trading or the social welfare of all participants. Energy dispatch management problems of individual MGs are not considered when determining their bidding prices.
To incorporate the optimal energy management problem of MGs into the P2P energy trading framework design, hierarchical approaches have been proposed, where the local energy management problem at the lower level is integrated with the high-level P2P trading problem [20], [21], [22], [23], [24]. Short-term optimization models are developed to determine day-ahead optimal energy bidding and scheduling decisions at the lower level. Day-ahead trading schemes, such as auction-based methods, are designed at the upper level to determine intraday clearing solutions for the next day. However, it is difficult to accurately predict day-ahead renewable generation and energy demand, especially energy demand. A slight variation between day-ahead forecast and real-time information (e.g., hour-ahead forecast) could significantly impact the implementation of day-ahead optimization strategies, thereby affecting the effectiveness of energy trading systems. As a result, complicated mechanisms are considered to deal with the deviations from day-ahead optimal energy scheduling plans during intraday trading [20], [22].
To tackle unknown arbitrary dynamics of renewable generation and demand, Lyapunov optimization [25], a technique that provides a per-slot optimal solution with lower computation complexity for time-average stochastic optimization problems without requiring any knowledge of the probability distributions of the random event processes, has been widely applied in designing online energy management mechanisms, such as demand side energy management and energy storage management [26], [27], [28], [29], [30]. Most of these studies primarily focus on real-time coordination between distributed RESs and ESSs without considering direct energy exchange. In this paper, we adopt the Lyapunov optimization technique to design an online energy control algorithm that jointly considers the energy dispatch and trading problems.
Trading pair matching, the core process in P2P energy trading, determines the efficiency of energy trading and how individual MGs benefit from energy trading. An important component for trading pair matching is an efficient mechanism allowing participants to negotiate mutually beneficial transactions, given their bidding prices and quantities. In addition, power losses occurring due to energy trading need to be considered when determining trading pairs. Matching theory, an approach to providing low complexity and tractable solutions for the combinatorial problem of matching players from two distinct sets while considering the preference of each player, has emerged as a promising technique for resource management, such as wireless resource allocation [31]. Different from typical game theory based auction schemes, in which each player has to determine its own best responses based on other players' actions through frequent information exchange during the converging process, matching theory based approaches characterize interactions between heterogeneous players using their preferences that can handle heterogeneous and complex considerations related to their individual objectives, thereby providing an efficient and scalable means to reaching a two-sided stable matching that achieves stability and optimality. However, most existing matching models are designed for exchange markets with indivisible goods, whereas the P2P energy trading problem could be a many-to-many matching problem, where at least one player within each of the two sets could be matched to more than one member in the other set.
In this paper, we develop a distributed real-time energy trading system for a smart energy system, where multiple interconnected MGs with RESs and ESSs trade energy with each other aiming to minimize their individual operational costs, including the energy provision costs, the energy transmission costs and the operational costs of ESSs. The main contributions of this paper are as follows: • Based on Lyapunov optimization, we design a joint real-time energy control and bidding algorithm for such a time-varying P2P energy trading system with high uncertainty to allow each MG to determine its energy control and bidding decisions in a dynamic manner only based on its current energy supply condition.
• Taking into consideration individual payoffs of involved MGs yielded from energy trading, we design a distributed many-to-many pair matching mechanism based on matching theory to facilitate the MGs to reach a stable match, which is individually beneficial to them. To capture important characteristics particular to P2P energy trading, the payoff preferences of MGs are subject to several relevant factors, including energy transmission line losses, bidding prices and quantities.
• In the proposed distributed energy trading system, where the integration of the Lyapunov-based energy control algorithm with the matching theory based trading pair matching mechanism allows each MG to independently determine its energy control and trading decisions on a per slot basis with all information that can be obtained locally or through simple communication, MGs are able to freely join or leave the proposed energy trading system anytime without increasing computational complexity. Thus, the proposed energy trading system is scalable, which makes it applicable in real systems.
The rest of the paper is organized as follows: Section II briefly discusses the related work. A joint energy control and trading system model is presented in Section III. In Section IV, an online P2P energy trading system is developed, where a Lyapunov-based online energy control and bidding algorithm is integrated with a pair matching mechanism. Section V presents simulation evaluations. Finally, concluding remarks are provided in Section VI. Several studies have been conducted to employ the Lyapunov optimization techniques in joint optimization of energy control and energy trading/sharing. In [32], a Lyapunov-based online energy sharing method was proposed to improve the self-sufficiency of a nanogrid cluster. However, the economic benefits of individual nanogrids are not considered. A joint energy control and trading system is developed for smart communities in [33], where a Lyapunov-based online energy control and trading algorithm is combined with a double auction mechanism, assuming transmission losses due to energy exchange are negligible within energy communities in close proximity. The proposed double auction mechanism clears the energy trading market at the equilibrium price where the quantity demanded equals the quantity offered. Trading pair matching is not considered in the uniform-price auction mechanism. Matching theory has been applied in developing distributed P2P energy trading schemes [18], [34], [35], which mainly focus on interactions between agents without considering energy dispatch management problems of individual agents when determining their bid/offer prices. For instance, in the cooperative electric vehicle (EV)-to-EV charging based energy management protocol proposed in [34], the trading price in the one-to-one matching model is set as the mean between the buying and selling prices of the power grid. In [35], an iterative price-negotiation mechanism is proposed to search for the equilibrium trading price by adjusting the seller/buyer price in each iteration. Additionally, the aforementioned literature paid little attention to energy transmission losses associated with energy exchange.

LIST OF MAIN SYMBOLS
Power losses occurring due to energy trading in the network directly impact the market outcome. Some energyexchanging algorithms have been proposed based on coalition formation games to facilitate cooperative local power exchange, taking into consideration power losses in networked MGs [3], [10], [36], [37]. Assuming a priority of choosing geographically closer MGs to exchange energy first and the same energy trading price for all MGs, the proposed cooperative energy trading schemes mainly focus on reducing power losses in the distribution network without considering MGs' incentives to cooperate in energy trading. To improve individual utilities of MGs, the authors in [18] proposed a coalition-based energy trading algorithm where a second-price sealed-bid auction based matching algorithm is employed and incentives are designed for the coalitional operation of networked MGs. The bidding price of an MG is set based on the market prices of the main grid. In [17], taking transmission losses and wheeling costs into account, the authors propose an energy trading framework using a credit rating based multi-leader multi-follower game model. An iterative best response algorithm is designed to search equilibrium strategies for each time slot by adjusting the bidding prices without considering the impact of the time coupling constraint of ESSs on MGs' energy scheduling and bidding prices. A similar iterative algorithm is employed to realize the coordination among sellers and buyers in the decentralized P2P energy trading market clearing mechanism considering power losses and network fees in [19].
This paper proposes a real-time P2P energy trading system that integrates a Lyapunov-based energy control and trading algorithm with a matching theory based trading pairing mechanism. Each MG, as an independent entity with its individual objective, independently controls and dispatches its energy resource, considering the randomness of renewable energy output and arbitrary changes in energy demand along with the operational constraints of its ESS and energy transmission losses caused by P2P energy trading and transfer.

III. SYSTEM MODEL
In this paper, we consider a smart distribution system consisting of I = {1, 2, . . . , I } MGs that are interconnected to each other through bi-directional power links and connected to the utility grid through a distribution substation (DS), as illustrated in Fig.1. Each MG typically contains a RDG, e.g., wind and solar power generators, a finite capacity ESS and electrical loads. The MGs can trade energy with each other with the assistance of a virtual trading agent (VTA), which manages information sharing between MGs and facilitates to clear the local P2P energy trading market in the virtual layer. Thus, only bi-directional communications between the VTA and MGs are required in the energy trading system. Note that the proposed P2P energy trading system can be implemented in a full P2P mode, where the MGs directly interact with each other through bi-directional communication links and energy trading pairs can be matched without the assistance of a VTA. The power system operates in slotted time t ∈ {0, 1, . . . , T − 1}.

A. LOAD DEMAND AND SERVING
In time slot t, MG i serves a set of users whose aggregate load demand is D i (t) and its harvested renewable energy is denoted by g R i (t). Note that all power quantities are in the unit of energy per time slot in this paper. We assume a priority of using the harvested renewable energy g R i (t) to directly supply time-varying load demand D i (t) and consider the following two cases: i.e., energy deficit, all the harvested renewable energy is used to serve the load demand and the residual, D i (t) − g R i (t), can be served by discharging energy, g dis,D i (t), from its own ESS; buying energy, g eb,D i (t), from other MGs via energy trading; purchasing energy, g u i (t), from the utility company when the energy drawn from its ESS and brought from other MGs is insufficient. Thus, a balance between purchasing energy and discharging energy must be struck under the following feasibility condition: We assume a priority of discharging energy from its own ESS to serve the residual.
i.e., energy surplus, MG i can store the excess renewable energy into its own ESS.
Let g ch,R i (t) denote the amount of excess renewable energy charged into its ESS by MG i in time slot t; sell the excess renewable energy to other MGs.
Let g es,R i (t) denote the amount of excess renewable energy sold to other MGs. Due to the finite storage capacity, a portion of the excess renewable energy could be curtailed if there is not enough storage space. We then have

B. ENERGY STORAGE
In time slot t, each MG with energy surplus can store its own extra renewable energy generation, g ch,R i (t), and/or energy bought through energy trading with other MGs, g eb,E i (t), into its ESS. An MG can then draw the stored energy from its ESS to serve its load and/or sell to other MGs. Let g dis,D i (t) denote the amount of energy discharged by MG i in time slot t to supply its load and g es,E i (t) denote the amount of energy discharged by MG i to sell to other MGs. Considering the relatively high cost of energy provision due to charge/discharge losses and depreciation linked with lifetime degradation, making effective use of ESSs is critical [38]. We now consider the energy model of the ESS at each MG.
In practice, energy conversion losses occur during the charging and discharging processes. Denote S i (t) as the energy state of MG i's ESS, i.e., state of charge (SoC), at the beginning of time slot t, which evolves as follows: where η ch i ∈ (0, 1] and η dis i ∈ [1, ∞) are the charging and discharging efficiency coefficients of MG i's ESS, respectively, and g ch are the total charging and discharging amounts in time slot t, respectively. Note that, energy charging and discharging should not happen simultaneously, i.e., Due to limitations imposed by the charging and discharging circuits, the amount of energy that can be charged/discharged into/from MG i's ESS is upper bounded. The maximum charging and discharging rates of MG i's ESS are denoted by R ch,i and R dis,i , respectively. We have Charging an ESS near its capacity or discharging it close to zero will significantly reduce its lifetime [39]. Thus, the SoC of MG i's ESS in time slot t is bounded by where S min i and S max i are the preferred energy lower and upper bounds, respectively.

IV. ONLINE P2P ENERGY TRADING ALGORITHM A. ENERGY PROVISIONING COST MINIMIZATION OF INDIVIDUAL MGs
The operational cost of each MG comprises energy procurement and battery degradation costs. As described in Section III-A, in each time slot, each MG can purchase energy from the utility company at the unit price p u (t), p u min ≤ p u (t) ≤ p u max , which is time-varying, to supply its loads. In addition, each MG can trade energy with other MGs at the buying/selling price p eb . Hence, the energy procurement cost of MG i in time slot t consists of the cost incurred for energy purchase from the utility company and the expense/revenue incurred/generated in energy trading with other MGs, which is given by (7) where g eb are the total amounts of energy bought and sold by MG i in energy trading in time slot t. Thus, the time average energy procurement cost is defined by Frequent charging/discharging activities cause battery degradation, which shortens battery lifetime [40].
denote the net amount of battery charging and discharging in time slot t. Based on (4) and (5), In practice, faster/deeper charging/discharging generally has a more detrimental effect on the battery lifetime. To model the cost of charging/discharging activities that cause battery degradation, we define the degradation cost function, Q i (·), as a function of the time-average net charging/discharging amount, which is defined by The battery degradation cost function Q i (·) is assumed to be a continuous, strictly convex and increasing function over The objective of each MG is to minimize its long-term time-averaged operational cost subject to its time varying renewable energy generation and load demand along with the operational constraints of its ESS, by jointly managing energy purchasing, energy trading and energy charging/ discharging actions. Denote the control action set (strategy set) of MG i by . Then the optimization problem of MG i is to find a control strategy that determines the optimal strategy set based on its current state , p u (t)] to minimize its time-averaged operational cost, which can be formulated as the following stochastic control optimization problem, called P1, We assume that statistical information of g R i (t) and D i (t) is unknown and their dynamics to be arbitrary. Taking into account the system dynamics, the stochastic optimization problem P1 seeks control decisions for the whole process. However, the control actions Y i (t) that are correlated over time due to the time-coupling constraints make P1 a particularly challenging problem to solve.

B. REAL-TIME ENERGY CONTROL BASED ON LYAPUNOV OPTIMIZATION
In this section, we use the idea of Lyapunov optimization [25] to solve the time-coupling optimization problem P1.
Employing the concept of one-slot look-ahead queue stability to handle the time-coupling constraints through successive problem relaxation and transformation, we propose a Lyapunov based optimization method that determines the control vector Y i (t) for each MG in each time slot based only on its current system state X i (t), without requiring any statistical knowledge of its renewable energy generation and load demand.

1) PROBLEM MODIFICATION AND TRANSFORMATION
The constraint in (6), which couples the charging and discharging decisions across time slots, makes the standard Lyapunov optimization technique directly inapplicable to problem P1. To overcome such time-coupling, similar to the technique used in our previous work [41], instead of the finite battery capacity constraint (6), we impose the following soft constraint: The derivation of (9) follows the framework of Lyapunov optimization [25] and is given in our previous work [41]. Accordingly, P1 is relaxed to the following problem: where the dependency of per time slot control decisions on the battery state is removed. The degradation cost function Q i (λ i ), which is defined as a function of the time-average expectation λ i , does not conform to the structure required for the standard Lyapunov optimization technique. As in [42], to transform P2 into an optimization problem involving only time-averaged functions, we first introduce an auxiliary variable γ i (t), which is bounded within the same range as i , i.e., Additionally, the time average expectation γ i lim T →∞ ) and adding the constraints (11) and (12) associated with γ i (t), we then transform P2 into the following problem Following the general arguments of Lyapunov optimization [25], P3 is equivalent to P2, which can be proven as follows: Let C * i,P2 and C * i,P3 be the resulting minimum costs of P2 and P3, respectively. Note that any optimal solution of P2 satisfies all constraints of P3 with the same value of the cost objective. Thus, On the other hand, VOLUME 10, 2022 by Jensen's inequality and convexity of Q(·), for any solution of P3, we have Q(λ i ) = Q(γ i ) ≤ Q i (γ i (t)), which implies that C * i,P3 ≥ C * i,P2 . Therefore, P3 is equivalent to P2. The transformed problem P3 involves only time averages, rather than time-averaged functions, in the objective, so that the standard Lyapunov optimization techniques can be applied to design a real-time energy control policy to tackle P3.

2) VIRTUAL QUEUES
We now introduce two virtual queues E i (t) and K i (t) to transform the time-averaged constraints (9) and (12) in P3 into constraints with queue stability, respectively, as follows: turbation parameter that can be designed to guarantee the energy state constraint in (3) is satisfied. The dynamics of E i (t) is given by . (14) • Virtual net charge queue Note that both E i (t) and K i (t) are all associated with the battery charging/discharging activities.

3) REAL-TIME ENERGY CONTROL ALGORITHM
denote the virtual queue vector. Consequently, we define a Lyapunov function associated with the virtual energy queues i (t) as follows: L i ( i (t)) 1 2 (E i (t) 2 + K i (t) 2 ), which represents a scalar measure of stored energy. In a decision making algorithm minimizing the quadratic Lyapunov function of i (t), keeping L i ( i (t)) small pushes all virtual queues small, therefore pushing the value of S i (t) towards θ i . Hence, carefully choosing the value of the perturbation parameter will ensure the battery queue always lies in the feasible region.
Define the conditional one-slot Lyapunov drift, which represents the expected change in the Lyapunov function from one time slot to the next, as follows: where the expectation is taken over the randomness of its system state X i (t), given the current virtual queue state i (t).
We now incorporate a weighted version of the timeaveraged energy provisioning cost into the Lyapunov drift and obtain the following drift-plus-penalty expression: where the time-averaged constraints and the objective function in P3 are jointly considered. The Lyapunov drift in the first term represents the stability of the virtual queues, while V in the second item serves as a weight controlling the performance tradeoff between minimizing the queueing delay and minimizing the operational cost.
Based on the drift-plus-penalty minimization method [25], the control decisions are chosen to minimize the upper bound on the drift-plus-penalty expression, which is given in Lemma 1, to jointly maintain the stability of the virtual queues and minimize the time-averaged energy cost of MG i. Lemma 1: For any possible control decision, the driftplus-penalty expression for all t is upper bounded by: The energy control algorithm is then constructed: in each time slot t, the control decision Y i (t) of each MG i is determined based on its current virtual queue state i (t) and system state X i (t) by solving the following linear programming problem P4

C. PRICE-QUANTITY BID/OFFER
In each time slot, each MG determines its offer/bid price and quantity of energy to sell/buy, with which it is willing to participate in energy trade with other MGs, by solving the optimization problem P4 based only on its current state. The MGs that are willing to trade energy with others report their pricequantity offers/bids, {p ask i (t)/p bid i (t), g ask i (t)/g bid i (t)}, which are given in Lemma 2, to the VTA, which facilitates matching the buyers to the sellers aiming to minimize the energy transmission losses during P2P trading. Note that energy generation and storage of each MG is associated with a cost due to its initial investments, operation and maintenance costs. This cost varies from one MG to another depending on the geographical location, weather, the method of power generation, and the types of RDGs and ESSs, etc.. Therefore, the levelized cost of MG i's renewable energy generation and storage differs from others. The variation in the cost of renewable energy has an impact on trading decisions of the MGs.
Lemma 2: In time slot t, the offer/bid price p ask i (t)/p bid i (t) of MG i is given by where [a] + max(a, 0) and c R i (t) is the average per unit cost of MG i's energy available for trading, which refers to the total cost of energy available for trading divided by the amount of energy available for trading, where Q i (ξ (τ )) is the battery degradation cost of traded energy ξ (τ )

τ )| and L R i is the levelized cost of MG i's renewable energy generation and storage, which reflects MG i's capital and maintenance costs of its RDG and ESS.
Proof: See Appendix B.

D. PAIR MATCHING ALGORITHM
In time slot t, assume M MGs, referred to as sellers, submit their price-quantity offers and N MGs, referred to as buyers, submit their price-quantity bids. The potential sellers in M and potential buyers in N form a P2P energy trading market. Note that M and N vary in each time slot and could be zero. In this section, we investigate the trading pair matching problem between the potential sellers and buyers.
We first introduce some basic concepts of stable matching theory [35], which are the basis of our algorithm.
Definition 1: Each MG on one side (buyer or seller set) has preferences over the MGs on the other side, which can be represented by a rank order list.
In the energy trading pair matching problem, each MG aims to maximize its payoff through energy trading. Therefore, based on the payoffs of MGs as buyers and sellers, we define the preference relation for seller m and buyer n as follows: • Seller m prefers buyer n to buyer n if π ask mn > π ask mn , where m ∈ M, n, n ∈ N and n = n ; • Buyer n prefers seller m to buyer n if π bid nm > π bid nm , where m, m ∈ M, n ∈ N and m = m ; Definition 2: In a matching ω, if two MGs are not matched with each other but prefer each other over their paired MGs through the matching, such a pair is called a blocking pair for matching ω. Note that matching ω is unstable because the blocking pair would prefer to deviate from the matching and pair with each other.
Definition 3: A matching ω is said to be two-sided stable if and only if there is no blocking pair.
According to the above stable matching definitions, we now define the payoff that an MG receives from energy trading. In the local low-medium voltage MG network, any energy transfer between an MG and the DS or between two MGs is accompanied with transmission losses over the distribution line. In this paper, we restrict our attention to transmission losses associated with energy transfer inside the MG network and do not consider transmission losses between the utility grid and the MG network.
In general, energy transfer between the DS and the MGs is done at a medium voltage U 0 , while energy transfer between MGs is done at a low-to-medium voltage U M , which is smaller than U 0 . Transferring energy e i between MG i and the DS incurs a transmission loss q loss i0 , which is given by [43] q loss i0 = where R i0 is the resistance of the distribution line between MG i and the DS and β is the fraction of energy lost in the transformer at the DS. Thus, in time slot t, to ensure MG i receives g u i (t), the actual amount of energy that MG i acquires from the UG through the DS, g r,u i0 (t), is given by a solution to the following equation: In P2P energy trading, the energy lost in the distribution lines during the local power transfer between a seller MG and a buyer MG is also given by (20) with β = 0, since the local energy transfer between MGs yields no transformer losses. Then, when seller m sells energy g ask m (t) to buyer n, considering the incurred transmission loss, the actual amount of energy transferred from seller m to buyer n is given by where R mn is the resistance of the distribution line between seller m and buyer n. In addition, the actual amount of energy that buyer n requires from seller m to ensure it receives g bid n (t) is given by a solution to the following equation For a seller-buyer pair {m, n}, let the transaction price be the mean between their offer/bid prices, i.e., p ET mn (t) = 1 2 (p ask m (t) + p bid n (t)).
Consequently, buyer n's expected payoff from trading with seller m, which is defined as the expense incurred in energy trading, is given by π bid nm (t) = −p ET nm (t) min(g r,es nm (t), g bid n (t)), and seller m's expected payoff from trading with buyer n, which is defined as the revenue earned in energy trading, is given by π ask mn (t) = p ET mn (t) min(g ask m (t), g r,eb mn (t)).
We now design a many-to-many matching algorithm, where each MG can trade with multiple MGs simultaneously VOLUME 10, 2022 in each time slot, to form a stable matching between buyers and sellers based on their mutual payoff preferences to maximize their individual payoffs, while ensuring fairness among participants. Under the pair matching algorithm, the matching process follows the mutual preferences between buyers and sellers based on their payoffs yielded from energy trading as follows: • Each buyer with non-zero bid energy establishes its payoff preference list by calculating its expected payoff from trading with each seller with an ask price lower than its bid price using (24), and sorts the expected payoffs in decreasing order. Compare the expected payoffs with the most preferred seller m * , i.e., the seller in its own first order, with the expected payoff from acquiring the same amount of energy, g acq n0 (t) = min(g bid n (t), g r,es nm * (t)), from the utility company via the DS, which is given by where g r,acq n0 (t) is obtained using (21). If π acq n0 (t) > π bid nm * (t), the buyer will not participate in P2P energy trading. Otherwise, the buyer submits a matching offer to its most preferred seller m * .
• A seller who only receives a matching offer from a buyer pairs with the buyer. A seller who receives matching offers from more than one buyer establishes its payoff preference list by calculating its expected payoff from trading with each buyer who has proposed to it using (25), sorts the expected payoffs in decreasing order, and pairs with the buyer in its own first order.
• For a matched seller-buyer pair {m * , n * }, if seller m * can satisfy buyer n * 's energy need, buyer n * is paired with seller m * and its available bid energy is updated with g bid n * (t) = 0. Meanwhile, seller m * 's available ask energy is updated with g ask m * (t) = g ask m * (t) − g r,eb n * m * (t). -Otherwise, buyer n * buys as much energy as possible from seller m * and updates its available bid energy with g bid n * (t) = g bid n * (t) − g r,es m * n * (t). Meanwhile, seller m * is paired with buyer n * and its available ask energy is updated with g ask m * (t) = 0. The matching process is repeated until all buyers in N have satisfied their energy needs or there is no available ask energy from sellers in M. Note that, since ask/bid energy is divisible, one seller/buyer could be paired with more than one buyer/seller.
According to the procedure of the proposed pair matching algorithm, each seller/buyer only needs to collect pricequantity offers/bids and the resistances of the distribution lines to establish its payoff preferences. Then the sellers/ buyers are able to take actions, i.e., proposing to their most preferred buyer/seller or deciding to accept or reject the received proposal(s), in an independent manner. Therefore, the proposed pair matching algorithm can be implemented in a distributed way.

E. PERFORMANCE ANALYSIS
Since the time-coupling constraint (3) is replaced with the time-average constraint (9), the solution to P4 might not be feasible to P1. In the following Lemma, we show that the boundedness of the energy states (5) in P1 can be satisfied by appropriately designing the perturbation parameter θ i and the control parameter V i , i.e., the solution to P4 satisfies all constraints of P1. Thus, the control decisions Y(t) derived from P4 are a feasible set of P1.
The performance of the algorithm P4 is analyzed with respect to the original problem P1 Lemma 3: Set the perturbation parameter θ i as where Then, under the energy control algorithm, we have 1) In each time slot t, i.e., the control decision Y i (t) derived from P4 is feasible to P1.
2) The resulting time-averaged cost under the proposed algorithm by solving P4, C * i,P4 , is within bound B i /V i of the optimal cost of P1, C * i,P1 , i.e., where B i . Proof: See Appendix C. Lemma 3.2 characterizes the gap between the expected time-averaged cost achieved by the proposed algorithm P4 and the optimal cost of the original problem P1, which implies that, setting the control parameter V i as By transforming the original problem P1 into the linear programming problem P4, the proposed algorithm provides a low-complexity alternative, which achieves sub-optimal performance, without requiring any statistical information of the system. It can easily cope with an arbitrary number of MGs with different levels of demand. With all information that can be obtained locally or through simple communication, each MG can independently determine its energy control and trading decisions avoiding disclosure of private information.

V. NUMERICAL SIMULATION
In order to evaluate the performance of the proposed energy trading model, we set up a distribution network of 10 interconnected MGs that are randomly deployed within a square  of 50km×50km with the DS located at the center, as illustrated in Fig.2(a). The resistance between any two nodes R = 0.2 /km and the transformer loss β = 0.02, respectively. The voltages U 0 and U M are set to 50 kV and 22 kV, respectively. Each MG includes a photovoltaic (PV) system and an ESS with charging and discharging efficiencies of η ch i = 0.8 and η dis i = 1.25, respectively, corresponding to 80% efficiency for both charging and discharging for the ESS. For simplicity's sake, we assume that S min , and set the initial battery energy level as 0.5S max i . The degradation cost function of an ESS is assumed to be a quadratic function Q i (x) = 0.01x 2 [42]. For the purpose of simple illustration, we choose the same degradation battery cost function for all MGs. The simulation is performed for a duration of 90 days with time resolution T = 4320 and the Time-of-Use tariff of Johannesburg City Power, in which the peak, standard and off-peak energy prices are R2.0019, R1.5072 and R1.1586 per kWh, respectively, is used in the simulation.
We randomly generate 10 MGs, 4 Type I with low energy generation, 3 Type II with medium energy generation and 3 Type III with high energy generation. The PV systems of the MGs in the same type generate a similar amount of renewable energy everyday. The daily solar energy generation of each MG is then converted into hourly solar energy generation. Similarly, the stochastic energy demand profile of each MG is simulated using the appliance demand profile generator developed in [41] to synthesize the variability in load demand at different times of day. The average daily solar generations and load demands of individual MGs are listed in Table 1. Note that there are just slight differences in the load demands of different types of MGs, while the solar generations of different types of MGs differ considerably. An illustrative example of the net demand (load demand minus renewable generation) profiles of different types of MGs is shown in Fig.3. Due to the limited capacity of its PV system and operational constraints of its ESS, an MG has to purchase energy from the utility company in the event that its demand cannot be fulfilled with its own PV generation. For the sake of easy comparison, the corresponding average daily costs incurred in purchasing energy from the utility company to fulfill the demand-supply gaps without any energy scheduling mechanism are listed in Table 1 as lower benchmarks.
As observed in Fig.4(a) and (b), as net demand patterns of MGs are relatively similar, only a small portion of surplus PV generation sold in trading can be used to directly fulfill the buyers' load demands. A large portion of sold PV energy is stored into the buyers' ESSs. The stored energy can be sold later in trading. Thus energy trading among the MGs further improves the flexibility brought by the ESSs that allows for energy time-shifting. Meanwhile, a large portion of energy brought at relatively lower prices in energy trading is used for load-serving to reduce the operational costs of the buyers. In addition, in each time slot, each pair of MGs in energy trading trades energy with a different trading price. The range of P2P trading prices and the corresponding mean trading price in each time slot are illustrated in Fig.4(c). As can be seen, although P2P trading prices are determined based on the energy supply conditions of each pair of MGs,  the mean trading prices (marked with asterisks) reflect the changes in the energy supply condition of the system: the mean trading price drops when more energy is available for trading and vice versa, which encourages local energy trading and consumption.
To verify the effectiveness of the proposed energy trading mechanism, comparisons are drawn with the scenario without energy sharing, where each MG operates independently under the same Lyapunov-based energy control algorithm and does not share energy with each other. In the case without energy trading, each MG acquires energy from the utility company via the DS and the actual energy that an MG acquires from the utility company is obtained using (21). Fig.5 compares real-time energy storage scheduling and energy purchasing actions with and without energy trading. As can be observed, since stored excess energy in the ESSs can be traded between the MGs, the proposed energy trading mechanism allows more surplus solar generation to be used to fulfill load demands, thereby reducing purchasing energy from the utility company.
Moreover, under the proposed energy trading mechanism, the MGs not only trade their stored energy but also share the storage space of their ESSs in P2P energy trading. Taking into consideration the battery degradation costs and charging/ discharging losses when making bidding decisions, the use of the distributed ESSs to provide substantial energy shifting  is compensated, which incentives the MGs to utilize their ESSs in a collaborative but competitive way. The collective use of ESSs in energy trading can be considered analogous to the case where the MGs share their ESSs, which leads to a significant improvement in PV generation utilization, especially for the MGs with higher ratios of PV generation to load demand (P/L), as demonstrated in Table 1. The solar generation curtailment rates of Type III, II and I MGs drop by 48.90%-85.64%, 29.62%-40.74% and 8.53%-47.54%, respectively. This indicates that, with local energy trading, it may not be necessary for the MGs to invest in larger size ESSs.
Although the role of each MG in energy trading dynamically changes with its net demand, it can be observed in Table 1 that, in energy trading, Type I MGs with low P/L ratios buy more energy than what they sell, while Type III MGs with more solar energy production sell more energy. The energy trading decisions of the MGs determine how individual MGs benefit from energy trading. In energy trading, since the MGs are paired based on their payoff preference lists, which couple the price-quantity bids/offers of the MGs with the transmission losses dependent on the distances between them, their locations are one of the main factors contributing to the resulting energy trading decisions. However, the energy trading decisions of an MG also depend on other time-varying factors, such as the net demands of the MGs, the available ESS storage capacities and the per unit costs of MGs' energy available for trading. To investigate the impact of locations of MGs on the performance of the proposed energy trading mechanism, we compare three scenarios, where the locations of MG3, MG5 and MG6 vary, as illustrated in Fig.2. As observed in Fig.6, in Scenario1, among the Type III MGs, although MG3 with a lower P/L ratio and a higher levelized cost is closer to other MGs, as shown in Fig.2(a), it sells much less energy than other Type III MGs that are more distant. Especially, MG5 with a higher P/L ratio and a lower levelized cost, which is relatively close to Type I and II MGs that are more likely to buy energy in energy trading, sells more energy in comparison to other Type III MGs, thereby gaining more benefit from energy trading in terms of operational cost reduction. Similarly, in Scenario3 (Fig.2(c)), where the locations of MG5 and MG6 are switched, MG6 with a higher P/L ratio and a larger ESS offers more surplus solar energy production and flexibility in energy shifting, thus gaining more benefit from energy trading. Since the real-time net demand of an MG as a seller affects the energy trading decisions of other MGs, compared to Scenario1, switching MG5 and MG6 results in less energy exchange among the MGs while the resulting transmission loss rate remains 3.06%, as shown in Table 2. In Scenario2 (Fig.2(b)), the locations of MG3 and MG5 are switched. MG5 with more surplus solar energy production available for energy trading is more likely to be chosen as a seller in energy trading. As a result, the transmission loss decreases to 2.85%, thereby slightly reducing the total energy cost of the system in comparison to Scenario1 and 3 as shown in Table 2.
In what follows, we investigate the performance of the proposed distributed pair matching algorithm in terms of transmission loss under Scenario2 by a comparison drawn with a centralized transmission loss minimization (CTLM) algorithm, where a central controller chooses geographically closer MGs to exchange energy first, i.e., a seller MG will be matched to the closest buyer MG whose bid price is greater than its offer price. As illustrated in Table 2, compared to the proposed pair matching algorithm, the transmission loss associated with energy trading under the CTLM algorithm is 0.95% lower and the total energy cost of all MGs is 1.04% less. As illustrated in Fig.7, under the CTLM algorithm, without considering the financial benefits of the MGs, MG9 with a higher levelized cost sells more energy as it is closer to Type I MGs, which in turn reduces its solar generation curtailment and increases its revenues from energy trading. However, energy sold by other Type III MGs drops, resulting in declines in their energy trading revenues. In contrast, compared to the CTLM algorithm, the proposed matching algorithm increases the total traded energy in the system, which slightly increases the total energy cost of the system, while incentivizing MGs with excess energy to participate in energy trading in such a competitive energy trading market, as illustrated in Table 2.

VI. CONCLUSION
This paper studies the real-time energy trading problem in a smart energy distribution system with interconnected MGs subject to transmission losses. We present a Lyapunov-based energy trading system that integrates energy control and energy bidding, aiming to minimize the long-term timeaveraged operational costs of individual MGs. The proposed online energy control and bidding algorithm allows each MG to independently and dynamically optimize its energy bidding decisions along with its energy control decisions taking into account the operational constraints of its ESS without requiring any statistical knowledge of the system. The trading pair matching algorithm allows MGs to pair with each other based on their individual payoffs, which couple price-quantity bids/offers of MGs with distance-dependent energy transmission losses associated with energy exchange. Numerical evaluations provide a more comprehensive insight into the interactions among the self-interested MGs with various energy generation and storage capacities and diverse load demand profiles. Simulation results show that, compared to the scenario without energy sharing, energy exchange via energy trading reduces the operational costs of individual MGs, improves the utilization efficiency of local renewable generation, and reduces dependency on the utility company. The payoff preference based trading pair matching algorithm ensures that each MG benefits from energy trading taking into account the associated transmission losses, thereby incentivizing the MGs to participate in energy trading.

Proof of Lemma 1:
According to the definition of L i ( i (t)), Based on the queue update rules in (14), we have Since battery charging and discharging can not happen simultaneously, we have . Then, based on the queue update rules in (15), we have Applying inequalities (31) and (32) to (30), taking the conditional expectation over L i ( i (t + 1)) − L i ( i (t)) given i (t) and adding the penalty term V i E{C i (t)| i (t)} yield the upper bound in (16).

Proof of Lemma 2:
The optimization problem P4 can be decomposed into the following two sub-problems to determine the optimal Y i (t) and γ i (t) separately: We now only study the sub-problem P4.1, from which the optimal Y i (t) is obtained. We first rearrange P4.1 to The following two cases are considered: i (t) = 0 and g es,R i (t) = 0. Then, the optimization problem P4 can be written as follows: As can been seen -If p eb i (t)−p u (t) ≤ 0, i.e., p eb i (t) ≤ p u (t), MG i tends to increase g eb,D i (t). Otherwise, g eb,D Note that, to encourage energy trading among MGs so as to reduce conventional energy purchase from the utility company, the ask price is capped: p ask i (t) ≤ p u (t). In case , we have g dis,D i (t) > 0. Since energy charging and discharging can not happen simultaneously, MG i is not able to buy energy via energy trading to store into its ESS, i.e., g eb,E i (t) = 0.
• Energy Surplus: when g R i (t) ≥ D i (t), we have g u i (t) = 0, g dis,D i (t) = 0, g es,E i (t) = 0 and g eb,D i (t) = 0. Then, the optimization problem P4 can be written as follows: As can been seen i.e., MG i's surplus renewable energy is not able to be stored into its battery. MG i tends to increase g es,R i (t) by choosing the lowest possible ask price c R i (t), which is the average per unit cost of MG i's energy available for trading, to reduce its energy cost and avoid the waste of renewable energy. On the other hand, in case of E i (t) − K i (t) ≤ 0, MG i tends to store as much as possible of its surplus renewable energy into its ESS and increase g es,R i (t) with any ask price p ask i (t) ≥ c R i (t) in case there is not enough storage space in the ESS to store all surplus renewable energy. To encourage energy trading among MGs to reduce conventional energy purchase from the utility company, the ask price is set as the lowest possible price: p ask i (t) = c R i (t).

Proof of Lemma 3: Proof of Lemma 3.1:
To prove Lemma 3.1, we first introduce Lemma 4 below: Lemma 4: The optimal solution γ * i (t) of P4.2 is given by is the first derivative of Q i (·) and Q −1 i (·) is the inverse function of Q i (·). Then, we have Proof: Since Q i (·) is assumed to be a continuous, convex and increasing function with Q i (·) = 0, for γ i (t) ∈ [0, i ], the first derivation of Q i (γ i (t)), Q i (γ i (t)) ≥ 0, increases with γ i (t) and Q i (0) = 0. Thus, we have 0 . We then study the first derivation of J i (γ i (t)), which is given by J i (γ i (t)) = K i (t) + V i Q i (γ i (t)).
• If K i (t) ≥ 0, we have J i (γ i (t)) > 0, which indicates J (γ i (t)) monotonically increases. Thus, its minimum occurs at γ * indicates J (γ i (t)) monotonically decreases. Thus, its minimum occurs at γ * i (t) = i . We now prove the upper bound of K i (t) considering the following cases.
• K i (t) ≥ 0: We have γ * i (t) = 0. Since λ i (t) ≥ 0, based on the dynamics of K i (t) in (15), K i (t +1) = K i (t)−λ i (t) ≤ K i (t), i.e., non-increasing; according to (4), the maximum increment of K i (t + 1) from K i (t) in (15) occurs when γ * i (t) = i and λ i (t) = 0. Thus, Since the maximum increment of K i (t + 1) from K i (t) in (15) occurs when λ i (t) = 0, The per-slot problem P4 includes all constraints of the original problem P1 except for the energy state constraint. Hence, to prove the solution derived from P4 is feasible to P1 is to show the energy state of MG i, S i (t), is bounded within [S min i , S max i ]. The proof proceeds by induction. First, it is obvious that the lower and upper bounds hold for t = 0. We now suppose that S min i ≤ S i (t) ≤ S max i holds for time slot t, which in turn indicates S min i − θ i ≤ E i (t) ≤ S max i − θ i . Hence, to prove the boundary of S i (t) in (28) also holds for time slot t + 1, we need to prove S min i − θ i ≤ E i (t + 1) ≤ S max i − θ i holds. We now study the energy deficit and energy surplus cases separately. Let g eb,D * i (t), g eb,E * i (t), g es,R * i (t), g es,E * i (t), g ch,R * i (t) and g dis,D * i (t) be the optimal solution to P4.
• Energy Deficit: We prove the upper and lower bounds considering the following cases: -Case 1. E i (t) ≥ −V i p es i (t)/η dis i − K i (t): as p es i (t) < p u (t), we have 0 < g es,E * i (t) + g dis,D * i (t) ≤ R dis i . Since we assume a priority of using the stored energy from MG i's ESS to serve its load demand, we have g eb,E * i (t) = 0 according to (4). Based on the update equation (14), we have E i (t + 1) < E i (t) ≤ S max i − θ i . In addition, as p es i (t) < p u (t) < p u max and K i (t) ≤ i , we have E i (t) > −V i p max /η dis i − i . Then, based on the definition of θ i , we get  , an Extraordinary Professor with the University of Pretoria, a Professor Extraordinaire with the Tshwsane University of Technology, and a Visiting Professor with the University of Johannesburg. His research interests include wireless sensor and actuator networks, low power wide area networks, software defined wireless sensor networks, cognitive radio, network security, network management, and sensor/actuator node development.
He is a member of many IEEE Technical Communities. He participated in the formulation of many large and multidisciplinary Research and Development successful proposals (as a principal investigator or a main author/contributor). He is the Founder of the Smart Networks Collaboration Initiative that aims to develop efficient and secure networks for the future smart systems, such as smart cities, smart grid, and smart water grid. He is a Section Editor-in-Chief of Journal of Sensor and Actuator Networks, an Associate Editor of IEEE ACCESS, IEEE INTERNET OF THINGS JOURNAL, and IEEE TRANSACTION ON INDUSTRIAL INFORMATICS. VOLUME 10, 2022