Decentralized Active Power Management in Multi-Agent Distribution Systems Considering Congestion Issue

Recently, due to the restructuring of power systems and the high penetration level of local renewables, distribution systems have encountered with the complexity of power management. Therefore, the modern systems would be operated in a multi-agent structure which facilitates the power management as well as privacy protections of independent entities. In this structure, the distribution system is assumed to compose of several agents who independently schedule their local resources in order to maximize their own profits. Consequently, this paper provides an efficient peer-to-peer (P2P) active power management framework in a multi-agent distribution system while considering network constraints (i.e., line loadings and losses). In this context, in the proposed P2P scheme, the distribution system operator (DSO) model the network constraints in the form of line-usage costs within the transactive signals. Respectively, the developed transactive control signals enable the DSO to model the power loss as well as alleviate the congestion in the grid. Therefore, the agents automatically consider the network constraints in their power transactions management procedure without any direct interferences of the DSO in their resource scheduling. Finally, the proposed model is implemented on the modified-IEEE-37-bus-test system in order to investigate its effectiveness in the energy management of multi-agent systems.


Decentralized Active Power Management in Multi-Agent Distribution Systems Considering Congestion Issue
to the introduction of restructuring and privatization in power systems. In this regard, one of the developments in the power system structure is the advent of multi-agent-based management, where agents schedule their local resources, independently [1]. Furthermore, multi-agent system (MAS) structures facilitate the privacy protection of agents [2]; therefore, the development of MASs is going to be more prevalent in future distribution systems. In addition, implementing the MAS structure would enable avoiding the necessity of central management of a large number of local resources (i.e., renewable energy sources (RESs), storage units, and demands) as well as collecting and analyzing a huge amount of system data; which are indispensable in a centrally managed system [1], [2]. The increasing trend of RESs integration in distribution systems has enabled local agents to partially supply their respective local demands. In this new environment, agents would have the opportunity to exchange power with each other at a lower price in comparison with the price of purchasing power from the upstream grid. In this context, these kinds of energy exchanges in the future distribution systems with MAS structures would lead to forming a power market in local energy systems [3]. It is noteworthy that the development of local power markets not only provides the opportunity for sellers and buyers to achieve more benefits but also increases the independency of distribution systems from the upstream network as well as the efficiency of the power grid [4].
A decentralized peer-to-peer (P2P) framework is well suited to meet the preliminary conditions required for the development of local power markets [5]. In a P2P framework, agents would be able to determine their power exchanges with each other without any need for a central server. In this regard, agents would employ intelligent software platforms that would analyze the market information and consequently make the best decision based on the agents' favorite settings [6].
Developing local power markets in multi-agent structures has been taken into account in previous research works. In this respect, Table I presents the comparison between research works primarily conducted on operational management of distribution systems from different perspectives. It is noteworthy to mention that developed approaches in [4], [7]- [13] have missed the technical constraints associated with the network (i.e., lines loading and lines losses) or merely considered one of them in their models. In [14], although the authors This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ have considered the technical constraints of the grid in their model; they have aimed to block the power transactions that have high risks to the network, which eliminates the related households' opportunity to modify or revise their transactions. Furthermore, the models described in [15]- [17] have investigated the power losses issue of the network, but they have not considered an effective manner to control the line loadings.
This paper provides a new framework for running a decentralized P2P market considering line loadings and losses, as technical constraints in the management of multi-agent distribution systems. Furthermore, the model predictive control (MPC) technique is deployed in the active power management scheme in order to connect the future decision variables to those of the current time interval for maximizing the profit of the agents.
It is noteworthy that while [14] blocks the power transactions that risk the network constraints, this paper aims to incentivize the agents to revise their operational scheduling based on the received transactive signals in order to ensure converging into optimum operational point of the system. Accordingly, this paper strives to exploit the scheduling of agents based on the operational constraints of the grid. Moreover, unlike ADMM frameworks [18], [19] in which the information transactions between agents in a distribution grid is limited to the neighbor entities, this paper aims to develop an efficient P2P framework that facilitates the interaction of independent agents to determine their respective power transactions in the next time interval. As a result, power transactions in the system would be determined without limitations over the information/power exchanges between agents.
Furthermore, the developed approach strives to address operational constraints of the grid considering the distributed nature of the system. In other words, the proposed paradigm enables the agents to interact in the P2P power market context, while the system operator strives to relieve the operational constraints of the grid utilizing transactive signals. In this regard, technical constraints of the grid would be considered in the P2P transactions between the agents without any requirements for agents to model the structure of the network and its bottlenecks in their operational scheduling. It should be noted that the developed approaches in recently published research works in the context of the operational management of the distribution systems considering congestion issue, i.e., [2], [20]- [25], have all considered a central optimization for scheduling the transactions between agents and the upper-level network. In other words, while we have employed the P2P concept to enable the P2P transactions between agents, previous research works have not considered the possibility of P2P transactions in the system. Specifically, authors in [20]- [22] have tried to determine appropriate tariffs in the system to exploit the power exchange of local resources with the energy grid. Moreover, in [2], and [23], the proposed schemes merely aim to alleviate the congestion issue in the grid after clearing the power market utilizing flexible resources. In [2], and [23], it is assumed that agents would have participated in the wholesale market and the system operator strives to alleviate the potential congestion issues resulted from the market clearing results. Accordingly, these works have not considered the possibility of energy exchanges among agents in the distribution system. The proposed scheme in [24] has assumed that all the aggregators would announce their power requests to the distribution market operator which is a central entity for clearing the 'pay-as-bid' market. As a result, the market-clearing price is conducted by a central entity in [24]. Furthermore, in [25], the operator is considered the responsible party for alleviating the congestion issue. In this regard, the operator would conduct a robust optimization based on the prediction of power request by local resources to optimize the operation of the system during the day-ahead operation.
The proposed scheme aims to provide a detailed step-wise algorithm that facilitates the implementation of the P2P market concept in the multi-agent distribution systems. In the proposed framework, it is assumed that each agent besides its respective load demands could independently operate some photovoltaic (PV) and/or wind power units as well as energy storage systems (ESSs), which would improve its respective flexibility as well as increase its profits. Furthermore, the MPC methodology is taken into consideration in order to enable the agents to consider the upcoming operational time periods in their current operational scheduling optimization.
Based on the literature explorations and the above discussions, the following points could be pointed out: • The high integration of distributed energy resources as well as the fit and forget paradigm in the investment management of distribution grids could result in congestion issue. In this regard, previous research works in the context of congestion alleviation in distribution systems [2], [20]- [23] have merely considered the power exchange of agents/prosumers with the grid; while, the proposed scheme in this paper facilitates the P2P power transactions as well as power exchanges with the upperlevel system. • On one hand, most of previous research works in the context of P2P energy management in distribution systems have overlooked network constraints specifically the potential congestion issue in the grid. On the other hand, while, the proposed model in [14] blocks the power transactions that violate the network constraint; this paper aims to incentivize the agents to revise their operational scheduling based on the received transactive signals to maximize the social welfare and converge to the optimal solution. • The proposed model provides a step-wise algorithm for P2P energy management of multi-agent distribution systems while addressing the network constraints.
Respectively, TE concept is employed to ensure decentralized management of power transactions between independent agents. Moreover, as the procedure of updating the transactive control signals is conducted in an iterative discontinuous way, 'finalizing process' step is developed to ensure the obtained active power management in the system addresses the demand-supply balance constraint in each point of the grid; which is not investigated in previous research works with the a similar context. In this paper, the multi-agent structure of the distribution system and the proposed P2P market framework will be discussed in Sections II-A and II-B, respectively. Furthermore, the mathematical modeling of the optimization conducted by each agent is described in Section II-C. The detailed mathematical modeling of items included in the optimization conducted by each agent are described in this section. The process of conducting the P2P market is explained in Section II-D. Finally, the results of the proposed scheme implementation on the IEEE-37 bus test system and its effectiveness are demonstrated and discussed in Section III, followed by the conclusion in Section IV.

A. System Modeling
A simplified model of the multi-agent distribution system is demonstrated in Fig. 1. While this structure could facilitate mitigating the privacy concerns associated with centrally operated systems; developing an applicable framework that could cope with its distributed nature as well as the operational constraints of distribution grids seem to be indispensable. Hence, in this paper, a framework is developed that facilitates P2P interaction between agents, while distribution system operator (DSO) utilizes transactive signals to model operational constraints of the distribution grid. Moreover, a new entity called the wholesale market aggregator (WMA) is introduced that enables multi-agent systems to exchange power with the upper-level system. In this regard, DSO would be responsible for the reliable operation of the distribution system, while, WMA could benefit from participating in the P2P market as well as the upper-level market. Note that DSO may also act as WMA in case of regulation permission from authorities. Additionally, DSO is conceived as the P2P market operator without any loss of generality to ease the modeling of the system. Figure 1 presents a model of the distribution system with a multi-agent structure that is taken into account in this paper. In the proposed P2P scheme, it is considered that the system agents modeled as N = {1, 2, 3, . . . , n} are categorized as B = {b 1 , b 2 , b 3 , . . . , b n B } buyers and S = {s 1 , s 2 , s 3 , . . . , s n S } sellers in each time interval; where n B and n S are the number of buyers and sellers, respectively. In other words, agents of the system in each time interval would be categorized as buyers/sellers in case of purchasing/selling energy from/to other agents or the upper-level system. It is noteworthy that agents could contact with each other to receive an offer for purchasing/selling energy in the next time interval, therefore, each agent would finally act as a buyer or a seller at the equilibrium point. Furthermore, in the case of communication constraints, the DSO could also act as a mediator entity that facilitates the communication between the agents; which could be considered as an alternative to direct communication between agents.

B. Proposed P2P Market Framework
In this paper, a new step-wise transactive distributed control framework based on the P2P market concept is developed to schedule the MAS operation for the next time interval. The proposed scheme has been developed in a way that addresses the independency of the agents as well as the operational constraints associated with the network The proposed framework for implementing the P2P market is structured in an iterative way. In this context, in each iteration, DSO determines and announces the network costs associated with power transactions in the system; while the agents optimize their power purchasing from the WMA and other agents considering their respective network costs. It is noteworthy that the P2P market would be conducted to determine power transactions for the next time interval. In this regard, agents employ the MPC concept in order to take into consideration the states of the system as well as their resources in future time intervals in their ongoing optimization to maximize their profits.
In the proposed scheme, first of all, WMA announces the prices associated with purchasing (i.e., λ WMA,buy t )/selling (i.e., λ WMA,sell t ) power from/to the agents; and then agents specify their role in the P2P market, i.e., buyer/seller, and announce it to the DSO (i.e., P2P market operator). In this paper, it is considered that agents utilize the announced prices by WMA to determine their role at the first iteration of conducting the P2P market. However, agents could take into account different learning approaches to improve their forecasting of the prerequisite input data for conducting their respective operational optimizations. In other words, this paper aims to develop an applicable step-wise approach for the transactive P2P market in MAS rather than merely investigate efficient optimization processes from the agents' perspectives.
The proposed P2P market framework is developed based on the announced selling prices by seller agents and requested power amounts by buyers. In this framework, sellers determine their selling prices in each iteration, while buyers update their power purchasing plan based on the updated selling prices and network costs. Afterward, based upon the received power transactions, DSO checks the convergence criteria and runs the load flow to ensure the grid would not confront the congestion.

C. Mathematical Modeling of Agents Optimization
In the following sections, the way that each agent manages its respective local resources, as well as the interaction with other agents, is investigated from the mathematical optimization point of view.
1) Operational Scheduling of Demands: In this paper, in order to generalize the approach, the consumption of each agent at each time interval is modeled by utilizing a utility function as follows: where, κ t n > 0demonstrates the consumption parameter, α n > 0 is a fixed predetermined parameter, P load n,t shows the power consumption in agent n at time interval t and P load,min n,t /P load,max n,t are the lower/upper band limit of the power consumption [26].
2) Operational Scheduling of ESSs: It is considered that agents operate ESSs to improve their flexibility towards high prices in the market, which would finally improve the flexibility of the system. In this regard, the operational costs of the ESSs of agent n based upon the associated charging/discharging power and the relative constraints in each time interval is modeled as follows [11]: In the equations above, t is the index of time interval, n is the number of the agent, ε ch n /ε dis n are charging/discharging depreciated costs, P ch n,t /P dis n,t present charging/discharging amounts, P ch,max n /P dis,max n shows the maximum charging/discharging rates, E ESS n,t is the stored energy of ESS, η ch n /η dis n are charging and discharging efficiencies, and E ESS,max n /E ESS,min n present the limitations over the stored energy in the ESS.
3) Operational Scheduling of RESs: Agents, as illustrated in (7)- (8), would model the cost associated with the operation of their local RESs to include them in their respective operational optimization scheduling.

C RES n,t P RES
where, k RES n , P RES n,t and P RES,max n,t represent the operational cost, power generation, and maximum limit of power production by RES units in agent n at time interval t, respectively.

4) Trading With Other Agents in the P2P Market Framework:
In the proposed P2P market structure, agents could negotiate with each other in an iterative algorithm. The following equations illustrate the trading cost functions associated with the buyers/sellers in each step of the P2P market framework.
where, C P2P,buyer k,t and C P2P,seller m,t are the cost of buyer k and seller m owing to trade with other agents, π m,t is the offered price by seller m, P buy k,m,t is the amount of power that buyer k purchases from seller m, and P sell m,t represents the total amount of the power that m th seller prefers to sell at time t. It is noteworthy that the sellers would determine their preferred prices and buyers would optimize their power requests based upon the given prices.

5) Costs of Utilizing the Distribution Network:
As mentioned, DSO is responsible to ensure the reliable operation of the distribution grid; while independent agents merely take into consideration their respective profits in the multi-agent system. In this paper, it is considered that DSO would be able to efficiently alleviate the operational constraints in the grid by allocating transactive control signals to system agents. As a result, DSO could control the loading of the lines as well as the losses in the system by employing transactive control signals. Moreover, the proposed transactive control concept could be utilized to assign costs associated with using distribution systems (i.e., network usage cost) to each agent based upon its reliance on the distribution grid to exchange power with other agents in the P2P market structure. In this context, the transactive signals that represent the network usage costs could enable the DSO to fairly allocate the costs of the operation and expansion of the distribution grid to agents. It is noteworthy that the transactive signals employed to designate the network losses, congestion, and network fixed costs have monetary origins and so would be updated in a step-wise algorithm during the implementation of the proposed iterative P2P market framework. In the proposed framework, without loss of generality, it is considered that the transactive signals would be allocated to the buyers to simplify the process of applying the proposed scheme. In other words, sellers would increase their proposed prices during the P2P market implementation in case of receiving the allocated costs to cover the profit losses. In this regard, the transactive signals announced by DSO regarding the fixed costs, the costs associated with network congestions, and power losses are formulated as follows: • Transactive signals associated with fixed costs: As mentioned, the network usage costs would include the costs associated with the operation and investment in the distribution grids. Therefore, the fixed operation costs are determined in C operation ; in which, each element represents the cost of lines that will be used by different agents to transfer energy to each other. Moreover, a similar matrix is defined as C impossible to model the continuity of the network. In other words, in case that the network is composed of isolated areas that prevent the power exchange with particular agents; the associated elements in C impossible would be set as infinite, otherwise the elements of the matrix would be set as zero. Therefore, the overall fixed costs are modeled as follows: • Transactive signals associated with the active power congestion: In addition to fixed cost transactive signals, DSO employs another transactive signal as a penalty factor to alleviate the congestion in the network. In this regard, C congestion represents the matrix of transactive signals associated with the congestion in the grid as below: demonstrates the transactive congestion cost relevant to the power exchange from agent m to k. Finally, m slope is the cost associated with the network congestion and would be determined by (14).
where, a 1 , a 2 and a 3 are parameters determined by DSO, and i is the index of iteration associated with conducting the P2P scheme. According to the (14), as the P2P framework progresses, m slope increases for power transactions that result in network congestion. Note that the proposed formulation is considered to facilitate convergence of the P2P algorithm by increasing m slope . This enables the algorithm to converge faster than the constant case and DSO could revise values of the parameters based upon the convergence rate. 1) Transactive signals associated with power losses: In the proposed framework, transactive signals associated with the power losses (i.e., C loss ) are deployed in order to enable the agents to include the costs of power losses in their operational scheduling. In this regard, in each iteration, after determining the active power losses in each line of the network, the respective losses cost of power transaction from agent m to k in the previous iteration could be defined as follows: where P TL k,m is the share of power losses related to power transaction from agent m to agent k, P TL represents the total active power losses, C TL is the cost associated with active power losses in the network, N br is the set of the network lines, P tr k,m,l is the transaction power goes from seller m to buyer k through line l, P br l shows the total power flow through line l and P loss l is the total active power loss in line l [16].

6) Cost Functions of Buyers Based on Transactive Signals:
In the suggested model, after the announcement of transactive signals by DSO, buyers utilize them to calculate the cost that should be paid to the DSO owing to network usage, congestion, and power losses as follows. are the overall transactive signal associated with the power transaction from agent m to k, and the network cost that buyer k should pay to DSO, respectively.

7) Trading With the Upper-Level Network:
In the designed framework, every buyer in the P2P market would be able to purchase an arbitrary amount of energy from the upper-level network (i.e., main grid) at a fixed price determined by WMA, and similarly, every seller could sell energy to WMA. Without loss of generality, similar to the current power systems, it is considered that agents would be able to exchange power with the upper-level system, without limitation, based on the announced prices by WMA. In this context, WMA would announce the prices associated with purchasing/selling (i.e., λ WMA,sell t /λ WMA,buy t ) power from/to the upper-level network to the agents before running the P2P market. In this regard, buyers have the option to purchase their power shortage from the WMA to fulfill the supply-demand balance and sellers could sell the surplus power to WMA to maximize their respective profits. Additionally, trading with WMA would limit the price of power exchange between agents into the range of [λ WMA,buy t , λ WMA,sell t ] (when there is not any line congested from the beginning of the flowchart) in the P2P structure regarding their economical perspectives. It is noteworthy that the WMA prices could be different in various time intervals, and the WMA could use this to control the sellers' prices of the market in different time intervals, which is a great advantage of the proposed framework. Finally, the cost associated with the power trade between agent n and the upper-level system in time interval t would be as follows: In these equations, P wb n,t /P ws n,t demonstrate the amount of power purchased/sold from/to WMA, C WMA,buyer n,t and C WMA,seller n,t are the cost of n th buyer/seller for power exchange with WMA. Moreover, C FCL n,WMA and C FCL WMA,n are the overall network costs associated with the purchasing/selling power from/to the WMA, respectively.

8) Modeling the Cost Function Associated With Each
Agent: As discussed in the previous sections, each agent should take into account different kinds of power exchanges and their associated costs as well as operational costs of local resources to determine its respective operational scheduling in the next time interval. Moreover, each agent could take the role of a buyer or a seller in each step of implementation of the P2P market based upon its forecasting of the cost of exchanging power with other agents and also WMA. In this context, the cost functions correspond to buyers and sellers is modeled as follows: C seller where, C buyer n,t /C seller n,t are cost functions associated with n th buyer/seller at time interval t. Note that network costs are merely included in the cost function associated with the buyer. Finally, as the P2P market framework would iteratively be conducted; agents have to calculate their respective costs (i.e., (21)- (22)) in each step to optimize their power exchanges with WMA and other agents in the system. 9) MPC Method: In the proposed methodology, the MPC concept is taken into account in order to enable the agents to consider future time intervals to schedule their resources (i.e., storage units). In this regard, agents would decide regarding the operational scheduling of their units in the ongoing P2P market for the current time interval and the future ones [27]. As a result, it is considered that agent n takes into consideration the H t n time intervals in its optimization while participating in the P2P market at time interval t. In this context, agent n considers futureH t n periods in its respective optimization models, while, the P2P market is conducted between agents to determine their power transactions at the t th time interval. Finally, agents could apply different forecasting and learning algorithms to improve their forecasting and take into account the following cost function for future time steps.

D. The Procedures for Implementing the P2P Market Model
As previously mentioned, the developed structure for implementing the P2P market scheme is composed of four different entities including seller and buyer agents, DSO, and WMA. In this context, this section aims to model the procedure conducted by each entity in each step of the P2P market framework.
1) Operational Optimization by Each Agent: As mentioned, each seller agent determines its desired selling price (i.e., π m,t ) at each iteration, and buyer agents optimize their purchasing plans based on the selling prices announced by WMA and seller agents. In this regard, the operational optimization model associated with system agents for participation in the P2P market at time period t is modeled as follows: -Buyer agents: Subject to operational constraints of the local resources, and power balance constraint as follows: -Seller agents: Subject to operational constraints of the local resources, and power balance constraint as follows: It is noteworthy that P sell m,t represents the power that seller m wants to sell by the price of π m,t ; while, P buy k,m,t shows the amount of power that buyer k wants to buy from seller m. Moreover, for future time periods the power balance constraint is formed as follows: P load n,t − P wb n,t + P ws n,t + P ch n,t − P dis n,t = P RES n,t Regarding the optimization models, agents minimize their operational costs with respect to the operational constraints of their corresponding resources as well as supply-demand balance constraints. Based on the optimization models, each seller determines the amount of power preferring to sell based on the announced selling price in order to benefit from power exchange with other agents; while buyers determine the amount of power to be purchased from each seller. Finally, in each iteration of running the P2P market, all the agents announce their desired amounts of power exchange (i.e., P sell m,t /P buy k,m,t ) to the DSO as the operator of the P2P market and distribution network.
Sellers, based on the requested power by buyer agents (i.e., P Request m,t ) and the amount of available power that the seller agent has determined from its operational optimization model (i.e., P sell m,t ), would update its announced selling price as follows: where, i shows the index of iteration associated with conducting the P2P market scheme. Moreover, ρ m is a penalty parameter that transforms the difference between the preferred selling amount of power and power request to the selling price. In this regard, ρ m is a progress rate factor that would be set by each seller agent based on its viewpoint towards the risk [10]. Based on the developed formulation in (30), seller agents would decrease their offered price in case that the overall power request of buyers is lower than its own preferred selling power. Accordingly, the seller agent would incentivize the buyers to purchase more power from the agent. It is noteworthy that RESs would be the main power resource in the agents and so, based on their negligible operational costs, the agent would prefer to sell all the amount of the determined selling power. Based on a similar description, the seller agent increases the announced selling price in case the P Request m,t is more than P sell m,t in order to increase its profit. In this regard, the P2P market would be continued until the step in which the overall power request by buyers meets the power production by the seller agent. Note that in case the congestion issue has not occurred in the grid, as the buyer/seller agents would be able to without limitation buy/sell power from/to the upper-level network, the final prices of power exchange between sellers and buyers would be between λ WMA,sell t , and λ WMA,buy t . The new derived prices by seller agents would be taken into consideration by buyer agents to conduct their operational optimization and determine their desired power exchange with each seller. Moreover, seller agents would also update their desired amount of power to be sold to the system agents based on the updated prices. It is noteworthy that this iterative process would continue until the termination criteria is satisfied and potential operational constraints of the distribution grid are relieved.
3) Termination Criteria: In the proposed framework, the iterative process of the P2P market will be terminated by the market operator in case the operational conditions of the grid are addressed and one of the following conditions occurs for each of the seller agents.
1. The change in the announced prices by the seller agents is negligible: 2. The announced prices by the seller agent in the last ϑ iterations fluctuate in the range of [π f − τ,π f + τ ].
Note that i represents the iteration index of the P2P market model, ε π and τ are small constants, andπ f shows the average of the prices of the last ϑ iterations. In this context, the developed scheme would also address the possibility of fluctuations in P2P power transaction optimizations and so the convergence of the proposed framework would be ensured. The fluctuation could occur due to network costs or in the vicinity of WMA prices. In this regard, buyer agents would change their purchasing power plans; which would result in changing the price announced by the seller agents. Consequently, when the announced selling price by a seller fluctuates in the recent iterations within a confined low range described in condition 2, the DSO would set its selling price to the last one that is preferable to buyers in comparison with WMA's price. Note that once the criteria are satisfied, the finalized transaction prices would be announced by the market operator, and agents would finalize their preferred amount of power transaction based on their operational optimization models.

4) Line Loadings and Losses Check Out by DSO:
In the proposed scheme, after the satisfaction of the termination criteria, the DSO checks out the changes in the line losses based on (32). In this regard, if the related changes are less than a small negligible constant (i.e., ε loss ), the changes would be considered as acceptable. Moreover, DSO would also check the line loadings to determine any line loading violation in the system. Note that in case of violation in the losses as well as line loadings, their respective transactive signals would be updated by DSO based on the formulations developed in the previous section.
In (32), j indicates the index of the iteration associated with the loss changes' check out. 5) Finalizing Process: As the developed P2P market is an iterative process that each agent optimizes its selling/purchasing amount of power, the overall power request from a seller agent may slightly differ from its preferred selling power. This could occur specifically due to condition 2 in the termination criteria. However, for clearing the market model, it necessitates that power request (i.e.,P Request m,t ) equals to P sell m,t for each of the seller agents. To this end, the extra two steps are designed to be conducted in the P2P market framework to ensure that the power request by buyers would become equal to the preferred amount of selling power by seller agents. Note that the termination criteria ensure determining the optimal prices of power transactions between agents, while these steps ensure that requested power meets the selling power for each seller agent.
In the first step after satisfaction of termination criteria, in case that P Request m,t is bigger than P sell m,t , DSO proportionally allocates the buyers requests to purchase power from the seller m as follows: shows the allocated purchasing power by buyer k from seller m. After completing this procedure for all the seller agents, buyer agents would run a new optimization model to determine a new purchasing plan for the power difference between P buy,Allocated k,m,t and P buy k,m,t . Since then, the procedure defined in this step would be iteratively conducted in order to ensure that, P sell m,t would be equal to or bigger than P Request m,t for all the seller agents. After finalizing the buyers' power requests in the first step; seller agents would receive the permission for optimizing their extra power (i.e., P sell m,t -P

Request m,t
). Therefore, seller agents would be able to revise the scheduling of their local resources or increase selling power to WMA in order to address their extra power. Afterward, the process is over, and all agents' bids are regulated and are ready to exchange. It is noteworthy that considering the finalizing process ensures that the P2P market framework would converge in all operational circumstances. In other words, the steps defined in the 'Finalizing Process' stage would ensure demand-supply balance in each node of the system. Consequently, while considering WMA prices and transactive signals would ensure relative convergence of the proposed framework; 'Finalizing Process' stage would address possible energy imbalance circumstances in the system. That is why the steps developed in the 'Finalizing Process' stage would address the convergence of the proposed algorithm by ensuring the demand-supply balance in each node of the system. Finally, unlike previously proposed frameworks, this stage is developed in the P2P management paradigm in order to ensure that the market coordinator would be able to clear the market in each iteration while ensuring that the demand-supply balance in the system would be addressed.
6) Convergence Improvement Techniques: In order to improve the convergence of the market-clearing algorithm, several premises are taken into considerations as follows: • The market operator could impose limitations over the change in purchasing power by buyer agents in consecutive iterations. In this regard, this limitation would result in a smooth change of the operational point of the system [12], and could be formulated as follows: where, i and ζ are the P2P market iteration index and a confining constant, respectively. It is noteworthy that a similar limitation could be imposed on the selling prices announced by the seller agents, therefore, the following constraint should be satisfied by the announced selling prices in each iteration: In (34), ξ shows the allowed percentage of deviation from the former value. • Sellers could take into account the states of the previous iterations of the P2P market as a learning process to update their respective selling prices as follows [12]: In this equation,γ ,ω j , and υ are the learning coefficient, the weighting coefficient for π m,t (j), and the number of previous iterations considered for the learning process.

7) The Complete
Step-Wise Procedure of Implementing the Proposed P2P Framework: In the previous sections, the procedures associated with the step-wise P2P market scheme and their associated mathematical modeling were demonstrated. In this context, the step-wise procedure of implementing the P2P market framework is presented in Fig. 2. It has to be mentioned that, according to the flowchart, two main conditions should be satisfied before the finalizing process step. First, the variation in network's loss amounts in the current iteration, in compare with the previous iteration should be negligible, Fig. 2.
The step-wise procedure of implementing the P2P management paradigm.
which implies that the agents do not want to modify their loss amounts. Second, there should not be a congested line in the power grid. Note that the step-wise P2P management model for MAS would enable the agents to independently optimize their operational plans; while addressing the grid operational constraints. Finally, the information associated with running the algorithm in each step is presented in [31].

III. CASE STUDIES
In this section, the simulation results of implementing the proposed P2P management paradigm on a multi-agent distribution test system are discussed. To this end, the modified IEEE 37-bus test system is employed; where each bus of the system is considered as an independent agent. Moreover, it is considered that each agent of the system operates its local resources, i.e., PV units, wind power units, load demands, and ESSs. The operational data of the test system are adapted from [2], [28]- [30] and is presented in [31]. Furthermore, it is considered that the P2P framework is applied to determine the power transactions between agents for the next hour. As mentioned, agents would employ the MPC concept to consider the future time intervals in their current operational optimization; therefore, it is assumed that agents consider the future 5 hours in their ongoing optimization models. In the rest of this section, two case studies are rendered to investigate the obtained operational results of the system as well as the efficiency of the model from congestion alleviation and convergence perspectives.

A. 24-Hour Simulation Results
This paper primarily deals with the condition that, during running the P2P market scheme, the active power requested by the load demands or the power production by RESs results in congestion occurrence in the distribution network. In this regard, it is assumed that the maximum active power flow capacity of the test system's lines is 6 p.u. (= 600 kW), and the simulation has been run for 24 hours of a sample day. In this context, Fig. 3 indicates the power exchange with the upstream network in the 24 hours of the day. Moreover, Fig. 4 shows the status of the grid in the 12 th hour of the day; as an example of the time periods in which the grid congestion has been alleviated by incorporating the transactive   Fig. 4. The congestion occurrences in Line-21-22 has caused the grid to be divided into two sections in which the converged prices on the left side and the right side of the Line-21-22 are approximately 36.65¢/kW and 20.35¢/kW. The difference between the converged prices is approximately equal to the cost of Line-21-22 (i.e., 16.20¢/kW); which shows the importance of incorporating the transactive signals to alleviate the congestion in the system. The congestion in Line-21-22 would limit the power that sellers on the right side of the line could sell to agents and WMA. Furthermore, the selling price of the WMA entity is considered 36.65¢/kW, which approximately equals the converged selling prices of seller agents at the left side of Line-21-22. In other words, due to congestion, the buyer agents in the left section have to purchase power from the WMA and so the prices of seller agents have converged approximately to 36.65¢/kW; which could be considered as the marginal price of purchasing power at hour 12. Note that the slight differences between the selling price of seller agents and WMA are based on the network costs. According to the obtained results, the developed scheme would be able to alleviate the potential congestion issue in the grid while facilitating the P2P active power management between agents.
In order to investigate the resource scheduling in each agent; as an example, the scheduling of resources for agent    30 is demonstrated in this section. In this regard, the power generation by PV and wind power units is shown in Fig. 5. Furthermore, the consumption power by loads and the charging/discharging power of ESSs are presented in Figs. 6 & 7. Moreover, the total power traded with the other agents, and WMA is presented in Fig. 8. Regarding the obtained results, the agent has the seller role at hours 1, 3, 4, 5, 7, 9, 11, 12, 15, 19, 20, 21, 22, and 23; while at other time periods the agent is a buyer. Fig. 9 shows the average converged price beside the purchasing and selling prices of the WMA, during the 24 hours. According to this figure, as expected, all the converged prices (except the 10 th and 14 th hours' prices) are located between the purchasing and selling prices of the WMA, since, as mentioned before, WMA could control the sellers' prices in his range. The converged prices in the 10 th and 14 th hour are not in WMA's range because in the mentioned hours the lines between the upstream network and node 2 (i.e., Line-0-1 and Line-1-2) have been congested. As it can be seen from the results, in the situations that there is at least one line congested, the WMA cannot have control over the prices of sellers who are on the other side of the congested line, since the WMA cannot control the line cost of the congested line which affects the prices of these sellers.
Moreover, Fig. 9 shows that at hours 10, 14, 21, 22, and 23 the power price is higher than that of other hours. Therefore, the ESS discharge amounts of agent 30 at these hours in Fig. 7 seem to be rational, considering the fact that agent 30 is assumed to predict the WMA prices of future hours with a 3% error. Similarly, the ESS charging amounts of agent 30 at hours 5, 7, 11, 12, and 15 in Fig. 7 are due to lower power prices. Also, in Fig. 9, at hour 12, only the prices of the left side of the grid are considered in this figure, which indicates that the WMA has control only over this part of the network.
It is noteworthy that, as Fig. 7 demonstrates, the MPC method enables the agents to decide about their current ESS charging/discharging by considering future time intervals. In this respect, by utilizing the MPC method, the agents strive to charge their respective ESSs when the power price decreases and discharge them when the power prices are expensive.

B. One-Hour Simulation
In this section, two cases are studied as follows: • Case 1: when the line capacities are equal to 6 p.u.; therefore, there is a line congestion condition during implementing the P2P management paradigm. • Case 2: when the capacities of the grid's lines are assumed to be 10 p.u.; hence, there would not be any line congestion owing to the high capacity of lines. The aim of this section is to compare these two cases with each other and find out the convergence and congestion clearance status. 1) Case 1: As mentioned, the system would confront with congestion in Case 1. Based on the step-wise flowchart  presented in Fig. 2, after checking the termination criteria step, the over-loading of the network would be checked and the transactive signals would be updated. In this context, the change in the loading of the Line-21-22 during implementing the P2P scheme is shown in Fig. 10. Note that in each step after updating the transactive signal associated with the congestion occurrences; the iterative procedure associated with optimizing the scheduling of agents and checking the termination criteria would be conducted. In this regard, the iterative procedure of updating prices by seller agents 4,9,30, and 36 at hour 12 after updating the transactive signal associated with the congestion occurrence in Line-21-22 is presented in Fig. 11. Based on the obtained results, the agents have converged reasonably during the implementation of the P2P framework. According to the results, agents 4 and 9 have been converged to 36.65¢/kW, while, agents 30 and 36 are converged to 20.35¢/kW, approximately. The differences between the converged prices arise from the congestion occurrence in Line-21-22 discussed in the previous section. Furthermore, the preliminary and final loadings of the grid's lines are presented in Fig. 12 which demonstrates the congestion alleviation of the network.
2) Case 2: In case of considering the line capacities of 10 p.u., the selling prices of agents 23, 28, 30, and 35 as shown in Fig. 13 are converged to about 35¢/kW, which is between the selling/purchasing prices announced by the WMA (i.e., 36.65¢/kW and 33.2¢/kW).  ]. In this formulation, υ indicates the percentage decrement/increment of WMA's purchase/selling prices. It is noteworthy that the presence of zero in this price interval guarantees λ WMA,buy t ≥ 0. In this section, the market prices at the 9 th and 20 th hours are considered from Fig. 9. as a sample time intervals, at which the prices of the seller agents are approached to their maximum and minimum limits, respectively. In this regard, the simulation has been done for the two hours using various amounts of υ and the results are shown in Figs. 14 and 15 for the 9 th and the 20 th hours, respectively. It should be noted that in the case of υ = 0, the purchasing and selling prices of WMA are similar to the previously presented 24-hour simulation results (which can be observed from Fig. 9.). That is why the average converged price is also approximately similar to the previous results.
According to Fig. 14, at hour 9, as the domain factor increases, the average converged price also enhances and the received power from the upstream network decreases. This is because the total amount of demand is greater than the amount of the power supply as can be inferred from Fig. 3. Moreover,  whenever WMA increases its price interval length, the market prices tend to enhance until the supply-demand is balanced and the received power from the upstream network becomes almost zero.
Similarly, at hour 20, since the power supply in the network is greater than the demand (which can be inferred from Fig. 3), the market prices have a diminishing trend and its decrement by the increment of the domain factor is shown in Fig. 15. Also, the decrement of the average converged price is almost stopped when power reception from the upstream network approaches zero. Thus, without loss of generality, the existence of WMA gives the benefit of price controlling in the proposed scheme.

IV. CONCLUSION
This paper proposes a step-wise P2P management scheme to facilitate the decentralized operation of distribution systems with multi-agent structures. Hence, each agent of the systems would be able to independently schedule its own resources as well as power transactions with the upper-level system and other agents. Furthermore, different transactive signals are developed to enable the system operator to exploit the power transactions between agents in order to address the grid operational constraints (i.e., fixed costs, line loadings, and power losses). Additionally, implementation of a new role into the designed P2P market (i.e., WMA) enables providing power supply from the upstream network. In other words, the attendance of WMA gives the benefit of power exchanging with the upstream network to trade an extra amount of power or purchase power for compensating some probable power shortage in the downstream P2P market. Consequently, the developed framework would facilitate the efficient energy management of multi-agent systems; while addressing the independent agent's privacy concerns. Finally, the proposed scheme is implemented on the modified IEEE-37 bus test system; which demonstrates the effectiveness of the proposed P2P management paradigm for energy management of MASs while taking into account the grid's operational constraints. Moreover, the results show that WMA prices add a capability of controlling the market prices within a certain range which is another advantage of the proposed framework; although, inevitably, congestion occurrences in the network can challenge this controlling action as investigate in the results.