Selection Game for Consensus-Based Decentralized Aggregators of Distributed Energy Resources in a Blockchain Ecosystem

Aggregators can be effective in organizing distributed energy resources (DERs) for smart grids and electricity markets. The recent development of blockchain and peer-to-peer (P2P) networks provides a new ecosystem for aggregating DERs. Initial studies have mainly used off-the-shelf consensuses, which may struggle to balance node sizes and computational intensities. Moreover, the dynamics of DERs changing their selection among multiple aggregators over time are rarely considered in most related literature. This freedom of selection, which is encouraged by the electricity market, can be better activated by a blockchain. In this study, a game-dynamic-based selection framework for multiple aggregators of DERs is proposed in a decentralized blockchain ecosystem. First, a proof-of-dual-credibility (Po2C) protocol is established so that DERs in such an aggregator can reach consensus. At the same time, for one DER node, both an unavoidable objective credit and a malicious subjective credit constitute its credibility with different weights. Then, a function with triple payoffs motivates DERs in terms of both the physical characteristics of being power supply devices and P2P nodes, the latter including consensus winning and data propagation. Third, the selection game of DERs among aggregators is modeled as an evolutionary game under replicator dynamics to find equilibrium. Numerical simulations with two and four aggregators show general stability in the selection game of DERs. Performance achieved with different consensuses and incentives are compared as well. The framework shows its great potential to organize DERs in a decentralized but aggregated mechanism in open electricity markets.


I. INTRODUCTION A. MOTIVATIONS
Developing a smart, low-carbon power grid involves the integration of distributed energy resources (DERs), such as micro renewable energy generators, flexible loads, electric The associate editor coordinating the review of this manuscript and approving it for publication was Vitor Monteiro . vehicles, energy storage systems, etc. [1]. The rapid diffusion of these various and numerous DERs poses a great challenge to power systems. Compared with traditional grid-side generators, these consumer-side DERs are much smaller in capacity, more geographically dispersed, and more complex to schedule due to different uncertainties [2]. Thus, aggregators are widely adopted to act as an intermediary between power system operators and DERs [3]. For instance, the Federal Energy Regulatory Commission (FERC) in the U.S. issued a notice that would require transmission organizations and system operators to create rules to enable aggregated DERs to participate in wholesale markets. An open market allows DER individuals to select among many aggregators in which it is willing to participate [4].
With the help of DER aggregators, the scheduling pressure of power system operators can be shifted downstream [5]. However, aggregators themselves have to find an effective way to organize profit-seeking DER individuals and their volatile outputs. At the same time, competitors emerge in the open electricity market environment. In the traditional form of centralized organization, there must be a central unit with which each DER has to establish interactive communication, putting significant pressure on reliability and expandability. In addition, the cybersecurity of centralized aggregators must be carefully addressed [6].
Recently, blockchain-enabled ecosystems in the energy sector, especially in smart grids, have been developed, and they can be utilized as a decentralized and self-organized intermediary with a guarantee of transparency and tamper resilience [7]. This thriving technology is envisaged as promising to overcome the challenges of centralized systems. At the same time, the peer-to-peer (P2P) network allows one individual node to share sources or data directly to others, without relying on a centralized controller [8]. That is, DERs can organize as nodes allowing P2P communication among any individuals. Aggregators from these DERs forms a typical P2P network. A blockchain-enabled P2P network can theoretically provide an equal, trusted, and open-access environment for power trading and settlement. Thus, in such a blockchain-enabled aggregator, a centralized organization is no longer needed, and DERs can autonomously organize themselves and the aggregator they select to join [9].
A growing number of studies are interested in designing blockchain-enabled energy communities. Related topics include trading energies [10], supporting P2P trades [11], motivating prosumers [12], etc. Pilot projects, e.g., the Brooklyn microgrid [13], have also been reported. Meanwhile, there are still some questions that need to be further answered, which constitutes the main motivations of this paper. (Q1) Which kind of consensus should be adapted for DER peers in a blockchain-enabled aggregator? (Q2) What properties should be incentivized for DERs in a blockchain-enabled aggregator? (Q3) Do DER individuals behave as they would in an open electricity market when selecting among multiple aggregators?
B. RELATED WORKS Some recent works can be introduced to further depict the three questions that are essential to establish a decentralized aggregator of DERs above.
For an arbitrary blockchain ecosystem, one of the essential problems is to find an appropriate consensus protocol. This also corresponds to the first question, Q1, here. Individuals are expected to reach an agreement, i.e., consensus, based on specific properties. A straightforward approach is to directly adopt an off-the-shelf protocol, e.g., the Byzantine fault tolerant (BFT) protocol in [10], [12] or the proof-of-work (PoW) protocol applied in [11] and [14].
Various BFT protocols are developed to tolerate Byzantine attacks and random failures. On the other hand, the PoW protocol achieves consensus by making nodes compete to win a puzzle-solving process [15]. The different principles result in advantages and drawbacks for each of these two types of protocols, and it is difficult to strike a balance between complexity and intensity [16]. For a large population of DERs, an appropriate consensus paradigm must be carefully chosen and improved [17].
The second question, Q2, further affects the design of consensus, which is closely related to Q1. While traditional centralized aggregators are generally concerned with the energy characteristics of DERs, most studies with the blockchain structure tend to focus on their nature as nodes. Considering that the original PoW wastes too many resources on solving an unnecessary hash puzzle, the puzzle-solving race can be changed into the rank of other characteristic. In specific, when this characteristic is a certain type of credit, a Proofof-Credit (PoC) consensus can be developed under the PoW paradigm. For Internet-of-Things (IoT) devices, a creditbased PoW consensus is proposed in [18], where the credit is defined as their obedience to system rules. Another PoC blockchain protocol for electronic currency transactions is presented in [19], where the credit is defined as a special stake that quantifies how much the activities of nodes are beneficial to the whole system.
Although there are some consensuses for DERs, it is still in the early stage of development and needs some specific improvement. On the one hand, unlike traditional incentive models in centralized aggregators, DERs in decentralized P2P networks need to take on more functions in terms of information propagation and consensus [20], which should also be reflected. On the other hand, uncertainties in DER outputs are inevitable, so it may not be appropriate to use only this single dimension to evaluate their credits [21].
The third question, Q3, evolves the dynamics in which the DERs select among multiple aggregators and corresponding equilibrium over time. Blockchain-enabled aggregators can better establish an open environment, which has a great positive effect on non-monopolistic electricity markets in the real world.
Existing studies generally do not focus on the freedom of selection for DERs. It is a general default that DERs always participate in the same aggregator given initially and have no freedom to change, e.g., in [22] and [23]. In another case, there is only one aggregator, either for the traditional centralized structure, e.g., in [24], or the P2P structure, e.g., in [25].
Once the aggregator-selection of DERs is considered, the equilibrium dynamics should be studied as well. Although there are few studies specifically on this topic, the balance of energy production and consumption within an aggregator or among multiple aggregators can provide some examples. Game dynamics are considered in [26], while equilibrium is obtained inside one P2P trading network. By using a leading energy sharing node, the equilibrium of the demand response of DERs is achieved under a leader-follower game [27]. The evolutionary game in [28] shows the solvability of curtailed load amounts for DERs to obtain evolutionary stability.

C. CONTRIBUTIONS
This paper is motivated to fill the gaps when answering the three inevitable questions. The main contributions of this paper are organized as follows.
1) A novel framework is organized to aggregate DERs in a decentralized manner with the freedom of selection. Derived from the PoC consensus, a more comprehensive proof-of-dual-credibility (Po2C) is proposed along with its payoff mechanism. In specific, the dual credibility constitutes two inherent features of DERs as power devices and P2P nodes, and is measured by the uncertainties that DERs may behave. 2) To better reflect the freedom and dynamics that DERs should have among aggregators, their aggregating process is mathematically described as game dynamics. Furthermore, the proportional imitation rule is used to obtain evolutionary stability among a population of DERs. 3) Numerical simulations verify that, in the proposed framework, free selections of DERs among multiple aggregators can be activated. This process is independent of the distributions at previous time intervals. Moreover, the prescheduled powers of aggregators and the payoffs of being power supply devices and P2P nodes for DERs are the factors that determine the equilibrium states of each selection. The remainder of this paper is organized as follows.
To answer Q1 along with Q2, Section 2 presents the proposed framework of blockchain-enabled aggregators for DERs and formulations of the Po2C consensus protocol and the incentive model. As a response to Q3, a theoretical analysis of equilibrium for aggregator-selection dynamics by DERs is provided in Section 3. Section 4 demonstrates the case study and results. The conclusion is discussed in Section 5.

II. FRAMEWORK AND CONSENSUS OF BLOCKCHAIN-ENABLED AGGREGATORS FOR DERs
This section presents how DERs can perform and select aggregators as a blockchain ecosystem. Necessary models, i.e., consensus and incentives for DERs, are also provided.

A. FRAMEWORK
DERs include not only distributed generators (DGs) in the distribution networks, e.g., rooftop photovoltaic units, microturbines, and stationary energy storage devices, but also demand-side resources (DRs) that can adjust load profiles, e.g., electric vehicles, temperature-controlled loads, dryers, etc. DERs are often equipped with advanced controlling, metering, and communicating functionalities. Therefore, DERs can be organized autonomously through P2P and blockchain technologies for a decentralized aggregator.
The P2P links can be built over nodes in a blockchainenabled network. The term node refers to a logical entity, which means a physical device is allowed to be associated with different functionalities and with multiple networks [16]. A node can not only provide the primary functionality of data propagation by maintaining routes but can also win the leadership competition in a decentralized consensus protocol. With the development of smart-grid-related technologies, e.g., IoT devices and the edge computing infrastructure, DERs can behave as autonomous, distributed, and multifunctional nodes in a blockchain ecosystem [18].
Blockchain networks were initially intended to be completely decentralized in blockchain practices. Due to the explosion of computational capability, it is common for individual nodes to join a so-called mining pool to increase their chance of sharing the winning of mining competitions. Thus, many theoretical studies and practices suggest that individuals are free to choose and join different mining pools. This is similar to the organization of power scheduling, e.g., aggregators in electricity markets. The decentralized selfregulating ability enabled by the blockchain ecosystem can be adapted for aggregating autonomous DERs.
Specifically, enabled by a blockchain, an aggregator of autonomous DERs can operate as a network without any single authority. Moreover, an arbitrary DER in one of these aggregators can be selected to join another aggregator as desired. The proposed framework for the DER components in a blockchain-enabled decentralized aggregator is shown in Figure 1. The following settings are introduced without disrupting the equity of decentralized DERs or being incompatible with the rules of mainstream electricity markets.
DERs can constitute nodes in the ecosystem by adapting their consensus. They are not only devices capable of supplying power but also providing necessary functionalities. They become a peer in the network that activates propagation, consensus participation, etc. Codes can be implemented in DERs to make the process fully automated and unmanned.
A P2P network is created among multiple DER nodes as peers. To boost participation without undermining scalability, peers communicate in a gossip manner, similar to the Bitcoin network. Instead of communicating with a centralized node or with each other, each DER selects a small number of random peers to gossip with and does not require a full connection as topology. In addition, one peer can act as a public bulletin board between an external electricity market or grid and the aggregator, exchanging information about the power supply, electricity prices, etc. However, decisions are still decentralized and made by individuals with no centralized control node.
Once such blockchain-enabled aggregators are established, DERs can freely choose to participate in an aggregator. As time changes, they can also select another aggregator. For one time interval, a DER can select only one aggregator to participate with its power supply sold entirely to it. Naturally, each DER can be seen as a rational and profit-driven individual to maximize its payoff. Thus, the Nash equilibrium among aggregators can be analyzed as an evolutionary game with stability.

B. Po2C CONSENSUS
The organization of P2P nodes is related to the consensus adopted in a blockchain. Many consensus mechanisms have bloomed in the past decade for different applications. A summary of the two most widely adopted consensus paradigms in various blockchains, i.e., BFT and proof-of-concept (PoX), is provided in Table. 1. BFT consensus protocols, e.g., PBFT and Paxos, are widely adopted in blockchains for government and corporate affairs. This type of protocol may award a firstmover advantage for adopters, e.g., PFBT applied in Hyper-Ledger Fabric [29]. In some energy-related blockchains, e.g., energy trading transactions, as aforementioned, it can also be adopted by default. However, for these BFT protocols, a fully connected topology among consensus nodes and a leader-peer hierarchy with three-way handshakes are necessary. This leads to a communication complexity of (n 2 ), where n is the total number of nodes in a network [16]. In the past few years, with the advent of the Nakamoto protocol in Bitcoin, also known as PoW, various PoX-based protocols have rapidly emerged. In each consensus round of a PoX-based protocol, nodes prove that they have a particular capability, usually by competing to perform a complex but easily verifiable task, and the winning node obtains credit. In contrast to BFT-based protocols, P2P nodes accept the received block proposal following the longest-chain rule after they verify the validity of the block. Since no all-to-all messaging phase in three-way handshakes is needed, a PoX-based protocol may have a complexity of (n), much smaller than the (n 2 ) complexity in BFT [16].
As shown in Table 1, the BFT paradigm may not be suitable for aggregating DERs that are widely distributed and have a large population. The PoX-based paradigm is a more appropriate solution but needs to overcome several aspects. The first is the high degree of electricity depletion and computation intensity in the traditional PoW protocol. The second is the ability to handle subjective deceptions and objective errors. Unlike financial applications such as virtual currency, DERs may suffer from unintentional and uncontrollable errors and deviations in forecasts, fluctuations, etc. [30], which are often neglected in most existing PoX-based protocols for DERs. Thus, in the PoX-based paradigm, a Po2C consensus is organized as follows.
In PoX-based protocols, each node should prove to what extent it can contribute and win in a consensus round, e.g., the hash rate in crypto mining. In the proposed Po2C, the required proof is determined by a dual credibility rate, which considers two types of credits regarding objective and subjective uncertainties. For example, prediction deviations between planned and actual powers are inevitable for some DERs, and thus it belongs to the objective type. As another example, as a P2P node, a DER may randomly tamper with the message that it propagates, at which point this uncertainty is of the subjective type. Four typical kinds of uncertainties are categorized in Table 2. Other types can be incorporated by adding more sub-credits. Thus, for DER node i, its dual-credibility rate C ivt when selecting aggregator v at time interval t is represented as in which where ϕ v and C it are vectors representing coefficients and credits, ϕ While values of ϕ v can be given, C it are updated based on their credits in dealing with uncertainties at each consensus round as follows where x refers to the index of 1 and 2 in respective sub-credit, η obj is a penalty factor in the objective credit, ε obj,x it is the occurring rates of prediction deviations and communication failures for DER i at time interval t, η h and η m are a reward factor and a penalty factor, respectively, and h and m are the set of honest nodes and malicious nodes.
Regardless of the aggregator chosen by DER i, its values of C it in (4)-(5) are utilized in a tamper-evident manner by a blockchain. The logarithmic and linear forms in (5) can control the rate of growth or decline in subjective credits, respectively, while the cumulative effect of historical credit decreases over time.

C. WINNING PROBABILITY OF DERs IN Po2C
Unlike the classical PoW protocol, which runs decryption that has great computational resource demands, the Po2C protocol uses the dual-credibility rate to directly represent the willingness and capability of any DER that can contribute to the blockchain. After updating the quantified credits in the last consensus round, DERs in the same blockchain need to compete for the right of book-keeping in the next round. The necessary processes for a PoX-based protocol, i.e., ability proving, P2P propagating, and transaction verifying, are adapted as follows.
Most BFT-based and PoX-based protocols need to find a leader node as the first step in initiating a consensus round. The probability for a node to win as a leader is proportional to the ratio between its stake and the overall stakes in the whole network in PoX-based protocols. The Po2C protocol reduces the waste of solving the hash function in PoW by directly quantifying each node's dual-credibility rate, which does not affect this probability. Thus, the probability of DER i successfully winning as a leader in aggregator v at time interval t, i.e., p leader ivt , can be estimated as where N vt is the total number of DERs in aggregator v at time interval t, and j is also a DER number.
Then, DER nodes propagate information to neighboring nodes and hope to be confirmed. Caused by the efficiencies of P2P networks and lags among geographically dispersed DERs, the propagation delay is nonnegligible. Based on an empirical fitting curve for a P2P network in the PoW protocol, the average propagation time is determined by the sum of a round-trip delay and a block verification delay [24]. This propagation delay for aggregator v at time interval t, i.e., τ vt , can be modeled as where a and b are the coefficients reflecting the round-trip and verification properties in a P2P network, respectively. The incidence rate at which a valid block is orphaned by propagation delay, i.e., r(τ vt ), can be modeled by a Poisson process from a network-level probability [16] where ρ v is the mean of the Poisson distribution in aggregator v, whose value is fixed by the average block arrival rate in the P2P network. At the end of each round, the probability that DER node i can ultimately win the consensus in aggregator v at time interval t, i.e., p win ivt , can be derived as DERs in blockchain-enabled aggregators can earn incentives from the perspectives of both supplying energy as power nodes and supporting transactions as consensus nodes. The latter includes participating in consensus and in the propagation of data. The three types of incentives are described below. 1) Power supply incentives. They are used to reward the successful supply of electricity and are similar to traditional aggregators. They can be modeled by a power supply function of DERs and the probability of being identified in the aggregator. 2) Consensus winning incentives. They are assigned for the particular node in (8) that wins the book-keeping right in a consensus round. They are also the only incentives that a blockchain provides to nodes in most PoX-based protocols. For DERs with higher dualcredibility rates, there is also a higher chance of gaining this type of incentive. 3) Data propagation incentives. Because most nodes cannot be rewarded for winning, a propagation incentive is introduced to not only encourage DERs to keep P2P transactions active but also attract DERs to select this aggregator. As mentioned above, the communication complexity of a PoX-based protocol is (n). Thus, this type of incentive is linearly related to the DER numbers within the aggregator. A DER resource hopes to schedule the largest amount of power supply as it can within the maximum limits it can provide. Meanwhile, the aggregator is likely to have a contract with an external power market that binds the total amount. The aggregator can determine powers that DERs can supply according to Po2C values. An allocation function of the power that DER i can supply when it selects to participate in aggregator v at time interval t is in which where P ivt is the power that DER i can schedule to supply when it selects aggregator v at time interval t, f (v, P it ) is the utility factor for DER i in aggregator v at time interval t, P max it is the predicted maximum power that DER i can supply at time interval t, and P max vt is the total amount of power that aggregator v should supply at time interval t.
Some customized types of DERs, e.g., electric vehicles, energy storage devices, and temperature-controlled loads, can also be utilized in (11) with more constraints, e.g., as in [32].
Therefore, for one DER, by simultaneously being a power provider, a consensus-winning node, and a communicating peer, it can expect a comprehensive payoff, described as in which where R(v, P ivt ) is the payoff when DER i selects aggregator v and provides P it at time interval t, λ t is an external market clearing price at time interval t, α vt is a price that rewards power supply in aggregator v at time interval t, β v and γ vt are the given rates in the market clearing price that rewards consensus leader winning and P2P propagation in aggregator v, respectively, γ max v is the maximum rate to a market clearing price that rewards all the P2P gossips, and z ivt is the number of nodes that DER i gossips to in blockchain-enabled aggregator v at time interval t.
To focus on how DERs select among multiple aggregators and the feasibility of changing their selections, biddings of aggregators and locational differences in an electricity market are not introduced. Thus, a unified external market clearing price for all aggregators is given as known. Equation (13) ensures that the benefits received by an aggregator from the external electricity market are fully shared by DERs in it.
In (12)-(15), P max it , λ t , β v , and γ max v are taken as parameters instead of variables, while α vt and γ vt are constrained variables. In the last term of (12), the propagation incentive is linearly positively correlated to z ivt since the complexity of the PoX-based protocol is (n).

III. SELECTION GAME AMONG AGGREGATORS
This section studies how DER individuals select a specific aggregator that they prefer. The equilibrium process of DERs in multiple aggregators is organized as an evolutionary game with additional evolutionary stability.

A. GAME DYNAMICS OF DERs AND AGGREGATORS
DERs seek to select one aggregator by comparing its expected payoffs in different aggregators, and it is rational to regard the selection with the highest payoff as better. This process differs from the traditional noncooperative game that analyzes how players behave through static solution concepts, in which no individual has a unilateral incentive to change their behaviors [33]. Evolutionary game theory is suitable for describing the dynamics in these DERs and blockchainenabled aggregators.
From the perspective of the evolutionary game, DERs can be taken as one single large-populated, well-mixed species in the aggregator. Their rates of reproduction, i.e., fitness, are translated as payoffs that each DER individual tries to maximize [34]. The game dynamics can model how DER individuals change their strategy for selecting aggregators and finally achieve stability.
The evolutionary game dynamics for DERs to select an aggregator at time interval t can be mathematically defined with the 4-tuple G t = I , V , t , {R (v, P it ; t )} v∈V as follows.
1) I = {1, 2, . . . ,i, . . . ,I } is the finite population of DER individuals. VOLUME 10, 2022 2) V = {1, 2, . . . ,v, . . . ,V } is the finite set of aggregators, which represent the strategies that are available for DERs to select. 3) t ≡ {(ω 1t , ω 2t , . . . ,ω vt , . . . ,ω Vt ) | V v=1 ω vt = 1, 0 ≤ ω vt ≤ 1} is the vector of population states at time interval t, where ω vt indicates the proportion of the DER population selecting aggregator v at time interval t. Relationships of V 1 N vt = I and ω vt = N vt I hold. 4) R (v, P it ; t ) v∈V is the set of each DER payoff in each aggregator with different population states, in which each value can be calculated by (12). The classical replicator dynamics (RD) is suitable to interpret this proposed noncooperative aggregator selection dynamics [35]. It describes how DER individuals, called replicators, make rational decisions by observing and transferring to aggregators that presently provide a higher payoff. For an arbitrary aggregator, the rate of changing the selection strategy by the population of DER individuals is called the per capita growth rate, which can be derived from the difference between the expected payoff and the population's average payoff [36]. The dynamics for the evolution of the population states of aggregators ∀ω vt ∈ t can be organized as an ordinary differential equation as a replicator equation, in whichR whereω vt is the per capita growth rate of aggregator v at time interval t andR ( t ) is the average payoff of a DER selected at random, i.e., the mean payoff of the whole population. The naive evolutionary logic determines that a DER individual switches to the candidate strategy only if its payoff is higher than the payoff of its current strategy. Thus, a more straightforward way to interpret (16) and (17) is as follows.
1) A scenario whereω vt > 0: The fitness (i.e., payoff) from selecting v is above average; then, DER individuals will change their selections to aggregator v, leading to the dynamics where the ω vt for aggregator v will increase in the population; 2) A scenario whereω vt < 0: Based on the same logic, the dynamics are opposite those in Scenario I, so the ω vt for aggregator v will decreases; 3) A scenario whereω vt = 0: There is no evolutionary motivation for corresponding DERs to change their selected aggregators. The flowchart for the game among DERs to select aggregators is shown in Figure 2. In (16), the per capita growth rate of aggregator v is considered to be the difference between a random interaction and the whole set of DERs if aggregator v is selected. The average payoff in (13) is depicted by reproducing selection strategies through all the aggregators, which is determined by the payoffs and the population proportion in each aggregator. Features of DER individuals in a blockchain environment, i.e., autonomous, noncooperative, etc., can be fully retained in the evolutionary dynamics in (16) and (17). According to the theory of dynamic systems, trajectories in (16) leave the interior of t invariant as well as each of it faces [37].

B. EQUILIBRIUM WITH EVOLUTIONARY STABILITY
The stabilization of this dynamical system is analyzed considering that the payoff of a DER depends on the selections of other DERs. Considering all the replicator equations in the whole DER population, an evolutionarily stable strategy (ESS) for the replicator dynamics is reached, which means that none of the populations will evolve. A fixed point is defined for the ESS such that the scenario withω vt = 0 for ∀v ∈ V is satisfied. It is often described as a Nash equilibrium (NE) with additional ESS properties [38]. To obtain the NE of this game, G t = I , V , t , R (v, P it ; t ) v∈V , there are two necessary theorems, i.e., the theorem for NE and the theorem for the ESS, and their definitions need to be stated [39].
Theorem for NE. The replicator equation for an evolutionary game satisfies the following: i) a stable fixed point is in NE and ii) a convergent trajectory in the interior of the strategy space evolves toward NE.
The definition of NE is as follows. A population state * t = ω * 1t , ω * 2t , . . . ,ω * vt , . . . ,ω * Vt is the NE state of this evolutionary game G if the following inequality holds (18) where t represents all the feasible states for population states and R v, P it ; t and R v, P it ; * t are the vectors for payoffs of aggregator v in population state sets t and * t , respectively.
The theorem for NE is the application of the traditional folk theorem in evolutionary game theory. NE is determined by the fixed point of the replicator dynamics. Next, the stability of the NE state, i.e., the ESS, is further explored with the generalization form of evolutionary games.
Theorem for the ESS. For the whole population of DERs: i) * t is an ESS if and only if R v, P it ; * t > R (v, P it ; t ) for all ω vt ∈ * t that are sufficiently close but not equal and ii) ESS * t in the interior of t is a globally asymptotically stable fixed point of the replicator equation.
The ESS is defined as follows. Population state * t = ω * 1t , ω * 2t , . . . ,ω * vt , . . . ,ω * Vt is an ESS of this evolutionary game G where there is a neighbor that is sufficiently close but not equal to * t if For an arbitrary aggregator v, assume that there is another population state ω vt that attempts to intrude upon ω * vt by attracting a fringe of DER individuals to switch. Based on (19), this ω vt could be an ESS if the following inequality holds where R v, P it ;ω vt and R v, P it ;ω * vt are the payoffs for DER i with power P it in aggregator v when aggregator v's population states are ω vt and ω * vt , respectively. The implications from (18) to (20) demonstrate that the RD ends up in NE, and not just any NE state but an ESS. Proofs based on the Lyapunov function for these two theorems can be found in [37], [39].

C. AGGREGATOR SELECTION PROCEDURE BY DERs
The procedure for evaluating DER aggregator selection strategy evolution is given in Table 3. It begins by assuming that each DER individual occasionally selects one aggregator at random. By comparing this payoff with the average payoff, each DER determines whether it needs to change the aggregator it currently selects. If so, the DER switches to a certain aggregator in the next iteration. When there are more than two blockchain-enabled aggregators, which is very likely, a random switch choice is less efficient. A more efficient switch choice considering multiple simultaneous comparisons based on a probabilistic criterion is proposed as follows where k is the iteration number, ρ i,v→u,t (k) is the probability of DER i in aggregator v switching to aggregator u for time interval t in iteration k, and K is the maximum number of iterations to find equilibrium.
Criterion (21) is adapted from the pairwise comparisons for proportions, i.e., a proportional imitation rule [40], which is widely used in population games. In particular, DER individuals imitate strategies with a higher payoff with a probability that is proportional to the expected payoff obtained by switching to another aggregator.
The final result of evolution is that every DER individual uses a strategy with its highest payoff. The algorithm of this evolutionary game is solved locally by DERs, in which the relevant parameters they obtained and the choices they finally made are secured by the blockchain. The main advantage of the blockchain environment also includes the utility of being truthful and open records of information. General underlying technologies for blockchains, e.g., cryptographic data structures, datagram transfer protocols, and distributed ledger storage, are off-the-shelf technologies and thus are not covered here.  the latter complements the historical values [41]. The rated power of DERs P max i is set from U (5, 25) with a unit of kW, which is a very common range for distributed photovoltaics, electric vehicles, flexible loads, etc. [42]. The expected outputs of P max it are then sampled from U (0.4P max i , P max i ). For objective uncertainties, the probability for inevitable errors in DER outputs and communication failures, i.e., ε obj,1 it and ε obj,2 it , is set as the normal distributions of N (0.5P max it , 2 2 ) and N (0.02, 0.01 2 ). For subjective uncertainties, the probability that DER i behaves maliciously, i.e., i ∈ m at t, is assumed to fit a gamma distribution of (3, 2) with a unit of % [43].

IV. CASE STUDY
For aggregators, parameter settings are as follows. P2P-propagation-related parameters a + b and ρ v are set to 1,000 and 2, respectively [35]. The values of P max vt in different cases will be given along with simulations. The external market clearing price λ t is shown in Table 5 [44]. The empirical distribution of z ivt fits a power-law distribution with an exponent value of 1.2 [45]. The default parameters in each aggregator, including penalty factors in (2) and (4)-(5) as well as unit prices for rewards in (12), are listed in Table 6. The rules for setting penalty factors in (2) is explained as follows. First, from the perspective of trustiness, which is encouraged by blockchains, the objective uncertainties are less hostile than subjective uncertainties, so that the values of ϕ  Simulations are conducted by Jupyter Notebook on the Anaconda platform with an Intel Core i7 CPU at 2.4 GHz with 8 GB of memory. The maximum number of iterations is set to K = 1,000.

B. EVOLUTIONARY ANALYSIS FOR AGGREGATOR SELECTIONS
Because NE has a significant effect on the proposed framework and the results for Q1 and Q2, Q3 is answered first by verifying the theoretical analysis of the game dynamics. In this subsection, to observe the evolution with different parameters, the P max vt values in aggregators are large, i.e., no curtailment is triggered in (11).
Case A with two aggregators is first observed. The initial DER populations in each aggregator are set to (75%, 25%), (50%, 50%), and (10%, 90%). The iteration processes of DER populations in each aggregator at t = 1 are depicted in Figure 3. The ESSs at t = 1 in all three simulations are approximately 66% and 34%. This shows that the equilibrium results are not significantly correlated with the initial distributions. This indicates the feasibility of launching the proposed framework that allows free selections of DERs among aggregators. DER selection strategies continue to evolve. Figure 4 takes t = 9 and 10 as examples. When parameters in aggregators remain unchanged, population states change from (66%, 34%) at t = 1 in Figure 3(a) to (82%, 18%) at t = 9 in Figure 4(a). The ESS point may change due to the accumulation of credit values and the fluctuation of powers. At the very next time interval in Figure 4(a), i.e., t = 10, the aggregator v = 2 increases the ratios of incentives for being P2P nodes, i.e., β v and γ max v . As shown in Figure 4(b), the situation in which many DERs selecting aggregator v = 1 changes immediately, and the ESS point becomes (43%, 57%). This is most likely because with a limited total incentive, the larger the number of DERs, the less incentive for propagation that individuals can share, and the probability of winning consensus decreases. Once the aggregator v = 2 with a smaller number of DERs raises the values of β v and γ max v , DERs are more likely to receive an increase in the total payoff. As seen from the slope fields in Figure 4(c) and Figure 4(d), the fact that each DER strategy is the one whereby no individual has a unilateral payoff to change its selection is consistent with the theoretical analysis of NE. This shows that this framework is able to help the free selections for DERs with an obtainable ESS point when an electricity market requires openness. Case B with 4 aggregators is simulated to further test the generality. As mentioned above, the initial population distributions do not affect the NE results. Thus, initial population states can be set as random. Figure 6 depicts the evolution processes in DER population states and per capita growth rates at the next time interval. Since aggregator v = 3 sets a higher ratio of incentives for propagation, i.e., γ max v , its population grows and exceeds aggregator v = 2 whose other parameters are the same, despite the lower initial distribution. Generally, changes in per capita growth rates in Figure 6(b) converge to zero in a process consistent with Figure 6(a). This indicates that when the number of aggregators with different parameters increases, the framework still obtains feasible solutions in the proposed algorithm for the evolutionary game.

C. PERFORMANCE WITH DIFFERENT CONSENSUSES AND INCENTIVES
After showing that the evolutionary processes of selection are not a barrier to the proposed framework, to VOLUME 10, 2022 answer Q1 and Q2, more comparisons are analyzed, focusing on the consensus and incentives. Case A with constrained P max vt is taken as the environment for this subsection. First, consensus protocols are compared, while the triple-incentive payoff function and game dynamics in the framework remain unchanged. Using different values for subcoefficients ϕ To distinguish it from Po2C, this consensus that does not measure objective uncertainties is referred as Po1C here. By accommodating uncontrollable objective uncertainties in a consensus, it is beneficial to attract a larger population of DERs in one aggregator. Corresponding to the adjustment in the consensus and the resulting changes in DER populations in Figure 7, total power outputs in aggregators change as well. The reduction in (11) is only for the preschedule, and if the actual output exceeds it, it is still retained in the cumulative measurement. Moreover, because the impact of a power surplus can be removed by curtailment for aggregators, the shortage affects their performance in the external market.
The prescheduled and actual cumulative powers are shown in Figure 8. In general, cumulative powers are largely close to the prescheduled powers in each aggregator, regardless of the variations in the DER population. This demonstrates the validity of the proposed framework. The overall differences using the original Po2C with ϕ For aggregator v = 1 in Figure 8(a), the increase in the power deficit is more evident with Po1C, which may be caused by not having enough DERs to select it. Meanwhile, the fluctuation in the power differences in Figure 8 Figure 8(b), as the population of DERs grows with Po1C, the power deficit continues to decrease, but the surplus is higher. This may be because DERs with more objective uncertainties prefer to choose the Po1C consensus. In addition, prescheduled cumulative powers also affect the population states. The higher the value is, the more DERs that choose it, e.g., for aggregator v = 1 at time interval t = 3 and for aggregator v = 2 at time interval t = 2. This is probably because of the lower power that will be curtailed by (11). However, for aggregators, because external electricity markets will evaluate their deviations, it is not recommended to look for a single increase in the proportion in Figure 7 but rather to combine the differences in Figure 8.
Relationships between DER populations and cumulative powers can be further observed by comprehensive observations in both Figure 7 and Figure 8. Comparisons within the aggregator itself show that if one aggregator overemphasizes the penalty for objective uncertainties, i.e., using a Po2C with higher ϕ obj,1 v and ϕ obj,2 v for credits, an insufficient DER population will result, instead exacerbating the bias in cumulative powers. Comparisons among aggregators suggest that when the prescheduled cumulative power of an aggregator is limited, not all DERs select the aggregator that places less emphasis on objective credits. This is also be the reason why the populations in NE are not disproportionately large in Figure 7. Second, different incentives are removed in the payoff for aggregator v = 1, while Po2C and game dynamics remain. The payoff function (12) with triple incentives is replaced by double incentives that reward both power supply and consensus leader winning and a single incentive that rewards power supply only. Figure 9 shows the corresponding population states and differences in cumulative powers. The performance of aggregator v = 1 shows that if the P2P propagation incentive is discouraged, the number of DERs in it decreases, and a shortage emerges. If the incentive for the leader to win is further removed, greater deviations will exist in the aggregated power outputs. In particular, they are unfavorable shortages that cannot be curtailed in the way that power surplus can. This indicates that considering the need for consensus and data propagation in the blockchain ecosystem, a comprehensive payoff function helps the organization of aggregators.

V. CONCLUSION
This study focuses on providing a decentralized framework for aggregating DERs, in which DER individuals can freely and dynamically select the aggregators they participate in and achieve dynamic equilibrium. While the blockchain ecosystem provides necessary transparency and security, a Po2C consensus protocol for the P2P network is proposed, in which quantitative models are separately given for objective credits and subjective credits. The dual properties of DERs as power nodes and information nodes are also considered in an incentive model. The game dynamics for DER selection among multiple aggregators are studied, and then, an algorithm using pairwise comparisons is proposed to find equilibrium. Simulation results show that the proposed framework effectively enables DERs to organize themselves autonomously with evolutionary stability. The proposed consensus is fully decentralized and thus does not require centralized control from aggregators.
When focusing on consensus and game dynamics, the interaction between aggregators and electricity markets or power grids is simplified. Moreover, most electricity markets currently only allow DERs to change selections at the end of a trading day, which is simplified in the simulation by allowing them to change after each time interval. In future studies, the bidding processes of decentralized aggregators in electricity markets can be further modeled, and market clearing prices as variables determined by the bilevel equilibrium can be investigated. Along with the bidding processes, more detailed models for uncertainties, e.g., RES production profiles, different load shift ratios, various failures for communications, can be further introduced. In addition, the optimization of parameters in the consensus for various components of DERs is also an important issue that is worth studying in future work.
YUQING BAO (Member, IEEE) was born in Zhenjiang, China, in 1987. He received the Ph.D. degree from Southeast University (SEU), Nanjing, China, in 2016. He has been working as a Faculty Member at Nanjing Normal University (NJNU), since 2015. His current research interests include power system operation and scheduling, power demand side management, and the frequency control of the power systems.