A Fair and Scalable Mechanism for Resource Allocation: The Multilevel QPQ Approach

In this paper the problem of distributing resources among a collection of users (or players) is explored. These players have independent preferences to get these resources and can be dishonest about their preferences in order to increase their utility (their preference for the resources they are allocated). The objective is design a mechanism to allocate resources to players so that all of them get the same amount of resources (fair), the total utility is maximized (optimal), and no player has incentive to be dishonest (strategy proof). Santos et al. proposed the Quid Pro Quo (QPQ) mechanism to solve this problem. In this paper a generalization of the QPQ mechanism is proposed that, in addition to the above properties, has a very high degree of scalability. The proposed multilevel QPQ mechanism divides the players into disjoint clusters and runs a mechanism similar to QPQ inside each cluster and across selected players in each cluster. As a consequence the amount of communication required is drastically reduced. Similarly, the storage used by the mechanism by each player is also significantly reduced, which in a practical setting can be used to improve the ability to detect dishonest players.


I. INTRODUCTION A. MOTIVATION
Resources need to be assigned to users in many situations. A resource could be, for instance, the processing capacity of a computer system, the power of wireless transmitters or the bandwidth of communication paths. The way in which these resources are allocated to users determines the performance of the system. Therefore, a lot of research has been performed to propose mechanisms to achieve an efficient and fair resource allocation [1]- [5].
Resources could be assigned under the consideration of the existence of a central agent that establishes optimal allocation policies that the users follow (this can be also seen as a situation where users coordinate). However, current The associate editor coordinating the review of this manuscript and approving it for publication was Shihong Ding . telecommunication networks are decentralized, and users that are present in the system very often take self-interested decisions [6]- [8]. Therefore, from the practical point of view, it is crucial to design mechanisms that assign resources to selfish users or players in an efficient manner.
The Quid Pro Quo (QPQ) mechanism [9] is a distributed resource allocation algorithm without payments with which a set of resources is allocated to a set of users. Players declare their preferences for the resources and each resource is assigned to the user with the largest value. The main particularity of that model is that each player can misreport its preferences, e.g., users can cheat to get more resources. To prevent that, the mechanism checks that the preferences declared by a player follow a uniform distribution. If this test fails, the declared preference is replaced with a random value. One of the main results of [9] states that this mechanism is fair in the sense that the expected utility of a player that VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ declares true preferences is larger than the expected utility of a player that declares false preferences. This implies that, even though users can cheat on the declared values, they do not have benefit by doing so. The main disadvantage of the QPQ mechanism is that the exchange of information is very large when the number of players is large. This is because the implementation of this mechanism requires that each user sends its preference for each resource to the rest of the players, and these messages have to be processed. As a result, the QPQ mechanism [9] presents a clear scalability problem, i.e., its implementation is very expensive in communication and computation when the number of players is large.
To overcome the aforementioned problem, a multilevel approach of the QPQ mechanism is studied in this article. The idea behind this model is presented now. The players are divided into clusters and, in each cluster, the QPQ mechanism of [9] is implemented to determine the winner independently, i.e., the player with the largest declared value for each resource. Then, it is applied again the QPQ mechanism to the set of players formed by the winners of the clusters to determine who gets the resource. The mechanism used among winners has to be adapted, because all the players of the same cluster operate as a single player whose declared preferences follow a Beta distribution. Using this approach, the preferences need to be exchanged only among the players that belong to the same cluster and with the winners of the rest of the clusters. This is a clear advantage with respect to the mechanism of [9] since the amount of information to be shared by the users is much smaller.

B. CONTRIBUTIONS
In this work, the goal is to show that the nice properties of QPQ mechanism of [9] are complemented in the multilevel QPQ approach with some additional new features. Specifically, the main contributions of this article are the following: • First, the features of the multilevel QPQ are studied when the cheating detection is perfect. It is shown that multilevel QPQ is optimal in the sense that, when players are honest (i.e., they declare their true preferences), no mechanism achieves larger total utility.
• Second, it is also shown that the multilevel QPQ mechanism is strategy proof, i.e., any cheating strategy leads to an expected utility that is not larger than being honest. Finally, it guarantees fairness in the assignment of resources and in their expected value.
• The benefits in the multilevel QPQ approach with respect to QPQ are also studied when all the players are honest. First, the communication cost is analyzed, and it is shown that the reduction on the volume of communication is largest when the number of clusters equals the square root of the number of players. It is also shown that, providing the same memory to multilevel QPQ and QPQ leads to a significant improvement in the number of values that are used to detect a dishonest behaviour and in the expected utility.
• In the numerical experiments, a system with honest and dishonest players is considered (i.e., players that do not declare their real preferences) in which the detection of cheating behavior is not perfect, and evaluate its performance. The goal is to compare the utilities achieved by the multilevel QPQ approach with the utilities of the QPQ mechanism of [9]. The main conclusion from this analysis is that, for the honest players, the expected utility of both mechanisms is very similar, whereas for the dishonest players the expected utility of the multilevel approach is smaller than that of the QPQ mechanism of [9]. This means that the multilevel QPQ approach penalizes more the dishonest players.

C. STRUCTURE
The rest of the article is organized as follows. First, the QPQ mechanism is described in Section II and in Section III the multilevel QPQ. Then, the main results of our work are presented: In Section IV, it is shown that multilevel QPQ is strategy proof and obtain analytically the utility of honest and dishonest players under different assumptions. In Section V, an analytical study of the benefits of multilevel QPQ with respect to QPQ [9] is presented. In Section VI, the simulations that have been carried out are presented. Finally, the related work is explained in Section VII and the main conclusions of this work in Section VIII.

II. THE QPQ MECHANISM
We define the following resource allocation problem: Definition 1 [9]: The resource allocation problem is a tuple R, N , where, 1) R = {r 1 , r 2 , . . .} is a (potentially ∞) ordered list of resources. 2) N = {1, 2, . . . , n} is a set nodes or players, where n > 1 is assumed to be finite. We assume that players are well identified.

3)
= (θ j ) j∈N is a vector of continuous random variables where θ j represents the preferences of player j for the different resources. This information is private, i.e., it is only known by player j. The preference of j for a resource r is denoted as θ j (r). 1 A solution of a resource allocation problem R, N , assigns each resource to a single player. The utility of a player j is then the sum of the preferences θ j (r) of the resources it gets, and the utility of the resource allocation system is the sum of the utility of all players. The objective is to find solutions that maximize the system utility.
We assume that the player's preferences are independent. This means that (1) for a resource r the preferences θ 1 (r), . . . , θ n (r) are mutually independent, and (2) the preferences θ j (r s ) and θ j (r t ) for different resources r s and r t by the same player j are also independent. Moreover, it may not be possible to compare these preferences among each other, as each player could measure this parameter in her own metric and units. This may imply that the system utility could not be computed or its value be meaningless. (There could be many factors than can influence a player's preference, and they can affect in a different way depending on her personality.) To solve this problem, we apply the Probability Integral Transformation (PIT) function to the random variables θ 1 , . . . , θ n [9].
Definition 2 [10]: (Probability Integral Transformation) Let X be a continuous random variable with a Cumulative Distribution Function (CDF) F; that is, X ∼ F. Then, the Probability Integral Transformation (PIT) defines a new random variable Y as Y = F(X ).
An interesting property of the PIT is that it follows a uniform distribution in [0, 1], independently of the distribution of X . Let us define the normalized preference of player j ∈ N asθ j = PIT j (θ j ). Then, by construction,θ j follows a uniform distribution in [0,1], and this fact makes possible the comparison amongθ 1 , . . . ,θ n .
From the properties of the PIT and Proposition 1, we have the following facts.
Proposition 2: The normalized preferencesθ 1 , . . . ,θ n of the players • follow a uniform distribution in [0, 1], and • are independent; i.e., (1) for a resource r the preferences θ 1 (r), . . . ,θ n (r) are mutually independent, and (2) the preferencesθ j (r s ) andθ j (r t ) for different resources r s and r t by the same player j are also independent. Given these properties, in the rest of the paper we will use only normalized preferences. Using the theory of Mechanism Design [12], note that the Resource Allocation Problem could be reformulated as a mechanism whose message space is the space of normalized preferencesθ j (r) and whose decision function D is defined as D :¯ → N . That is, for each resource r the mechanism will ask to each player j her normalized preference, and assigns this resource to the player that the decision function returns from the preferences declared by the players.
Since the normalized preference is private information of a player j, she may choose (strategically) to declare a value different fromθ j . We denote the value declared by player j for resource r asθ j (r). For each resource r the valuesθ 1 (r), . . . ,θ n (r) are common knowledge once they are declared. If player j is honest,θ j (r) will coincide with the real normalized preferenceθ j (r). Otherwise, if the player is not honest, the valueθ j (r) declared by j may not represent the real normalized preference. Moreover, the set of declared valueṡ θ j may not even follow a uniform distribution in [0, 1].
We assume that the mechanism used to declare theθ j (r) values for a given resource r will guarantee that players do not have access to the declared values of other players before they declare their own value. Hence, for every resource r, the preferencesθ 1 (r), . . . ,θ n (r) are mutually independent. Moreover, we assume that the players' strategies and declared values cannot change the belief of other players. Intuitively, this means that the declared values of a player j are independent with respect to the previous preferences of j and previous declared values of the other players. Formally, a strategy for the player j is any map σ j : j → ( j ), where σ j (θ j ,θ j ) is the conditional probability that the player reportsθ j when her true type isθ. Observe that this formulation assumes that dishonest players do not collude.
Using the normalized preferencesθ(r), the declared preferencesθ(r), and the decision function D(·), we can define the normalized utility of a mechanism (for player j with respect to resource r).
Hence, we want to find mechanisms that maximize the utility of all players. When the player j is honest we denote this utility asū j . The authors in [9] present the QPQ mechanism that solve this resource allocation problem. QPQ is based on the Linking Mechanism Design proposed by [13]. This solution has some nice properties like approximate truthfulness, expected utilities that converge to an efficient allocation, and no-payments. In Table 1, we present the QPQ mechanism applied to our family of resource allocation problems.
As can be seen in Line 6 of Table 1, the basic QPQ algorithm executed by player i applies a Goodness of Fit (GoF) test 2 to each of the declared valueθ j (r) (with the aid of a repository History j in which the values previously used for player j are stored). This test evaluates whether the declared valueθ j (r) matches the appropriate probability distribution of values (a uniform distribution in [0, 1]). In the analytical sections of this paper we will assume that this GoF test is perfect, i.e.,θ j (r) passes the test if and only if it has been extracted from the appropriate probability distribution.
The outcome of the GoF evaluation (Lines 6-10) is the valueθ j (r) for each player j. If a valueθ j (r) passes the test, then simplyθ j (r) =θ j (r). Otherwise,θ j (r) =θ j (r), wherê θ j (r) is a pseudo-random value generated from the rest of declared values [TODO: give a practical example of how this would be done, eg, using a cryptographically secure random generator], so that it follows the appropriate probability distribution. Since all players use the same GoF test and the same pseudo-random function, the valuesθ j (r) are all the same in all players.
Finally, the valuesθ j (r) are used to select the player that receives the resource r (Lines 13-16). The player d with the largest such value gets the resource. Since all players have the same set of valuesθ j (r), they all agree on the decision. Hence, the decision function used is D(θ j (r)) = arg max j∈N {θ j (r)}.

III. THE MULTILEVEL QPQ APPROACH
We observe that the QPQ mechanism of Table 1 presents issues when the number of players is large because all the players must know the declared values of the rest. First, the number of communications that must be established for each resource is of the order of n 2 . Secondly, each player must store in History j all the values declared by player j, for every j ∈ N . This requires an overall amount of memory of n 2 times the number of resources. This clearly indicates that this mechanism does not scale properly and we deal with this issue by splitting the set of players into clusters organized hierarchically.
In the multilevel QPQ mechanism, we consider that the set of players N is partitioned into k > 1 subsets, or clusters, N 1 , . . . , N k . The size of cluster c is n c = |N c |, and the membership is N c = {(c, 1), . . . , (c, n c )}. Hence, N = N 1 ∪ · · ·∪N k , N i ∩N j = ∅ for i = j, and n = k j=1 n j . Each of the clusters N 1 , . . . , N k behave as one player of a supercluster.
We consider that the clusters are fixed, that is, each player belongs to the same cluster in all the rounds. In addition to this, we assume that the size and membership of all the clusters are known by all the players.
We now explain how we adapt the notation of the previous section to the multilevel QPQ mechanism. We denote by (c, i) the player i of cluster c. The preference of player (c, i) for a given resource r is denoted by θ c,i (r). Besides, (c, i) applies the PIT function to the preferences and, thus, its normalized preference for a resource r is denoted byθ c,i (r) and the declared preference byθ c,i (r).
The multilevel QPQ mechanism for player (c, i) and resource r is presented in Table 2 and we describe it briefly here. First, from Line 5 to Line 13, the QPQ algorithm of [9] is applied to the players of cluster c. Hence, for each player in cluster c, if the declared value passes the test,θ c,i (r) equals the declared value, that is,θ c,i (r) =θ c,i (r), andθ c,i (r) gets a uniform random value otherwise. We denote by N (1) the set of winners of each cluster. Then, from Line 19 to Line 27, the QPQ algorithm of [9] is applied to the set of winners, that is, to N (1) . Hence, for each player (w, b w ) of N (1) , if the declared value passes the test, then ... θ w,b w (r) =θ w,b w (r), and ... θ w,b w (r) gets a pseudorandom value that follows an appropriate distribution otherwise. Finally, the resource is allocated to player (c, We define for each player (c, i) the following random variable to quantify her normalized utility associated to resource r ∈ R.u Note that the above value depends on the real normalized preferenceθ c,i (r) of the player. 3 The authors in [9] showed that the QPQ mechanism is optimal in the sense that, if all the players are honest, the total utility generated is maximized. In the following result, we generalize this result to a system with clusters. That is, we show that, when the player are honest, the total utility of any mechanism that assigns a set of resources to players (divided in clusters or not) is smaller or equal than the total utility of the QPQ multilevel mechanism.
Proposition 3: Assume that all players are honest. For any resource r ∈ R and any normalized preferencesθ(r), every mechanism M (which may be probabilistic) verifies that whereū M j is any realization of the normalized utility of player j with respect to resource r and normalized preferencesθ(r) when mechanism M is applied.
Proof: Let us assume by way of contradiction that there exists a resource r and a mechanism M such that the above inequality is not satisfied. Therefore, it holds that When the multilevel QPQ mechanism of Table 2 is used, the following holds. Since all players (c, i) are honest, θ c,i (r) =θ c,i (r). Since the GoF test is perfect,θ c,i (r) = θ c,i (r). Hence, the winner of each cluster is the player in the cluster with largest normalized utility. Similarly, for each winner it holds that ...
. This implies that resource r is assigned in Lines 27-30 to the player w * = arg max j∈N {θ j (r)}. Let us assume that, when we apply mechanism M , the resource r is assigned to player w M . With both mechanisms, resource r is assigned to a single player, and hence all players have a zero normalized utility except for that player (w * and w M , for Multilevel QPQ and M , respectively). Thus, it results that and we have found a contradiction.
The following proposition follows from the definition of normalized utility and the fact that a resource r is assigned to only one player.
Proposition 4: For any resource r ∈ R, any normalized preferencesθ(r), and any declared preferencesθ(r), whereu j (r) is the real utility of player j when the declared preferences areθ(r).

IV. THE BENEFITS OF BEING HONEST
In this section, we prove that, for any player, being honest is the strategy that maximizes its utility. Prior to presenting the analysis we have done to prove this result, we give the following result: Proposition 5: The preferences of players (c, i) that the multilevel QPQ mechanism uses to assign the resource r, θ c,i (r), are drawn from independent and uniform distributions in [0, 1].
Proof: We distinguish the following three cases, depending on the behavior of the players: • The player (c, i) is honest. In this case, we have that the preferences areθ c,i (r) =θ c,i (r) and, therefore, they follow a uniform normalized distribution, from Proposition 2. Besides, they are independent since players declare their preferences before receiving values of the others. VOLUME 9, 2021 • The player declares values that do not follow a uniform distribution. In this case, the declared valueθ c,i (r) does not pass the GoF test and the algorithm assigns a random valueθ c,i (r) to the player (c, i), which is uniformly distributed in [0, 1] and independent from the preferences of the others.
• A player is dishonest and she passes the GoF test. This occurs when the declared valuesθ c,i (r) follow a uniform distribution in [0, 1], but they are different fromθ c,i (r). Besides, the declared values are independent from other player's preferences, since the value is sent before the others are received. Since, in all the cases, the values are independent and uniformly distributed in [0, 1], the desired result follows.
The authors in [9] use the notion of aggregated player to compute the expected utility of a player. The idea is the following: they consider the rest of the players as a single fictitious player whose preference is the maximum of all of them. As a result, the computation of the expected utility of a player gets simplified since it is reduced to calculate the probability that its preference is larger than that of the aggregated player. In the following section, we show how we adapt the concept of aggregated player to the multilevel QPQ mechanism.

A. AGGREGATED PLAYER
In the multilevel QPQ mechanism, we consider three aggregated players. The first one is inside the clusters, i.e., we consider that each player (c, i) is competing against a fictitious aggregated player, denoted (c, −i), whose preference is the maximum of the preferences of the rest of the players in cluster c. We denote byθ c,−i the preference of this aggregated player in the cluster level, i.e., Hence, player (c, i) is the winner of cluster c when θ c,i >θ c,−i . When player (c, i) is the winner of cluster c (i.e., b c = i) its preference ... θ c,i is compared with that of the winners of the other clusters. Therefore, the second aggregated player we consider, denoted (−c), is for the cluster winners, i.e., when the player (c, i) is the winner of the cluster c, it is in competition against the aggregated player that is formed by the rest of the winners, whose preference of the aggregated player of the winner is denoted by ... θ −c , and defined as ...
We also define the aggregated player of (c, i) of the entire system, denoted by −(c, i), as the player whose preference is the maximum between ...
The objective of this section is to calculate the distribution of preferencesθ c,−i , ... θ −c and ... θ −(c,i) . The next result is the key to quantify them. Proposition 6: Let X 1 , . . . , X κ , κ > 1, be independent continuous random variables such that they follow, respectively, a Beta(p j , 1) distribution, p j ∈ N ∀j ∈ {1, . . . , κ}. Then, the random variable follows a Beta( κ j=1 p j , 1) distribution. Proof: First, from Proposition 1, it follows that X is a continuous random variable, since the maximum is a measurable function. As a result, we consider F X the CDF of X and we calculate its value for any y ∈ R as follows: Since the random variables X 1 , . . . , X κ are independent and X j follows a Beta(p j , 1) distribution, it results that Depending on the value of y, we differentiate three cases: If we derive F X with respect to y, we obtain the density function of X : Let's recall that the density function associated to a Beta(p, q) distribution is: and, otherwise, f β (y) = 0. Therefore, the desired result follows since f X is equal to f β when p = κ j=1 p j and q = 1.
From Proposition 5, we know that the preferencesθ c,i of the players follow independent uniform distributions in [0, 1]. A uniform distributions in [0, 1] coincides with the Beta(1, 1) distribution. Therefore, from Proposition 6, we derive the preferences' distribution of the first aggregated player in the following result.
Corollary 1: The preferencesθ c,−i of the aggregated player (c, −i) follow a Beta(n c −1, 1) distribution. Moreover, this distribution in independent from the distribution ofθ c,i , which follows a uniform distribution in [0, 1].
Using the same arguments as in the above result, one can easily provide the distribution that the preferences of the winner of cluster c follow.
Corollary 2: The preferencesθ c,b c of the winner b c of cluster c follow a Beta(n c , 1) distribution. Moreover, this distribution is independent from the distribution ofθ c ,b c , for all cluster c = c.
As a result of this corollary and Proposition 6, the next result follows.
As a result of Corollary 3 and Corollary 1, we have the following result.

B. HONEST STRATEGY
When a player (c, i) declares its real preference for a resource r, since we assume that the GoF test is perfect, we have thatθ c,i =θ c,i and, if it is the winner of cluster c, we also have that ... θ c,i =θ c,i . Hence, the obtained utility of player (c, i) for resource r ∈ R (omitted from now on) can be written as follows (combining Eq. 2, 3, and 4, and the decision function D(·)), Observe that, from the definition of ... θ −(c,i) (Eq. 5), this is equivalent toū We use the above expression to compute the expected profit of an honest player in the following result.
Proposition 7: The expected normalized utility of an hon-

C. ARBITRARY RATIONAL STRATEGIES
In this section we show that if all players are rational, no strategy will allow a player to have a expected real normalized utility higher than being honest. From Proposition 7 and 4 we have the following corollary, which gives an upper bound on the sum of the real normalized utilities of all players. Corollary 5: For any set of strategiesσ used by the players, whereθ are the declared preferences of the players withσ , The following result shows that no rational player will ever have less expected utility than the obtained being honest. Hence, which violates Corollary 5.

D. DISHONEST STRATEGIES THAT DO NOT PASS THE GOF TEST
In this section, we prove that the utility of a player when it declares the real preferences is larger than when the declared preferences do not pass the GoF test.
We study the utility of a dishonest player whose the declared value does not pass the GoF test. This occurs when the player declares non uniform values. For this case, the player is assigned a new random preference, i.e.,θ c,i = θ c,i . We assume that the generator of the random values is perfect and, hence, as shown in Proposition 5, the preferences θ c,i are independent and follow a uniform distribution in [0, 1]. As a result, the GoF test which is applied to the winners of the clusters (see Line 06 of The normalized utility,û c,i , of player (c, i) when it declares values that do not pass the GoF test is given bŷ Notice that, from the definition of ... θ −(c,i) (Eq. 5), this is equivalent toû In the next proposition, we compute the expected utility of a player when its declared values do not pass the GoF test.  (9), it follows that the expected utility of a player (c, i) associated to the resource r is given by where z represents the valueθ c,i assigned to player (c, i). From Corollary 4, we know that the distribution of ... θ −(c,i) is the Beta(n − 1, 1) and therefore

E. DISHONEST STRATEGIES THAT PASS THE GOF TEST
In Section IV-D we have shown that it is preferable for a player to be honest than to declare values that do not pass the GoF test. However, players can declare values that pass the GoF test but are different from the real preferences. This occurs, for instance, if the declared values follow a uniform distribution different from the real uniform distribution of the player. Therefore, in this section, we generalize the result of Corollary 7 to any dishonest strategy.
In the remainder of this section we use the following concepts. The strategy of player (c, i) is realized by its declared valuesθ c,i . Since these values pass the GoF test and thereforė θ c,i =θ c,i = z, they follow a uniform distribution in [0, 1]. We assume that they are defined by a bi-variate density function σ j (x, z) = σ j (θ j ,θ j ) that relates the real preferences x and the declared values z. I.e.,θ c,i (r) is chosen randomly from a distribution with density function σ j (θ c,i (r), z) on z.
Since these declared valuesθ c,i pass the GoF test, it holds that the marginal distribution f z (z) of σ j (x, z) on z satisfies For the declared values we have that A similar result could be obtained for the aggregated player −(c, i). From Corollary 4, given that z is the maximum of n − 1 independent uniform random variables, we conclude that z ∼ Beta(n − 1, 1). Hence, we have the following result. Proof: The expected utility can be computed as where the equality follows from Lemma 2 and the independence between the valuesθ (c,i) (i.e., z) declared by (c, i) and ... θ −(c,i) . Since this expression does not depend on the strategies of the players in the aggregated player −(c, i), the claim follows. Finally, using the above results, we prove the main result of this section now.
Theorem 2: A player (c, i) never obtains less normalized utility (in expectation) by being honest, i.e.,

E[ū c,i ] ≥ E[ũ c,i ], whereũ c,i is the utility obtained with the declared valuesθ c,i (which can be different fromθ c,i ). Moreover, this is true for any number of clusters and any number of players in each cluster.
Proof: Let us assume that the valuesθ c,i declared by (c, i) do not pass the GoF test. Then, from Corollary 7, the desired result follows.
We now consider that the valuesθ c,i declared by (c, i) pass the GoF test. Let us suppose that there exists a set of declared valuesθ c,i (different from the true preferencesθ c,i ) such that player (c, i) gets less utility by being honest than by declaring valuesθ c,i , i.e.

E[ū c,i ] < E[ũ c,i ].
The intuition of the rest of the proof is as follows. In this scenario, sinceθ c,i is a uniform distribution independent from the other players, this strategy used by (c, i) does not affect the utility of the other players. Let us assume the other players are all honest. Then, the total utility is higher than when all players are honest. The next step is to create a mechanism M that reproduces the allocation of multilevel QPQ with (c, i) usingθ c,i and all other players honest when also (c, i) is honest. The existence of this mechanism would lead to a contradiction with completes the proof that the strategyθ c,i does not exist.
More formally, let us assume valuesθ c,i follow strategy σ (θ,θ), which is the density probability function of announcingθ when the real preference isθ. From Proposition 10, this inequality holds for any strategy of the aggregated player −(c, i) (i.e., any strategy of the rest of players). For that reason, we consider in the rest of the proof that, except for player (c, i), all the players behave honestly. Moreover, by Proposition 9, we know that the aggregated player −(c, i) (i.e., each of the other players) obtains the same expected utility independently of whether the player (c, i) is honest or dishonest. As a result, total utility in a system in which all players are honest is smaller than the total utility in a system in which player (c, i) declares preferencesθ c,i and the rest of players are honest, i.e., Now, we define a mechanism M that behaves exactly like Multilevel QPQ but assigning to player (c, i) a preferenceθ c,i chosen with a density σ (θ,θ ) when it declaresθ c,i in Line 3. With this transformation, the probability that the resource r is assigned to player (c, i) when it is honest coincides with the probability that our mechanism allocates the resource to the player when declares the valuesθ c,i . Observe that this new mechanism M does not alter the assignment to other players and the valuesθ c,i are still independent from the preferences of the other players. Therefore, Replacing these values in (10), we obtain that . However, the above expression contradicts Proposition 3, that shows that our algorithm is optimal for honest players. Therefore, the strategy that maximizes the expected profit of the player (c, i) player is to be honest.
Remark 1: Observe that this theorem generalizes the result of Thm 10 of [9] since the model of QPQ is a particular case of the multilevel QPQ.

V. THE BENEFITS OF CLUSTERING IN PRACTICE
In this section we compare multilevel QPQ with QPQ in several dimensions: communication cost, memory used, and the expected utility of honest players. For simplicity, unless otherwise stated, we will assume in this section that all clusters have the same size n/k.

A. BENEFITS IN COMMUNICATION COST
The first dimension in which multilevel QPQ improves versus QPQ is in the total communication volume per resource assignment that are required. Let us assume, for instance, that the declared values are sent to a central relay R, which then sends them to the players. QPQ has the following sequence of actions: • Each player i sends its declared value to R. • Relay R sends the set of values declared by all players to every player. The total volume of communication is V QPQ = n + n 2 values in 2n messages.
In multilevel QPQ the sequence is as follows.
• Each player (c, i) sends its declared value to R. • Relay R send the values declared by all the players in cluster c to all the players in cluster c, for each cluster.
• The winners (c, b c ) from all clusters send their declared values in Line 15 of Table 2 to R.
• Relay R sends the set of values declared by all winners to all the players. The total volume is V mQPQ = n + k j=1 n 2 j + k + nk, in a total of 3n + k messages. Let us consider the case in which all cluster have the same size n j = n/k, ∀j. Then, Let us obtain the value of k that minimizes V mQPQ . The derivative of the above expression with respect to k is ∂V mQPQ ∂k = − n 2 k 2 + 1 + n and this is zero when which, when n is large enough, the optimal number of clusters k * is approximately This leads to an asymptotic improvement in the complexity of the volume of communication as follows.
Proposition 11: Multilevel QPQ with k clusters of the same size n/k reduces the volume of data communication with respect to QPQ by which becomes (n 1/2 ) for k = n 1/2 .

B. BENEFITS DUE TO IMPROVED MEMORY USE
In the analysis of Section IV, we have assumed that the GoF test is perfect. However, in practice, the GoF test is not perfect. This means that it can accept values that do not follow an adequate distribution (i.e., there are false positives), and can reject values that follow the distribution (i.e., there are false negatives). We claim that the performance of the GoF test improves with the length of the available history. This means that the GoF test has fewer false positives and negatives. Ideally, all players would maintain the full history of values used in all prior resource assignment rounds. However, this may not be possible since the required memory would grow without bound.

1) INCREASE OF THE HISTORY LENGTH
Let us first assume that the available memory to store the history is fixed, and compare the length of the history used in a GoF test with multilevel QPQ versus the original singlelevel QPQ. Let S 1 be the memory available to store the history values at each player in the single-level QPQ. Thus, if H 1 is the length of the history used for each GoF test, it follows that since in the one level model, a player performs the GoF test to n players.
In the multilevel QPQ, let the memory available to store history values be S 2 . Then, when the history length used per player at the cluster level is H c and at the upper level is H u , we have that Let us assume that H u = αH c , for a constant value α > 0. Then, we have that We are interested in studying the relationship between H 1 , H u , and H c when S 1 = S 2 . If we equalize the memories, we have that In the following result, we present the relation between H 1 , H u and H c for α = 1. Proof: For a fixed value of k, we consider the function g(n) = n n k +k . Since g(n) is increasing with n and lim n→∞ g(n) = k, we have that H c is, at most, k times larger than H 1 . For the lower bound, we observe that g(k) < 1, whereas g(mk) > 1, for all m = 2, 3, . . . .
We now fix k = √ n, which we proved in the previous section is the choice of k that minimizes the communication volume. For this case, (11) and (12) give respectively H c = n 3/2 n(α + 1) and H u = n 3/2 n(1 + 1/α) Hence, we have the following result.

2) INCREASE OF EXPECTED UTILITY OF HONEST PLAYERS
We now focus on a system with only honest players. As we just showed, it is possible to have a longer history using multilevel QPQ than with QPQ [9]. This means that, in practice, honest players will suffer of less false negatives in the GoF test (by definition a honest player can never have a false positive) using multilevel QPQ. We show here that this leads to a higher practical expected utility of honest players with multilevel QPQ than with QPQ. Let us first provide the expression of the expected utility of QPQ [9]. We denote by q that probability of false negative. Hence, the expect utility for a given honest player j is given by The computation of the above expression is available in the Appendix.
For the multilevel QPQ, the test of GoF is carried out in the cluster level and in the upper level. We denote by p c and p u , respectively, the probability of false negative in the cluster level and in the upper level.
We now introduce the following notation: U H is the expected utility when a player is honest and both GoF test do not fail; U u the expected utility when the declared value passes the test of the cluster level, but not in the upper level; U c the expected utility when the declared value does not pass the test of the cluster level, but it does in the upper level test; U u,c the expected utility when the declared value does not pass the test in the cluster level and in the upper level. Therefore, for the multilevel QPQ, since the GoF is done in the cluster and in the upper level, the utility of a player is given by The values of U H , U u , U c and U u,c are given in the Appendix.
When all the clusters are of the same size, it results We now show that (14) is, at most, two times (13). Therefore, we study the ratio of the utility of the two levels model VOLUME 9, 2021 over the utility of the We note that (14) decreases with p u and p c and (13) decreases with q. Therefore, the maximum over p u , p c and q of (15) is given when p u = p c = 0 and q = 1.
When p u = 0, p c = 0 and q = 1, (15) gives We observe that the above ratio increases with n and it is equal to one when n = 1 and equal to 2 when n → ∞. As a consequence, we have the following result: Proposition 14: The utility of the two level system is, at most, two times higher than the utility of the one level system.
Let us now show that in fact multilevel QPQ can achieve higher expected utility than QPQ. We show a relation between p c , p u , and q that guarantees this property Proposition 15: If n > k and (1 − p c )(1 − p u ) ≥ 1 − q then an honest player achieves higher expected utility with multilevel QPQ than with QPQ, i.e., Proof: From the assumption that (1−p c )(1−p u ) ≥ 1−q we have that Then, to have > 1 it is enough to have (see Eq. (15)) Since 1 n+k > 1 2n when n > k, then Hence, i.e., p c +(1−p c )p u ≥ q. This is always true from (1−p c )(1− p u ) ≥ 1 − q, which implies

VI. SIMULATION RESULTS
We now focus on the simulations we have carried out in this work. 4 The goal of this study is to compare the performance of multilevel QPQ with QPQ for several parameters and its impact on the utility of honest and dishonest players. While there are other mechanisms without payments that could be used to solve the problem [15], [16], they are designed for system models more general than the one considered here, and are equivalent to QPQ when deployed under the model and assumptions used in this paper.
In the experiments that we have conducted, we have used Kolmogorov-Smirnov as the GoF test. The Kolmogorov-Smirnov GoF test is given a history of values with a new value θ to test, and the reference distribution with which to compare. Then, it returns a p-value. This value p is compared with a threshold τ to determine whether θ passes the GoF test or not. If the p-value is smaller that the threshold τ , the value θ is considered to fail the GoF test.
When the history provided is in fact extracted from the reference distribution, the p-values returned are drawn from a [0, 1] uniform distribution. By construction, when the θ values provided are the preferences of an honest player, the probability of the GoF test failing (i.e., a false negative) is τ , independently of the history length (see Figure 1).
If τ q is the threshold of QPQ, and τ c and τ u are the thresholds respectively in the cluster level and in the upper level of multilevel QPQ, by definition τ q = q, τ c = p c , and τ u = p u . In our experiments, we have set these thresholds of the GoF test of multilevel QPQ and QPQ such that the impact on the utility due to the false negatives in both systems is the same. Then, following Proposition 15, we consider that Additionally, to simplify the analysis, we have considered in our simulations that p c = τ c = p u = τ u . In the Kolmogorov Smirnov GoF test, the probability of detecting that the declared preferences are different from the reference distribution (i.e., true negatives) increases with the length of the history (i.e., with the number of values used to perform GoF test; see Figure 2). These values require some memory space. (The false positive rate is not affected significantly by the history length; see Figure 1.) Therefore, the structure of the multilevel QPQ (i.e., the number of cluster and the number of players in each cluster) determines the memory available in the system. We fix the memory in both systems (QPQ and multilevel QPQ) to be the same.
Taking into account the previous considerations (thresholds and length of history), we define a baseline scenario with n = 64 players and, following Proposition 11, we consider that k = √ n = 8 clusters. We assume a single dishonest player whose strategy consists of declaring preferences that follow a Beta(1.2, 1) distribution. The threshold value is τ q = 0.03 and the length of the history of the GoF test in QPQ for each player is H 1 = 100 values (so the total memory is S 1 = 100n). In multilevel QPQ the total memory available S 2 = S 1 is distributed among the cluster level and at the upper level. We assume α = 1 (see Section V-B1), so that the history length per player at the cluster level H c is the same as the history length per cluster H u at the upper level. The value of H c and H u is obtained with Eqs. 11 and 12, and from Proposition 12 they are larger than H 1 since in all the considered cases n/k > 1. Moreover, for each set of experiments, we vary one of the parameters while the rest of the parameters are fixed. In all the experiments, we consider the following utilities: (i) the utility under the QPQ approach of the honest players (labelled as QPQ honest) and of the dishonest players (labelled as QPQ dishonest), and (ii) the utility under the multilevel QPQ approach of the honest players (labelled as ML-QPQ honest) and of the dishonest players (labelled as ML-QPQ dishonest). In both cases, we represent in each plot the mean and the 95% confidence interval of, at least, 500 values we obtained for the normalized utility, which is: where E[ū c,i ] = 1 1+n is the expected utility of an honest player (c, i) and u * is each of the aforementioned utilities.
In Figure 3, we compare the performance of multilevel QPQ with QPQ for different strategies of the dishonest players. The strategies under consideration consist of declaring values that follow a Beta(b, 1) distribution, where b changes from 1.05 to 1.5. We note that the Beta(b, 1) distribution with b = 1 coincides with the uniform distribution in [0, 1]. We observe that the utility of the honest players does not change substantially under multilevel QPQ and QPQ. This is as expected, since we have fixed τ q = 0.03, and hence (1 − τ q ) = (1 − q) = (1 − p c )(1 − p u ) = 0.97. However, the normalized utility of the multilevel QPQ of the dishonest player is smaller than that of obtained with the basic QPQ. This means that the multilevel QPQ approach penalizes more the dishonest player than the QPQ.
In this figure, it can be also seen that for the multilevel QPQ the utility of a dishonest player is larger than that of an honest player when the parameter of the beta distribution is smaller or equal than 1.1, whereas for QPQ this occurs when this parameter is smaller than 1.2. The main reason for this is that the parameter of the beta distribution is very close to one, in which case the beta distribution and the uniform distribution are very similar. We also observe that when the strategy of the dishonest player is far from the uniform distribution (beta parameter equal to 1.5), the utility of the dishonest players in QPQ and multilevel QPQ are very close. Besides, the GoF test is not perfect. We have seen that this can be solved by changing the threshold τ q . In fact, in Figure 4, we consider τ q = 0.1 and we note that for this case the utility of the dishonest players for multilevel QPQ and QPQ is almost always smaller than the utility of the honest players in the scenarios considered (except for the parameter 1.05). This comes at the cost of reducing the expected utility of the honest players, since they suffer more false negatives in the GoF test. Observe that honest players incur a smaller reduction with the increase of τ q with multilevel QPQ than with QPQ. Another interesting property of considering a larger the value of τ q the utility of dishonest player with QPQ and multilevel QPQ are closer, specially when the beta parameter is large.  We now study the utility when we vary the number of clusters k. In Figure 5, we observe that the utility of the honest players is again very similar for multilevel QPQ and QPQ and the multilevel QPQ approach penalizes more the dishonest player than the QPQ. Furthermore, we observe that the multilevel QPQ penalizes more the dishonest player than the basic QPQ and the penalty that the dishonest player suffers in multilevel QPQ is maximum when the number of clusters is k = 8, which is the square root of the number of players. In Figure 6, we consider a similar scenario with n = 256 players and we observe that the utility of the dishonest player follows the same pattern, that is, when k = √ n = 16, the penalty suffered by the dishonest player in multilevel QPQ is the largest. Therefore, from these simulations, we conclude that, for k = √ n, not only the volume of data communication is reduced as stated in Proposition 11, but also the penalty suffered by the dishonest player is maximized.
We now focus on the utilities when we vary the number of dishonest players from 0 to 8. We show in Figure 7 that the utilities of multilevel QPQ and QPQ are very similar for the honest players, whereas the dishonest players are more penalized for the multilevel QPQ. Moreover, we also see that the utility of the honest and dishonest players do not change substantially with the number of dishonest players, as expected. We also observe in this plot that the utility of  honest player under multilevel QPQ is slightly larger than the utility of honest player under QPQ.
We also analyze the influence of the utility when we vary the number of players from 16 to 256 in Figure 8. We remark that in all the scenarios, the number of clusters is set to the square root of the number of players. As it can be seen in this illustration, for the honest players, the utility of the multilevel QPQ and the utility of QPQ are again very similar. However, for the dishonest player, the utility for multilevel QPQ is smaller than QPQ, i.e., the multilevel QPQ penalizes more the dishonest player than QPQ. We observe that the utility of the dishonest player for QPQ does not vary substantially with the number of players. Moreover, the utility of the dishonest player in multilevel QPQ decreases with the number of players because the total available memory increases and can be used to improve the GoF test.
We study the influence of the memory used to perform the GoF test (or history length) on the utility of honest and dishonest player for multilevel QPQ and QPQ in Figure 9. For this case, we consider three different values of H 1 , which are 100, 300 and 1000. We observe that the utility of the honest players does not change with the history length H 1 , whereas that of the dishonest player decreases with the history length for both QPQ and multilevel QPQ, as expected. The main reason for this is that, with a larger memory, the GoF test performs better the task of detecting a dishonest behaviour,  which leads to a smaller utility for the dishonest player. Observe that for history length 1000 the dishonest player has normalized utility roughly 0.5. This occurs because its utility is very close to 1/2n, which is the expected utility of a dishonest when the GoF test is perfect, from Proposition 8. We also remark that, for QPQ, the utility of the dishonest player when H 1 is 1000 approximates 1/(2n) (i.e., the expected utility of a dishonest when the GoF test is perfect) and, therefore, we conclude that the utility of the dishonest player when the history length is large for QPQ and multilevel QPQ are very close.
In Figure 9 it can be observed that, with QPQ, for H 1 = 100 the dishonest player has higher utility than the honest players. As was shown in Figure 4, it is possible to deal with dishonest players that follow a strategy close to uniform by increasing the threshold τ q , at the cost of reducing the utility of the honest players. We observe in Figure 10 that this reduction can be compensated by increasing the memory. The figure shows the impact of memory size when a threshold τ q = 0.10 is used. We observe that, when H 1 is small, the utility of the honest players is larger than that of the dishonest player for QPQ and multilevel QPQ. However, when the history length is large, for QPQ and multilevel QPQ, the utility of the dishonest player is close to 1/(2n).
We also consider the effect of the history length on the utilities for a threshold value τ q = 0.1 and a parameter of  the beta distribution of the dishonest players equal to 1.05. As we saw in Figure 4, the utility of honest and dishonest players for this case is very similar. In Figure 11, we show that, for a larger value of the memory available, the utility of the dishonest player in multilevel QPQ is smaller than in QPQ and the utility of the honest players is still larger in multilevel QPQ than in QPQ.

VII. RELATED WORK
The problem of how to assign a set of resources to a fixed number of agents that act rationally has been studied in the context of Mechanism Design by different authors. In [12], [17], mechanisms based on payment systems are considered. We remark that the QPQ mechanism does not consider payments among users (as in [18], [19]). Other models in the literature such as [13], [20]- [22] assume the existence of a central agent that handles the probability distribution that characterizes the rational behaviour of the agents. This assumption has already been criticized by [23] arguing that it is not applicable in real environments.
In this work, we study an extension of the QPQ mechanism, which has been introduced in [9] and further analyzed in [15] and [16]. In [15], the authors relax the assumption of the preferences of the players to be i.i.d by considering that there is a correlation between the preferences of the players for the resources. On the other hand, in [16] they consider that in each round there are k resources to be assigned to the set of players (whereas in the work of [9] a single resource is considered in each round). The main feature of these models is that the properties of efficiency, fairness, etc. given in [9] are also achieved in the extensions under consideration.

VIII. CONCLUSIONS AND FUTURE WORK
In this article, we generalize the QPQ approach of [9] to a system with two levels. In the multilevel QPQ technique, players are divided in clusters and the QPQ approach is applied at the intra-cluster level and then at the inter-cluster level. More precisely, the multilevel QPQ firstly uses the QPQ approach to determine the winner of each cluster independently; then, secondly, the multilevel QPQ uses again the QPQ mechanism among the winners of all the clusters to determine who gets a resource.
We show that the positive properties of QPQ extend to the multilevel QPQ. First, we show that the utility of an honest player is always larger that the utility of a player that declares values different from its real distribution if a perfect detection mechanism is available. We also show that the multilevel QPQ has several advantages with respect to QPQ in terms of reduction of communication cost and use of memory, which means that multilevel QPQ is more scalable. We also study with simulations the performance of multilevel QPQ (and QPQ) when the detection of a dishonest player is not perfect. We show that in most cases dishonest players have lower utility then honest players, and that with similar amount of memory multilevel QPQ has this property in more cases.
For future work, we would like to generalize the results of this article to a multilevel QPQ mechanism with an arbitrary number of levels. We would also like to consider correlated players in multilevel QPQ. Another line worth exploring in practice is adapting the parameters of the GoF test (e.g., τ q ) to the evolution of the system (for instance the balance of resources assigned among players). Finally, we are extremely interested in studying QPQ and multilevel QPQ for unknown and variable player sets. And the desired result follows.

B. COMPUTATION OF UTILITIES OF MULTILEVEL QPQ
We now compute the values of U H , U u , U c and U u,c of player (c, i). We observe that, from Proposition 7, we have that U H = 1 n+1 and from Proposition 8, that U c = 1 2n . The rest of the expressions are given below. We can now express the utility of the player (c, i) as follows where z u represents the regenerated value in the upper level test, f (x) = 1 is the density function associated to the random variableθ c,i , and g u (z u ) is the density function associated to the random variableθ c,i . The preferencesθ c,−i of the player (c, −i) and the preferences ... θ −c of the player (−c) are independent and they follow, respectively, a Beta(n c − 1, 1) and a Beta(n − n c , 1) distribution. Additionally, they are independent of the valuesθ c,i , which follow a Beta(n c , 1) distribution. Therefore, the utility of player (c, i) when its passes the cluster level test but not the upper one is where z c and z u represent the regenerated value in the cluster level and in the upper one, and f (x) = 1 is the density function associated to the normalized preferencesθ c,i , which follow a U (0, 1) distribution, g c (z c ) = 1 is the density function associated to the random variableθ c,i , and g u (z u ) = n c · z n c −1 u is the density function associated to the random variableθ c,i . The preferencesθ c,−i of the aggregated player inside cluster c of player (c, i) and the preferences ... θ −c of the aggregated player of the winner of cluster c are independent and they follow, respectively, a Beta(n c − 1, 1) and a Beta(n − n c , 1) distribution (see Corollary 1 and Corollary 3). Then, the utility of player (c, i) when its does not pass any test is