Best Effort Workload Disparity Minimization in Multi-Agent Systems With Capacity Constraints

In this article, we propose a load balancing problem formulation where agents cooperate with the aim of simultaneously minimizing both the workload disparity among the agents and the overall workload transfer, under network capacity constraints. Notably, in our computational setting, the network is not just a device for the distributed solution of an optimization problem; on the contrary, the problem shares the same sparsity pattern as the network, and this aspect allows us to solve it without the need for the agents to store large amount of data. In particular, while the load balancing process occurs over directed links, agents' communication is assumed to be bidirectional. For this optimization setting, first, an optimality condition is derived; then, a provably convergent distributed algorithm to compute the optimal solution is developed, and an upper bound on the convergence rate is characterized. Simulation results are provided to corroborate the validity and performance of our theoretical findings.

out in a centralized fashion, such as [15], [16] where the concept of load balancer, i.e., an entity that performs all the computations, is introduced.
In recent times, due to the increasing computational capability of computers and industrial machines, and the pervasiveness of information and communication technologies, we are observing a paradigm shift from centralized to more agile and horizontal schemes (e.g., in the cases of Internet of Things [17] and Industry 4.0 [18]), where entities cooperatively take decisions, with no need for central coordination due to scalability and complexity reasons. In the context of load balancing, this would allow us to scale up the dimension of the balancing problem without the issue of having a single point of failure (i.e., the monolithic data center responsible for the computations).

A. State of the Art
Motivated by such an operational setting, several distributed load balancing approaches have been provided in the literature. Indicatively, in [19], a hierarchical approach that relies on the construction of virtual trees among the nodes in the network is provided; in [20], an algorithm is provided that balances the loads in spite of network unreliability; in [21], an algorithm is proposed to balance the loads of the base stations of a cellular network by suitably assigning the users; in [22], a noncooperative game theoretic approach was used to model and solve the load balancing problem and derive a distributed algorithm, which provides a near-optimal solution; in [23] and [24] noncooperative algorithms based on Wardrop equilibria and mean field game theory are developed; in [25], an algorithm is proposed for the workload balancing on nonidentical parallel machines, based on a min-max optimization strategy. In [26], the problem is cast in the field of power dispatching. Specifically, the objective is to find a generation plan for n generators in a grid, characterized by a capacity and with constraints on the total required load; from a technical standpoint, a distributed solution is provided based on an average consensus protocol. Notice that, in the literature, several distributed optimization algorithms have been developed (e.g., [27], [28], [29]). However, within such methods, the nodes attempt to solve a local problem while being coupled by global constraints, while the network is just a tool for solving the problem. In other cases, each node is in charge of deciding local variables and the coupling occurs at the level of the constraints (e.g., [30], [31]). However, to the best of authors' knowledge, such approaches are fundamentally different from the proposed load balancing formulation. In fact, in [30], each variable appears at most in one coupling constraint and in [31], the coupling amounts to a constraint featuring the sum of all variables.

B. Contributions
Existing approaches typically enforce balancing without taking into account the effort required to transport or redistribute the loads across the agents. Nevertheless, in real world (networked) scenarios load transferring may require an effort in terms of bandwidth (e.g., in the case of computer networks) or actual transportation costs (e.g., in the case of flexible manufacturing), and therefore, it might be desirable to achieve balancing in a best effort sense, e.g., seeking a tradeoff between workload balance and transfer costs. Motivated by such considerations, our contributions are the following. First, we formulate a variation of the classical load balancing optimization problem for networked multi-agent systems where agents cooperate with the aim of simultaneously minimizing the workload disparity among agents and the overall workload transfer, under network capacity constraints. In particular, while the load balancing process occurs over directed links, agents' communication is assumed to be bidirectional. Second, we derive a necessary and sufficient global optimality condition for the optimization problem considered. Next, we develop a provably convergent distributed algorithm that lets the agents compute the optimal solution. Finally, simulation results are provided to corroborate the validity and efficacy of our theoretical findings. Notice that, in our computational setting, the problem shares the same sparsity pattern as the network: this aspect allows us to solve it without the need for the agents to store large amount of data.

A. Notation and Definitions
We denote vectors (and vector-valued functions) by boldface lowercase letters and matrices with uppercase letters. The set of nonnegative real numbers is denoted by R ≥0 . Let x i denote the ith component of a vector x. We use the notation x ≥ 0 (x > 0) when all of the components of x are nonnegative (positive). The inequality x ≥ y implies that x i ≥ y i for all components i.
We use v 2 to denote the Euclidean norm of a vector v. Given a vector v ∈ R n with v > 0 n , · v ∞ stands for the weighted maximum norm, i.e., We refer to the (i, j)th entry of a matrix A by A ij . We represent by 0 n and 1 n vectors with n entries, all equal to zero and to one, respectively. Moreover, we denote by 0 n×m and 1 n×m the n × m matrix containing just zeros and ones, respectively.
A function f : R n → R m is said to be positive if f (x) ≥ 0 m ∀x ≥ 0 n ; moreover, f is said to be monotonically nondecreasing if the fact x ≥ y implies f (x) ≥ f (y).

B. Graph Theory
and is said to be directed otherwise.
A graph is said to be strongly connected if each node can be reached by each other node via a path that respects the edges' orientation.
Given a graph G = {V, E}, let us define the set of matrices compatible with G as

III. PROBLEM FORMULATION
Let us consider a set of n agents, each provided with an initial load x i ≥ 0 representing goods, jobs, or data to be processed. Moreover, we assume that each agent i has a capacity b i ≥ 0, which represent the maximum load that it can process. In this setting, we assume that each agent i is able to transfer a portion of the load to a subset of the other agents, and we denote the load transferred from agent i to agent j by W ij . The aim of the agents is to redistribute the loads in order to fulfill two conflicting objectives: a) minimize the difference among the agents' loads; b) minimize the effort required for the redistribution of loads. Furthermore, we consider transmissions with a limited capacities W UB ij > 0 that cannot be trespassed. In order to model such a scenario, we consider a directed and strongly connected graph G = {V, E} with n nodes, where each node v i ∈ V represents an agent, while a link (v i , v j ) captures the ability to transmit loads from the ith to the jth agent. In order to accomplish the abovementioned task, each node aims at selecting a nonnegative value W ij for all its outgoing edges (v i , v j ) (from v i to v j ), which represents the load that has to be sent to each out-neighbor v j over G. As a result of the choice of the terms W ij for all edges, the load i (W ) at each node is given by i.e., it corresponds to the original loads, minus the outgoing, plus the incoming ones. As for the effort, we define it as where the coefficients c ij ≥ 0 account for heterogeneous costs associated to different edges. In order to mediate between such conflicting objectives, we consider an overall objective that is a convex combination of the load balancing and effort objectives, based on a parameter α ∈ (0, 1). Remark 1: The proposed load balancing problem fundamentally differs from weight balancing (e.g., see [32], [33]). In fact, within the latter class of problems, the aim is to make the sum of the incoming weights equal to the sum of the outgoing ones and the loads are, thus, equal to zero. Conversely, within the proposed formulation, the loads (i.e., the initial load plus the loads associated to the incoming edges minus the loads related to the outgoing ones) are, in general, different from zero.
The problem under consideration is cast as the following optimization problem.
For the sake of later developments in this article, it is convenient to consider the following equivalent formulation, in which (with slight abuse of the notation), we consider additional variables i and we introduce an additional constraint enforcing that i = i (W ).
Before characterizing the structure of the global optimal solution to Problem 2, we provide necessary and sufficient conditions for the existence of an admissible solution.
Remark 2: A necessary condition for the existence of an admissible solution to Problem 2 is that In fact, by construction, it holds Remark 3: Let us define A sufficient condition for the existence of an admissible solution to Problem 2 is that (3) holds true and In fact it can be noted that ν i represents the maximum amount of load in excess at node v i with respect to the capacity b i (i.e., in the worst case where the load transferred along each incoming link is equal to the upper bound); if such a quantity is smaller than the sum of the upper bounds over all outgoing edges to node v i , then node v i can satisfy the capacity constraints. From now on, we make the following assumption. Assumption 1: A feasible solution exists for Problem 2.

A. Global Optimality
Let us now establish that the global optimality solution to Problem 2 corresponds to a solution to the following problem.
Problem 3: Find * ∈ R n , W * ∈ A G , γ * ∈ R n that satisfy (6c) Theorem 1: A solution * , W * is a globally optimal solution for Problem 2 if and only if there exists a vector γ * ∈ R n such that * , W * , γ * solve Problem 3. Moreover, the global optimal solution * , W * to Problem 2 is unique.
The abovementioned theorem establishes that the optimal solution to Problem 2 corresponds to a zero of the function defined by (6a)-(6c). Therefore, in the following, we design a distributed algorithm to compute a zero of such a function.
Notice that, later in the article, while proving convergence of our algorithm we establish that (6a)-(6c) have a unique zero, thus implying that, being γ * a zero of (6a)-(6c), γ * is unique.

IV. DISTRIBUTED IMPLEMENTATION
In the previous section, we established that the optimal solution and the optimal Lagrange multipliers {W * ij , * i , γ * i } correspond to a zero of the function defined by (6a)-(6c). Exploiting the fact that such a function shares the same sparsity pattern as the network, we develop a distributed algorithm where each agent is required to maintain just a scalar dynamical equation, based on local communication only.
In particular, although loads are exchanged according to the directed graph G, we assume that the nodes are able to communicate over a communication graph that corresponds to the undirected version of the graph G, i.e., the nodes are able to communicate with both the in-and out-neighbors over G. From the point of view of the ith agent, the distributed algorithm is given by where γ(k) = γ 1 (k) γ 2 (k) . . . γ n (k) T and the function f i : R n → R is given by where β i ∈ R ≥0 is a convergence gain to be defined later Alternatively, the dynamics for all agents can be written in a compact form as where f : R n → R n is the stack of the functions f i . Notice that, within the proposed algorithm, at each time instant each agent broadcasts γ i (k) to all its in-and out-neighbors (over the undirected communication graph) and receives, overall, |N + i | + |N − i | messages for each time instant.
In order to prove asymptotic convergence of the proposed distributed algorithm to the global optimal solution, we now prove some properties, namely, positivity and monotonicity.
Proof: See Appendix B.
As a consequence of the abovementioned proposition, we now establish monotonicity of f i (γ).
Proposition 2 (Monotonicity): Proof: See Appendix C. Remark 4: Note that, since we assumed the nodes are able to communicate with both their in-and out-neighbors over G, the upper bound for β i given in (11) can be computed locally at each node.
Before we proceed with the proof, we provide a definition and a theorem that are used for the development of our proof.
Definition 1 (Definition 2, [34]): A function f : R n + → R n + , given by the iteration γ(k+1) = f (γ(k)), is said to be a contractive function if it, for all γ ≥ 0 n , satisfies the following conditions. 1) Positivity: In Theorem 2, it is shown that contractive functions f (·) define contraction mappings, which implies that such functions have unique fixed-points and linear convergence rates.
Theorem 2 (Theorem 1, [34]): If f is a contractive function, then it has a unique fixed point γ and for every initial vector γ(0) ≥ 0, the sequence γ(k + 1) = f γ(k) converges linearly to γ , with We are now in position to prove convergence. Theorem 3: Suppose Assumption 1 holds true and let β i < β i , where β i is defined in (11). Then, f (·) is a contraction mapping and for all γ(0) ≥ 0 n the proposed distributed algorithm converges linearly to the unique fixed point γ * , which corresponds to the optimal Lagrange multiplier for Problem 2.
Proof: Since the function f (γ) is positive and monotone, we need to show that it also satisfies the contractivity condition in Definition 1. Therefore, we need to show that f (γ + v) ≤ f (γ) + cv. For node i, we can show that and Note that it can be easily deduced that if γ i > γ j , we can easily find v i and v j such that can be a function of γ.
In vector form, we obtain matrix M (γ).
with v being the Perron-Frobenius eigenvector.
At this point we observe that (3) guarantees that not all L i are equal to 0. Therefore, it can be deduced that ρ(M ) < 1 holds for all M ∈ M.
We established contractivity of f . Since f is also positive (Proposition 1) and monotone (Proposition 2), all conditions of Theorem 2 are satisfied, and f is a contraction mapping with linear convergence rate and a unique fixed point γ * . By Theorem 1, γ * is a fixed point of f if and only if it corresponds to an optimal Lagrange multiplier for Problem 2. Therefore, we conclude that Problem 2 has a unique optimal Lagrange multiplier γ * .
Remark 5: In order to characterize an upper bound on the convergence rate, we consider the exact value of ρ(M † ); hence, the convergence rate depends on the activated constraints, which generate set M s .

V. NUMERICAL VALIDATION
In this section, we numerically demonstrate the effectiveness of the proposed distributed algorithm. Specifically, we consider a graph G = {V, E} composed by |V | = 8 nodes and |E| = 16 edges, as depicted in Fig. 1. In more detail, the stacked vector of the initial load for each node is x = [10,20,30,40,50,60,70, 80] T while the stacked vector collecting the nodes' capacity is b = [44, 44, 44, 44, 80, 80, 80, 80] T .  As for the links, for the sake of simplicity, we assume that each link (v i , v j ) is characterized by a maximum capacity W UB ij = 100 and a costs c ij = 1. In the example, we choose α = 0.7 in order to model a situation where the load balancing objective is assumed to be slightly more important than the minimization of the effort.
Notice that the condition on β i given by (11) corresponds to the requirement that min i=1,...,n β i ≤ 0.1235; in our simulation, we consider β i = 0.07 for all nodes v i . In order to have a baseline, we compute the global optimal solution via a centralized quadratic programming solver, obtaining while the objective function assumes a value equal to 5916.22. In Fig. 2, we show the evolution of the error γ(k) − γ * v † ∞ and the upper bound ρ(M † ) k γ(0) − γ * v † ∞ obtained considering the worst case scenario discussed in the proof of Theorem 3. As shown by Fig. 2, the proposed algorithm converges to the global optimal solution. Moreover, we observe that the upper bound is quite tight. Let us now compare the proposed algorithm against the distributed dual subgradient method (DDSM) [35]. It should be noticed that within DDSM the agents need to resort to point-to-point communication; in fact, a node i must selectively send its local estimate of the multiplier associated to the jth constraint to each neighbor j, picking out exactly that agent among all of its neighbors. Conversely, within our algorithm, at each time instant node i communicates by broadcasting its value γ i (k) to all its neighbors, e.g., the transmitter simply broadcasts a message which will be received by any other agent within its range of transmission. In our opinion, this latter communication approach represents a more effective choice, since it can be more easily implemented and provides a higher robustness to the system. Moreover, our approach at each iteration only requires the distributed computation of a scalar equation for each agent, i.e., γ i (k + 1) = f i (γ(k)), while DDSM at each iteration also requires all agents to maintain a larger set of variables (i.e., order of the number of neighbors for each node), to locally solve an optimization problem and to execute a projection, thus having, in general, higher computational complexity. However, it is worth noting that while the proposed algorithm is tailored to the particular optimization problem considered in this article, DDSM is a general solver.
In Figs. 3 and 4, we compare the performance of the proposed algorithm against DDSM (considering a step size that decays linearly with time 1 and neglecting the computational time required by DDSM to solve the local optimization problems at each step). According to the figure, it can be noted that the proposed algorithm exhibits remarkably faster convergence than DDSM. In Fig. 5, we consider randomly generated and connected Erdös-Renyi graphs with density equal to 0.5 (i.e., 50% of the edges in a complete graph) and |V | ∈

VI. CONCLUSION
In this article, we proposed a variation of the classical load balancing problem where a multi-agent system cooperates in order to simultaneously minimizing the workload disparity among agents and the overall workload transfer costs, while taking into account network capacity constraints. This work is motivated by the observation that in real world scenarios load transferring may require an effort in terms of bandwidth (e.g., in the case of computer networks) or actual transportation costs (e.g., in the case of flexible manufacturing), and therefore, it might be desirable to achieve balancing in a best effort sense, e.g., seeking a tradeoff between workload balance and transfer costs. For this problem setting, first, an optimality condition has been derived; then, a provably convergent distributed algorithm to compute the optimal solution has been developed.
Future work will be focused in extending the proposed approach along the following directions. 1) Consider the case for which the communication network is also directed. In this case, agents will not be able to update their state based on information from out-neighbors and other distributed approaches should be adopted. 2) Consider time-varying workloads (i.e., the workload at each node may vary over time) in which a dynamic balancing is required.
3) Allow for a time-varying network in which agents may arrive and/or leave over time.

APPENDIX A PROOF OF THEOREM 1-OPTIMALITY
In order to prove the result, we observe that, by construction, the objective function of Problem 2 is convex and all constraints are linear. Hence, Problem 2 has a unique global optimal solution that satisfies the conditions of the Karush-Kuhn-Tucker (KKT) Theorem (e.g., see [36]). Specifically, let us consider Lagrangian multipliers γ i , λ i , μ i , Φ ij and Ψ ij , each associated to a constraint belonging to the first, second third, fourth and fifth sets of constraints of Problem 2, respectively. In the following, we use g(·)| * to denote the evaluation of a function at the optimal value of the variables and Lagrange multipliers.
Notably, due to the complementary slackness condition μ * i * i = 0, the optimal solution must be such that it either holds μ * i = 0 or * i = 0. We claim that * i = 0. In fact, when * i = 0 and x i > 0 we have that an infinitesimal increase in * i would result in a reduction of both objectives (i.e., load balancing and effort); hence a solution featuring a term * i = 0 would not be optimal in this case. Similarly, when * i = 0 and x i = 0 two cases are possible: Case (i) would not correspond to a global optimal solution, as an infinitesimal increase in * i would result in a reduction of both objectives in Problem 1; hence, case (ii) must hold. However, also in this case it is possible to slightly modify the solution in order to improve it. Specifically, it is sufficient to select a node v j such that (v j , v i ) ∈ E (we are guaranteed v j exists since the graph G is strongly connected) and transfer a quantity ω ∈ (0, * j ) from v j to v i . As a result, the new loads at nodes v i and v j are equal to ω and * j − ω, respectively, while all other loads are unchanged; moreover, the new amount of load transferred from v j to v i is equal to ω, while all other of W * ij are unchanged. Notice that the difference κ between the objective function attained for the new solution and the original one is equal to κ = (α( j − ω) 2 − α 2 j + (1 − α)c ji ω 2 )/2, and therefore the new solution corresponds to an improvement with respect to the original one if κ < 0, i.e., if ω ∈ (0, (2 j )(α + (1 − α)c ji )). We established that case (ii) would not correspond to a global optimal solution. Therefore, it must hold * i = 0. This, in turn, implies that μ * i = 0. In light of this, the requirement that ∂L(·)/∂ i | * = 0, where L(·) is the Lagrangian function and is not given here for the sake of brevity, reduces to * and we observe that, since the KKT conditions require that λ * i ≥ 0 and * i ≥ 0, it holds: γ * i ≥ 0. Moreover, by (15) and by the complementary However, since the optimal solution corresponds to the minimum value for * i and since γ * i ≥ 0, we conclude that * Now, by using the KKT condition ∂L(·)/∂W ij | * = 0, for all By the KKT complementary slackness condition Ψ * ij W * ij = 0, we have that Ψ * ij = 0 when W * ij = 0; similarly, by the KKT complementary slackness condition Φ * ij (W * ij − W UB ij ) = 0, we have that Φ * ij = 0 when W * ij = W UB ij . Therefore, (17) is equivalent to W * ij = min max 0,

APPENDIX B PROOF OF PROPOSITION 1-POSITIVITY
Ignoring the upper bound in γ i (for simplicity of exposition), we have that f i (γ) = f i (γ), where with I ij (γ) defined in (14) and In the following, where understood, we omit the dependency of I ij (γ), J ij (γ) on γ. Notice that, by construction, it holds Moreover, we have that, by construction Hence, f i (γ) satisfies which is positive for β i < β i . When we invoke the upper bound in γ i (k) (i.e., γ i ≤ αb i ), the positivity of f i (γ) is not affected (originally ignored to make the exposition of the analysis easier).

APPENDIX C PROOF OF PROPOSITION 2-MONOTONICITY
From Proposition 1, we have that all the terms are nonnegative or positive. Assuming there exist no constraint for γ i due to (9a) and (9b), then f i (γ) is also linear. Due to positivity and linearity of f i (γ) in the set of γ for which no constraint is activated, then for γ ≥ γ, f i (γ ) ≥ f i (γ), i.e., the system is monotone 2 . Since for the linear case, we established monotonicity, we want to see what happens with constraints (9a) and (9b). First, observe that if γ i ≥ γ i and all the other terms in γ and γ are the same, then it can be easily shown that f i (γ ) ≥ f i (γ), even when (9a) and (9b) are invoked. Second, for other nodes, again it can be easily shown that if γ j ≥ γ j (and all the other terms remain the same), then W ij (·) − W ji (·) decreases, and therefore, f i (γ ) ≥ f i (γ). Given these two observations, it can be deduced that f i (γ) is monotone ∀i ∈ N.