Distributed Anytime-Feasible Resource Allocation Subject to Heterogeneous Time-Varying Delays

This paper considers distributed allocation strategies, formulated as a distributed sum-preserving (fixed-sum) allocation of resources over a multi-agent network in the presence of heterogeneous arbitrary time-varying delays. We propose a double time-scale scenario for unknown delays and a faster single time-scale scenario for known delays. Further, the links among the nodes are considered subject to certain nonlinearities (e.g, quantization and saturation/clipping). We discuss different models for nonlinearities and how they may affect the convergence, sum-preserving feasibility constraint, and solution optimality over general weight-balanced uniformly strongly connected networks and, further, time-delayed undirected networks. Our proposed scheme works in a variety of applications with general non-quadratic strongly-convex smooth objective functions. The non-quadratic part, for example, can be due to additive convex penalty or barrier functions to address the local box constraints. The network can change over time, is not necessarily connected at all times, but is only assumed to be uniformly-connected. The novelty of this work is to address all-time feasible Laplacian gradient solutions in presence of nonlinearities, switching digraph topology (not necessarily all-time connected), and heterogeneous time-varying delays.

not practical as communication links are digital and of limited bandwidth. In addition, communication links may experience delays (e.g., due to packet re-transmission). Under limited network bandwidth and/or latency, the existing solutions may not be optimal or may lose feasibility. The feasibility constraint ensures the balance between assigned resources and the overall demand. The proposed strategies can be used for resource management over Cloud infrastructure (as in [1]) while considering heterogeneous delays over links, local box constraints on the states, and quantized (discrete-valued) communications among the servers.

A. RELATED LITERATURE
The classical work in the context of constrained distributed optimization mainly assume linear and ideal communication and data transmission, e.g., see the seminal work by [12] which considers a Laplacian-based constraint on the states. Some recent works consider unconstrained distributed optimization [13] and consensus optimization [14] subject to communication delays, resource allocation over open networks subject to arrivals/departures of nodes [15], and double averaging and projection-based solutions over static digraphs [16]. Bit allocation for distributed optimization setups is also considered in [17]. In resource allocation, solution feasibility is crucial for the resource-demand balance (at the termination point of the algorithm) to avoid service disruption and even system breakdown [18]. The Laplacian gradient solutions benefit from anytime feasibility [10], [18], i.e., the sumpreserving equality-constraint holds at all iterations of the algorithm, in contrast to asymptotic feasibility in the primaldual and ADMM-based solutions [19], [20], [21], [22], [23]. Some other concerns in distributed resource allocation are: (i) uniform-connectivity in case of dynamic and sparse mobile networks in contrast to all-time connectivity [19], [20], [21], [22], [23]. (ii) Latency over the network to account for possible time-varying heterogeneous delays due to data exchanges over unreliable communication links between agents or even asynchronicity between the nodes. The time-delays may even cause feasibility gap in the allocation algorithm [24]. Finally, (iii) possible nonlinearities in the model mainly due to quantization [1] and discrete-value optimization [2] in contrast to the ideal linear models.
Some example resource-allocation applications include economic dispatch and generation control over the smartgrid [10], [25], [26], [27] and CPU scheduling over the network of data centers [1]. Apart from these quadratic models, some other works also address possible non-quadratic objectives [28], [29], [30] to span more application scenarios. In general, nonlinear dynamics (either inherent to the system model or additive constraints due to limited capacity/storage) are prevalent in practical applications and cannot be addressed with the existing linear algorithms. Some examples are: the ramp-rate-limit on the generators' dynamics for automatic generation control [27], impulsive-noise resiliency in consensus algorithms [31], or convergence in finite/fixed-time [4].
Quantization or clipping (and general strongly signpreserving odd nonlinearities) over dynamic networks while satisfying distributed anytime-feasibility in presence of (possible) time-delays are not addressed in the existing optimization works (to our best knowledge). For example, the work [24] addresses homogeneous time-delays at all links with some feasibility-gap over switching undirected (all-time) connected graphs. Possible local box constraints on the states may add even more complexity to the model [18]. Recall that some of the mentioned model constraints are discussed in consensus literature [2], [3], [4], [31], [32], [33], but not welladdressed in their general form in the networked optimisation research and this paper fills the gap.

B. MAIN CONTRIBUTIONS
The proposed distributed allocation protocol in this work is (i) anytime-feasible (or primal feasible) and (ii) with the possibility to address nonlinear factors on the exchanged data over the network, due to, e.g., utilization of quantized values for more efficient usage of network resources or limited available bandwidth that may cause clipping. In such nonlinear setups, the existing linear methods may fail the feasibility constraint or converge to a sub-optimal solution (or even diverge). Our nonlinear model converges (a) exactly under logarithmic quantization (as a sector-bounded nonlinearity) and (b) with ε-accuracy under uniform quantization (as a non-sectorbounded nonlinearity). In the uniform quantization case, we find the quantization level to ensure convergence to the εneighborhood of the optimizer and meet certain ε-accuracy. Our solution paves the way for the use of bandwidth-limited or fast bandwidth-efficient algorithms subject to quantized values or to address the trade-off between ε-accuracy and the limit on the network bandwidth. We derive the general sufficient condition on the nonlinear mapping to preserve alltime feasibility in presence of latency and nonlinearities, and converge to the exact optimizer or within its ε-neighborhood. Further, (iii) we take possible data-exchange delays into account and provide two solutions for known and unknown heterogeneous time delays over the network. We explicitly find a (sufficient) max bound on the time-varying delays to not violate the algorithm convergence (for a given step rate). Our delay-tolerant algorithm leads to no feasibility gap under a general heterogeneous framework (and odd sign-preserving nonlinearities). Further, (iv) this work accounts for possible change and dis-connectivity of the network, i.e., uniformconnectivity instead of all-time network connectivity as in [1], [18], [24]. We provide (quadratic) CPU scheduling subject to quantized data transmission as an example application, even though the solution works for general non-quadratic models. To our best knowledge, no work in the literature addresses the contributions (i)-(iv) altogether.

C. PAPER ORGANIZATION
Section II states the problem, definitions, and preliminary lemmas. Sections III and IV provide the proposed distributed nonlinear protocols and the proof of feasibility and convergence under latency. Section V presents the simulation results and Section VI concludes the paper.

II. PROBLEM STATEMENT
General Notation: · 2 denotes the 2-norm. ";" denotes the column vector concatenation. The gradient is defined denote the element-wise version of <, ≤, >, ≥ operator for vectors. span{·} denotes the linear span of a vector. 1 n and 0 n are vectors of all 1 s and 0 s of size n, respectively. RHS and LHS abbreviate right-hand-side and left-hand-side (of an equation). (·) denotes the transpose.
The constrained optimization problem considered in this paper is in the following general mathematical form: with y = [y 1 ; . . . ; y n ] ∈ R n and y i as the resource assigned (or to be assigned) to the agent i. The fixed parameter b ∈ R represents the fixed amount of total resources, and a = [a 1 ; . . . ; a n ] ∈ R n + is a general weighting vector. The function f i (y i ) at agent i represents the cost as a function of assigned resources to agent i. By change of variable as x i = a i y i , the problem turns into the following simpler (sum-preserving) form: The cost functions f i : R → R are strongly-convex 1 and smooth at all agents. This is defined later in the following assumption. We make the following assumption on the objective function.
Assumption 1: The local objectives f i (x i ) : R → R, i ∈ {1, . . . , n} are strongly convex, proper, and closed with locally Lipschitz derivatives such that 2v Note, however, that the assumption of strong convexity is for determining the algorithm's convergence rate, while strict convexity 0 is sufficient for the proof of convergence. For quadratic cost strong and strict convexity are equivalent. The ratio u v ≥ 1 in Assumption 1 is referred to as the condition number of f i and, for example, equals to 1 for quadratic objective functions (e.g., for CPU scheduling), which may affect the rate of convergence in general distributed optimization problems [34].
In some applications, there are certain box constraints on the states as m i ≤ x i ≤ M i . One can remove such constraints in P 1 by adding some convex penalty functions [35] or barrier functions [18], [36] (as discussed later). Recall that the sum of the strongly convex f i (·) and a convex penalty function is strongly convex 2 . In general, the penalty and barrier functions are convex but not necessarily quadratic, and adding them to the objective function makes it non-quadratic. Therefore, such problems cannot be addressed by general consensus-based solutions assuming a quadratic cost model, e.g., the solution by [1]. Some examples of general non-quadratic costs are given in [38] for linear dynamics, where no node/link nonlinear constraint is addressed.
In distributed resource allocation, the idea is to assign resources to the agents in order to solve P 1 in a distributed fashion and based on the local data exchange in the neighborhood of agents 3 (see examples in Section V). However, in practical applications, some constraints on the states or nonlinearities on the agents' dynamics may affect the convergence and solution optimality; for example, when the states take discrete (quantized) values or the communication bandwidth is limited. The main contribution of this work is to address how such possible nonlinearities and constraints can be addressed in the proposed distributed solution. Further, the conditions to reach the exact optimizer of P 1 need to be defined. For example, suppose the exact optimizer cannot be reached under certain conditions. In that case, we need to determine the εbound on the convergence, i.e., to define the furthest distance to the optimizer x * that the algorithm may converge to (known as the ε-neighborhood bound [39]).
In many existing solutions, participating nodes are assumed to be interconnected with undirected communication links. This means that the network topology forms a connected undirected graph. Note, however, that the results of this paper are suitable for balanced dynamic directed graphs as well, where the network topology may change over time, i.e., our results are valid over uniformly-connected digraphs with balanced (not necessarily bi-stochastic) weights on the incoming and outgoing links. Considering (possibly delayed) information exchange due to time delays while simultaneously handling anytime-feasibility is another contribution of our work. Anytime (or all-time) feasibility implies that the coupling resource-demand constraint in (2) holds at all times and at any termination point of our algorithm.

A. PRELIMINARY DEFINITIONS AND LEMMAS
The network of agents is represented by graph G with weight matrix W . Define its Laplacian matrix as L = D − W with D = diag[ n j=1 W i j ] and positive link weights W i j > 0. Assumption 2: Network G(k) is weight-balanced, i.e., 1 n W (k) = (W (k)1 n ) . Further, W ii = 0 and W i j > 0. The union of the network over every finite time-interval of length B, i.e., G B = k+B k G(k) for k ≥ 0, is strongly-connected, which is known as uniform-connectivity or B-connectivity 4 .
Such weight-balanced digraphs (and their weight matrices W ) can be designed using, e.g., the algorithm in [40].
Lemma 1: For a weight-balanced graph G, its Laplacian matrix L is positive semi-definite. Let y 1 ∈ R n , y 1 := y 1 − 1 n y 1 n 1 n , and λ n , λ 2 as the largest and smallest non-zero (real) eigenvalue of L s = L + L 2 . Then, the following Laplacian disagreement function satisfies, Further, given y 2 = g(y 1 ) ∈ R n as a (element-wise) monotonic function of y 1 such that, for ith element of y 1 , y 2 , 0 < κ ≤ y 2i y 1i ≤ K and (y 2i − y 2 j )(y 1i − y 1 j ) ≥ 0 for all i, j, we have Proof: The proof of (3) and (4) follows from [41]. We prove (5) in the following.
with symmetric matrix W defined as W +W 2 . The first equality above follows from the fact that 1 n L s = L s 1 n = 0 n . Following the (element-wise) monotonic property of y with respect to x, κ (y 1i − y 1 j )(y 1i − y 1 j ) ≤ (y 2i − y 2 j )(y 1i − y 1 j ) ≤ K(y 1i − y 1 j )(y 1i − y 1 j ) Using the above in (6) along with (4) proves (5).
For more information on the above, the notion of mirror digraphs in consensus literature is insightful.
Corollary 1: For uniformly-connected network G B with B > 0 satisfying Assumption 2, Lemma 1 can be restated for its Laplacian L B and its largest/smallest non-zero eigenvalue λ nB , λ 2B . In (3)-(5) holds for any symmetric positive semidefinite matrix, for example L = L L, satisfying L1 n = 0 n and 1 n L = 0 n . Then, x Lx ≤ λ 2 n x 2 2 . Definition 1: Define S b = {x ∈ R n |1 n x = b} as the set of all feasible values for x.
Lemma 2: Under Assumption 1, P 1 has a unique feasible optimizer x * ∈ S b satisfying ∇F (x * ) = ψ * 1 n , ψ * ∈ R, and ∇F (x) / ∈ span{1 n }, ∀x = x * , x ∈ S b . Proof: The proof follows from the strong-convexity of F (x) in Assumption 1 and the KKT condition [42]. For completeness, we give another proof based on level-set analysis. Define the level set for γ ∈ R as L γ (F ) := {x ∈ R n |F (x) ≤ γ }. Assumption 1 implies that all the level sets of F (·) are strongly convex [42], and thus, only one, say L γ (F ), touches the affine feasibility constraint facet S b at only one point, say y. Then, ∇F (y) is orthogonal to the facet S b , and ∇F (y) ∈ span{1 n }. For two points z, y ∈ S b on level sets γ 1 By contradiction consider both ∇F (y) ∈ span{1 n } and ∇F (z) ∈ span{1 n }. This implies either two points (i) on the same level set L γ (F ), γ = F (y) = F (z) both adjacent to the affine feasibility constraint set S b , or (ii) on two level Since S b is linear, (i) contradicts the Assumption 1 on the strong convexity of the level sets. In case (ii), y, z ∈ S b implies 1 n (y − z) = 0 and (∇F (y) − ∇F (z)) (y − z) = 0 which contradicts (7).
Note that x * defined in the above is assumed to follow the box constraints, i.e., [43]): Consider a strictly-convex continuous function F : R n → R, two points x 1 , x 2 ∈ R n , and δx := Then, from Assumption 1, for strongly convex function F , In (8)-(9) are also known as quadratic lower and upper bound equations. For Lipschitz continuous functions (9) refers to the generalized Cauchy-Schwarz inequality.
with F := F (x) − F * and F * as the optimal cost F (x * ). Proof: The proof directly follows from Lemma 2. See, e.g., [10] for the proof of (10). The proof of (11) is as follows, where the latter follows from 1 n δx = 1 n x 1 − 1 n x 2 for any two feasible x 1 , x 2 ∈ S b . Putting x 2 = x * and considering x 1 as any feasible x ∈ S b (11) follows. Note that for notation simplicity, we dropped the dependence on x (also in the rest of the paper unless needed). The next lemma follows from Assumption 1 and the fact that strong-convexity implies strict convexity.
Proof: From Lemma 3, substituting x 1 = x and x 2 = x * we get, the definition. This gives (12) which, along with (10) and taking the square roots results in (13).

III. PROPOSED DISCRETE-TIME NONLINEAR SOLUTION
In this section, we introduce a 1st-order protocol to update the state of agents at every time-step k, while considering possible nonlinear models on the data transmissions. We consider a group of n agents sharing information over nonlinear (possibly) delayed links. Following a common assumption in the literature, we assume synchronized clocks over the multi-agent network. This can be implemented by e.g., the fully-distributed algorithms proposed in [44], [45], [46] for synchronization over (wireless) sensor network. At time-step k, every agent i represents a nonlinear mapping due to, e.g., signal clipping or logarithmic quantization over the channel. The (delay-free) information update at agent i is, with k ∈ Z ≥0 as the time-index, η > 0 as the step size, and W = [W i j ] satisfying Assumption 2. In terms of implementation, at the beginning of each time slot, each node i receives the states of its in-neighbors j ∈ N − i and multi-casts (or broadcasts) its own state to its out-neighbors j ∈ N + i . Then, it updates its state x i (k + 1) based on the received information (and its own previous state x i (k)) as in (14). Similar to the existing literature, we assume collision-free packets and contention mechanisms to resolve this issue over the networked busses, where the details are out of the scope of this work and skipped here.
The vector form of the coordination protocol (14) is with L as the graph Laplacian. One can consider a more general formulation (over undirected weight-symmetric networks) by adding a post-processing step as a node nonlinearity along with the nonlinearity on the links in (14). This gives a more general formulation as, where the nonlinearities g(·) in (14) and (16) are different in general, however, both satisfy the assumption below. Assumption 3: The nonlinear mapping g : R → R is signpreserving and odd, i.e., g(z)z > 0 for z = 0, g(0) = 0. Further, and lim z→0 is "strongly" signpreserving. Further g(·) is monotonically non-decreasing.
Example nonlinearities satisfying Assumption 3 are logarithmic quantization [33], [47] and saturation (or clipping) [27]. In this work we more focus on quantization which is an inherent property of the network and at all the links. It is typically assumed that these nonlinearities are generally the same at all links; see some more examples for nonlinear robust consensus in [31]. As we see later in this section, the convergence of our algorithm under dynamics (14) and (16) is proved under fixed step size η. This is a privilege in terms of convergence rate over diminishing step sizes in some algorithms as in [20]; see some detailed discussions on this in [30]. Note that the exact convergence rate of (16) (and (14)) depends on the choice of the nonlinearity g(·). For example, it is even possible to achieve convergence in fixed or finite time by considering g(·) as sign-based nonlinearities; see, for example, [4], [27], [48].
One immediate implication of Assumption 3 is, where we drop the dependence on time-index k for notation simplicity (also in the rest of the paper unless where needed).

A. PROOF OF FEASIBILITY AND CONVERGENCE
First we discuss anytime-feasibility, i.e., under initialization x(0) ∈ S b the solution preserves its feasibility at every k (referred to as sum-preserving property). Under similar box constraints at all nodes, a simple local initialization is b n . Under heterogeneous box constraints, one can use existing results for establishing a feasible initialization in a distributed way. For example, [10, Algorithm 2], provides a finite-time algorithm to re-adjust the initial guesses (within the box constraints) even for a network of time-varying sizes.
Lemma 6 (Feasibility & Uniqueness): Let Assumptions 1, 2, and 3 hold. By any feasible initialization x(0) ∈ S b , the solution under dynamics (14) remains feasible, i.e., x(k) ∈ S b , ∀k > 0. Further, ∇F (x * ) = ψ * 1 n with x * as equilibrium under dynamics (14) and ψ * ∈ R from Lemma 2. Proof: Following Assumptions 2 and 3, we have 1 n Lϕ(k) = 0 n where the gradient is well-defined from Assumption 1. Note that this holds irrespective of network connectivity and is a direct result of symmetric weights and oddness of g(·). Then, from (15) (14), , implying that such x * is not an equilibrium of (14), which is a contradiction. Thus, for the equilibrium x * we have ∇F (x * ) ∈ span{1 n }. Using Lemma 2 and anytime-feasibility in Lemma 6, ∀x ∈ S b there is no other x = x * satisfying ∇F (x) ∈ span{1 n }. This implies that x * with ∇F (x * ) = ψ * 1 n is the unique equilibrium of (14).
Proof: Consider positive Lyapunov-type function F (k) := F (x(k)) − F * (as in Lemma 4 and 5) representing the residual cost. We prove where the latter follows from dynamics (15). From Assumption 3 and (18)- (19), and Lemma 1 (Corollary 1 for uniformly connected networks), the above is satisfied if where the strict inequality above holds for and for ξ = 0 holds the equality. In other words, F (x(k + 1)) ≤ F (x(k)) and from Lemma 2 the strict equality uniquely holds for Therefore, under the proposed dynamics (14), the function F is decreasing in time 5 till x reaches the unique equilibrium point x * satisfying (24), which is the optimizer of P 1 and the theorem follows. Note that for non-Lipschitz mapping g(·), a similar line of reasoning results in instead of (23) with ϕ = ϕ − 1 n n i=1 ϕ i 1 n . Lemma 7: Let Assumptions 1, 2, and 3 hold and x(0) ∈ S b . Following the notations in Theorem 1, for η < η the rate of decrease in F (k) under protocol (14) is and x(k) converges to x * with the rate Proof: From Lemma 4 and 5, and using the RHS of (10) we have 4vF ≤ ξ ξ which results in (26). Here, we used the fact that the term −κηλ 2 + uλ 2 n K 2 η 2 is non-positive. Combining (26) with (12) gives, For a B-connected network, from Corollary 1, the proof can be stated over time scale of size B, i.e., one can prove F (k + B) < F (k) for the B-connected network G B with eigenvalues λ 2B , λ nB . Then, the proof similarly follows. Note that the RHS of (26) and (27) give a bound on the convergence rate as a function of η. The RHS of (27) is quadratic and, for example, gives the tightest bound for η = η 2 = κλ 2 2uλ 2 n K 2 . Similar results are given for unconstrained distributed optimization, e.g., in [30].

B. UNIFORM QUANTIZATION AND ε-ACCURACY
Next, we consider the case that nonlinear mapping g(·) is sign-preserving, but not "strongly" sign-preserving. A simple example is when dg dz | z=0 = 0 as in uniform quantization (in case of finite packet size for the networking links), see Fig. 1. Uniform quantization is motivated by recent digital communication devices with a limited number of bit-transmissions. The number of bits (quantization level) defines the efficiency and accuracy and depends on the limitations of the communication equipment. For this case, the formulation (14) can be written as, with W i j > 0 for j ∈ N − i , in general. One can consider (possible) non-negative integer weights W i j ∈ N, e.g., using the distributed integer weight-balancing algorithm in [40]. These integer weights result in quantized (or discrete) allocated values x i (with level μ), e.g., for task allocation in CPU scheduling [1].
Remark 1: For g(·) representing the uniform quantizer, we have dg(z) dz | z=0 = 0 and g(z) = 0 for −0.5μ < z < 0.5μ. This implies that for −0.5μ1 n ≺ ∇F − ψ * 1 n ≺ 0.5μ1 n we have x(k + 1) = x(k) in (28). In this case, the solution reaches the ε-neighborhood of x * instead. Then, one can define this ε as follows. Since −0.5μ + ψ * < df i dx i < ψ * + 0.5μ and following the definition of ξ , we have |ξ | ≺ 0.5μ1 n and, ξ ξ < 0.25μ 2 1 n 1 n = 0.25nμ 2 From (13), which gives an estimate that how close we can get to the optimizer x * . The above remark implies that by choosing fine uniform quantization (i.e., smaller μ) the solution can get arbitrarily close to the optimizer x * . In this perspective, one can define the notion of ε-accuracy [39], i.e., the upper-bound on the quantization level μ that guarantees convergence to the ε-neighborhood of the optimizer x * . The ε-accuracy of the solution, then, follows (29), i.e., for the quantization level μ the solution is guaranteed to reach the ε-neighborhood of x * for ε ≤ √ nμ 4v .

C. DISCUSSIONS AND MORE APPLICATIONS
There exist many other applications for finite-sum resource allocation; many works [10], [49], [50] are dedicated to generator coordination over smart grids (known as the economic dispatch problem). In contrast to many existing solutions which are limited to the quadratic cost model, as in economic dispatch, CPU scheduling, and general consensus problems [51], the cost function in this paper only needs to be strongly-convex. This allows to consider the solution for many applications with non-quadratic costs, e.g., the cost function in [38], or to add extra objective terms to address, e.g., penalty terms for the so-called box-constraints σ [ Note that [u] + is smooth for c ≥ 2, and one can also use smooth equivalents for non-smooth case c = 1. In this case, non-quadratic (but strongly-convex) penalty terms in the logarithmic form σ ρ log(1 + exp(ρu)) are typically proposed [52] to replace the non-smooth penalties for case c = 1. σ is a weighting factor to tune the weight on the box constraint. It can be shown that by setting ρ large enough L(z, ρ) becomes arbitrarily close to max{u, 0}; in fact, the maximum gap between the two functions inversely scales with ρ, i.e., L(z, ρ) − max{z, 0} ≤ 1 ρ , and the two can become arbitrarily close by selecting ρ sufficiently large [52]. Similarly, some smooth and convex barrier functions are proposed in the literature (e.g., see [18], [36]). Let h i (x i ) represent a general local constraint at agent i. Then, example barrier functions are defined as, which are known as, respectively, the logarithmic and inverse barrier functions. In general the barrier functions satisfy the following: Our results can, further, help to reach a faster rate of convergence by using sign-based dynamics [27] for the nonlinear node-based solution. In discrete-time, such non-Lipschitz dynamics mandate a sufficiently small step rate to reduce the so-called chattering effect. There is a trade-off (to be properly adjusted) between the steady-state residual around the equilibrium and the step size. In applications, one can reach convergence in finite, fixed, or prescribed time irrespective of the chattering (e.g., in continuous-time [27], [48]).

IV. SOLUTION IN THE PRESENCE OF TIME-DELAYS A. DOUBLE TIME-SCALE SCHEME: UNKNOWN DELAYS
Our first solution is to update the states at a longer time-scale such that at every link one message is delivered. The following assumption (as in [32]) holds on this subsection for time-delay τ i j (k) on every link ( j, i): Assumption 4: r τ i j (k) ≤ τ where 1 ≤ τ < ∞ represents the upperbound on the time-delays (τ = 0 implies no delay). The finite integer bound τ ensures that data from agent i at time k eventually reaches agent j at most in k + τ , and also implies no missing packet.
r τ i j (k) is, in general, heterogeneous, arbitrary, timevariant, and not necessarily known. The proposed state-update under Assumption 4 is as follows: with k = k τ +1 as the new time-scale. In this method, the states get updated not in every time-step k, but every τ + 1 time-steps, representing a longer time-scale k. In other words, after initializing and sending the first messages at step k = k = 0, the next state-update and communication occurs at k = τ + 1 (i.e., k = 1) and every τ + 1 steps on k afterwards, while states remain unchanged at k = k(τ + 1). Following Assumption 4 on time-delays, it is clear that over every link ( j, i) one data-packet is received by agent i from in-neighbor j over τ + 1 steps of k. The feasibility and convergence under the delayed model are proved next. Theorem 2: Under Assumptions 1, 2, 3, and 4, with x(0) ∈ S b (feasible initializing), the states x(k) ( and x(k)) under protocol (31) remain feasible and converge to the optimal solution of (2) for 0 < η < η.
Proof: The proof of solution feasibility follows similar to Lemma 6 by considering the state-update over time-scale k. The uniqueness of the optimizer is similar to considering uniform-connectivity of the network over B + τ . The convergence to the optimizer follows a similar analysis as in Theorem 1 over the time-scale k. For two feasible states x(k + 1), x(k) ∈ S b define δx(k) =: x(k + 1) − x(k), δF (k) =: F (x(k + 1)) − F (x(k)). Then, following the same line of reasoning as in (20)- (25), one can find the same bound on the convergence step rate as η < η.
Note that the convergence of this double time-scale scheme is slow, as agents need to wait for a while to receive the delayed information and then update their states. However, if delays are known and symmetric over bidirectional links, states can get updated at the same time scale. It is clear that this gives faster convergence, as discussed next.

B. UPDATE AT THE SAME TIME-SCALE: KNOWN DELAYS
Our other solution is to update the state at the same time-scale k. We make the following assumptions for this case. r The network G is undirected with symmetric linkweights and time-delays for both sides of the links are the same, i.e., W i j = W ji and τ i j (k) = τ ji (k). The uniform connectivity follows similar to Assumption 2. The state of every agent i is updated based on all the (possibly delayed) information received from neighbours at time k + 1 as they arrive. Note that since the delays are assumed to be heterogeneous, the received information at time k is, in general, sent over the range [k − τ , k] (the last τ timesteps). Also, assuming time-varying delays, it is possible that at time-step k agent i receives more than one packet (from in-neighbour j). This makes the solution more challenging in terms of satisfying anytime feasibility. Recall that, for anytime feasibility, n i=1 x i (k + 1) = n i=1 x i (k) needs to be hold at every time k, which is satisfied by synchronous messaging over both directions (i, j) and ( j, i) of every link. For the same reason, the weights of all bidirectional links are designed symmetrically. We discuss this more in the feasibility analysis in Lemma 8. The proposed single time-scale protocol in the presence of time delays is as follows.
where I is the indicator function, Note that I k−r,i j (r) = 0 indicates the message received at time k with time-delay τ i j = r over the link ( j, i) (i.e., sent at time k − r). In general, we assume I k−r,i j (r) = 0 for at least one pair (i, j ∈ N − i ) at every time k. This assumption simply means that at least one message is delivered over the network at every time k, and is only required to ensure that the cost monotonically decreases at every time step under the proposed dynamics. However, without this consideration, the solution still converges over time. The following remark relaxes Assumption 5 by using definition (33).
Remark 2: As a follow-up to Assumption 5, in case of known but asymmetric time-delays at bidirectional links, say τ i j = τ ji < τ , both agents i, j can process their mutual information based on the known max delay τ (or possibly known max{τ i j , τ ji } ≤ τ ) on the shared mutual link, i.e., I k,i j (τ ) = I k, ji (τ ) = 1 instead of I k,i j (τ i j ) = 1, I k, ji (τ ji ) = 1, τ i j = τ ji . This implies that both agents apply (process) their shared information at the same time. This can be thought as a combination of the two schemes in subsections A and B.
Lemma 8: The solution under (32) and Assumptions 1, 2, 3, and 5 is anytime feasible with unique equilibrium x * as the optimizer of P 1 .
Proof: Following Assumption 5, for every pair of links ( j, i) and (i, j) we have W i j = W ji , τ i j = τ ji , I k,i j (τ i j ) = I k, ji (τ ji ) = 1, and I k,i j (r) = I k, ji (r) = 0 for r = τ i j , τ ji . Therefore, This implies that n i=1 x i (k + 1) = n i=1 x i (k) and the feasibility follows for all k ≥ 0. The uniqueness follows similar to Theorem 2 by considering uniform-connectivity over B + τ . This completes the proof.
Theorem 3: Under Assumptions 1, 2, 3, and 5, with x(0) ∈ S b , solution under protocol (32) converges to the optimizer of P 1 for, with η given in (23). Proof: The proof follows from Lemma 1. First, consider a homogeneous case where τ i j = τ , i.e., agents' states at any iteration k next updates at k + τ + 1 (and every τ + 1 steps afterwards). The bound on η, then, follows from Theorem (2) and (23). Next, for general (heterogeneous) delays two cases are possible: Case (i), time-invariant (fixed) delays at all links where every node i receives only one (possibly) delayed packet from j ∈ N − i and δx remains the same as in (20)- (25); this gives the same bound as η < η. Case (ii), for general time-varying delays (satisfying Assumption 4), node i receives at most τ + 1 delayed packets from the nodes j ∈ N − i ; and, thus, η needs to be down-scaled by τ + 1 to ensure convergence. This is because (15) is scaled by δx = −(τ + 1)η τ Lϕ(k) in the proof of Theorem 1 at step k and, following the same line of reasoning as in (21)- (25), (τ + 1)η τ < η guarantees F (k) < F (k − 1) for x(k − 1) = x * . This completes the proof.

C. CONVERGENCE AND OPTIMALITY DISCUSSIONS
Remark 3: The following remarks are noteworthy: i) For time-invariant delays, one can further relax the upper-bound in (34). In this case, at every time-step k ≥ τ + 1, agent i receives only one packet from agent j ∈ N − i . Following the same line of reasoning as in the proof of Theorem 1 and 3, the solution (32) converges for η τ < η. ii) Convergence under protocol (32) is faster than (31) for the same step rate η = η τ , since the time-scale k is τ + 1 times longer than time-scale k. However, for general time-varying delays, the solution (32) may not necessarily converge for η τ +1 ≤ η τ ≤ η, while solution (31) converges. iii) Recall that in contrast to logarithmic quantization, uniform quantization is not "strongly" sign-preserving (see Fig. 1 for 0 < μ < 1). In this case, the lowerbound in (17) does not hold for any κ > 0. Similarly, in case that g(z) z is not upper-bounded, e.g., g(z) = z|z| ν−1 as in finite-time control/consensus [4], the bound (17) in Assumption 3 does not hold. In such cases, and similar sign-preserving odd mappings, the convergence of discrete-time protocol (14) to the exact optimizer cannot be guaranteed and one can prove convergence to an ε-neighborhood of the optimizer, e.g., see [53]. Note that, in this case, function g(·) is non-Lipschitz at the origin, i.e. lim z→0 g(z) z = K → ∞. From (25), the solution converges for all η satisfying For non-Lipschitz case, if x(k) − x * 2 2 → 0, we have implying that to reach the exact optimizer x * we need η → 0. Next, given 0 < η < η we want to know how close can we get to the optimizer x * and optimal value F * . Using (13) we have Therefore, for given η < η, This means that we cannot get arbitrarily close to the optimizer. Recall that in the case for which the nonlinearity is Lipschitz, as x(k) − x * 2 2 < ε → 0 the RHS of (36) always remains greater than 1 16u 2 K 2 which is satisfied for 0 < η < η via (23) (as shown in the proof of Theorem 1). However, for non-Lipschitz mapping g(·), the RHS → 0 and therefore the inequality cannot be satisfied for ε → 0 as the LHS is a positive number, and steady-state non-zero residual follows (35). For continuous-time dynamics, however, the convergence to the optimizer can be proved, e.g., see the results in [27], [48]. iv) The upper-bound η on the step-rate inversely depends on the Lipschitz constant u of the objective function f i . For a fixed u, a larger value of v ≤ u implies tighter bound on the convergence rate in RHS of (26). For quadratic cost as in (41) (with u = v) we get the tightest bound on the convergence rate. v) For the nonlinear function g, the ratio κ K 2 < 1 appears in (23), while in (26) the gap between κ and K 2 affects the convergence rate. From Fig. 1, coarser quantization (larger μ) implies smaller κ = 1 − μ 2 and larger K = 1 + μ 2 . This implies a tighter bound on η in (23) and a looser bound on the convergence rate in (26). On the other hand, as given by (29), coarser quantization results in higher ε-bound and possibly larger steadystate residual. See the simulations for better illustration. vi) In presence of time-delays, feasible initialization can be done via, for example, using [10, Algorithm 2] over a longer time-scale as in Section A. This finite-time algorithm works irrespective of discrete (quantized) or real-valued information exchanges and gives many possible (quantized plus real) outputs.

A. APPLICATION: OPTIMIZING THE CPU UTILIZATION
This application focuses on optimizing the CPU utilization across servers (computing nodes/servers) in data centers by carefully allocating CPU resources to workloads in a distributed fashion. The data centers are modelled as a set of V nodes. Each node v i ∈ V can operate as a resource scheduler (which is a standard practice in modern data centers). The set of all jobs to be scheduled is J . Each job b j ∈ J , (where j ∈ {1, . . . , |J |}) is the group of tasks and requires ρ j cycles to be executed. The amount of ρ j cycles required for job b j to be executed is known before the optimization operation. At each node v i , the total workload due to arriving jobs is denoted by l i . Furthermore, the time period for which the optimization runs the jobs on the servers (before the next optimization operation for a new set of resource allocation) is defined as T h . At each node v i , the CPU capacity during the optimization operation is equal to π max i := c i T h , where c i is the sum of all clock rate frequencies of all processing cores of node v i given in cycles/second. For each node v i , the CPU availability at optimization step m 6 (i.e., at time step [m]. We have that T h at step m, is chosen such that ρ[m] ≤ π avail [m], i.e., the total amount of resources demanded meets the total available resources. This indicates that the demand does not exceed the available resources in the network. T h can be chosen appropriately to fulfill this (as the box constraints). Each node needs to calculate the optimal solution at every optimization step m by executing a distributed algorithm (with time step k). The exchanged information over the network (and the allocated CPU workloads are quantized). The algorithm can take into account such nonlinearities. In [1] every node balances its CPU such that each node utilizes the same percentage of its own capacity (under the feasibility constraint). This balancing strategy calculates the optimal workload w * i [m] to be received at optimization step m such where π max : . Note that in the remainder we drop the index m, since we consider a single optimization step. Each node maintains a scalar quadratic local cost function of the form, with α i > 0, ρ i ∈ R as the positive demand at node v i , and z as a global optimization parameter (that determines the optimal workload at each node). Each node needs to calculate the optimal parameter z * ∈ Z such that z * = arg min z∈Z v i ∈V f i (z), where Z denotes the set of all feasible values of z, e.g., as the box constraints m i ≤ z i ≤ M i . Its closed form solution for the quadratic cost (38) is From [1], in order to calculate the optimal balancing workload according to (37), we need the solution of z * to be From (40) we modify (38) as This means that each node (i) computes its proportion of workload, and (ii) from its workload proportion it can calculate to receive the optimal workload w * i equal to Recall, however, that the allocated workload by (42) gives the optimal allocation subject to the balancing constraint in (37). In other words, it is possible to reach lower CPU allocation costs by disregarding this balancing condition and considering more general cost models as where z i = z j , in general. Note the subtle difference here as the factors z i in (43) could be unequal (compared to the same z in formulation (38)). Substituting w i from (37), This convex formulation gives a lower cost by replacing the balancing constraint w i +ρ i Note that we assign the same amount of overall workloads as given by (40). The modified version of (41) is then with z ∈ Z as the box constraints. Note that this box constraint makes the solution non-trivial, in general. The formulation (45) reshapes the problem in the form P 1 . In general, the cost of workloads assigned by (41) is always less than (or equal to) the cost associated with the balanced model. We add some penalty functions to keep the servers at an operating point away from the capacity, in fact, below 70−80% of their capacity (due to the uncertainty of the processing times) since the Mean Response Time of the servers grows (exponentially) at some point [54]. As a rule-of-thumb, we address this concern by box constraints on the load-to-capacity ratios For numerical simulation, a network of n = 12 servers with the following parameters for cost function (41)     The residuals F = F (x) − F * as the Lyapunov function under the two quantization schemes are compared in Fig. 2, which is decreasing towards zero under the logarithmic quantization for η < η. The residual ε-accuracy bound from (29) is equal to ε = 9.3; this implies the worst performance under uniform quantization. The average of states (black dashed lines) are constant, ensuring all-time feasibility. In addition, to give an idea on the computational complexity of the proposed solution (14) (with quantized nonlinearities), Table 2 compares the running time of the algorithm in MATLAB R2021b Intel Core i5 @ 2.4 GHz processor 8 GB RAM using the tic-toc functions.
Over the same setup and parameters, CPU allocation under time delays over the data-transmission network is given in Fig. 3. We run the simulation over the weight-symmetric cyclic graph (with λ 2 = 0.09, λ n = 1.33) in Section IV-B assuming known but random delays. From (23) the (sufficient) bound on the step-rate is η = 7.26 for no latency and, from Theorem 3 in the presence of delays, η(τ + 1) < 7.26. The simulations for both quantization schemes and both (same and double time-scale) delay scenarios with τ = 1, 2, 4, 6 and η = 0.5 are given in Fig. 3. The arbitrary delays in the range [0 τ ] are generated via MATLAB rand. As it is clear from the figure, due to longer waiting time, the double timescale scenario converges slower; however, from Theorem 2, it always converges for any τ (for η < 7.26). In contrast the same time-scale scenario may not converge (for large τ ) if η(τ + 1) > 7.26.

B. LOGARITHMIC PENALTY + NON-QUADRATIC COST: CONDITION NUMBER AND QUANTIZATION LEVEL
Recall that the proposed solution in this paper, to solve P 1 based on the cost model (45), can optimize general nonquadratic cost functions, e.g., due to additive logarithmic or max-based penalty/barrier functions to the cost function (45). For this simulation, we consider non-quadratic cost as in [38], (46) with cost parameters b = 20, random α i ∈ [−5 5], random ω i ∈ [0 0.5], σ = 1. The logarithmic penalty term with ρ = 1 is added to the objective function with weight factor σ = 1, m i = 0, M i = 5 (implying penalty terms u 1 = [x i − 5] + , u 2 = [0 − x i ] + ). The topology switches every 20 steps between 4 (disconnected) undirected network topologies while their union is an undirected connected cycle, i.e., Assumption 2 holds for B = 80. The simulation results are shown in Fig. 4 for η = 0.0005. The simulation is done for logarithmic and uniform quantization (with μ = 0.5) as compared to the single-bit protocol [4] (with single bit of data exchange), protocol subject to saturation [27] with sat level equal to 1, classic linear solution [43], and signum-based solution for faster convergence [27], [31] with g(z) = sgn ν 1 (z) + sgn ν 2 (z) with sgn ν (z) := z z ν−1 for ν 1 = 0.5, ν 2 = 1.5, since the sign-based nonlinearity is not upper-sector-bounded we consider small values of η to resemble continuous-time dynamics, see [27] for details. We further compared the residuals under different quantization levels μ for both logarithmic and uniform quantization. From the figure, for larger μ uniform quantization results in larger steady-state residual (Remark 3-(v) and (29)) while logarithmic quantization gives faster convergence. For the logarithmic quantization, we further compared the convergence rate for different condition numbers by tuning σ . The simulation parameters are: b = 10, random α i ∈ [− 5 5], random ω i ∈ [0 0.5], μ = 1 (i.e., κ = 0.5 and K = 1.5), m i = 0, M i = 5, η = 0.0005. The network is considered as an undirected cycle with λ 2 = 0.38, λ n = 4. We change the factor as σ = [0.05, 0.25, 0.5, 1] which results in different condition numbers u v shown in the figure. In this example, for larger condition numbers the convergence is faster.

VI. CONCLUSION
The optimal allocation of resources over a weight-balanced directed network is addressed. Our nonlinear solution can provide quantized coordination with resource-demand feasibility at all times. The solution advances the state-of-theart by simultaneously addressing (i) anytime-feasibility, (ii) quantization and ε-accuracy, (iii) latency, and (iv) uniformconnectivity. The results, therefore, allow for the design of algorithms for limited (or more cost-efficient) bandwidth by proper quantization over the dynamic networks (with intermittent connectivity). Overall, the solution is applicable in more general nonlinear setups to address convergence rate and robustness in future directions. Further, the weight-balance and uniform-connectivity assumption (in contrast to weightstochasticity and all-time connectivity) allow for convergence under link removal and packet drops over switching networks, which is worth investigating.
Recall that, as discussed in [18] and our previous work [27], the sum-preserving problem formulation (2) (and (1)) can be extended to y i ∈ R m , b ∈ R p with p, m > 1 in general and a coupling-constraint in the form n i=1 A i y i = b with A i ∈ R p×m . Using the notion of slack variables, it is possible to address inequality coupling-constraints n i=1 A i y i ≤ b as in [23], [55]. This is by defining n additional (auxiliary) slack variables s i ∈ R p such that n i=1 (A i y i + s i ) = b with 0 s i s as extra box constraints, see [23,Eq. (16)]. Each node (and local constraint) is associated with one slack variable, and the new state changes to y = [y; s] with new constraint n i=1 A i y i = b and A i := [A i I p ] ∈ R p×(m+p) . Certain convexity and connectivity assumptions need to be addressed for this formulation; see details in [23], [55]. This is one direction of our future research.