Improving the Transient Times for Distributed Stochastic Gradient Methods

We consider the distributed optimization problem where <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> agents, each possessing a local cost function, collaboratively minimize the average of the <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> cost functions over a connected network. Assuming stochastic gradient information is available, we study a distributed stochastic gradient algorithm, called exact diffusion with adaptive stepsizes (EDAS) adapted from the Exact Diffusion method (Yuan et al., 2019) and NIDS (Li et al., 2019) and perform a nonasymptotic convergence analysis. We not only show that EDAS asymptotically achieves the same network independent convergence rate as centralized stochastic gradient descent for minimizing strongly convex and smooth objective functions, but also characterize the transient time needed for the algorithm to approach the asymptotic convergence rate, which behaves as <inline-formula><tex-math notation="LaTeX">$K_{T}=\mathcal {O}\left(\frac{n}{1-\lambda _{2}}\right)$</tex-math></inline-formula>, where <inline-formula><tex-math notation="LaTeX">$1-\lambda _{2}$</tex-math></inline-formula> stands for the spectral gap of the mixing matrix. To the best of our knowledge, EDAS achieves the shortest transient time when the average of the <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> cost functions is strongly convex and each cost function is smooth. Numerical simulations further corroborate and strengthen the obtained theoretical results.

1. Introduction.In this paper, we consider the distributed optimization problem, where a set of n agents each possessing a local cost function f i , seek to collaboratively solve the following optimization problem: where each f i : R p → R is assumed to have Lipschitz continuous gradients, and f is strongly convex.The agents are connected over a network, where they may only communicate and exchange information with their direct neighbors.
To solve problem (1.1), we assume that at each iteration k of the algorithm we study, each agent i ∈ N = {1, 2, . . ., n} is able to obtain a stochastic gradient sample of the form g i (x i,k , ξ i,k ), given input x i,k ∈ R p , that satisfies the following condition: Assumption 1.1.For all k ≥ 0, each random vector ξ i,k ∈ R q is independent across i ∈ N .Denote by F k the σ-algebra generated by {x i,0 , x i,1 , . . ., x i,k | i ∈ N }.Then, i , for some σ i > 0. (1.2) Stochastic gradients appear in many machine learning problems.For instance, suppose f i (x) := E ξi [F i (x, ξ i )] represents the expected loss function for agent i, where ξ i are independent data samples gathered continuously over time.Then for any x and ξ i , g i (x, ξ i ) := ∇F i (x, ξ i ) is an unbiased estimator of ∇f i (x).For another example, let f i (x) := (1/|S i |) ζj ∈Si F (x, ζ j ) denote an empirical risk function, where S i is the local dataset of agent i.Under this setting, the gradient estimation of f i (x) can incur noise from various sources such as sampling and discretization [10].
Problem (1.1) has been studied extensively in the literature using various distributed algorithms [32,18,19,13,6,7,28,5,26,17,34,25], among which the distributed gradient descent (DGD) method proposed in [18] has drawn the greatest attention.Recently, distributed implementation of stochastic gradient algorithms has received considerable interest.Several works have shown that distributed methods may compare with their centralized counterparts under certain conditions.For example, the work in [2,3,4] first showed that, with a sufficiently small constant stepsize, a distributed stochastic gradient method achieves comparable performance to a centralized method in terms of the steady-state mean-square-error.
More recent works have demonstrated that various distributed stochastic gradient methods enjoy the so-called "asymptotic network independence" property, that is, the generated iterates asymptotically converge at the same network independent rate compared to centralized stochastic gradient descent (SGD) for minimizing smooth objective functions [29,24,23,12,8,30,33].Nevertheless, it always takes a certain number of updates before this network independent convergence rate can be reached, which is always network dependent and referred to as transient time of the algorithm.For example, when considering smooth and strongly convex objective functions, the shortest transient time that has been shown so far is achieved by the distributed stochastic gradient descent (DSGD) method [23], which behaves as O n (1−λ2) 2 , where 1 − λ 2 stands for the spectral gap of the mixing matrix.For general undirected networks such as ring graphs, the transient time of DSGD is as large as O n 5 , which can be significant for large-scale networks.Therefore, it is critical to develop new algorithms with shorter transient times.
Recently, a class of closely related algorithms, termed EXTRA [28], D 2 [30], Exact Diffusion [37,36] and NIDS [11] have been proposed to solve problem (1.1) under both exact and stochastic gradient settings.These methods achieve linear convergence for minimizing smooth and strongly convex objective functions under exact gradient information [11].When only stochastic gradients are available, the work in [36] compared the performance of various distributed optimization methods under constant stepsizes and demonstrated the effectiveness of Exact Diffusion compared to other distributed stochastic gradient algorithms.
In this work, we consider a distributed stochastic gradient algorithm adapted from NIDS/Exact Diffusion/D 2 , termed EDAS (Exact Diffusion with Adaptive Stepsizes) and perform a non-asymptotic analysis.In addition to showing the asymptotic network independence property of EDAS, we carefully identify its non-asymptotic convergence rate as a function of characteristics of the objective functions and the underlying network.Moreover, we derive the transient time needed for EDAS to achieve the network independent convergence rate, which behaves as O n 1−λ2 .This is the shortest transient time for minimizing strongly convex and smooth objective functions in a distributed fashion to the best of our knowledge.Numerical experiments corroborate and strengthen the theoretical findings.proximation methods perform asymptotically as well as centralized schemes by means of a central limit theorem.The papers [20,21] demonstrated the advantages of distributively implementing a stochastic gradient method assuming that sampling times are random and non-negligible.The first "asymptotic network independence" result appeared in [31], where it assumed that all the local functions f i have the same minimum.A recent paper [29] discussed an algorithm that asymptotically performs as well as the best bounds on centralized stochastic gradient descent subject to possible message losses, delays, and asynchrony.In a parallel recent work [8], a similar result was presented with a further compression technique which allowed agents to save on communication.The work in [22] considered a distributed stochastic gradient tracking method (DSGT) which asymptotically performs as well as centralized stochastic gradient descent.The result was generalized to the setting of directed communication graphs in [27].For nonconvex objective functions, the paper [12,30] proved that decentralized algorithms can achieve a linear speedup similar to a centralized algorithm when the number of iterates k is large enough.The work in [1] further extended the theory to directed communication networks for training deep neural networks.A recent survey [24] provided a detailed discussion on this topic.
When restricted to considering strongly convex and smooth objective functions, several recent works have not only shown asymptotic network independence for the proposed algorithms, but also provided the transient times for the algorithms to reach the network independent convergence rate.Table 1 compares the transient times for some existing algorithms, which are functions of the network characteristics.In particular, the paper [23] showed that for the distributed stochastic gradient descent(DSGD) method, it takes O n (1−λ2) 2 (1−λ 2 stands for the spectral gap of the mixing matrix) to achieve network independence.This result was also proved to be sharp.For another distributed stochastic gradient tracking (DSGT) method, the paper [33] showed

Table 1
Transient times of distributed stochastic gradient methods for minimizing strongly convex and smooth objective functions.

Main Contribution.
We summarize the main contribution of the paper as follows.First, we consider a distributed stochastic gradient algorithm (EDAS) adapted from NIDS/Exact Diffusion and perform a non-asymptotic analysis.For strongly convex and smooth objective functions, we show in Theorem 4.4 that EDAS asymptotically achieves the same network independent convergence rate of centralized stochastic gradient descent (SGD).We also carefully characterize the convergence rate of EDAS as a function of the characteristics of the objective functions and the communication network.
Second, by comparing the convergence result of EDAS to that of SGD, we derive the transient time needed for EDAS to achieve the network independent rate, which is upper bounded by under mild conditions.This is the shortest transient time so far to the best of our knowledge, and significantly improves upon the shortest transient time previously obtained, i.e., O n (1−λ2) 2 .This is the main contribution of the paper.
Finally, we provide two simulation examples that corroborate and strengthen the obtained theoretical results.For solving a "hard" problem with EDAS, the observed transient times are consistent with the theoretical upper bounds for both the ring network topology and the square grid network topology, which verifies the sharpness of the theoretical results.For the problem of classifying handwritten digits with logistic regression, the observed transient times are even better than the upper bounds.For both problems, we compare the performance of EDAS with other existing algorithms and show the superiority of EDAS.
1.3.Notation.Throughout the paper, we use column vectors if not otherwise specified.Let each agent i hold a local copy x i,k ∈ R p of the decision variable at the k-th iteration.We denote We also define the following aggregative functions, and write For two vectors a, b of the same dimension, a, b denotes the inner product.For two matrices A, B ∈ R n×p , we define where A i (respectively, B i ) represents the i-th row of A (respectively, B). • denotes Assumption 1.2.Each f i : R p → R has L-Lipschitz continuous gradient, and Under Assumption 1.2, problem (1.1) has a unique optimal solution x * ∈ R p .
The agents are connected through a graph G = (N , E), where N is the set of nodes and E ⊆ N × N denotes the set of edges.Regarding the graph G and the corresponding mixing matrix W = [w ij ] ∈ R n×n , we make the following standing assumption.
Assumption 1.3.The graph G is undirected and strongly connected.There exists a link from i and j (i = j) in G if and only if w ij , w ji > 0; otherwise, w ij = w ji = 0.The mixing matrix W is nonnegative, symmetric and stochastic, i.e., W 1 = 1.In addition, suppose the smallest eigenvalue of W satisfies λ n ≥ λ > 0 for some fixed λ independent of n. 1The rest of this paper is organized as follows.In section 2 we present the EDAS algorithm and its transformed error dynamics in the form of a primal-dual-like algorithm.Next we provide some supporting lemmas and prove the sublinear convergence rate of EDAS in section 3. The main convergence results of the algorithm are presented in section 4, where we derive the convergence rate of EDAS as a function of the characteristics of the objective functions and the mixing matrix, and subsequently, obtain the transient time of EDAS.Numerical experiments are provided in section 5, and we conclude the paper in section 6.

5:
if k = 0 then 6: end for 11: end for Using the notation in subsection 1.3, EDAS can be written in the following compact form: For further investigation, note that since W is nonnegative, symmetric and stochastic, it has a spectral decomposition given by (2.2) where . EDAS can be rewritten in the following equivalent form that resembles a primal-dual algorithm (see [37]): let y 0 = 0, for k ≥ 0, To verify the equivalence relationship, note that from (2.3), when k ≥ 1, we have

Optimality Condition.
To facilitate the analysis of EDAS, we consider the optimality condition for problem (1.1).From the condition we are able to construct the error dynamics for Algorithm 2.1.
Lemma 2.1 (Optimality Condition [35]).Under Assumptions 1.2 and 1.3, if there exists some (x * , y * k ) that satisfies: Proof.We follow the proof of Lemma 2 in [37], where the stepsize α k was assumed to be a constant.First, since null(V ) = span{1 n }, we have and hence x * 1 = x * 2 = ... = x * n .Additionally, multiplying 1 on both sides of (2.4a) and noticing that 1 V = 0, we have where we use the column stochastic property of W , i.e., 1 We next show the existence of (x * , y * k ) and how to select a unique pair of (x * , y * k ) in light of the singularity of V .Lemma 2.2.Under Assumptions 1.2 and 1.3, there exists (x * , y * k ) that satisfies (2.4).In particular, we can choose where V − is the pseudoinverse of V given by in which Q is the same orthogonal matrix as in decomposing W , and Proof.The proof for the existence of (x * , y * k ) is inspired by Lemma 3 in [37].Let x * = 1(x * ) .Then noticing that V 1 n = 0, we have V x * = 0. To prove that y * k exists for all k ≥ 0 is equivalent to showing that the following linear system is consistent w.r.t.y * k : (2.8) Hence we need only show that −α k W ∇F (x * ) lies in range(V ) for all k ≥ 0. By Assumption 1.3, which proves the existence of y * k .We next verify that y First of all, it is easy to see that the choice of V − is indeed a pseudoinverse of V .Then since the linear system (2.8) is consistent, we have that Remark 2.3.Throughout the paper, we let (x * , y * k ) be the unique pair given in (2.7).

Preliminary Analysis.
In light of the optimality condition of problem (1.1) given in Lemma 2.1, we consider the following error dynamics as the starting point of the convergence analysis for EDAS. where Proof.From relation (2.3) and the definitions of xk , ỹk and s k , we have Substituting (2.11) into relation (2.12), we obtain (2.13) In the next lemma, we show the matrix B ∈ R 2n×2n has an eigendecomposition in the form of B = U DU −1 , inspired by the arguments in [37].In light of the decomposition, we are able to derive the transformed error dynamics for EDAS, which is then used for performing the non-asymptotic analysis for EDAS in Section 3.
Lemma 2.5.Under Assumption 1.3, the matrix B has an eigendecomposition given by ×n , and c > 0 is an arbitrary scaling parameter.Denote 2 has an upper bound that is independent of n.More specifically, Based on Lemma 2.5, we derive the transformed error dynamics for EDAS by multiplying U −1 on both sides of the error dynamics (2.9).In this way, we decompose the error (x k , ỹk ) into two parts: zk that measures the difference between the average iterate and x * , and žk that represents the remaining error.Then we further study the relationship between zk and žk in section 3.
Lemma 2.6.Under Assumptions 1.2 and 1.3, the transformed error dynamics for Algorithm 2.1 is given by , where Proof.Denote (2.16) Multiplying U −1 on both sides of (2.9) leads to 3. Analysis.In this section, we provide some useful supporting lemmas based on Lemma 2.6, and then prove the sublinear convergence rate of EDAS in Lemma 3.7.For the proof of sublinear convergence, we first propose Lemmas 3.4 and 3.5 to derive two coupled recursions for E zk 2 and E žk 2 respectively according to Lemma 2.6.
Then we introduce a Lyapunov function

and by
constructing the recursion for H k , we derive the sublinear convergence rate.
3.1.Supporting Lemmas.In this part, we provide two important coupled recursions regarding E zk 2 and E žk 2 , respectively.Before introducing these results, we state some preliminary lemmas.
Lemma 3.1.Suppose Assumption 1.1 holds, then The results of Lemma 3.1 directly come from Assumption 1.1.Lemma 3.2.Under Assumption 1.2, there holds In light of Lemma 10 in [26], we have the following contraction result: There holds With the above results in hand, we provide an upper bound for E zk Proof.Denote sk := 1 n n i=1 s i,k , where s i,k is the i-th row of s k .We have from Lemma 2.6 that where the second equality follows from ∇F (x * ) = n i=1 ∇f i (x * ) = 0 in light of the optimality of x * .Hence For the first term on the right hand side of (3.8), we have where we invoked relations (3.6), (3.1b),and Lemma 3.2.Let γ 1 = 3 8 α k µ and notice that α k ≤ 1 3µ .We conclude from (3.11) that This finishes the proof.
We obtain the recursion for E žk+1 2 stated in Lemma 3.5 below.
Lemma 3.5.Under Algorithm 2.1 with Assumptions 1.1 to 1.3, suppose where λ 2 is the second largest eigenvalue of W . Then for all k ≥ 1, we have Proof.By squaring and taking conditional expectation on both sides of the second recursion in Lemma 2.6, we obtain where γ > 0 is arbitrary.To further bound the right hand side of (3.12) in terms of zk 2 and žk 2 , first consider where γ 2 > 0 is arbitrary.Since λ 2 := |λ 2 (W )|, it can be seen from (A.4) that Relation (3.13) leads to (3.15) We then bound the term y * k − y * k+1 on the right hand side of (3.12).As stated in Lemma 2.2, where (3.16) holds since W = 1.
Substituting (3.15) and (3.16) into (3.12), and taking full expectation on both sides of the inequality, we obtain (3.17) Given that We obtain the desired result.

Preliminary Convergence Results
. We now prove the sublinear convergence rate of the EDAS algorithm under the diminishing stepsizes policy where θ, m > 0 are to be determined later.For simplicity, we let Our analysis builds upon constructing a Lyapunov function (3.20), which is inspired by the arguments in [23].The sublinear convergence results will then be combined with Lemma 3.5 and Lemma 3.4 respectively, so as to derive the improved convergence rates for M k and T k which will be stated in the next section.
The following result comes from Lemma 4.1 in [23] and will be used for bounding specific terms in the proofs.It will be used repeatedly in the analysis.
The Lyapunov function is defined as ) .
It will be made clear why ω k takes this particular form in the proof of Lemma 3.7.
Step 1: bounding H k .From Lemmas 3.4 and 3.5 and the definition of H k+1 , we have for all k ≥ 0 that (3.25) We now show the following inequalities hold for all k ≥ 0: for all k ≥ 0, and so that .
Hence for (3.26b) to hold, it is sufficient that which is satisfied given the definition of ω k in (3.21).Second, condition (3.26a) requires It is sufficient that We have verified condition (3.26), and thus obtain the following recursion for H k+1 from (3.25): 5 , where (3.30) Then for all k ≥ 1, we have where we invoked Lemma 3.6 for the second inequality.Additionally, since θ > 4, we have Further noticing that m 4θ 3 (k+m) Step 2: bounding T k .To bound T k , we substitute (3.32) into Lemma 3.5 and get 4 , where (3.33) Hence for all k ≥ 0, we have Note that for i = 2, 3, 4,

By induction we obtain (3.35)
A We conclude that for all k ≥ 0, there holds (3.37) 4. Main Results.In this section, we present the main convergence results for EDAS.First, we show EDAS performs asymptotically as well as the centralized stochastic gradient descent (SGD) method in Theorem 4.1.In other words, EDAS enjoys the so-called " asymptotic network independence" property.Then we refine the convergence results by iteratively combining the existing results with Lemmas 3.4 and 3.5.Finally, we derive the transient time for EDAS to approach the network independent rate in Theorem 4.6 by comparing with the performance of centralized SGD.Theorem 4.1.Under Algorithm 2.1 with Assumptions 1.1 to 1.3, suppose θ > 4 and m is chosen according to (3.22).We have 4 , Proof.From Lemma 3.4, we have Thus for all k ≥ 0, ( where Lemma 3.6 was invoked for obtaining the second inequality.In light of (3.24), Since (m + t) 3θ 2 −1 q t 0 is decreasing in t and 1 ln q 0 d((m + t) In addition, from an argument similar to (3.31), we obtain Hence for all k ≥ 0, From Theorem 4.1, we can see that the asymptotic convergence rate of EDAS, which behaves as 4θ 2 σ2 (3θ−2)nµ 2 (k+m) , is of the same order as that of the centralized SGD method [23].Our next goal is to derive the transient time needed for EDAS to reach this asymptotic convergence rate.In particular, we would like to characterize the dependence relationship between the transient time and the network characteristics as well as function properties.For this purpose, we first simplify the presentation of Theorem 4.1 with O(•) notation in Corollary 4.3 and then improve the convergence results by iteratively substituting the existing upper bounds into Lemmas 3.4 and 3.5.
In the following lemma, we estimate the constants q i , i = 2, 3, 4, M 0 , T 0 , H 0 , and c 0 appearing in the statement of Theorem 4.1 with O(•) notation.Lemma 4.2.Let the free scaling parameter c in Lemma 2.5 be chosen as Then we have , q 2 = O (1) In light of Lemma 4.2, we again iteratively apply Lemmas 3.4 and 3.5 to derive Corollary 4.3 based on Theorem 4.1.
We now state the convergence rate for the EDAS algorithm under the measure 2 , i.e., the average expected optimization error over all the agents in the network.Note that from Corollary 4.3, the error xk can be decomposed into xk = 1z k + cU R,u žk .Therefore, Taking full expectation on both sides of (4.5) and noting the choice c 2 = n U L 2 , we obtain Combining Corollary 4.3 and (4.6) leads to the improved convergence rate of EDAS stated in Theorem 4.4.
Theorem 4.4.Under Algorithm 2.1 with Assumptions 1.1 to 1.3, suppose θ > 5, and m is chosen as in (3.22).We have for all k ≥ 1, Proof.Substitute the bounds on M k and T k in Corollary 4.3 into (4.6), and notice that the product U R 2 U L 2 is independent of n according to Lemma 2.5.We obtain the result.

Transient Time.
In this part, we estimate how long it takes for EDAS to achieve the convergence rate of centralized stochastic gradient descent (SGD) method, i.e., transient time of the algorithm.First, we state the convergence rate of SGD [23].
Theorem 4.5.Under the centralized stochastic gradient descent method (SGD) with stepsize policy α k = θ µ(k+m) , suppose m ≥ θL µ .We have In the next theorem, we derive the transient time for EDAS.
Theorem 4.6.Under Algorithm 2.1 with Assumptions 1.1 to 1.3, suppose θ > 5 and m is chosen according to (3.22).Then it takes  time for Algorithm 2.1 to reach the asymptotic, network independent convergence rate, that is, when k ≥ K T , we have Proof.From Theorem 4.4, we have We obtain which finishes the proof.
Under mild additional conditions, we can obtain a cleaner expression for the transient time of EDAS in the following corollary.
Corollary 4.7.Under Algorithm 2.1 with Assumptions 1.1 to 1.3, suppose θ > 5 and m is chosen according to (3.22).Assume in addition that ) for some q > 0. Then it takes steps for Algorithm 2.1 to reach the asymptotic, network independent convergence rate.
Remark 4.8.Assuming that x 0 − x * 2 = O n 3 and ∇F (x * ) 2 = O n 3 is mild.This condition can be satisfied for many problem settings including the ones we consider in Section 5.In addition, 1  1−λ2 = O (n q ) is generally satisfied since 1 1−λ2 = O n 2 holds under common choices of the mixing weights for undirected graphs [16].

Numerical Examples.
In this section, we present two numerical examples to verify and complement our theoretical results.To begin with, we define the transient time for the algorithms of interest as follows for practical consideration: where x k ∈ R p stands for the k-th iterate for the centralized SGD algorithm.
In the first experiment, we construct a "hard" optimization problem for which the experimental transient time of EDAS is of the order of O n 1−λ2 , which agrees with the upper bound given in Theorem 4.6.This verifies the sharpness of the obtained theoretical results.Then we consider the problem of logistic regression for classifying handwritten digits from the MNIST dataset [9].In this case, EDAS achieves a transient time close to O n (1−λ2) 0.5 , which is better than the upper bound in Corollary 4.7.Hence in practice, the performance of EDAS depends on the specific problem instances and can be better than the worse-case scenario.
For both problems, we consider two different types of network topologies, i.e., ring network shown in Figure 1a and grid network shown in Figure 1b.The mixing matrices compliant with the networks are constructed under the Lazy Metropolis rule [16].In addition to presenting the transient times for EDAS, we also compare its performance with other algorithms enjoying the asymptotic network independent property, i.e., distributed stochastic gradient descent (DSGD) [23,37] and distributed stochastic gradient tracking method (DSGT) [22].Fist, we illustrate the property of "asymptotic network independence" by comparing the performance of EDAS and SGD in Figure 2a.We can see that the error of EDAS gets close to that of centralized SGD as the number of iteration grows.Eventually, it is as if the network were not there and the convergence rate for EDAS is exactly the same as SGD.Hence the phenomenon is called "asymptotic network independence".In Figure 3a, we plot the observed transient times of EDAS and 6n 1−λ2 against the network size n for the ring network topology. 3We then consider the grid network topology in Figure 3b.It can be seen that the two curves are close in both cases, which suggests that the theoretical upper bound K T = O   In Figure 4, we further compare the early stage performance of DSGT, DSGD, and EDAS with the same initialization.It can be seen that the error term of EDAS decreases faster than that of the other two decentralized methods and quickly gets close to the error of SGD.This is particularly true when the network is not well connected, e.g., when the network has a ring topology with a large number of nodes (see Figure 4c).

Logistic Regression.
We consider classifying handwritten digits 1 and 2 in the MNIST dataset using logistic regression.There are 12700 data points S = {(u, v)} with u ∈ R 785 denoting the image input and v ∈ {−1, 1} being the label. 4Each agent possesses a distinct local dataset S i randomly selected from S. The classifier can then be obtained by solving the following optimization problem using all the agents' local datasets S i , i = 1, 2, ..., n: where ρ > 0 is a regularization parameter (hyperparameter).To solve the problem, each agent can obtain an unbiased estimate of ∇f i (x) using a minibatch of randomly selected data points from its local dataset S i .In the experiment, we let ρ = 1, α k = 6/(k + 20), ∀k for all the considered algorithms with the same initial solutions: 3 For problem (5.2), the upper bound on the transient time is     where Q is an orthogonal matrix, and Λ := diag (1, λ 2 , . . ., λ n ) .Therefore, (A.1) Noticing the particular form of the middle term on the right hand side of (A.Therefore, u žk , (3.4) where (3.2) holds because of Assumption 1.2, inequality (3.3) results from Cauchy-Schwarz's inequality, and (3.4) comes from (implies xk = 1z k + cU R,u žk .
(a) Error terms for centralized and decentralized methods.
(a) Transient time for the ring network topology.(b) Transient time for the grid network topology.

Fig. 3 .
Fig. 3. Comparison of the transient time for Algorithm 2.1 and multiples of n 1−λ 2 .The expected errors are approximated by averaging 500 simulation results.
(a) Transient time for the ring network topology.(b) Transient time for the grid network topology.

Fig. 5 .
Fig. 5. Comparison between the transient times for EDAS and multiples of (a) Error terms over a ring network with 20 nodes.(b) Error terms over a ring network with 40 nodes.(c) Error terms over a ring network with 60 nodes.(d) Error terms over a grid network with 49 nodes.(e) Error terms over a grid network with 81 nodes.(f) Error terms over a grid network with 121 nodes.