Distributed Data-Driven Control of Network Systems

Imperfect models lead to imperfect controllers and deriving accurate models from first principles or system identification is especially challenging in networked systems. Instead, data can be used to directly compute controllers, without requiring any system identification or modeling. In this paper we propose a strategy to directly learn control actions when data from past system trajectories is distributed among multiple agents in a network. The approach we develop provably converges to a suboptimal solution in a finite number of steps, bounded by the diameter of the network, and with a sub-optimality gap that can be characterized as a function of data, and that can be made arbitrarily small. We further characterize the robustness properties of our approach and give provable guarantees on its performance when data are affected by noise or by a class of attacks.


I. INTRODUCTION
Networks and multi-agent systems have been studied extensively in the control community in the context of formation control [1], [2], consensus algorithms [3], and coordination of multi-agent systems [4], to cite a few. These results have seen application in various real-world scenarios from robotics [5], [6] to power grids [7]. Together with its opportunities, modeling the interaction between possibly heterogeneous systems through networks presents practical challenges. In fact, building accurate models of large networks is a burdensome task and modeling errors (e.g., missing or extra links, incorrect link weights) are often unavoidable [8], [9]. Network models can be built either through first principles or via system identification. In the former, a dynamical model is deduced from the physical properties of the system, which are often not fully characterized. Alternatively, system identification can be used when data are available [10], [11]. Unfortunately, this approach, too, has limitations and has returned a set of mixed results in application to modern network systems [12], [13], [14]. This is especially problematic when the end goal of system identification is to provide the basis for controller design since errors in the modeling phase can compound in the controller design phase. In this work, we propose a modelagnostic control approach for a network system. That is, we bypass the system modeling phase altogether by using data to directly learn suitable control actions.
The direct data-driven control literature leverages data to design controllers in a one-step solution, as opposed to the indirect approach, where data are used to identify a model that is then used to design a controller. Although direct data-driven controls can be dated as far back as [15], they have recently been the subject of renewed interest from the control community. A modern approach to designing direct data-driven controls includes data-driven open-loop optimal control [16], [17], [18], closed-loop and robust control [19], [20], [21], [22], and predictive and nonlinear control [23], [24], [25], [26], [27]. The problem of learning optimal controls in network systems has remained, however, relatively unexplored. To the best of our knowledge, the first result in this direction is [28], soon followed by [29] and [30]. In [28], the authors propose a modified version of the DeePC algorithm [23] to stabilize a network system through a primal-dual flow (while minimizing a quadratic function of the inputs and the states). This strategy is based on a Model Predictive Control approach (MPC) where a behavioral systems representation built upon a recorded system trajectory is used in place of the system's model. A limit of this approach is that it assumes that inputstate data from past trajectories, as well as the current state, can be freely shared among neighboring agents. In addition, the distribution of the primal-dual flow among the agents in the network requires a substantial number of messages to be shared [35]. This assumption is limiting because it presupposes that the time scales of communication and computation processes are significantly shorter than that of the controller action. An alternative approach appeared in [30], where the same problem is tackled through the system level synthesis framework (SLS) [36]. SLS parameterizes closed-loop system responses directly from collected open-loop trajectories. The result of [30] reduces to a distributed computation of a dynamic state-feedback controller via the alternating direction method of multipliers (ADMM) [37]. This approach shares much of the limitations of [28], namely the necessity of high frequency communication between nodes in the network, and the need to share input-state trajectories among the networks' nodes. Further recent results on the control of interconnected systems include [31], [32], [33], [34]. However, these works focus on finding stabilizing controllers, and do not include optimization over an objective function, robustness bounds, nor explicit data-driven formulas.
The contribution of this paper is a data-driven and distributed algorithm to learn optimal controls in a multi-agent environment. Our approach is data-driven as the control design relies exclusively on prerecorded input-state trajectories of an unknown linear system, and it is distributed as it assumes that the recorded trajectories are not available to any single agent, but are partitioned throughout the network. The goal of the agents is to compute the control action which globally minimizes a given quadratic cost function of the states and inputs, from a given initial condition. To do this, we leverage an algorithm based on iterative projections in order to distribute the computation of the control in a network of agents with partial access to data. This work builds upon and significantly expands the approach presented in [29] and departs from the cited literature in a number of ways. First, it does not require data from past recordings to be shared (at least directly) among the networks' nodes, offering an implicit layer of privacy. Moreover, it relies on a closed-form expression that provably converges to a solution, with a prescribed distance from optimality, after a finite number of iterations of the algorithm bounded by the diameter of the network. Finally, while the aforementioned papers consider the case of noiseless data, in this work we also study the robustness of the approach when the collected data are injected with noise or adversarial attacks.
The remainder of the paper is organized as follows. Section II contains the problem setup and necessary preliminary notions. In Section III we derive closed-form solutions to the problem of designing optimal data-driven controls and present our distributed algorithm. In Section IV we derive results on the robustness of the proposed approach to noisy data, and in Section V we numerically validate our results and compare them with alternative approaches. We conclude our paper in Section VI and leave the proofs to the Appendices.

A. NOTATION
Let R (N) and R + (N + ) denote the set of real (integer) and strictly positive real (integer) numbers, respectively. Given a matrix A ∈ R n×m , we let Rank(A), Basis(A), Ker(A), A T , σ min (A) denote the rank, a basis of the column space, the kernel, the transpose, and the smallest singular value of A, respectively. We let blkdiag(A 1 , . . . , A n ) be the block diagonal matrix with blocks A i ∈ R n i ×m i . For a matrix A ∈ R n×m , and for p < n and q < m, we let [A] 1:p,1:q be the matrix formed by the first p rows and q columns of A. For any matrix A, we denote its Moore-Penrose pseudoinverse as A † . We let A 0 (A 0) denote a positive definite (positive semidefinite) matrix. The spectral norm of matrix A is A , and the Kronecker product between matrices A and B is denoted by A ⊗ B. I n and 0 n,m stand for the n × n identity matrix and n × m zero matrix, respectively (subscripts will be omitted when clear from the context). Given a sequence of vectors Let vec(·) : R n×m → R nm denote the vectorization operator of a matrix. Given two vectors v, w ∈ R n , we say that they are orthogonal if v T w = 0, and indicate this as v ⊥ w. To stress the fact that two vectors are not orthogonal we shall use v ⊥ w. For a random variable x : → R, we let P[x ∈ S] and E[x] be the probability that x takes on a value in a set S ⊆ R and the mean or expected value of x, respectively. Finally, we say that a function f : R → R grows sublinearly with x if lim x→∞ | f (x)|/x = 0.

B. PROBLEM DEFINITION
We study the controllable linear system where x ∈ R n and u ∈ R m are the state and the input vectors at time t ∈ N + , respectively, and with A ∈ R n×n and B ∈ R n×m . We seek to compute the input sequence u T = col(u(0), . . . , u(T − 1)) that minimizes the following finitehorizon quadratic objective function: where Q ∈ R n(T +1)×n(T +1) 0, R ∈ R mT ×mT 0 are the state and input weighting matrices, respectively, and x T = col(x(0), . . . , x(T )). When the matrices Q and R in (2) are block diagonal the minimization problem (2) is referred to as the finite-horizon discrete-time linear quadratic regulator (LQR) problem [38].
In solving the minimization problem (2) we assume that the pair (A, B) is not known. We assume, however, that a series of experiments has been performed to learn the solution of (2) directly through experimental data. In particular, we assume that N experiments of length T have been performed and that the experimental data are where X 1:T ∈ R nT ×N contains the state trajectories, U ∈ R mT ×N the input sequences, and X 0 ∈ R n×N the initial states of the N experiments: Notice that we can write the state evolution of (1) from initial condition x 0 and subject to the input sequence u T as Thus, without measurement noise, the data matrices satisfy We assume that the following holds throughout the paper. Assumption II.1 (Persistency of excitation of data): The experimental inputs and initial conditions satisfy Assumption II.1 is standard in data driven studies [20], [21] and places a lower bound on the number of experiments needed to build exact data-driven expressions, namely, N ≥ n + mT , where T is the length of each experiment [39].
In the absence of a model, the minimization problem (2) can be solved using experimental data as shown in [20], [23], [40]. However, these works assume that the experimental data are collectively available. Instead, motivated by the increasing interest in data-driven control of multiagent and network systems, we assume that the experimental data are distributed across a set of agents, and we seek a solution to the minimization problem (2) that relies on distributed computation. Let V be partitioned as V = V 1 ∪ · · · ∪ V M , with |V i | = n i . Then, after reordering the nodes, the matrix A reads as

C. SETUP FOR MULTIAGENT LEARNING
We assume 1 that the input matrix in (1) can be written as Then, system (1) can equivalently be written as the interconnection of M subsystems of the form where x i ∈ R n i and u i ∈ R m i are the states and inputs, respectively, of the nodes in V i . We assume the presence of M agents, each one responsible for one subsystem (see also Fig. 1). In particular, agents are interconnected according to a directed communication graph G c = (V c , E c ), and agent i selects the input u i to the ith subsystem using local data and information exchanged with neighboring agents N i = { j : (i, j) ∈ E c }. The local data available to agent i are that is, the components indexed by V i of the state trajectories, control inputs, and initial conditions of the N experiments. Thus, after a possible reordering of the system states, Agents cooperate to collectively compute the local inputs that solve the minimization problem (2). A naive solution to this problem requires the agents to share all their local experimental data with all other agents, and then employ centralized data-driven formulas for the solution of the minimization problem (2) (see [20], [23], [40] and Theorem III.1 below). Instead, we will develop algorithms that require the agents to exchange reduced information to compute, in finite time, a solution to the minimization problem (2) in a distributed manner.

III. MULTIAGENT LEARNING OF OPTIMAL CONTROLS
We start by providing a data-driven solution to the minimization problem (2) when there is only one agent. This solution is general, and appeared previously in [40]. In particular, the optimal control u * T minimizing (2) can be computed in closed form as with and K 0 = Basis(Ker(X 0 )) and K U = Basis(Ker(U )), highlighting the dependency of the optimal controls on the data. We now give an alternative formula to compute u * T in closed form. This formula requires a stronger assumption on Q, but is more compact and holds in most scenarios.
Theorem III.1 (Single-agent data-driven solution (2)): Let X 0 , X and U be as in (3) and (5), and satisfying Assumption II.1. Define P ∈ R (n+mT )×(n+mT ) as If Q 1/2 O T has full column rank the matrix P is invertible, then the solution u * T to (2) can be extracted from with x(0) = x 0 , Theorem III.1 provides a data-driven expression of the optimal control input for the minimization problem (2) alternative to (9). The assumption that Q 1/2 O T has full column rank is required to ensure the invertibility of P and is typically satisfied in practice. For instance, it is satisfied when Q 0, or, in the case of block diagonal Q with diagonal blocks Q d ∈ R n×n (standard LQR setup), when the pair (A, This assumption on Q 1/2 O T is made in order to keep expression (9) more compact, which helps simplifying notation in the rest of the paper. If needed, however, this assumption can be lifted and we refer the interested reader to Appendix B. Both expressions (11) and (9) require the knowledge of all the experimental data, as well as the knowledge of the cost function, which is undesirable for interconnected and possibly large systems. While a distributed data-driven solution to the minimization problem (2) could be obtained using standard techniques for distributed optimization, e.g., see [28], [30], we follow a different approach that will lead to an algorithm with finite-time convergence guarantees. To this aim, we define the following auxiliary problems: and arg min where ε ∈ R + is a tunable parameter, with I X = I n(T +1) , We now characterize the feasibility and optimality properties of (12) and (13). Lemma III.2 (Relationship between the solution of (2) and (12)): If α * and β * are minimizers of problem (12), then u * T = U β * is the minimizer of problem (2). (13)): The minimization problem (13) is feasible and admits a unique solution when ε > 0. Furthermore, if α * (ε), β * (ε), v * (ε) and w * (ε) are minimizers of problem (13), then

Lemma III.3 (Relationship between the solution of (2) and
where ε > 0, and u * T is the minimizer of problem (2). Since the constraints in the minimization problem (13) can be partitioned row-wise in a way that each row depends only on the data available to a single agent (see below), a distributed algorithm can be readily obtained. Further, given the equivalence between the minimization problems (2), (12), and (13) as stated in Lemma III.2 and Lemma III.3, a distributed solution to (2) can be obtained by solving (13) in a distributed manner. Our distributed algorithm for the agents to solve the minimization problem (2) via distributed computation is in Algorithm 1, where W i is defined in (15) and d denotes the diameter of G c . 2 We now provide an informal description of the algorithm: (S1) Initially, each agent i computes the minimum norm solu- and K i = Basis(Ker(W i )). Here, I i X (resp. I i U ) is a matrix whose rows are the rows of I nT (resp. I mT ) corresponding to the indices that extract Q (S2) At each iteration, each agent i transmits γ i and K i to its neighboring agents j, and receives γ j and K j from each neighbor j. (S3) At each iteration, each agent i updates γ i and K i as (S4) Convergence of this iterative procedure is guaranteed after a number of steps equal to the diameter of communication graph (see below). Upon convergence, each agent returns the vector β i = [γ i ] N+1:2N , extracted from γ i at the algorithm's final iteration.
A high level walkthrough of the algorithm is in order.
Step (S1) is simply the initialization step, in which we compute a preliminary solution γ i = W † i col(x 0,i , 0) which uses only local data and is feasible for agent i. In step (S2) neighboring agents share the information that is needed to update the provisory solution γ i . In step (S3) γ i is updated to γ + i with information from its neighbors in such a way that γ + i is a feasible solution for agent i and its neighbors. More specifically we find the new γ + i to satisfy for all j ∈ N, where κ i and κ j are vectors of appropriate dimension. In the proof (cf. Appendix E) we show how κ i and κ j such that γ + i satisfies (16) always exist, and show how this can be computed through the procedure described in (S3). Finally, (S4) gives one condition for ending the algorithm once it converges, which is also detailed in the proof.
Theorem III.4 (Distributed learning of data-driven optimal controls): Let G c be a strongly connected communication graph. Let β i (ε) be the value returned by Algorithm 1 when W i is as in (15), and let α From Theorem III.4, Algorithm 1 returns in a finite number of steps the solution of the minimization problem (13). Due to Lemma III.3, such solution yields the minimizer of (2) as the parameter ε decreases to zero. In fact, for any finite value of ε, the sub-optimality gap between the minimizer of (2) and the input U β * (ε) reconstructed from the minimizer of (13) can also be quantified. Let Lemma III.5 (Optimality gap of (14) for finite ε): Let β * (ε) be the minimizer of Problem (13), and let δ( where Using Lemma III.5, Algorithm 1 can be used to compute a sub-optimal solution to (2) in a finite number of distributed calculations and within any desired sub-optimality guarantee. In fact, once β i is computed by each agent i, the sub-optimal and local control at each agent is simply u T,i = U i β i .
Remark III.6 (Convergence of Algorithm 1): Algorithm 1 converges after d iterations, where d is the diameter of G c (cf. Appendix E). When d is not available the algorithm can be stopped after n iterations, since n is always available to each agent through the size of the vector v i , and d ≤ n always holds.

IV. ROBUSTNESS OF DATA-DRIVEN CONTROL INPUTS
So far, we discussed a distributed approach to learn optimal controls in a multiagent environment through noiseless data. This assumption on the data is often too restrictive in practice.
In what follows we study how noisy datasets affect the results of Section III, assuming different degrees of knowledge on the noise distribution. In particular, we are interested in characterizing the robustness of our approach by bounding, in probability, the distance between the true cost of the optimal control problem in (2) and the one computed through our data-driven approach with a noisy dataset.
Assume that the state trajectory in (3) is collected as where X 1:T is the ground truth data and X is a matrix containing stochastic perturbations. We highlight that perturbation X affects the state trajectory at times t = {1, . . . , T }. However, the analysis that follows can be carried out in a similar fashion for noisy initial states and inputs, as well as for alternative data-driven expressions, e.g., (9). Let F : R mT ×N × R n×N × R n(T −1)×N → R mT +n be the data-driven map (11). We define the perturbed control map wherẽ

Assumption IV.1 (Invertibility ofP):
We assume that matrix P is almost surely invertible.
Assumption IV.1 is required for the (almost sure) existence of the map (19). This is a technical assumption which is typically verified in practice. Following a procedure similar to [?] we let supp( X ) denote the set of corrupted entries of X 1:T and supp( X ) = {i : δ X,i = 0}, where δ X,i = vec( X ) i is the i-th entry of vec( X ). Further, since F (U, X 0 ,X 1:T ) is Fréchet-differentiable with respect to X 1:T [41], 3 we can write it through its Taylor expansion with lim X →0 r(U,X 0 ,X, X ) X = 0 and where ∇F X is the Jacobian matrix of F (U, X 0 , X 1:T ) with respect to X 1:T .
Finally, we let J andJ be the cost function of (2) obtained from x 0 through u T andũ T in (11) and (19), respectively, and we let the error on the cost function be We are now ready to give probabilistic bounds with respect to J on control (11) for different assumptions on the noise X . In particular, given tolerance τ , and assuming that r(·) = 0 in (20), we characterize the probability that J is greater than such tolerance. In this context, a smaller probability P[ J ≥ τ ] translates to a more robust controller, while a probability close to one implies a poor robustness performance with respect to the desired tolerance. In the following we let and we refer the reader to Appendix G for a detailed derivation of the results that follow. Theorem IV.2 (Probabilistic upper bound on J): Let J be as in (21). Then, for any τ > 0, where c X,i = V 1 2 ∇F X,i and ∇F X,i is the ith column of the Jacobian matrix of (19) with respect to X 1:T .
Theorem IV.2 provides a simple upper bound on the accuracy of the data-driven control (11). In fact, bound (23) captures the sensitivity of (11) to perturbations of the dataset: the less sensitive the data-driven map is, i.e., the smaller c X,i is in absolute value, the closer the control input computed through the perturbed dataset in (19) is to the optimal input in (11) and, in turn, the smaller J is. The bound (23) requires that the noise distribution has a defined variance, but is typically loose. For instance, in the case of i.i.d. Gaussian perturbations δ X,i ∼ N(0, σ 2 ), it holds that E[|δ X,i |] = σ √ 2/π and (23) reads as which exhibits a square root decay in the ratio τ /σ 2 . More informative bounds can be established for specific classes of perturbations, as we show next for the Gaussian case.

Theorem IV.3 (Probabilistic upper bound on J for Gaussian perturbations):
Let J be as in (21) and assume that the entries of X are i.i.d. Gaussian random variables with zero mean and variance σ 2 . Then, for any τ > 0, Notice that the bound in Theorem IV.3 decays exponentially in the ratio τ /σ 2 , and is therefore tighter than (24), as shown in the numerical simulations in Fig. 2.
We next investigate how the sensitivity of the data-driven map (as quantified by the norm of the Jacobian matrices ∇F X,i ) is related to the number of experiments N. This study is especially interesting in a scenario in which an attacker can affect some entries ofX 1:T deliberately, while supp( X ) grows sublinearly with respect to N.
Lemma IV.4 (Asymptotic behavior of ∇F X,i for large N): Assume that the entries of X 0 , U are independent of N and 4 We remark that the condition σ 2 min ([X T 0 U T ] T ) ≥ cN is typically satisfied for random i.i.d. initial conditions and inputs. 5 Thus, Lemma IV.4 shows that all ∇F X,i typically converge to zero as the number of experiments N increases. Under this scenario, the map F becomes less sensible to corrupted data as more data becomes available. This conclusion is instrumental to prove the following result.
Theorem IV.5 (Asymptotically vanishing perturbation for large N): In addition to the assumption in Lemma IV.4, assume that the distributions of the entries of X are independent of N and |δ x,i | have finite mean. Then, if supp( X ) grows sublinearly with N, for any τ > 0, Under the assumptions of Theorem IV.5 we can guarantee that the error on J goes to zero for increasing N, regardless of X , as shown in Fig. 3, ensuring the robustness of the control action under attack on the data set.

V. NUMERICAL RESULTS AND ILLUSTRATIVE EXAMPLES
In this section we provide numerical validations of the results presented in this paper. First, we show how Algorithm 1 and 4 We say that a random variable X is independent of a deterministic parameter N if the distribution of X is not a function of N. 5 If U and X 0 have i.i.d. zero mean entries P[lim N→∞ [X 0 U ] [X 0 U ] = NI] = 1 by the (strong) Law of Large Numbers [42]. We remark that N is the number of experiments (columns of X 0 , U ) and I is the identity matrix.  N and δ i ∼ N(1, 0). Equivalent results can be obtained for any distribution on supp( X ) that satisfies the conditions on Theorem IV.5. Data are gathered from a randomly generated unstable system with n = 5, m = 3 and T = 15. Assumption II.1 is met for all N to the right of the red solid vertical line. Theorem III.4 can be used to solve the problem in (2) when the system (1) is unknown but data (8) are available. We further discuss how our formulas can be used directly, for a fixedwindow control, in a receding horizon fashion. Finally, we compare and discuss our approach to alternative approaches in the literature.

A. AN APPLICATION TO SCALING RING NETWORKS
We assess the results of Section III by analyzing the convergence of Algorithm 1 for networks of varying diameter. We consider randomly generated ring networks, with communication graph G c as in Fig. 4(a), here shown for a ring network with M = 6 and d = 3. In Fig. 4(b) we plot the error between the solution of (2) and the result of Algorithm 1. For the kth iteration of Algorithm 1, subnetwork i computes its own control inputs as is the interim value of β i at iteration k. The dynamics of (1) when u i [k] is injected to all i ∈ {1, . . . , M} is compared to the one induced by the model-based optimal control (11). As discussed in Theorem III.4, Algorithm 1 converges to the solution in a number of steps equal to the diameter of the communication graph G c .

B. AN APPLICATION TO WATTS-STROGATZ NETWORKS
To prove the effectiveness of this approach in more complex network structures, we test Algorithm 1 on a Watts-Strogats network of larger size [43]. In Fig. 5 we run Algorithm 1 over a randomly generated Watts-Strogats network with M = 30, mean node degree 4 and rewiring probability of 0.15, see the documentation for the WattsStrogatz function in MATLAB [44]. The particular realization of Fig. 5(a) is a network with diameter 5, as testified by the convergence of Algorithm 1 shown in Fig. 5(b).

C. RECEDING HORIZON IMPLEMENTATION
Theorem III.1 assumes that T is the control horizon of problem (2) as well as of the dataset (8). It is possible to lift this requirement by implementing Algorithm 1 in a receding horizon fashion. That is, once Algorithm 1 is executed, each agent applies only a finite horizon h ≤ T of the computed controller u T,i = U i β i . After h time steps, then, Algorithm 1 is executed again, and a new controller is found for the subsequent horizon h. This approach is formally described in Algorithm 2. Clearly, there is no limit to the number of times that this algorithm can be run, i.e., we can use this approach to design an arbitrarily long controller. A receding horizon implementation is common throughout the literature, often termed Model Predictive Control (MPC), and is also implemented in related works [28], [30]. Comparison of these approaches with our method are discussed next.

D. COMPARISON WITH SPLITTING METHODS
Several strategies have been proposed to solve MPC problems in a distributed fashion. Among these, splitting methods [35], [?] have been explored in a model-based [45] and datadriven [28] setting. These consist in splitting one optimization problem in a family of smaller optimization problems. In the context of the problem setup (2), each agent needs to solve a local optimization problem, while iteratively exchanging information with other agents in order to properly converge to the global optimal solution. The accuracy of the approach is closely related to the number of iterations of the algorithm, a  The experiments are performed with noiseless data from the same system, a randomly generated network with M = 3 agents, and diverse state size. This comparison highlights the computational time to accuracy tradeoff that the approaches have with respect to a particular design parameter, namely the number of iterations of the splitting method, and the parameter ε for Algorithm 2. The runtime is reported in seconds (lower is better), and the distance from the optimal input is computed as the norm of the difference of the solution and the exact optimal control computed through a model-based LQR solver (lower is better). Additional simulation parameters are T = 5, tol = 10 −8 .
higher number of them leading to a more accurate solution. A known downside of these approaches is their slow convergence; typically, a large number of iterations is needed to converge to an acceptable solution (measured as its distance from the exact optimal solution). Recently [28] proposed splitting (2) trough a primal-dual flow. In Fig. 6 we compare the convergence properties of Algorithm 2 with a primal-dual algorithm based on [28]. In particular, we keep the problem formulation and data-collecting phase of [28], while modifying the cost function through the Augmented Lagrangian method, which is known to improve the convergence speed of the flow. As a standard practice, we distribute the Augmented Lagrangian through the Alternating Directions Method of Multipliers (ADMM), see [?, Chapter 8]. As highlighted in Fig. 6, Algorithm 2 returns a stabilizing controller even for higher noise values.

E. COMPARISON WITH [30] WITH NOISY DATA
We now compare the performance of our approach to [30], when data are collected with noise. We consider the same system used in [ (7) is a two-dimensional linearized and discretized ( t = 0.2) swing dynamics, with , with k i = j∈N i k i j . We run the same experiment three times with Q = I and R = I, the first assuming that X = 0 (i.e., noiseless case), the second that X ∼ N(0, σ 2 I ), with σ 2 = 0.1, and finally with the same noise distribution but with σ 2 = 5.0. We run our algorithm with the receding horizon implementation described in Algorithm 2, with h = 1. We notice from Fig. 7 that our method and that of [30] perform well when X = 0. Crucially, however, the convergence of [30] to a stabilizing controller is significantly slower than Algorithm 2, when data are corrupted by noise.

VI. CONCLUSION
In this paper we propose an algorithm to distributedly learn optimal controllers for an unknown network system through data. Our approach provably converges to a suboptimal solution in a finite number of steps, with a subopotimality gap that can be characterized as a function of the available data. Moreover, although data are distributed among multiple agents in the network, and communication between agents is therefore necessary to find a globally optimal control, this approach does not require to directly share trajectory data between agents. We discuss how these features are attractive, especially when compared to alternative approaches in the literature. Finally, we characterize the robustness properties of our approach and show that we can bound, in probability, the error on the cost function of the control computed with corrupted data.

A. DATA-DRIVEN TRAJECTORIES OF (1)
We start by recalling a structural Lemma, which appeared for input-output trajectories in [39], and is here adapted for inputstate trajectories. Lemma A.1 (Data-driven trajectories of (1)): Let (3) be the data generated by the system (1) with T ≥ n, and letx T be the state trajectory of (1) generated with some initial condition and control input. Then, for some vectors α and β. Proof: Letx 0 andū T be the initial condition and input to (1). Since the matrices X 0 K U and U K 0 are full-row rank (see Assumption II.1), there exists α and β such that From (4) we havē where the last equality follows from (5). Lemma A.1 shows how any trajectory of (1) can be written as a linear combination of the available data. In particular, state trajectories are obtained in (27) as the sum of the free (X K U α) and forced responses (X K 0 β), which are reconstructed from data of arbitrary control experiments.

B. PROOF OF THEOREM III.1
Consider the following problem arg min where P is as in (10). Note that, the solution to the above problem is γ * = col(x 0 , u * T ) with u * T being the solution to (2).
This follows from (4) and the fact that when Assumption II.1 holds. If P is invertible, whose solution is v * T = ([I 0]P −1/2 ) † x 0 . Thus, the solution to (29) is We show that if Q 1/2 O T has full column rank, then P 0, which in turn implies that P is invertible. To this end, notice that, from (30), Since O T T QO T 0 (which directly follows from the assumption that Q 1/2 O T has full column rank) and P 1 0, the Schur complement of the block O T T QO T of P 1 must be positive semidefinite [?, Theorem 1.12] Next, since S 0 and R 0, the Schur complement of the block O T T QO T of P, namely S + R, is positive definite. Since the block O T T QO T of P and its Schur complement are both positive definite, follows that P 0 [?, Theorem 1.12].
We complement the proof by noticing that, for P not invertible, and for any Q 0, problem (29) becomes where w is a vector of appropriate size, from which a solution alternative to (32) can be found.

C. PROOF OF LEMMA III.2
From Lemma A.1 we can write with x 0 = X 0 K Uᾱ and u T = U K 0β . Equivalently, we can write where α = K Uᾱ (i.e., U α = 0), and β = K 0β (i.e., X 0 β = 0), and consequently x 0 = X 0 α, u T = U β and the cost function of (12) equals that of (2). When u * T is the solution to (2) and α * , β * is the solution to (12), it is then straightforward to see that u * T = U β * . We can further derive an explicit form of the optimal vectors α * and β * instrumental to other results in the paper. By substituting the constraints of (12) into the cost function, and for H, Y , andx 0 as in (17), problem (12) can be written as where K Y = Basis(Ker(Y )). The minimizers to (34) define the set of optimal vectors α * , β * via where r ∈ Ker(H ) ⊆ Ker(U ). The resulting, unique, optimal control u T = U β * is the solution to (2).

E. PROOF OF THEOREM III.4
The proof follows an argument similar to the one of [46,Theorem 3.3]. Let γ i be the estimate of agent i, W i be defined as in (15), and K i = Basis(Ker(W i )). Observe that γ i = W † i col(x i,0 , 0, 0, 0, 0) ⊥ Ker(W i ). Let i and j be two neighboring agents, i.e, (i, j) ∈ E c , then there exist two vectors κ i and κ j such that γ i + K i κ i = γ j + K j κ j . In particular, such vectors can be chosen as Substituting κ i back in γ i we have that the vector , which contradicts the hypothesis. We conclude that [ , and, since γ i ⊥ Im(K i ), we can conclude that γ + i ⊥ (Im(K i ) ∩ Im(K j )). The theorem follows by noticing that after a number of steps equal to the diameter of G c , each vector γ i verifies all the measurements, since we will have that γ i ⊥ M j Im(K j ). Finally, we remark that the solution we are interested in involves only the second N elements of γ i , corresponding to β i .

G. CHARACTERIZING BOUNDS FOR J.
Since F (U, X 0 ,X 1:T ) is Fréchet-differentiable with respect to X 1:T [41], we rewrite F as in (20). We write (20) in compact form asF We be a proxy for the optimality error J (21). In particular, in the following we study bounds on G which we can then easily express as a function of J. To do this we need the following lemma. Lemma G.1 (Relating G and J): It holds that J = G 2 almost surely (a.s.).
Proof: We recall that J = J − J = |G TG − G T G| and we notice that Then, condition (43) holds when G T G =G T G, that is The above can be shown by noticing that We can therefore conclude that (44) holds and, in particular, Lemma G.2 confirms that when the expected value of the perturbation on X 1:T is small so is the residual r of the expansion (41). In the following we let that is, we assume V 1 2 r = 0. We highlight that vector ∇F X,i , consisting of the partial derivatives of the map F with respect to the entries of X 1:T , captures the sensitivity of the datadriven control inputs to perturbations of X 1:T .
The results that follow are given as a function of G but can easily be given as a function of J by noticing that P[x 2 > τ ] = P[x > √ τ ], when x > 0.
is a constant independent of N. Thus, from (56), ∇F (2) X,i ≤ k (2) X,i /N, where k (2) X,i > 0 does not depend on N. Finally, from (51) and ∇F X,i ≤ ∇F (1) X,i + ∇F (2) X,i , we conclude that ∇F X,i ≤ k X,i /N, where k X,i is independent of N.

K. PROOF OF THEOREM IV.5
By Theorem IV.2, where in the second step we used that c X,i = V 1 2 ∇F X,i ≤ V 1 2 ∇F X,i . Since the distributions of δ X,i are independent of N so are E[|δ X,i |]. Hence, by Lemma IV.4, it follows that max i { ∇F X,i E [|δ X,i |]} ≤ max i ∇F X,i max i E[|δ X,i |] ≤ κ X,i /N, where κ X,i is independent of N and we used the assumption E[|δ x i |] < ∞. If supp( X ) is a sublinear function of N, (26) follows from the latter inequalities and (58).