Exact Decomposition of Optimal Control Problems via Simultaneous Block Diagonalization of Matrices

In this paper, we consider optimal control problems (OCPs) applied to large-scale linear dynamical systems with a large number of states and inputs. We attempt to reduce such problems into a set of independent OCPs of lower dimensions. Our decomposition is ‘exact’ in the sense that it preserves all the information about the original system and the objective function. Previous work in this area has focused on strategies that exploit symmetries of the underlying system and of the objective function. Here, instead, we implement the algebraic method of simultaneous block diagonalization of matrices (SBD), which we show provides advantages both in terms of the dimension of the subproblems that are obtained and of the computation time. We provide practical examples with networked systems that demonstrate the benefits of applying the SBD decomposition over the decomposition method based on group symmetries.


I. INTRODUCTION
Optimal control of high-dimensional dynamical systems and networks has recently attracted much attention from the scientific community, see e.g., [1], [2], [3], [4], [5], [6], [7], [8]. A major issue with computing the solution of optimal control problems with many states and inputs is the computational complexity of the matrix operations that must be performed. In addition, running these operations in parallel is often nontrivial. Here we propose a novel method to decouple large optimal control problems into independent subproblems of the lowest dimension, which also provides a way to parallelize the solution. [16], [17]. Typically, this implies that information about the dominant modes/components of the system is preserved, while information about the non-dominant modes/components is removed. Our approach in this paper is to consider decompositions that are 'exact' in the sense that all the information about the original system and the original objective function is preserved.
Several studies have exploited symmetry to decouple linear OCPs involving large-scale systems into simpler subsystems. For example, Ref. [32] used symmetry to reduce the computational complexity involved in the controller synthesis of a linear system. Ref. [33] analyzed networked systems with symmetries in their interconnection topology and obtained a dimension reduction in their certification performance. Ref. [34] exploited symmetries in interior-point algorithms involved in the solution of optimization problems. References [35], [36], [37] used symmetry to reduce the computational complexity required by the solution of model predictive control problems. While all these papers have exploited symmetry to reduce the computational complexity involved in the solution of problems of different nature, in this paper we recognize that sometimes dimensionality reductions are possible even in systems that do not possess symmetry. As an example of this, the reader may consider the distinction between 'orbit partitions' and 'equitable partitions' in graph theory [38], where the former is generated by symmetry, while the latter is not. References [26], [32], [36], [39], [40], [41] have focused on a decomposition based on the symmetries of the system and, if applicable, of the objective function, which requires calculation of the irreducible representations (IRR) of the symmetry group. The goal of this paper is to compare the symmetry approach with an alternative symmetry-independent approach, which we will show yields better and faster decompositions.
In particular, we present a decomposition strategy, based on the techniques for simultaneous block diagonalization (SBD) of matrices, which is exact in the sense that it preserves all the available information about the system and the objective function (in contrast to model reduction methods) and requires fewer assumptions in comparison with the group theory approach. One example shows that our SBD approach can decompose an OCP into smaller subproblems than the symmetry-based decomposition. Furthermore, the decoupled OCPs resulting from the application of the SBD transformation have the finest size; hence, the SBD transformation is guaranteed to reduce the original problem into subproblems of the lowest dimension. Computing the decomposition with our method is also typically faster than with the symmetry approach.
In Section II, we introduce the preliminaries that are used throughout this paper. In Section III, we discuss the fundamentals of the SBD problem and briefly describe how to find the proper transformation matrix. In Section IV, we decouple the OCP for large-scale systems into a set of lower-dimensional OCPs. In Section V, we discuss how the SBD approach can be applied to a large-scale system in the form of a network of coupled sub-systems. In Section VI, we demonstrate the effectiveness of our method when compared to the symmetry-based approach by using three examples. Finally, the conclusions are presented in Section VII.

II. PRELIMINARIES
Here, we introduce the mathematical notation that will be used throughout this paper. We denote with N( ⋅ ) the null subspace of a matrix.

Definition 1:
The vectorizing function vec : ℝ n × m ℝ nm takes a matrix as input and returns a vector by stacking the columns of the matrix on top of each other.

Algorithm 1:
The SBD Transformation [46]. Taken two matrices A ∈ ℝ n × m and B ∈ ℝ p × q , the Kronecker product A ⊗ B ∈ ℝ np × mq and the direct sum A ⊕ B ∈ ℝ(n + p) × (m + q) are defined as follows, An important property of the Kronecker product is the mixed-product property: taking two square matrices C and D with the same dimension m, and other two square matrices E and F with the same dimension n, the nm-dimensional square matrix (C ⊗ E)(D ⊗ F ) = (CD ⊗ EF ). A property of the direct sum is that if G and H are square and invertible matrices, then

III. BACKGROUND
In this section, the SBD decomposition is introduced, and an algorithm to calculate the similarity transformation, which we will use throughout this paper, is provided.

Definition 2.
Simultaneous Block Diagonalization of Matrices (SBD) [43], [44], [45], [46]: Given a set of M square n-dimensional matrices S = A 1 , A 2 , …, A M , an SBD transformation is an orthogonal square matrix T (herefater, transformation matrix) with dimension n such that where the symbol ⊕ is the direct sum of matrices, l denotes the number of blocks, each block B j k is a square matrix with the dimension b j and ∑ j = 1 l b j = n.

Definition 3.
Finest SBD: The finest SBD is an SBD for which the resulting blocks cannot be further refined by any other similarity transformation [43].
Maehara, Murota et al. proposed algebraic techniques to SBD a set of matrices in [44], [45], [46]. Their approach is to find a basis that diagonalizes the *-algebra associated with the algebra generated by S. Among other applications, the SBD approach has been used to decouple the stability analysis of networks of oscillators, see for example [47], [48], [49], [50]. We present below Algorithm 1 introduced in [46] to find the transformation T. In Section I of the Supplementary Information, we discuss the effects of the sparsity level of the input matrices on the computation time of the Algorithm 1.
The mathematical proofs that provide support to Algorithm 1 can be found in [46]. Other references [46], [47], [48] also provide MATLAB codes of other numerical SBD algorithms.

IV. DECOUPLING OPTIMAL CONTROL PROBLEMS
Consider the OCP, where x ∈ ℝ n is the vector of the system states, and u ∈ ℝ m (n ≥ m) is the vector of the system inputs, and A ∈ ℝ n × n , B ∈ ℝ n × m ⋅ Q ∈ ℝ n × n , Q ≻ 0, and R ∈ ℝ m × m , R ≻ 0.F ∈ ℝ n × m , where now all the matrices A, B, Q, R, F ∈ ℝ n × n , and both vectors x ∈ ℝ n , and u ∈ ℝ n . In (3) we set, where the additional input vector u O (t) ∈ ℝ n − m is appended to u(t) so that u ∈ ℝ n , and is an arbitrary positive definite matrix. The additional inputs do not affect the dynamics since their corresponding coefficients in B and F are zero.

Remark 1:
The solution u * (t) of the OCP with (2) coincides with the solution u * (t) of the OCP with (3), This follows from the fact that the particular choice of R O and u O is independent of the original OCP. The modified OCP will break into two independent OCPs, one in u, and one in u O .
The orthogonal similarity transformation matrix T, is applied to decouple the original large problem into a set of L lower dimensional problems by simultaneously block diagonalizing the following matrices,  The blocks A l , B l , Q l , R l , F l all have the same size ∑ l = 1 L n l = n. (3b) is pre-multiplied by T ⊤ , T ⊤ ẋ(t) = T ⊤ AT T ⊤ x(t) + T ⊤ BT T ⊤ u(t) . (7) By defining z(t): = T ⊤ x(t) and v(t): = T ⊤ u(t), and using results from (6) and (7), (3b) is rewritten, The cost J from (3a) is rewritten in terms of the new variables by setting x(t) = T z(t) and So, the transformed OCP is

Lemma 1:
The large dimensional optimization problem from (9) decouples into a set of smaller independent sub-problems, Proof: After applying the transformation T, the OCP from (3) is transformed to, where J l = 1 2 ∫ 0 t f z l ⊤ Q l z l + v l ⊤ R l v l + 2z l ⊤ F l v l dt, and ∀i, j, i ≠ j, J i and J j are independent from one another and J i = J i z i , v i ≥ 0, and J j = J j z j , v j ≥ 0. Also, the states z i , z j and the inputs v i , v j are decoupled, so we conclude the large OCP is decoupled into L independent OCPs.
This concludes the proof.

Remark 2:
The dimension of the controllable and observable subspaces and the open-loop poles of the transformed system (8) are the same as those of the system (3).
This is due to the fact that we have obtained the transformed system by application of a similarity transformation, which preserves the rank and the eigenvalues of the matrices [51].
We have provided an algebraic method to decouple a large OCP into a set of L smaller OCPs. There are two advantages of this decomposition. The first one is that the lowerdimension OCPs are easier to solve since the algebraic operations have lower computational complexity for lower-dimension matrices; the second one is that the lower-dimension OCPs can be solved in parallel (since they are independent of one another.) Next, we discuss in detail how the closed-form solutions of the sample OCP (3) is affected by the SBD transformation. The Hamiltonian for the OCP (3) is, where λ(t) ∈ ℝ n is the time-varying costate vector. By solving the following equations simultaneously, the optimal solution can be found. From (12c), we obtain: Now, (12a) and (12b) should be solved for x(t) and λ(t). By assuming λ(t) = P x(t) + ξ(t), we write, To decouple x and ξ, we find P that sets Q = 0. Note that Q = 0 is the standard form of the algebraic Riccati equation in the matrix P [52], After solving for P, (14a) can be solved for ξ and x independently. To find the analytical Finding u * (t) is now straightforward: by solving (15) for P, and then solving (14a) for x(t) and ξ(t) to evaluate λ(t), we can be substitute back in (13) to find u * (t).
In general, solving (15) and (16) can be computationally challenging for large n. Using the SBD decomposition, the n × n algebraic Riccati equation from (15) and the n × n controllability Gramian from (16) are broken into L smaller independent equations in the blocks of the matrices P = ⊕ l = 1 L P l , and W = ⊕ l = 1 L W l , The computational complexities associated with solving the Riccati equation and the controllability Gramian are O R n 3 [53] and O G n 4 [54], respectively. By means of the decomposition, the overall complexity reduces to ∑ l = 1 L O R n l 3 ≤ O R n 3 , and where the equality corresponds to the case L = 1.

Remark 3:
The solution of (17) can be transformed back to obtain the solution of (15), i.e., P = T P T ⊤ .
To show this, consider (17) and construct P , After pre-multiplying and post-multiplying (21) by T and T ⊤ , respectively, we get, Hence, P = T P T ⊤ is the transformed solution of the original Riccati equation. The controllability Gramian also reduces to smaller-dimension Gramians after the application of the transformation T. Similar derivation for W will show that Ŵ = T W T ⊤ .

Remark 4:
We note here that the case of the infinite horizon OCP can be decomposed in a similar fashion, see Section II of the Supplementary Information.
To conclude, the large OCP in (3) has been decoupled into a set of smaller OCPs in (10) by the simultaneous application of the orthogonal similarity transformation matrix T to the system dynamics and the quadratic cost. By solving the lower dimensional OCPs separately, the control law for the large system is then recovered.

V. DECOUPLING NETWORKED SYSTEMS
Networked systems are formed of individual systems coupled together to form a network. Many real-world systems are naturally described as networked systems. For instance, social networks, power grids, and computer networks are all examples of networked systems. A large literature has dealt with optimal control of networked systems [55], [56], [57], [58], [59], [60], [61], [62], [63].
In this paper, we consider a networked system defined by the graph G(V, ℰ), where V = i = 1, …, N is the set of N nodes/vertices and ℰ ⊆ V × V is the set of edges, i.e., ordered pairs of nodes (i, j) ∈ V × V. Each node of the graph is a subsystem and an edge between two nodes represents coupling between them. We also assume that the graph G is heterogeneous, i.e, it contains different types of nodes. Thus, the set of nodes V can be partitioned into S clusters formed of nodes of the same type, The dynamics of this networked system consisting of N n-dimensional interacting sub-systems obeys [64], i = 1, …, N and i * = 1, …, S. The subscript i * labels the unique cluster C i * to which node i belongs. Hence, i * also identifies the dynamical matrices A i * , B i * , and H i * for the sub-system i. The term A i * x i (t), A i * ∈ ℝ n × n describes the dynamics of each sub-system, the term B i * u i (t), B i * ∈ ℝ n × m describes the effect of the control signals on each subsystem, and ∑ j = 1 N G ij H j * x j (t), G ∈ ℝ N × N and H j * ∈ ℝ n × n is the overall coupling that sub-system i receives from the other sub-systems in the network. The symmetric matrix G = G ij is the adjacency matrix that describes how the individual network systems are coupled to one another, namely if systems i and j are coupled, G ij = G ji ≠ 0, otherwise G ij = G ji = 0.

Definition 4:
We define the N-dimensional indicator matrix E s as, Given a general networked system represented by the graph G, and system dynamics (23), one can follow the 4 steps below to decouple the system into multiple OCPs and solve them in parallel to reduce the computation time.

1) Obtain the State Space Representation:
The first step is to convert the dynamics of the system to a standard state space form. We do so by first reordering the system states. Without loss of generality, the first N 1 sub-systems are set to be of type 1, the second N 2 sub-systems to be of type 2, etc. This results in the following indicator matrices, E s for s = 1, …, S The new re-ordered system can now be easily converted to the desired form. Using (25), (23) can be rewritten in the following compact form, where It can be seen that (26) is a standard state space representation with the vector X representing the states and the vector U representing the inputs.

2) Define the Cost Function:
The second step is to define a cost function depending on the particular OCP. Given the system is in the standard state space representation, a quadratic cost function can be written as, The matrices Q and W Q are positive semi-definite (and so is the matrix Q ⊗ W Q ). The matrices R and W R are positive definite (and so is the matrix R ⊗ W R ). In particular, Q ∈ ℝ N × N and R ∈ ℝ N × N are weighing matrices for the network systems. These matrices contain information on the weights given to the states and inputs of each sub-system relative to the other sub-systems. Instead, the matrices W Q ∈ ℝ n × n and W R ∈ ℝ m × m are weighing matrices for the individual sub-system states and inputs respectively. These matrices measure how the states (inputs) of a single sub-system are weighted relative to other states (inputs) of the same sub-system. For example, by properly choosing these matrices, one can drive individual sub-system states towards desirable values while limiting the effort to do so.

3) Add Additional Input Channels if needed:
This is done in an analogous way to the process to obtain (3) from (2) described in Section IV. Namely, a sub-system with m number of input channels, m < n needs to be transformed to a subsystem with n number of input channels. This can be done by adding nm input channels; hence u i ∈ ℝ n and U ∈ ℝ Nn , and we set their corresponding coefficients to zero in the input matrix, B i * . Doing so does not change the original dynamics of the

4) Apply the SBD Transformation:
The final step is to apply the SBD transformation. In order to show how the system dynamics and the objective function are decoupled by our proposed method, we present some key properties of the SBD transformation matrix. In particular, we define the concept of a canonical SBD transformation.

Definition 5:
The transformation matrix T is a canonical SBD transformation if each indicator matrix, E S , s = 1, …, S, maps back to itself (but with permuted rows and columns) after the application of the transformation T. A canonical transformation matrix is block-diagonal in S blocks, and the size of each block is equal to N 1 , N 2 , …, N S .
This follows from the property that in order to commute with E 1 , E 2 , …, E S , the matrix U from Algorithm 1 must have the same block diagonal structure as E S , s = 1, …, S. Hence, since the columns of the matrix T are the eigenvectors of U, T must have the same block structure as U. Moreover, because of the block structure of both the matrix T and the matrices E S , s = 1, …, S, we have that T T E S T maps back to E s , but possibly with permuted rows and columns.
Next, we consider the application of a canonical SBD transformation matrix T to the optimal control problem (26). The matrix T = SBD E 1 , …, E S , G, Q, R (28) transforms the matrices E 1 , …, E S , G, Q, R into a block diagonal form with P blocks of dimension n p , p = 1, …, P , ∑ p = 1 P n p = N and G, Q and R into G, Q and R respectively.
where the M s matrices are the transformed indicator matrices. Note that each one of the matrices M s is a diagonal matrix where the elements on the main diagonal are the same as those of the matrices E s but permuted and consequently have the same block-diagonal structure as G = T ⊤ GT , Q = T ⊤ QT , and R = T ⊤ RT . We now pre-multiply (26) by T ⊤ ⊗ I n , and define the following: Thus, (26) can be re-written, From (32), it is clear that X(t) = T ⊗ I n z(t) and U(t) = T ⊗ I n v(t). Therefore, the cost function from (27) is rewritten as, Since the states and inputs are of the form, it is clear that the system dynamics has been decoupled into P independent equations and the cost function into P independent cost functions, where p = 1, …, P , and J = ∑ p = 1 P J p .
Each OCP described by (35) can be solved independently and in parallel. Furthermore, the solution to the original OCP can be recovered by inverting the SBD transformation.

VI. CASE STUDIES
The computation time for solving OCPs using our proposed approach depends on two major factors; 1) the dimensions of the OCPs resulting from the decomposition and 2) the time to compute the transformation matrix.
The two examples that follow show that our proposed method based on the SBD transformation outperforms methods based on symmetry in both these aspects. A third example of HVAC control is used to demonstrate the effectiveness of our method.

A. EXAMPLE 1: MASS-SPRING-DAMPER SYSTEM
With this example we show that the reduction based on our method produces finer blocks than the symmetry method. We compare the reduction from the SBD transformation with that of an IRR transformation applied to a system of connected mass-spring-dampers. Fig.  1 shows the topology of the connections between the masses. Here each node represents a mass and each edge represents a connection (spring and damper) between the connected masses.
Following the steps outlined in Section V, we first derive the state space representation of the system. Since all our masses are the same, S = 1, so we set E S = I 8 , A S = A, H s = H, and B s = B, Note that the second column of the matrix B with entries that are all zeros is used here to make the dimensions of the matrices in (36) consistent. The adjacency matrix corresponding to the graph in Fig. 1 The compact dynamical equation of the coupled masses via springs and dampers is For step 2, we consider a quadratic cost function as in (27). We set R = I 8 and, . (39) This choice of Q indicates that we want to force masses 5 and 6 to go to the origin, while we want masses 1 and 2, masses 3 and 4, and masses 7 and 8 to follow the same trajectory (i.e., we want the difference between their positions and velocities to go to zero.) We note that the matrices W Q and W R do not play a role in finding the transformation T, but they are required to find the optimal control input U * (t).
Next, we compare our approach with the symmetry based approach. In order to compute the transformation matrix P that transforms the system in the irreducible representation coordinates, we use the algorithm proposed in [27]. A Python code of this algorithm is also provided in [65].
transforms G and Q into Ĝ = P ⊤ GP and Q = P ⊤ QP , We see that the original N = 8-dimensional problem is decoupled into two 3-dimensional problems and two 1-dimensional problems by using the symmetry-based decomposition.
We now consider the SBD approach, and using the algorithm presented in [46], we compute the transformation matrix T = SBD(G, Q) which is equal to (42) shown at the bottom of the next page, and transforms the matrices G and Q into G = T ⊤ GT and Q = T ⊤ QT . We see that the transformation derived from SBD decomposes the N = 8-dimensional problem into one 3-dimensional, one 2-dimensional, and three 1-dimensional problems. Therefore, this example shows that the decomposition using the SBD transformation is finer (the blocks generated are smaller) than the symmetry-based transformation.

B. EXAMPLE 2: POWERGRID SYSTEM
The purpose of this example is to show that the SBD decomposition can be computed in less time than the symmetry-based decomposition. Here, we use the Irreducible Representation (IRR) code from Ref. [27] and the SBD code from Ref. [47]. Our comparison is applied to real power grid networks of increasing dimension, i.e., with an increasing number of busses. For all the networks that we consider, the equitable and orbital partitions are the same [28], therefore, the inputs provided to both codes are exactly the same, i.e., one adjacency matrix that describes the topology of the network, and indicator matrices that provide information on the clusters to which each node belongs. Table 1 contains information about the selected power grids. Fig. 2 shows the CPU-time required for running both algorithms (SBD and IRR). We use an Intel Core i7-10700 CPU to run the algorithms. Power grids are chosen such that at least one non-trivial orbital partition (a partition with more than one node in it) is present. The computation complexities associated with the SBD code (Algorithm 1) and the IRR code are O N 4 and O(N !), respectively, where N is the number of nodes in the network. Fig. 2 shows that the CPU time grows faster as N increases for the IRR code compared to the SBD code.

C. EXAMPLE 3: HVAC SYSTEM
Here we show how the run-time required by the solution of a large-dimensional HVAC optimal control problem is reduced by using the SBD decomposition method. We consider a graph-based HVAC model that was originally introduced in [66]. This model is linear time-invariant and continuous time. It takes the temperature of the walls and zones as states, and the control strategy is to regulate the zones' temperatures while using the minimum amount of energy. The time evolution of the temperatures is modeled with an RC circuit consisting of lumped thermal resistors and capacitors, whose resistances and capacitances are calculated based on [18]. The procedure to find the linear model is described in Algorithm 1 of [66]. First, an undirected weighted graph is created, for which nodes represent either the walls, the zones, or the ambient. Nodes that are thermally connected share an edge. Connections are between walls and zones, and between the walls and the ambient. Internal walls connect zones and external floor/ceiling walls connect the zones and the ambient. The weight of a connection is the inverse of the thermal resistance between the two connected nodes. We assumed that the resistance of the floor/ceiling walls is greater than that of the internal walls by a factor of 4, and the resistance of the external walls is greater than that of the internal walls by a factor of 1.1. The Laplacian matrix corresponding to the graph of the network formed of walls, zones, and ambient is equal to L = D − A where A is the adjacency matrix that describes the topology of the HVAC system, and D is a diagonal matrix with entries on the main diagonal equal to the row sums of the matrix A.
Next, the continuous time model for the HVAC problem is generated based on the remaining steps of Algorithm 1 from [66]. Discretization of the model is done using a 10 min sampling time. For simplicity, since our focus here is on comparing run-times between the original and the transformed OCP, we assumed that the ambient temperature is 0°C and no disturbance is present. Hence, the dynamical equation of the system is, where T k wall is the vector of the temperatures of all the walls and T k zone is the vector of the temperatures of all the zones at time k. Note that we have added an appropriate number of zero columns to the matrix B in order to make it square. The cost function is quadratic in the form, where Q is a positive semi-definite diagonal matrix, with entries on the main diagonal that are either one for the zone states, or zero for all the other states. The matrix R is a positive definite diagonal matrix, such that R = R ⊕ I, where I is an identity matrix with appropriate dimension; and for simplicity, we set R to be a diagonal matrix with all entries on the main diagonal equal to 0.1. As a result, the original OCP can be written, The total number of states is equal to the total number of elements coupled together, i.e., the number of ceiling/floor walls, the number of ambient walls, the number of internal walls, and the number of zones. For the examples in Fig. 3, the number of ceiling/floor walls is equal to 2N z ; the number of ambient walls is equal to 4 N z ; the number of internal walls is equal to 2 N z − N z ; and the number of zones is equal to N z . Then the total number of states of the linear discrete-time system is equal to N sys = 5N z + 2 N z . As a benchmark, we timed the Matlab command dlqr using the Matlab command timeit. The command dlqr finds the optimal feedback gain matrix for the linear quadratic regulator problem in discrete time. This command first solves the algebraic discrete-time Riccati equation and then computes the optimal gain matrix. The average CPU times out of 20 simulations for the original discrete-time OCP and the transformed OCP are shown in Fig. 4. We see from Fig. 4 that the CPU times for the transformed problems are significantly lower than those of the original problem since the decoupled OCPs are lower in dimension and they can be solved using parallel computing. For this particular problem, we were able to break the original large OCP into several smaller OCPs with dimensions equal to or smaller than 2. As an example, in Fig. 5 the matrices A, B, Q, and R from (46) for the case with N z = 2 2 are visualized before and after application of the transformation T.
We remark that the dimensionality reduction achieved in Fig. 5 is very good as it enables the solution of several OCPs with much reduced computational effort. Table 2 provides detailed information about the dimension of the linear systems, and the number of subsystems of dimension 1 and 2 obtained after the application of the transformation. Overall the table demonstrates the usefulness of the SBD approach in decomposing these OCPs.
We note that in the process of generating the dynamical matrices A and B, the Laplacian matrix that describes the connectivity of the zones, walls, and the ambient is broken into pieces; see Step 2 of Algorithm 1 in [66]. Since the IRR code [27] requires either the adjacency matrix or the Laplacian matrix as an input, we were unable to use the IRR code to decompose this HVAC problem.
In Fig. 6, the memory, in kilobytes (kB), measured by the MATLAB command whos to store data for running dlqr in MATLAB is plotted as the number of zones increases. Since the application of the SBD transformation sets the transformed matrices into a block diagonal form, this results in substantial memory saving, as the dimension of the matrices increases.

VII. CONCLUSION
In this paper, we have shown that an SBD-based strategy can be conveniently adopted to decouple optimal control problems into lower dimensional problems. These lower dimensional OCPs can then be solved in parallel to achieve faster computation runtimes. In particular, we have shown that the SBD transformation can be used to break the original Riccati equation and the original controllability Gramian into a set of lower-dimensional (and easier to solve) Riccati equations and controllability Gramians, respectively. We have called these reductions 'exact' because no information about the original system and the objective function is lost with the decomposition. Therefore, important properties such as stability and the dimension of the controllable and observable subspaces are preserved while the computation time is reduced.
We have used three examples with application to networked systems to demonstrate the advantages of our proposed method when compared to the IRR reduction method. The first example of a simple spring-damper mechanical system demonstrates that the SBD approach decouples the system into lower-dimensional subproblems than the IRR decomposition.
In the second example, we have used real data about the topology of several power grid networks of various sizes, and shown that the computation time to compute the SBD transformation is always lower than the IRR method. Finally, our third example combines both these aspects and demonstrates the application of the method to a real-life example; an HVAC control problem.
FRANCESCO SORRENTINO (Senior Member, IEEE) received the master's degree in industrial engineering and the Ph.D. degree in control engineering from the University of Naples Federico II, Naples, Italy, in 2003 and 2007, respectively. His expertise is in dynamical systems and controls, with particular emphasis on nonlinear dynamics and adaptive decentralized control. His work includes studies on dynamics and control of complex dynamical networks and hypernetworks, adaptation in complex systems, sensor adaptive networks, coordinated autonomous vehicles operating in a dynamically changing environment, and identification of nonlinear systems. He has authored or coauthored more than 50 papers in international scientific peer-reviewed journals. His research interests include applying the theory of dynamical systems to model, analyze, and control the dynamics of complex distributed energy systems such as power networks and smart grids. Subjects of the current investigation are evolutionary game theory on networks (evolutionary graph theory), the dynamics of large networks of coupled neurons, and the use of adaptive techniques for dynamical identification of communication delays between coupled mobile platforms.  We plot the CPU time to run the SBD code and the IRR code for 11 selected power grids with different numbers of nodes. Additional information on the topology of these networks is provided in Table 1. We see that the SBD algorithm is faster in all cases. As the number of nodes increases, the CPU time for both codes increases, but the CPU time for the IRR algorithm grows more rapidly. Base structure of the zones and walls for three different cases corresponding to different number of zones: N z = 2 2 , N z = 3 2 , and N z = 4 2 . Narrow black lines indicate the internal walls, and thick black lines are the external walls. Floor/ceiling walls are not shown here. The CPU time for running dlqr Matlab command for the original and transformed OCPs for the HVAC example. Here we plot the CPU time corresponding to the transformed OCP, which is obtained by running the decomposed OCPs in parallel on 16 workers. HVAC problem with N z = 2 2 : comparison between the matrices A, B, Q, and R from (46) before and after application of the transformation matrix T. The nonzero entries of the original matrices and of the transformed matrices are shown as blue squares. A very good reduction is achieved in this case as the original 24-dimension system is decoupled into 20 subsystems of which 16  Real power grids. For each power grid network, we report the number of nodes N, the number of edges E, and the number of non-trivial clusters C. All networks are undirected and unweighted.