Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints

This paper considers the optimization landscape of linear dynamic output feedback control with $\mathcal{H}_\infty$ robustness constraints. We consider the feasible set of all the stabilizing full-order dynamical controllers that satisfy an additional $\mathcal{H}_\infty$ robustness constraint. We show that this $\mathcal{H}_\infty$-constrained set has at most two path-connected components that are diffeomorphic under a mapping defined by a similarity transformation. Our proof technique utilizes a classical change of variables in $\mathcal{H}_\infty$ control to establish a subjective mapping from a set with a convex projection to the $\mathcal{H}_\infty$-constrained set. This proof idea can also be used to establish the same topological properties of strict sublevel sets of linear quadratic Gaussian (LQG) control and optimal $\mathcal{H}_\infty$ control. Our results bring positive news for gradient-based policy search on robust control problems.


I. INTRODUCTION
Inspired by the impressive successes of reinforcement learning, model-free policy optimization techniques are receiving renewed interests from the controls field. Indeed, we have seen significant recent advances on understanding the theoretical properties of policy optimization methods on benchmark control problems, such as linear quadratic regulator (LQR) [1]- [4], linear robust control [5]- [8], and Markov jump linear quadratic control [9]- [11].
It is well-known that all these control problems are nonconvex in the policy space. Classical control theory typically parameterizes the control policies into a convex domain over which efficient optimization algorithms exist [12]. An important recent discovery is that despite non-convexity, many state-feedback control problems (e.g., LQR) admit a useful property of gradient dominance [1]. Therefore, modelfree policy search methods are guaranteed to enjoy global convergence for these problems [1], [4], [9]. Note that most convergence results require a direct access of the underlying system state, in which a simple change of variables exist to get a convex reformulation of the control problems [13].
For real-world control applications, however, we may only have access to partial output measurements. In the output feedback case, the theoretical results for direct policy search are much fewer and far less complete [14]- [18]. It remains unclear whether model-free policy gradient methods can be modified to yield global convergence guarantees. It has been B. Hu is generously supported by the NSF award CAREER-2048168 and the 2020 Amazon research award. 1 Bin Hu is with the Coordinated Science Laboratory (CSL) and the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, binhu7@illinois.edu 3 Yang Zheng is with the Department of Electrical and Computer Engineering, University of California San Diego, zhengy@eng.ucsd.edu revealed that the set of stabilizing static output-feedback controllers can be highly disconnected [14]. This is quite different from the state feedback case [19]. Such a negative result indicates that the performance of gradient-based policy search on static output feedback control highly depends on the initialization, and only convergence to stationary points has been established [15]. It is thus natural to investigate dynamical controllers for the output feedback case, and to see whether the corresponding optimization landscape is more favorable for direct policy search methods. The very recent work [16] shows that the set of stabilizing full-order dynamical controllers has at most two path-connected components that are identical in the frequency domain. This brings some positive news and opens the possibility of developing global convergent policy search methods for dynamical output feedback problems, such as linear quadratic Gaussian (LQG) control [16]. Two other recent studies are [17], [18]. In [18], the global convergence of policy search over dynamical filters was proved for a simpler estimation problem.
It is well-known that the optimal LQG controller has no robustness guarantee [20]. It is thus important to explicitly incorporate robustness constraints for the search of dynamical controllers. In this paper, we study the topological properties of the feasible set for linear dynamical output feedback control with H ∞ robustness constraints. The H ∞ constraints have been widely used in robust control [12], [21] and risksensitive control [22]. Our main result shows that the set of all stabilizing full-order dynamical controllers satisfying an additional input-output H ∞ constraint has at most two pathconnected components, and they are diffeomorphic under a mapping defined by a similarity transformation. Our proof technique is inspired by [16] and relies on a non-trivial but known change of variables for H ∞ control [23], [24]. If the control cost is invariant under similarity transformation, one can initialize the local policy search anywhere within the feasible set and there is always a continuous path connecting the initial point to a global minimum. Our result sheds new light on model-free policy search for robust control tasks.
The rest of this paper is organized as follows. In Section II, we formulate the linear dynamic output feedback control with H ∞ constraints as a constrained policy optimization problem. Section III presents our main theoretical results. We revisit connectivity of strict sublevel sets for LQG and H ∞ control in Section IV. Some illustrative examples are shown in Section V. We conclude the paper in Section VI. Some auxillary proofs and results are provided in the appendix.
Notations: The set of k × k real symmetric matrices is denoted by S k , and the determinant of a square matrix M is denoted by det M . We use I k to denote the k × k identity matrix, and use 0 k1×k2 to denote the k 1 × k 2 zero matrix; we sometimes omit their dimensions if they are clear from the context. Given a matrix M ∈ R k1×k2 , M T denotes the transpose of M . For any M 1 , M 2 ∈ S k , we use M 1 ≺ M 2 (M 1 M 2 ) and M 2 M 1 (M 2 M 1 ) to mean that M 2 − M 1 is positive (semi)definite.

II. PRELIMINARIES AND PROBLEM STATEMENT
A. Dynamic output feedback with H ∞ constraints We consider a continuous-time linear dynamical system 1 where x(t) ∈ R nx is the state, u(t) ∈ R nu is the control action, w(t) ∈ R nw is the exogenous disturbance, y(t) ∈ R ny is the measured output, and z(t) ∈ R nw is the regulated performance output. We make the following assumption.
We aim to design a controller that maps the measured output to the control action, in order to minimize some control performance metric, while satisfying stability and/or robustness constraints. Such control design problems can be formulated as a constrained policy optimization of the form where the decision variable K is determined by the policy parameterization, the objective function J(K) measures the closed-loop performance, and the feasible set K is specified by some stability/robustness requirements. We consider the following policy parameterization and robustness constraint: • Decision variable K: Output feedback control problems typically require dynamical controllers, and we consider the full-order dynamical controller in the form of: where ξ(t) is the controller state with the same dimension as x(t), and matrices (A K , B K , C K , D K ) specify the controller dynamics. For convenience, we denote but this matrix K should be interpreted as the dynamical controller in (3). • Feasible region: The controller K needs to stabilize the closed-loop system and satisfy a robustness constraint that enforces the H ∞ norm of the transfer function from w(t) to z(t) smaller than a pre-specified level γ. We allow a general cost function J(K), which can be an H 2 performance on some other performance channel, or more general user-specified performance metrics. One 1 All topological results can be extended to the discrete-time domain. advantage for the policy optimization formulation (2) is that it opens the possibility of solving robust control design via model-free policy search methods. This paper aims to characterize connectivity of K and strict sublevel sets of J(K).

B. Problem statement
We denote the state of the closed-loop system as ζ = x T ξ T T after combining (3) with (1). It is not difficult to derive the closed-loop systeṁ where the matrices (A cl , B cl , C cl , D cl ) are given by The closed-loop system is internally stable if and only if A cl is Hurwitz [12]. The set of full-order stabilizing dynamical controllers is thus defined as The transfer function from w(t) to z(t) is Then, the feasible set is formally specified as where T zw ∞ denotes the H ∞ norm of T zw , and can be calculated as T zw ∞ := sup ω σ max (T zw (jω)), with σ max (·) denoting the maximum singular value. In (9), we explicitly highlight the robustness level γ via the subscript. Under Assumption 1, there exists a finite positive value Then, K γ is non-empty if and only if γ > γ . Obviously, we have K γ0 ⊂ lim γ→∞ K γ = C stab for any positive γ 0 .
In (2), it is possible to estimate the gradient of J(K) and T zw ∞ from sampled system trajectories, and one may apply model-free gradient-based barrier algorithms to find a solution in an iterative fashion. To understand the performance of such model-free policy search algorithms, we need to characterize the optimization landscape of (2). In particular, we focus on some geometrical properties of the feasible region K γ and strict sublevel sets of J(K). It is well-known that K γ is in general non-convex, but little is known about their other geometrical properties. Even for the case γ → ∞, only a very recent work shows that C stab has at most two path-connected components that are identical up to similarity transformations [16, In many cases, it is desirable to explicitly encode some robustness guarantee for the feasible region [20]- [22]. However, the connectivity of the H ∞ -constrained set K γ remains unknown. In this paper, we focus on topological properties of K γ and their implications to gradient-based policy search. We will show that K γ shares similar properties with C stab .
Remark 1: The dynamical controller (3) is proper. Depending on the cost function J(K) (e.g., LQG [12]), we may want to confine the policy space to strictly proper dynamical controllers. Then the feasible set is defined as Our analysis technique works for bothK γ and K γ , and we show thatK γ and K γ have similar topological properties.
In this section, we present our main results on the topological properties of K γ . We first have a simple observation.
This fact is well-known. Then openness of K γ follows from the continuity of the H ∞ norm. It is unbounded since H ∞ norm is invariant under similarity transformations that are unbounded in the state-space domain. The non-convexity is also known, and we illustrate it using the example below.
Example 1: Consider an open-loop unstable dynamical system (1) It is easy to verify that the following dynamical controllers .33, and thus we have K (1) ∈ K 3.33 , K (2) ∈ K 3.33 . However, fails to stabilize the system, and thus is outside K 3.33 . Despite the non-convexity, K γ has some nice connectivity property which will be established next.

A. Main results
Our first main technical result is stated as follows. Theorem 1: Given any γ > γ , the set K γ has at most two path-connected components.
Before presenting a formal proof for Theorem 1, we first give some high-level ideas. Based on the bounded real lemma [21], we have K ∈ K γ if and only if the matrix inequality, is feasible. Clearly, the condition (11) is not convex in K and P . Our result in Theorem 1 relies on the fact that (11) can be convexified into a linear matrix inequality (LMI) (that is convex and hence path-connected), using a non-trivial but known change of variables for H ∞ control [23], [24]. The only potential of disconnectivity comes from the fact that the set of invertible matrices corresponding to similarity transformations has two path-connected components. Our proof is inspired by the recent work [16] that characterizes C stab only, with the main difference being that we need to analyze a more complicated H ∞ constraint (11). We now illustrate this idea for the case of state feedback (i.e. y(t) = x(t) and u(t) = Kx(t) with K ∈ R nu×nx ). In this case, it is known that (11) is feasible 2 if and only if Using a simple change of variables K = LQ −1 , we have Since the set of (Q, L) satisfying LMI (12) is convex and the map K = LQ −1 is continuous, the set {K ∈ R nx×nu | (11) is feasible} is path-connected. The analysis above hinges upon the fact that in the statefeedback case, the non-convex condition (11) can be convexified using the simple change of variables K = LQ −1 . In the output feedback case, a similar condition can be derived using a more complicated change of variables in [24]. We will leverage this fact to prove Theorem 1. Specifically, it is known that a controller K ∈ K γ can be constructed from the solution of the following LMI condition: where X ∈ S nx , Y ∈ S nx ,Â ∈ R nx×nx ,B ∈ R nx×ny ,Ĉ ∈ R nu×nx , andD ∈ R nu×ny , are decision variables. The linear mapping M γ (X, Y,Â,B,Ĉ,D) is defined as where the blocks M ij are given by Based on LMI (13), we introduce two useful sets: It is obvious that F γ is convex and hence path-connected.
Together with the fact that the set of n x ×n x invertible matrices has two path-connected components, this guarantees that G γ has exactly two path-connected components. We shall see that there exists a continuous surjective map from G γ to K γ , and thus K γ has at most two path-connected components. A detailed proof is provided in the next subsection.
We now present the following result which is essential for the proof of Theorem 1.
Proposition 1: The mapping Φ in (18) is a continuous and surjective mapping from G γ to K γ .
Proof: It is clear that Φ(·) is a continuous mapping. To show that Φ is a mapping onto K γ , we need to prove the following statements: 1) For any arbitrary controller K ∈ K γ , there exists Z = (X, Y,Â,B,Ĉ,D, Π, Ξ) ∈ G γ such that Φ(Z) = K.

2) For all
To show the first statement, let K = D K C K B K A K ∈ K γ be arbitrary. By the bounded real lemma [21], there exists P 0 such that (11) is feasible. We partition the matrix P as Without loss of generality, we assume that det Ξ = 0 (otherwise we can add a small perturbation on Ξ thanks to the strict inequality in (11)). We further define we can verify that Now we choose (Â,B,Ĉ,D) aŝ We can then verify that M γ (X, Y,Â,B,Ĉ,D) is exactly the same as   which is clearly negative definite due to (11). Thus, we have Z = (X, Y,Â,B,Ĉ,D, Π, Ξ) ∈ G γ by the definition of G γ . Note that (22) can be compactly rewritten as Based on Lemma 2, we have Therefore, the first statement is true. The second statement reduces to the standard controller construction for LMI-based H ∞ -synthesis [24]. We complete the proof.
Remark 2: The main difference between our proof and that in [16, Proposition 3.1] is that we have used the bigger LMI conditions (11) and (13) for H ∞ control. Simpler LMI conditions was used in [16] since it focuses on stability.
Based on Proposition 1, any path-connected component of G γ has a path-connected image under the surjective mapping Φ. Consequently, the number of path-connected components of K γ will be no more than the number of pathconnected components of G γ . The number of path-connected components of the set G γ is given below.
Proposition 2: The set G γ has two path-connected components, given by Proof: First, F γ is path-connected since it is convex. The set of real invertible matrices GL nx = {Π ∈ R nx×nx | det Π = 0} has two path-connected components [25] Thus, the Cartesian product F γ × GL nx has two pathconnected components. We further observe that the mapping from (X, Y,Â,B,Ĉ,D, Π) to (X, Y,Â,B,Ĉ,D, Π, (I − Y X)Π −1 ) is a continuous bijection from F γ × GL nx to G γ . This immediately leads to the desired conclusion. We note that the proofs for Proposition 2 and [16, Proposition 3.2] are similar. As a matter of fact, Proposition 3.2 in [16] can be viewed as a special case of Proposition 2 with γ → +∞. Now Theorem 1 can be proved by combining Proposition 1 with Proposition 2.
Proof of Theorem 1: We define .
If K γ is not path-connected, the two path-connected components of K γ are exactly K + γ and K − γ . Based on Proposition 1, Theorem 1 holds. In the next section, we further discuss some implications of Theorem 1 on H ∞ -constrained policy optimization.

C. Implications for H ∞ -constrained policy optimization
To understand the implications of Theorem 1 for policy optimization, we need to formalize the relationship between K + γ and K − γ . For this, we introduce the notion of similarity transformation that is widely used in control. For any T ∈ GL nx , let T T : C stab → C stab denote the mapping given by which represents similarity transformations on C stab .
We have a result that is similar to [16,Theorem 3.2]. Theorem 2: If K γ has two path-connected components K + γ and K − γ , then K + γ and K − γ are diffeomorphic under the mapping T T , for any T ∈ GL nx with det T < 0.
Furthermore, similar to [16, Theorem 3.3], we have sufficient conditions to certify the path-connectedness of K γ .
1) K γ is path-connected if it has one non-minimal dynamical controller. 2) Suppose the plant (1) is single-input or single-output, i.e., m = 1 or p = 1. The set K γ is path-connected if and only if it has a non-minimal dynamical controller. The proofs of Theorems 2 and 3 are adapted from [16], and we provide them in the appendix for completeness. Theorems 2 and 3 bring positive news on local policy search methods for H ∞ -constrained optimization (2). If K γ is pathconnected, it makes sense to initialize the policy search from any point in the feasible set. If K γ has two path-connected components, then the initial point may fall into either of the components. If the cost function J(K) is invariant with respect to similarity transformations (e.g. the LQG cost), then both components include global minima. It becomes reasonable to initialize the policy search within either pathconnected component. The following corollary is immediate.
Corollary 1: Suppose the cost function J(K) is invariant with respect to similarity transformations, then there exists a continuous path connecting any feasible point K ∈ K γ to a global minimum of (2) if it exists.

D. The case of strictly proper controllers
We briefly discuss the case of strictly proper dynamical controllers with D K = 0, which is required in some classical control problems, including the continuous-time LQG problem [12]. The topological properties ofK γ in (9) and K γ in (10) are identical. To see this, we let Minor modification of the proofs in Sections III-B and III-C can show thatF γ is path-connected, and thatG γ has two path-connected components. The same mapping Φ in (18) is a continuous and surjective mapping fromG γ tõ K γ . Therefore, we conclude thatK γ has at most two pathconnected components and they are diffeomorphic under the similarity transformation with det(T ) < 0.

IV. REVISIT SUBLEVEL SETS IN LQG AND H ∞ CONTROL
The results in Section III can be also interpreted as the connectivity of strict sublevel sets in optimal H ∞ control. Based on (8), T zw can be viewed as a function of K, and the optimal H ∞ synthesis [12] can be formulated as Now, K γ in (9) is exactly the γ-level strict sublevel set of the optimal H ∞ control (23). Thus, Theorems 1 to 3 characterize the strict sub-level sets of optimal H ∞ control.
In addition to (23), the proof idea of using the change of variables (18) can be applied to other output feedback control problems to establish connectivity of their strict sublevel sets. For example, we can consider an H 2 formulation of the LQG control [16] as follows where T zw 2 denotes the H 2 norm of T zw . This problem (24) covers the LQG control as a special case when the dynamics in (1) are chosen appropriately (see the appendix for details). Then, the same proof techniques in Section III can establish the connectivity of the strict sublevel sets of (24): We have the following result (see the appendix for details).
γ and L (2) γ are diffeomorphic under the mapping T T , for any T ∈ GL nx with det T < 0.
Remark 3: Path connectivity of sublevel sets may imply some further landscape properties (e.g., critical points and uniqueness of minimizing sets) [26], [27]. In particular, using a special definition of minimizing sets in [27,Definition 5.1], Theorem 5.4 in [27] guarantees that the H ∞ control (23) and LQG control (24) have a unique global minimizing set in some weak sense (see the definition of LTMS in [27]). The  [27] for detailed discussions. Indeed, it is shown in [16] that saddle points exist in (24). A rigorous definition of strict local minima for (23) or (24) requires some extra work due to unboundedness of similarity transformations.

V. NUMERICAL EXAMPLES
We present two simple examples to illustrate our main result (Theorem 1). We consider the open-loop unstable system in Example 1 with A = B 1 = B 2 = C 1 = C 2 = D 21 = D 12 = 1, and D 11 = 0. To ease visualization, We consider strictly proper controllers. In the left plots of Figure 1, we visualizeK γ for γ = 50 and 2. We can see thatK γ has two path-connected components. As we decrease γ, the feasible region shrinks. In the right plots of Figure 1, we change the value of A to −1, and visualizeK γ for the resultant system. In this case,K γ are path-connected for both values of γ.

VI. CONCLUSIONS
We have proved that the set of H ∞ -constrained fullorder dynamical controllers has at most two path-connected components (cf. Theorem 1) and they are diffeomorphic under similarity transformations (cf. Theorem 2). We have also discussed various implications on direct policy search of robust dynamical controllers and on the strict sublevel sets of LQG and H ∞ control (cf. Theorem 4). An important future direction is to develop provably convergent policy search methods for H ∞ -constrained robust control problems.

A. Auxiliary proofs
Proof of Theorem 2: The proof is similar to [16,Theorem 3.2]. We provide provide a proof sketch below. It suffices to show that, for any T ∈ R nx×nx with det T < 0, the mapping T T restricted on K + γ gives a diffeomorphism from K + γ to K − γ . We only need to show that Note that detΠ = det T · det Π < 0, leading toẐ ∈ G − γ . We can further verify γ is similar. This completes the proof. Proof of Theorem 3: If K γ has a non-minimal dynamical controller, then there exists a reduced-order stabilizing controller with an internal state dimension (n x − 1), satisfying the H ∞ constraint. Denote the state/input/output matrices of this controller as (Ã K ,B K ,C K ,D K ). Then, this controller can be augmented to be a full-order controller in K γ as Define a similarity transformation matrix By the proof of Theorem 2, we can see that K ∈ K ± γ implies T T (K) ∈ K ∓ γ . On the other hand, we can directly check that T T (K) = K. Therefore, we have K ∈ K + γ ∩ K − γ , indicating that K + γ ∩ K − γ is nonempty. Consequently, K γ is path-connected.
The proof for the second statement is identical to the proof of [16,Theorem 3.3], and hence is omitted here.

B. Connectivity of strict sublevel sets for LQG
In this section, we briefly discuss the path-connectivity of the strict sublevel sets for LQG and present the proof for Theorem 4. Consider the LTI system (1). We can exactly recover the LQG setup in [16] by choosing state, input, and output matrices as where W 0, Q 0, R 0, and V 0. Then the LQG problem in [16] can be equivalently formulated as (24). Since strictly proper controllers are used, we always have D cl = 0.
We now present the proof for Theorem 4.
Proof of Theorem 4: It is well-known that we have K ∈ L γ if and only if there exist P and Γ such that, The above condition is not convex in K and P . However, we can use the same change of variables as (18) in the main text. A controller K ∈ L γ can be constructed if ∃(X, Y,Â,B,Ĉ, Γ) such that the following LMI holds 3 , where the blocks M Similarly, we can define the following set: , we have Φ(Z) ∈ L γ . The second statement reduces to the standard controller reconstruction for LMI-based H 2 synthesis, and hence is known to be true. To prove the first statement, let K = 0 C K B K A K ∈ L γ be arbitrary. Then there exists P 0 such that (26) is feasible. We partition P as (19), and define 3 Since strictly proper controllers are considered, we always haveD = 0. We thus get rid of this matrixD and only (Â,B,Ĉ) show up in the LMI.
(X, Π, T ) via (20). Then we still have Y X +ΞΠ = I. Based on (26), we have T T 0 0 I A T cl P + P A cl P B cl B T cl P −I T 0 0 I ≺ 0, which exactly reduces to (27) if we choose (Â,B,Ĉ) as defined in (22) with D K = 0. Thus, we have Z = (X, Y,Â,B,Ĉ, Γ, Π, Ξ) ∈ L γ by the definition of L γ . Now the first statement holds as desired. Therefore, the number of the path-connected components of L γ cannot be larger than the number of the path-connected components of G (LQG) γ . Finally, we can slightly modify the proof of Theorem 2 to show that the two path-connected components are diffeomorphic under similarity transformations. This completes the proof.