Polynomial Controller Synthesis of Nonlinear Systems With Continuous State Feedback Using Trust Regions

We present a novel, correct-by-construction control approach for disturbed, nonlinear systems with continuous state feedback under state and input constraints. For the first time, we jointly synthesize a feedforward and feedback controller by solving a single non-convex, continuously differentiable approximation of the original synthesis problem, which we combine with a trust-region approach in an iterative manner to obtain non-conservative results. We ensure the formal correctness of our algorithm through reachability analysis and show that its computational complexity is polynomial in the state dimension for each trust-region iteration. In contrast to previous work, we also avoid the introduction of several algorithm parameters that require expert knowledge to tune, making the proposed synthesis approach easier to use for non-experts while guaranteeing state and input constraint satisfaction. Numerical benchmarks demonstrate the applicability of our novel synthesis approach.


I. INTRODUCTION
Many newly developed autonomous systems are safetycritical, such as automated vehicles or robots acting in human environments.As these autonomous systems become more capable, their increased complexity makes it virtually impossible to manually design controllers that always ensure their safety.
A wide range of safety-critical tasks for these autonomous systems can be classified as reach-avoid problems: An agent tries to steer a system to a given target while ensuring constraints on both the input and the state of the system.Human-robot collaboration for pick-and-place tasks or package delivery using drones can be interpreted as such reach-avoid problems.While these tasks have been successfully solved, their operational safety often cannot be guaranteed, which is especially problematic for safety-critical applications.

A. RELATED WORK
To solve these challenging problems while guaranteeing safety, many different approaches -surveyed subsequentlyexist.a) Hamilton-Jacobi equations: Hamilton-Jacobi equations provide a flexible way for computing reachable sets of systems with input and state constraints [1].A reachable set contains all system states that are reachable within a given time frame for a given initial set and input set.Although approaches using the Hamilton-Jacobi equations generally scale exponentially with the number of states, they are rather popular since they naturally compute optimal controllers for differential games [2] -this is rather difficult to realize with other methods that only solve optimal control problems without any opponent.While these approaches synthesize controllers for systems described by ordinary differential equations, the method from [3] can synthesize robust and optimal feedback controllers for systems given by nonlinear partial differential equations.In summary, all approaches using Hamilton-Jacobi equations suffer from the curse of dimensionality, induced by the inherent complexity required to solve them.b) Model abstractions: Another popular branch of research for controller synthesis uses model abstractions.Especially popular among these are symbolic model abstractions, which essentially abstract hybrid, i.e., mixed discrete and continuous dynamics, to a purely discrete system [4], [5], [6], [7], [8], [9], [10], [11], [12].This makes it possible to directly use the synthesis approaches developed by the computer science community for purely discrete systems.While earlier work was often only applicable to certain system classes [9], [13], [14], quantized input or states [15], [16], or required certain stability assumptions [17], [18], more recent approaches apply to a broader range of system classes and even consider complex task specifications formalized by temporal logic [19], [20], [21], [22], [23], [24].The abstraction to discrete systems leads to an exponential worst-case complexity in the number of state variables and consequently often requires a prohibitively large amount of memory for higher-dimensional systems.Thus, there has recently been an increased focus on abstraction techniques that do not require state space discretization.However, these techniques either use mixed-integer linear programming (MILP), whose worst-case complexity is exponential in the number of integer variables [25], mainly focus on linear systems [26], or use specific system properties to more efficiently compute abstractions [27].c) Control Lyapunov and control barrier functions: A possible way to circumvent the exponential complexity of abstraction-based approaches is to synthesize controllers based on Lyapunov-like functions.However, this only translates the controller synthesis problem to a synthesis problem of the required Lyapunov-like functions, which is often not much easier to solve.Mainly two techniques have been proposed so far: Control Lyapunov functions (CLFs) [28] and control barrier functions (CBFs) [29].Recent methods combine both techniques [30], [31], [32], [33], where CBFs encode constraints to ensure safety and CLFs guarantee the stability of the system.Since, by construction, both can be included in the optimization problem computing the optimal controller, they allow one to synthesize stable controllers adhering to both input and state constraints.To derive these barrier functions, however, all mentioned works restrict the class of nonlinear dynamics in most cases to control-affine systems.
In contrast to formulating CBFs and CLFs manually, the works in [34], [35] solve reach-avoid problems for nonlinear systems by computing linear-quadratic regulators (LQRs) along candidate trajectories, where the region of attraction along these trajectories is estimated using automatically computed Lyapunov functions.While these approaches apply to general nonlinear dynamics, they only provide probabilistic guarantees.The work in [36] uses a similar approach as in [34] for piecewise affine (PWA) systems instead of nonlinear systems.However, the authors use mixed-integer programming, which has exponential worst-case complexity in the system dimension.d) Model predictive control: Model predictive control (MPC) has been used extensively in industry for the past decades [37], [38] due to its ability to easily handle input and state constraints.Since early variants of MPC were not able to consider disturbances, tube-based MPC was introduced, which essentially keeps the system within a tube around a reference trajectory [39], [40], [41], [42], [43].In contrast to, e.g., abstraction-based methods, which compute their controllers offline and then simply choose an appropriate controller online, implicit MPC computes the control inputs online.To combat potentially long online computation times due to repeated optimization, explicit MPC aims to solve the optimization problem offline directly [44], [45]; however, its computation often becomes exponential in the number of continuous state variables due to the necessity of state space partitioning when explicitly solving more complex optimization problems.That said, recent advances in convex optimization theory enable real-time applications of robust MPC for selected classes of systems, such as linear systems [46], [47].e) Reachability analysis: Reachability analysis has gained attention as an efficient tool for verification tasks [48].In [49], the authors combine reachability analysis and nonlinear optimization to directly synthesize a feedback controller for linear time-invariant systems.Since this approach requires the recomputation of the reachable set in each optimization iteration, the work in [50] computes an approximation of the reachable set, which is directly parameterized in the control parameters to be computed.Thus, it is possible to quickly and efficiently synthesize a linear, piecewise constant controller for general, input-constrained nonlinear systems by a single linear program.Due to the nature of piecewise constant controllers, however, performance degrades with decreasing sampling frequency or when disturbances dominate.To remedy this shortcoming, the authors of [51] combine the feedforward control computation from [50] and the feedback optimization from [49], enabling controller synthesis that is provably correct while respecting input and state constraints at the cost of computing the reachable set in each optimization iteration.In [52], the authors propose a generalization of this combined controller synthesis to realize polynomial feedforward controllers.

B. CONTRIBUTIONS
To circumvent the recomputation of reachable sets during each optimization iteration, we propose an algorithm whichfor the first time -combines the synthesis of a piecewise constant, polynomial, state-dependent feedforward controller and a continuous state feedback controller.This combined synthesis avoids the introduction of additional algorithm parameters as in previous works, which need to be tuned by experts.To that end, we combine approximations for the undisturbed feedforward reachable set and the disturbance tube, which enlarges the undisturbed feedforward reachable set, to VOLUME 2, 2023 formulate a single non-convex, continuously differentiable optimization problem for the combined synthesis.To ensure the accuracy of this approximated optimization problem, we restrict its domain using trust regions, i.e., areas of the control parameters for which we trust the accuracy of the aforementioned approximations.The trust regions are iteratively updated, and we employ overapproximative reachability analysis in each iteration to guarantee formal correctness.Because these overapproximative reachable sets are available after each iteration, we only approximate the relative difference to these known, tight sets.As a result, we are robust against approximation errors because they do not accumulate over the iterations.In summary, our approach is fully automated and only requires the user to provide the number of controllers and the polynomial order of the feedforward controllers in the initial state (e.g., linear or quadratic).

II. PRELIMINARIES
We first introduce necessary notation and define the class of systems considered in this article as well as the concept of reachable sets.We then briefly describe all required set representations.

A. NOTATION
We denote with R, R ≥0 , R + , N, N + , S, and S ++ the sets of real numbers, non-negative real numbers, positive real numbers, natural numbers, positive natural numbers, symmetric matrices, and the cone of positive-definite matrices.The identity matrix of dimension n is denoted by I n , and we define 1 n and 1 n×m as the n-dimensional all-ones vector and the n-bym-dimensional all-ones matrix, respectively.For two vectors w ∈ R n and v ∈ R n , we introduce the multi-index notation w v = n i=1 w v i i .We use diag(v) ∈ R n×n to construct a matrix with v ∈ R n as its diagonal elements.Given a matrix M ∈ R m 1 ×m 2 , M (:) ∈ R m 1 m 2 is the vector that results from stacking all columns of M. Further, for a function x : R ≥0 → R n and time t ∈ R ≥0 , we define ẋ(t ) = dx (t )  dt .We denote sets by upper-case calligraphic letters.The Minkowski sum of two sets A ⊆ R n and B ⊆ R n is defined as for the Landau notation of the asymptotic computational complexity.

B. DEFINITIONS
In this article, we synthesize controllers of a given system for a set of initial states instead of a single initial state.Hence, we first introduce the considered system class and then use it to introduce the notion of reachable sets.
Definition 1 (System): We consider a system with dynamics ẋ(t ) = f (x(t ), u(t ), w(t )) ∈ R n x , where f is a twice continuously differentiable function, x(t ) ∈ R n x denotes the state of the system at time t ∈ R ≥0 , u(t ) ∈ U denotes the controllable input from the input set U ⊂ R n u , and w(t ) ∈ W is the uncontrollable disturbance from the disturbance set W ⊂ R n w .Further, X (0) ⊂ R n x denotes the set of initial states, i.e., x(0) ∈ X (0) .Definition 2 (Reachable Set): For a system as given in Definition 1, we define the reachable set as Since in general, computing the exact reachable set R (e)  x is not possible [53], we use overapproximative reachable sets R x to ensure formal correctness and approximative reachable sets Rx for the iterative optimization of the controller.For easier readability, we do not explicitly state the dependence of R (e)  x , R x , or Rx on the initial set X (0) , input set U , and disturbance set W.
We require different set representations.For convenience, we first describe the concept of a generating function to generate a set.Then, we introduce zonotopes as a popular set representation for reachability analysis and polynomial zonotopes [54], [55] to represent non-convex sets.Subsequently, we define ellipsoids, which are helpful due to their compact representation size, and H-polytopes to intuitively represent constraints.Lastly, we introduce support functions, which enable us to easily extend a set in a given direction.
Definition 3 (Set Generation): We define where s : [−1, 1] p×m → R n is the generating function of S ⊂ R n and ∈ [−1, 1] p×m are the dependent factors of S. We say that S is generated by s( ) over .Definition 4 (Zonotope): A zonotope Z = c, G Z with center c ∈ R n and generator matrix G ∈ R n×m is given by where c + Gν is its generating function with dependent fac- Definition 5 (Polynomial Zonotope): Let c ∈ R n be the starting point, G = g (1) , ..., g (m) ∈ R n×m the generator matrix with generators g (i) ∈ R n for i ∈ {1, . .., m}, and E = e (1) , ..., e (m) ∈ N d×m the exponent matrix of a polynomial zonotope.Its generating function is defined as with dependent factors ν ∈ [−1, 1] d and where we used multi-index notation defined in Section II-A, so that we can construct a polynomial zonotope as PZ = {h(ν)} ν .
Definition 6 (Ellipsoid): An ellipsoid E = c, Q E with center c ∈ R n and shape matrix Q ∈ S n×n ++ is defined by We only consider non-degenerate ellipsoids, i.e., Q −1 exists, in this article.For a given matrix M ∈ R m×n , the linear map of an ellipsoid is [56] M c, Q E = Mc, MQM T E .Definition 7 (H-Polytope): An n-dimensional H-polytope C, d H with the matrix C ∈ R m×n and offset d ∈ R m is given by where L T = l (1) , . . ., l (o) ∈ R n×o .In this article, sets will often be parameterized, e.g., M(z) ⊆ R n for z ∈ R m .The corresponding support function will append these arguments, i.e., the support function for M(z) is then given by ρ M (l, z).We are now ready to formulate the problem statement and propose our solution concept.

III. PROBLEM STATEMENT AND SOLUTION CONCEPT
For the remainder of this article, we assume that the initial set X (0) and disturbance set W are zonotopes.We make no assumptions about the statistical nature of W but take W to be centered at 0; the system dynamics can always absorb any non-zero center.
We want to synthesize a controller u(t, x(t )) -dependent on time t ∈ R ≥0 and state x(t ) ∈ R n x -that steers a set of initial states X (0) ⊂ R n x as close as possible to a target state x f ∈ R n x within time t f ∈ R + while bounded input constraints U ⊂ R n u and (possibly unbounded) state constraints X ⊆ R n x , both given as H-polytopes, are to be respected, i.e.
To obtain a tractable optimization problem, we parameterize the controller for each time interval t ∈ τ where m ∈ N + is the number of piecewise constant feedforward and feedback controller pairs, P (i) ∈ [−1, 1] n u ×a are the feedforward control parameters (a ∈ N + is the number of feedforward control parameters per input dimension), P = P (0) T , . . ., P (m−1) T T collects all m feedforward parameter matrices, K (t ) = K (i) ∈ R n u ×n x for t ∈ τ (i) are the feedback gain matrices, and x ff (t, P) is defined as the solution of ẋff (t, P) = f x ff (t, P), u ff x(0), P (i) , 0 , ( for t ∈ τ (i) and x(0) = x ff (0, P) ∈ X (0) .We often omit indices and time dependencies where clear from context for readability, e.g., we write K instead of K (i) .The feedforward state x ff can be interpreted as the state of the undisturbed system and the proposed controller in (2) aims to follow the feedforward state as closely as possible by bringing the deviation vector x(t, P, K ), implicitly defined by as close to zero as possible, where x(t, P, K ) ∈ R x (t, P, K ), with R x (t, P, K ) being an overapproximative, closed-loop reachable set of the disturbed flow using the controller in (2).We denote by R x ff (t, P) the feedforward reachable set from the flow in (3) containing all x ff (t, P) and construct S x (t, P, K ) to contain all state deviations x(t, P, K ) as defined in (4).When interpreting (4) in a set-based manner, we accept some conservatism and replace the exact sum by the Minkowski sum -neglecting dependencies between setswhich yields As shown later in Section VII, the added conservatism is negligible because both sets are only loosely coupled by the initial state.Let u(t, P, K ) = K (t ) x(t, P, K ) ∈ S u (t, P, K ), where S u (t, P, K ) denotes the set of input deviations.We can then similarly write where S u (t, P, K ) and S u ff (t, P) contain all combined inputs u(t, x, P, K ) and feedforward inputs u ff x(0), P (i) for t ∈ τ (i) , respectively.With these simplifications, ( 1) is relaxed to Even though ( 5) is now easier to solve than (1) due to the aforementioned simplifications, it still requires reachable sets to guarantee formal correctness.Since this is computationally expensive, we instead solve an approximation of ( 5) by finding approximations to all required sets, followed by overapproximative reachability analysis to ensure constraint satisfaction.We illustrate our solution concept with Example 1 visualized in Fig. 1, where we omit the feedback controller and limit ourselves to a single feedforward controller, i.e., m = 1, subject to input constraints for illustrative purposes.
Example 1: Let the zonotope overapproximation [55, Prop.5] of the final, closed-loop reachable set R x (t f , P) of a 1dimensional system be given by where ūff (β, P) = p 1 + p 2 β is the controller template (see Section IV-A for details), β ∈ [−1, 1] denotes the dependent factor of the initial set X (0) , )).The controller cost (see Section VI-B for details) is Fig. 1 shows the contour lines of J (P).
For the first iteration, we start at P = 0 and initially set the trust-region radius γ ∈ (0, 1], which constrains the available set of feedforward parameters (see Section IV-B for details), to γ = 1 4 , i.e., P γ ( P) = − 1 2 , 1 2 (see (10)).Fig. 1(a) shows the contour lines for our approximation J (P) of J (P) in the first iteration.Clearly, the approximation J (P) is not very accurate even though J ( P) < J ( P), and so we accept the step but shrink the trust-region radius γ for the next iteration.After another two accepted iterations (see Fig. 1(b) and (c)), we eventually arrive at a local minimum of J (P) in the fourth iteration (see Fig. 1(d)) and thus terminate the algorithm.
Since we can only accurately approximate the controller cost locally, we use the trust-region radius to restrict the domain of our approximation to a trust region, i.e., a bounded region of our control parameters for which we trust the approximation to be accurate.
The remainder of this article is structured as follows: We derive the set of feedforward inputs S u ff (t, P) and the approximation for the feedforward reachable set R x ff (t, P) in Section IV.Then, we compute approximations to the state deviation set S x (t, P, K ) and the input deviation set S u (t, P, K ) in Section V.In Section VI, we formulate an approximation to (5) using findings from Sections IV and V, show how we can control the accuracy of this approximation using trust regions, and derive the computational complexity of the proposed algorithm.Finally, we demonstrate the applicability of our novel approach using numerical benchmarks in Section VII.

IV. FEEDFORWARD REACHABLE SET
To compute an approximation of the feedforward reachable set, we first introduce the parameterization of the feedforward controller u ff x(0), P (i) in Section IV-A.We then discuss the feedforward reachable set computation using the parameterized controller in Section IV-B.

A. CONTROLLER TEMPLATE
As defined in Section III, the feedforward controller is statedependent, i.e., we compute a feedforward control law for each initial state x(0) ∈ X (0) = {c + Gβ} β with center c ∈ R n x , generator matrix G ∈ R n x ×l , and dependent factors β ∈ [−1, 1] l .For numerical purposes, it will be beneficial to parameterize the initial set in its dependent factors β.Let us define E = e (1) , . . ., e (a) ∈ N l×a as a matrix of exponents and P (i) = p (i,1) , ..., p (i,a) ∈ R n u ×a as the matrix of feedforward parameters for 0 ≤ i ≤ m − 1.In [52], the parameterization ūff β, is proposed, where Ū = c Ū + G Ū α α is the input set overapproximated by a parallelotope to achieve uniform scaling: By comparing (6) with ū(α) = c Ū + G Ū α, we can parameterize α instead of the input directly.Further, we have U ⊆ Ū ⊆ ūff β, P (i) β,P (i) , and therefore , which ensures uniform scaling of P (i) , i.e., the absolute magnitude of each element is less or equal to one.When U is a zonotope, the parallelotope enclosure can be achieved through order reduction [57].A halfspace representation of U , as assumed in this article, can be enclosed as follows: Proof: We shift M by c so that 0 ∈ M ⊕ {−c} and use the shape matrix Q to transform M into roughly a hypercube, i.e., Since computing the maximum-volume ellipsoid of U can be posed as a semi-definite programming (SDP) problem [58, Sec.8.4.2],Proposition 1 can be efficiently solved.The computed overapproximation Ū is only required for the proper scaling of P (i) ; the original input set U is used to verify input constraint satisfaction.
If X (0) is given as a non-degenerate parallelotope, i.e., G −1 exists, the controller template for the state x(0) = c + Gβ ∈ X (0) is given by If X (0) is a zonotope, the dependent factors β for a given x(0) can be obtained by solving x(0) = c + Gβ for β ∞ ≤ 1 using linear programming.The exact set of possible feedforward inputs is given by

B. REACHABLE SET COMPUTATION
To efficiently solve (5), we approximate the feedforward reachable set so that it is obtained by the evaluation of a polynomial map without the need to recompute the reachable set for every given P.
We can construct such a polynomial map using polynomial zonotopes [55] and an extended system state as follows: Given the extended state x ext = x T ff , u T ff T , which is necessary to retain the dependency of both the initial state as well as the input on β, the extended initial set is where x (0) (β ) = c + Gβ is the generating function of X (0) .Applying reachability analysis as described in [55] eventually yields for t ∈ [0, t f ] and P ∈ P = [−1, 1] mn u ×a , where D(t, P) is the part of Rx ff (t, P) that retained its dependence on P and c err , G err Z is the zonotope bounding abstraction and reduction errors.However, the size of the error c err , G err Z depends on the size of P as demonstrated next.
Example 2: Let us consider a controlled van-der-Pol oscillator with the undisturbed dynamics f (x, u) = , and E = I 2 .We compute Rx ff (t f , P) as described above.Fig. 2 visualizes the reachable sets Rx ff (t f , P)| P= P for P = 0.1 • 1 mn u ×a , computed for both P ∈ 0.1 • P and P ∈ P, and compares them to the overapproximative reachable set R x ff (t f , P), which needs to be recomputed for each P. Evidently, choosing P from a larger set affects the accuracy of the parameterized reachable set.For both set sizes, however, we notice that D(t f , P) approximates R x ff (t f , P) reasonably well, even if c err , G err Z is much larger for P ∈ P.
Thus, we avoid a large input set by computing Rx ff (t, P) only for P ∈ P γ ( P) where and the trust-region radius γ ∈ (0, 1] is reduced until the approximation is accurate enough.We include the factor 2 in (10) 9)).

V. DISTURBANCE SET COMPUTATION
To solve (5), we also require approximations to the state deviation set S x (t, P, K ) and input deviation set S u (t, P, K ), which we derive in Section V-A.In Section V-B, we then briefly describe a parameterization of the gain matrices K (i) for 0 ≤ i ≤ m − 1 using linear-quadratic regulator (LQR) control from [51, Sec.IV.B.] to reduce the number of optimization variables.

A. REACHABLE SET COMPUTATION
For now, we assume that all gain matrices K (t, z) = K (i) (z) for t ∈ τ (i) with 0 ≤ i ≤ m − 1 are parameterized in two matrices Q and R as well as the feedforward parameters P, collected in the vector z = P T (: , where Q and R as well as the parameterization itself will be introduced in Section V-B.
To compute an approximation of the disturbance tube, we find an approximate flow equation for the state deviation x by a first-order Taylor expansion of x around x = x ff , u = u ff , and w = 0, i.e. with where x(t, P) = 1 2 cx ff t, P + cx ff t, P and ū(t, mq and 0 ≤ k ≤ mq − 1, so that we obtain mq linear time-invariant (LTI) systems for q ∈ N + .Here, we use the centers of the zonotope overapproximations [55,Prop. 5] Zx ff (t, P) = cx ff (t, P), Gx ff (t, P) Z , (17) of S u ff (t, P) and Rx ff (t, P) from ( 8) and ( 11) as an approximation of their geometric centers since the starting point of a polynomial zonotope (see Definition 5) is not a good approximation of its geometric center due to its polynomial dependent factors.Furthermore, we require both zonotope overapproximations in Section VI-A since zonotopes make an efficient evaluation of their support function possible [59].
We remark here that we do not assume the same linearized dynamics for the complete time horizon [0, t f ] but rather compute a new linearization for each time interval t ∈ [k, k + 1]δ.
A similar approach which uses (12) to approximate the deviation vector x for robustification of an optimal control problem is described in [60].In general, approximation errors -caused by the first-order Taylor series and linearizationremain small due to the corrective actions of the controller so that the deviation from the feedforward solution stays small; that said, the linearized flow can become inaccurate for large disturbance sets W.However, since we are not dependent on an exact approximation, approximation errors do not affect the soundness of our approach.The reachable set Z x (t, z) of the approximated flow (12) with Z x (0, z) = {0} can be efficiently computed using zonotopes, as only Minkowski sums and linear maps are required, under which zonotopes are closed.Since the representation size of Z x (t, z) can become quite large (due to computing many Minkowski sums), we can avoid its explicit computation by replacing all set constraints in (5) with support functions evaluated in the normal directions given by the H-representations of the polytopic input and polytopic state constraints (see Section VI-A for details; also [61]).As we use overapproximative reachability analysis for each trust-region iteration (see Section VI-B and Algorithm 1), the extended reachable set at z = z is available (see Section VI-B for details) and hence the unknown support function ρ x (l, t, z) of the zonotope overapproximation Z x (t, z) ⊇ S x (t, z) in the direction l ∈ R n x can be efficiently computed at z = z.We first define a rough approximation of this unknown support function as ρ(E) x (l, t, z) = ρ x (l, t, z) x (l, t, z) x (l, t, z), where 1 is used to achieve differentiability and we scale ρ(E) x so that ρ(E) x (l, t, z) = ρ x (l, t, z).Further, E x (t, z) = 0, Q x (t, z) E is obtained by computing the reachable set of the linearized flow (12) using the ellipsoid

++
(see, e.g., [62]).For t + δ ≤ t f , we thus have [63] with Ā(t, z) = A(t, P) + B(t, P)K (t, z) and where E x (0, z) = {0}.We approximate the Minkowski sum of two ellipsoids required in (18) as the sum of their shape matrices.The approximation to the unknown support function ρ x (l, t, z) can then be defined as so that we only approximate the unknown difference ρ x (l, t, z) − ρ x (l, t, z) by ρ(E) x (l, t, z) − ρ(E) x (l, t, z) since ρ x (l, t, z) = ρ x (l, t, z) holds for z = z, which follows by substitution of z into (19).Furthermore, we have since u(t, x, z) = u ff x(0), P (i) + K (t, z) x(t, z) for t ∈ τ (i)  with 0 ≤ i ≤ m − 1, from which ρ u (l, t, z) can be derived analogously to ρ x (l, t, z).Next, we introduce the feedback matrix parameterization.

B. FEEDBACK MATRIX PARAMETERIZATION
Directly optimizing over all gain matrices K (i) for 0 ≤ i ≤ m − 1 as in ( 5) requires mn u n x optimization variables since K (i) ∈ R n u ×n x .Therefore, we introduce a parameterization of the feedback matrices using LQR control [51, Sec.IV.B] to reduce the number of optimization variables.
With m feedback matrices to be computed, it makes sense to only consider those K (i) that asymptotically stabilize the m LTI systems with system matrices A(t, P) + B(t, P)K (i) and input matrices V (t, P) as defined in ( 13) to (15) but which are now assumed to be constant in time for t ∈ m .For any such system and matrices Q ∈ S n x ×n x ++ and R ∈ S n u ×n u ++ , LQR control only generates feedback matrices that result in an asymptotically stable system by design, assuming that all m LTI systems are controllable for all P.This motivates the parameterization K (t, z) = K (i) (z) for t ∈ τ (i)  with z = P T (:) , Q T (:) , R T (:) T , so that optimization over K in (5) can be replaced by optimization over z.We refer the reader to the Appendix for a proof of the differentiability of K with respect to z.If controllability cannot be assumed, one can always directly optimize over K. Next, we introduce the novel trust-region synthesis approach.

VI. COMBINED SYNTHESIS
We now propose our novel trust-region approach for the combined synthesis problem.In Section VI-A, we construct a continuously differentiable optimization problem as an approximation to (5) using findings from Section IV and Section V, and then describe the proposed iterative algorithm in Section VI-B.In Section VI-C, we discuss the approximation accuracy of the trust-region subproblem, followed by a short description of the computational complexity of the algorithm in Section VI-D.

A. TRUST-REGION SUBPROBLEM
Given the current control parameters P, Q, and R -where ρ u ff is the support function of Z u ff from ( 16) and ρ x , ρ u are the approximated support functions as described in Section V-A -the solution of ẑ, ŝ = arg min z,s JTR (z, s), (20a) as well as and critical point (ẑ, ŝ) is an approximation of ( 5) as explained subsequently: r (20a) and ( 21): We replace R x ff (t, P) with Zx ff (t, P) = cx ff (t, P), Gx ff (t, P) Z from (17) and construct ( 21) as follows: We penalize deviations from x f (first term) and the size of its generators (second term).Further, we penalize the size of the disturbance tube by penalizing its support function values in the unit directions (third term) and introduce σ ∈ R + , which is chosen large enough (see, e.g., [64] for a discussion on the choice of σ ) such that achieving feasibility is always prioritized over minimizing the objective value for s = 0 (fourth term).65,Prop. 2.3].Further, we relax the resulting input constraint gU (z) ≤ 0 by adding s U ≥ 0 to the right-hand side, which ensures that a feasible solution always exists, even if there is no feasible solution for the original constraint with s U = 0. Finally, the maximization of the support functions over t in gU (z) can be approximated by evaluating (22) over smaller time intervals and forming a discrete maximum over all these smaller time interval solutions: The input feedforward support function ρ u ff for each time interval τ (i) with 0 ≤ i ≤ m − 1 can be directly computed (see ( 16) and ( 8)).For ρ u , we can replace the approximated support function over a short time interval with the approximated support function at a finite number of time points for a large enough q. r (20c): Analogous to input constraints (see (20b)).r (20d): We enforce P ∈ P γ ( P) ∩ P using the trust-region radius γ ∈ (0, 1] as defined in Section IV-B to construct a trust region for the feedforward parameters, ensuring that Rx ff (t, P) is accurate and P ∈ P = [−1, 1] mn u ×a .r (20e) and (20f): Let Q ⊂ S n x ×n x ++ and R ⊂ S n u ×n u ++ be two bounded sets.We define two generating functions h Q (M ) ∈ S n x ×n x , where h Q (0) = 0 with M being a dependent factor matrix, and h R (N ) ∈ S n u ×n u , where h R (0) = 0 with N being a dependent factor matrix.Similarly to the trust-region radius γ , which limits the domain of the feedforward parameters P, we introduce the trust-region radius η ∈ (0, 1] to confine Q ∈ Q and R ∈ R to smaller sets around Q and R. Naturally, Q and h Q as well as R and h R need to be chosen such that (20e) and (20f) can be reformulated as smooth constraints in Q and R only.
To avoid the semi-definite constraints (20e) and (20f), Q and R can be chosen as diagonal matrices with positive entries as done in this article or as strictly diagonally dominant matrices with positive diagonal entries as a sufficient (but not necessary) condition for positive semi-definiteness (follows from the Gershgorin Circle Theorem [66, Th. 0]).Note that the evaluation of (20b) and (20c) requires the implicit maximization over smaller time intervals -as discussed above -to approximate the continuous maximization over t ∈ [0, t f ] in (22) and (23).We choose not to reformulate these maximization expressions and absolute values in (20a) with additional auxiliary variables for easier readability.With these reformulations, ( 20) is a non-convex, continuously differentiable optimization problem.

B. TRUST-REGION ALGORITHM
We now propose our novel trust-region algorithm: In each iteration, we solve the subproblem in (20), evaluate the newly computed controller on a subsequently defined cost function using overapproximative reachability analysis, and accept the step only if the cost decreases.At the end of each iteration, we tune γ and η so that future solutions to (20) also decrease the cost.We describe the solution procedure in Algorithm 1, where each step is explained in detail subsequently.a) Initialization (l.1-2): We use the approach from [50], [51] to generate an initial guess P, initially set Q = I n x and R = I n u (or any other user-defined initial guess) and start with the complete range of P, Q, and R by setting γ = η = 1 (or any other user-defined values).b) Initial evaluation of solution (l.3): At a critical point (ẑ, ŝ) of ( 20), it can be shown (see proof of Theorem 1) that ŝ 1 = max(0, g(ẑ)) 1 , where g(z) = gU (z) T , gX (z) T T collects ( 22) and (23) in a vector.Therefore, let J (ẑ) = JTR (ẑ, max(0, g(ẑ))).
However, since J (ẑ) is only an approximated cost, we need to evaluate it using overapproximative reachability analysis to measure the actual cost of the controller.To preserve dependencies between the combined state, feedforward state, and feedforward input for a given initial state (similar to the end if 17: end for 18: return z, Rext reachability computation in Section IV-B), we extend the system state and compute the extended reachable set R ext (t, z) based on the extended flow equation where its zonotope overapproximation is given by With (24), we can define the actual control cost J (z), which is given analogously to J (z) but where sets (and corresponding support functions) are replaced by their formally correct versions contained in (24).c) Feedforward reachable set computation (l.5): If the maximum number of iterations k max ∈ N + is not yet reached, we compute the parameterized, undisturbed feedforward reachable set for P ∈ P γ ( P), as described in Section IV, at the start of each iteration.

d) Computation & evaluation of new candidate controller (l. 6-7):
As indicated in Sections IV and V, R ext (t, z) and its zonotope overapproximation Z ext (t, z) are required in (20) to construct the feedforward approximation in (11) and the support function approximation in (19), respectively.With ẑ being a solution to (20), we then compute its cost J (ẑ), which is used subsequently to determine if the current step is accepted.If the extended, overapproximative reachable set for the computation of J (ẑ) cannot be computed (see, e.g., Fig. 4), we shrink both γ and η and restart the iteration (not shown in Algorithm 1).
e) Parameter tuning (l.9): Theorem 1 discusses the tuning of the trust-region radii γ and η such that at a critical point (ẑ, ŝ) of ( 20), the objective value JTR (ẑ, ŝ) approaches J (ẑ). f) Step acceptance/rejection (l.10-16): If J (ẑ) < J (z) for the newly computed control parameters ẑ, the step is accepted.Furthermore, we terminate the algorithm if either the absolute or relative difference in the controller cost between two accepted steps is small enough, i.e.

C. ACCURACY OF TRUST-REGION APPROXIMATION
Ideally, we want to minimize the controller cost J (z).However, to avoid recomputation of the overapproximative reachable set as much as possible, we instead solve the approximated problem in (20).Thus, it is crucial that the approximated objective value JTR (ẑ, ŝ) at a critical point (ẑ, ŝ) of the trust-region problem (20) can approximate J (ẑ) "well enough".In this section, we show that the trust-region radii γ and η can indeed be tuned such that JTR (ẑ, ŝ) approximates J (ẑ) arbitrarily closely; otherwise, the approximated trust-region problem (20) potentially does not model the real cost of the controller and thus optimizing over it becomes meaningless.
For the derivation of this result, we define J x (z), which is given analogously to J (z) but where we replace the approximated feedforward reachable set Rx ff (t, P) and its approximated support function ρx ff (l, t, P) with their overapproximative versions from (24).Intuitively, we remove any error caused by approximating the feedforward reachable set and thus J x (z) is only inaccurate due to the approximated support functions ρ x and ρ u .We are now ready to state the main result of this section.

D. COMPUTATIONAL COMPLEXITY
Since the maximum number of iterations in Algorithm 1 is fixed, we only consider the computational complexity in the state dimension for one iteration of Algorithm 1 in this section.As Section VII (see Tables 1 and 2) indicates, the proposed algorithm typically terminates within a small number of iterations; thus, limiting the number of iterations to a reasonably small number often does not impede performance in practice.In this section, we show that the complexity of one iteration in Algorithm 1 is at most O 5 x + υ( )m n 6 + n 4 n 2 z + qlξ n ω n 2 z where n = max(n x , n u ) and O(n ω ) with ω ≥ 2 is the complexity of multiplying two n × n matrices.a) Reachable set computations: The computation of reachable sets (l. 5 for the approximated feedforward reachable set; l. 3 and 7 for the extended overapproximative reachable set) has complexity O n 5 x [67,Sec. 4.1.4].b) Trust-region subproblem: The total number of function evaluations υ( ) with υ : R + → N + required to solve ( 20) is polynomially dependent on the inverse of the requested solution accuracy > 0 if second-order methods are used [68].Thus, we continue to derive the computational complexity for one objective and constraint function evaluation of (20), which is dominated by the computation of the Hessian matrix for each element of the approximated disturbance tube shape matrix Q x (t, z) using (18).For its evaluation, we compute the Hessian matrix of each element of K (i) for 0 ≤ i ≤ m − 1 in terms of z ∈ R n z , where the complexity of one evaluation is O n 6 + n 4 n 2 z ; we omit the proof to keep the presentation compact.Further, we need to evaluate (18) mq times, where one evaluation is dominated by the computation of the Hessian matrix of each element of the integral, within which the Hessian matrix computation for each matrix element of e Ā(t,z)δ dominates with complexity O ξ n ω n 2 z .Here, ξ ∈ N + denotes the finite number of terms from the infinite series of e Ā(t,z)δ to compute the matrix exponential accurately enough and O n ω n 2 z is the complexity for each of the ξ terms, which follows from the complexity of multiplying two matrices but where each element multiplication is the outer product of two n z -dimensional vectors.We compute the integral by solving the corresponding ordinary differential equation (ODE) with complexity O lξ n ω n 2 z , where l ∈ N + collects the fixed number of function evaluations per ODE step (e.g., four evaluations for the classical Runge-Kutta method) and the total number of ODE steps required, which can be assumed to be fixed if a solver with a fixed step size is used.Thus, evaluating (18) mq times over υ( ) optimization iterations yields the complexity O υ( )m n 6 + n 4 n 2 z + qlξ n ω n 2 z .

VII. EXPERIMENTS
We demonstrate the applicability of the novel iterative polynomial reachset optimal control (iPROC) approach by comparing its performance against the reachset optimal control (ROC) algorithm from [51] (both implemented in MAT-LAB) and the robustly complete control synthesis (ROCS) C++ toolbox 1 [69], [70].We first present a water tank benchmark [71] in more detail since it is easily extendable to an arbitrary number of states to compare scalability.In Section VII-B, we then shortly introduce other benchmarks from previous work [51] and from the applied verification for continuous and hybrid systems (ARCH) 2 competition.We run all experiments on an Intel(R) Core(TM) i7-8650U processor with 24 GB of RAM.We note that ROC uses parallel computing to numerically approximate gradients and Hessian matrices while iPROC computes them analytically and thus does not use parallel computing.We use the interior point optimizer (IPOPT) [72], specifically the MATLAB interface included in the OPTI toolbox 3 , for iPROC and MATLAB's fmincon for ROC.Throughout this section, we assume σ = 1000, use a linear feedforward controller with E = [0, I n x ], and set μ = 1 × 10 −2 .Overapproximative reachable sets are computed using the continuous reachability analyzer (CORA) toolbox 4 [73].CORA does not consider floating point errors for the reachable set computation.To account for these, one could integrate interval arithmetic as done in [74].Our proposed algorithm will be implemented in the next release of the automated reachset optimal control (AROC) toolbox 5 [75].

A. WATER TANKS
A system of n water tanks is given by (see [71] for details) with 2 ≤ i ≤ n and where x k = h k m ∈ R ≥0 for 1 ≤ k ≤ n, h k is the water level of the k-th tank, u = l m 3 /s ∈ R ≥0 and l is the inflow into the first tank, w = ν m 3 /s ∈ R and ν is the uncontrollable inflow into the first tank, and k = 0.015 and g = 9.81.
We set m = 2, n = 2, 120 s, and the final state constraint is X f = X (0) .Fig. 3 compares the performance of iPROC against ROC and ROCS, where all approaches find feasible solutions for the initial set (also see Table 2).Since the ROCS approach does not directly realize reachable set minimization at a given final time to the best of our knowledge, we manually tried to find the smallest goal region for which a controller for all initial states, here G ROCS = [10,10] for up to 8 tanks.We omit final state constraints to avoid infeasibility for higher dimensions so that only input constraints are active.Our novel approach scales better with an increasing number of dimensions compared to both ROC and ROCS while mostly producing smaller final reachable sets.Further, Table 1 also displays the number of trust-region iterations of Algorithm 1 of the novel iPROC approach for the tank example, which indicates that the number of iterations does not really grow with the number of state variables.Additionally, Fig. 4 shows the solve progress of iPROC, which demonstrates the accurate approximation of J (z) by J (z) as derived in Theorem 1.

B. BENCHMARKS FROM PREVIOUS WORK AND THE ARCH COMPETITION
In this section, we first look at the car benchmark from [51], where a left turn maneuver using a kinematic model is to be performed under input and final state constraints.We also  synthesize controllers for a spacecraft approach from [76,Sec. 3.6], where a spacecraft attempts docking with another spacecraft under input, state, and final state constraints.To keep the presentation compact, we only show computation times and the sizes of the final, closed-loop reachable sets in Table 2 for each benchmark.We were not able to find a controller steering all states from the initial set into a reasonably small target state for any of the following benchmarks using ROCS in less than 3 h and thus ROCS is omitted.a) Kinematic car: We compute a left turn for the kinematic car benchmark from [51], which includes input and final state constraints.While both iPROC and ROC were able to find a feasible solution, iPROC was able to find a smaller final reachable set without manual tuning (see Table 2).b) Space rendezvous: In this benchmark, a controller for the rendezvous attempt of two spacecraft as described in [76,Sec. 3.6] is synthesized, where the first and last two states describe the position and velocity of the controlled spacecraft in a common orbital plane, respectively.This benchmark includes input, state, and final state constraints, where the state constraints consist of the approaching spacecraft being required to stay within a cone defining the line of sight and respecting velocity constraints.We choose m = 5, t f = 200 s, an initial set X (0) = x (0) , diag([2, 2, 0.1, 0.1]) Z with x (0) = [−95, −30, 0, 0] T , target state x f = [−1, 0, 0, 0] T , X f = X (0) ⊕ {−x (0) + x f }, and W = 0.05 U , where U and X are given as in [76,Sec. 3.6], but we replace the velocity constraints in X with their parallelotope underapproximation for computational reasons.While both iPROC and ROC found feasible solutions, iPROC again achieves a smaller final reachable set without the need for manual tuning (see Table 2).

VIII. CONCLUSION
We introduce a novel, formally verified, polynomial control synthesis approach for disturbed nonlinear systems that simultaneously synthesizes a piecewise constant feedforward controller and a continuous-time state feedback controller.
In contrast to existing work, we avoid the introduction of algorithm parameters which require expert knowledge to tune.We achieve this for the first time by combining the synthesis for the feedforward and feedback controller into a single optimization problem and using a trust-region approach to iteratively ensure the accuracy of this optimization problem.Additionally, we show that this optimization problem can approximate the formally correct controller cost arbitrarily closely, and furthermore prove the polynomial complexity of our novel synthesis approach in the state dimension for each trust-region iteration.Numerical examples indicate that our novel approach achieves similar performance -or even outperforms previous work -while not requiring manual tuning of algorithm parameters, making it more easily applicable by non-experts.

APPENDIX
For matrices A ∈ R o 1 ×o 2 and B ∈ R c 1 ×c 2 , the Kronecker product is denoted by A B ∈ R o 1 c 1 ×o 2 c 2 .Further, the Kronecker sum for two square matrices C ∈ R c×c and D ∈ R d×d is defined as C D = C I d + I c D.
For a controllable LTI system with system matrix A(P) ∈ R n x ×n x and input matrix B(P) ∈ R n x ×n u , both parameterized in the feedforward parameters P, the optimal gain matrix is K (z) = −R −1 B(P) T X (z), (35) where X (z) ∈ S n x ×n x ++ is the positive definite solution to the Riccati equation F (z) = A(P) T X (z) + X (z)A(P) − X (z)B(P)R −1 B(P) T X (z) + Q = 0, (36) and where z = P T (:) , Q T (:) , R T (:)

T
. We are now ready to state the main result of the appendix.
Theorem 2 (Continuous differentiability): Let an n xdimensional, controllable LTI system with system matrix A(P) and input matrix B(P), both k-times differentiable in P ∈ [−1, 1] mn u ×a for each matrix element, be given.Further, denote with K and X the corresponding solution to (35) and (36) using Q ∈ S n x ×n x ++ and R ∈ S n u ×n u ++ , respectively.The gain matrix K is then k-times differentiable with respect to z = P T (:) , Q T (:) , R T (:) T .Proof: For readability, we omit the arguments of functions where convenient.
We first prove that X (:) (z) exists, is unique, and is k-times differentiable with respect to z, which follows from the implicit function theorem if The k-times differentiability of K then follows directly from its definition in (35) since X is k-times differentiable and R ∈ S n u ×n u ++ .

FIGURE 1 .
FIGURE 1. Visualization of our approach for Example 1 over four iterations.Shown are input constraints (outer black diamond), the contour lines of the controller cost J(P) (solid) and the approximated cost J(P) (dashed) for the current trust region (dashed black box), the current initial guess of the control parameters P (red circle), and the optimizer P of the approximated optimization problem (red x).

1 2 ,
b − Ac H , and where c, Q E with center c ∈ R n and shape matrix Q ∈ S n×n ++ is the maximum-volume ellipsoid inscribed into M [58, Sec.8.4.2].
due to the definition of the support function, where c and r are given as above.Applying the inverse transform and shifting the result by c yields Z = Q 1 2 Z ⊕ {c} which concludes the proof.

FIGURE 4 .
FIGURE 4. Visualization of the solver progress and the approximation quality of the trust-region subproblem in (20) for the tank benchmark.Values between z(i−1) ) and J( ẑ(i) ), 1 ≤ i ≤ 10 -where z(i−1) and ẑ(i) are the initial and critical points of the subproblem in (20) -are linearly interpolated ( ẑ(0) = z(0) ).For steps i = 2 and i = 7, there were negative water values and thus no controller cost could be computed.The opaque value at i = 9 denotes a rejected iteration.
dF (:) (z) dX (:)(z) is invertible for all z.The first differential of F from(36) with respect to X isdF = dX (A + BK ) + (A + BK ) T dX, which, after vectorization, yields dF (:) dX (:) = A T cl A T cl [77, Th. 18.1], where A cl = A + BK.Since A cl only has eigenvalues with negative real part by design, all n 2 x eigenvalues of A T cl A T cl also have negative real parts [78, Th. 13.16] and thus (A T cl A T cl ) −1 exists for all P.
[55,use we compute overapproximative reachability analysis after each trust-region iteration to formally verify constraint satisfaction, we have R x ff (t, P) at the previously computed feedforward control parameters P available (see Section VI-B for details).Therefore, we approximate the effect of the deviation from P by defining the exact sum ⊕ e of polynomial zonotopes[55, Prop.10]preserving dependencies and only considering deviations from R x ff (t, P), where we approximate the relative change using Rx ff (t, P), i.e., we define an approximation to the parameterized reachable set by Rx (11)hat ∀ P ∈ P : P ⊆ P 1 ( P). ff (t, P) = R x ff t, P ⊕ e Rx ff (t, P) ⊕ e − Rx ff t, P = R x ff t, P ⊕ e D(t, P) ⊕ e −D t, P ,(11)for P ∈ P γ ( P) and where the last equality follows from Rx ff (t, P) ⊕ e − Rx ff t, P = D (t, P) ⊕ e −D t, P (see (

FIGURE 3. Comparison of our novel iPROC approach with ROC and ROCS for the tank benchmark. The best achievable goal region for ROCS is denoted by G ROCS .TABLE 1 . Scalability comparison of iPROC, ROC, and ROCS for the tank example using 2 to 8 tanks with m = 2 and no final state constraints.
T, diag([0.03,0.03])Z,exists.Both ROC and ROCS fail to find a feasible solution if algorithm parameters are set incorrectly.Furthermore, Table1displays the computation times and sizes of the final reachable set R x (t f ) = c(t f ), G(t f ) Z (where

TABLE 2 .
Collection of benchmarks.