Convexifying State-Constrained Optimal Control Problem

This article presents a method that convexifies state-constrained optimal control problems in the control-input space. The proposed method enables convex programming methods to find the globally optimal solution even if costs and control constraints are nonconvex in control and convex in state, dynamics is nonaffine in control and convex in state, and state constraints are convex in state. Under the above conditions, generic methods do not guarantee to find optimal solutions, but the proposed method does. The proposed approach is demonstrated in a 16-D navigation example.

in the state dimension; thus, it is typically used for low-dimensional systems. Koopman-based methods [5] lift nonlinear systems to higher dimensional linear systems, but the state-of-the-art Koopman-based methods still have not considered general state constraints. Branch-and-Lift algorithms [6] also provide an -close approximation to the optimal solution, but they require the exact computation of reachable sets, for which one may need to employ computationally intensive methods based on HJ analysis.
This article aims to relax the optimality conditions in a way similar to the direct methods and PMP, while still guaranteeing to find optimal solutions using numerical solvers. We draw our motivation from convexification of the Hopf-Lax theory [7], which provides a unique viscosity solution to HJ PDEs. Notably, the Hopf-Lax formula [7] for a particular class of HJ PDEs is a convex optimization problem if the terminal function is convex in the state. We observe that some OCPs have nonconvex costs in the control-input space, but the Hopf-Lax formulae [7] for HJ PDEs relevant to those OCPs are always convex in the control-input space. This implies that the direct methods are guaranteed to find the solution for the Hopf-Lax formula, while not necessarily for the OCP. This article aims to extend the convexification of the Hopf-Lax formula to general state-constrained OCPs, which is a novel development given that Hopf-Lax theories have not yet dealt with HJ PDEs relevant to state-constrained problems.
In prior related works [8], [9], convexification methods have been developed under different assumptions. For example, the control-set convexification method [8] convexifies nonconvex control sets and assumes convex cost in the state and the control, linear dynamics, and convex state constraints. On the other hand, the successive convexification method [9] provides a locally optimal solution for nonlinear dynamics, convex cost, and convex state constraints. The work in [8] is listed in Table I as a method for global optimality. Fig. 1 illustrates our framework to convexify the OCP in the controlinput space. Given the OCP, we propose a convexified OCP (COCP) without approximation; thus, the optimal costs of the OCP and the COCP are the same and represented as ϑ. The COCP applies the control signal transformation technique in [7], which converts the control signal variable α to a new variable β. Notably, the COCP is always convex in the new control signal space. Thus, if the OCP's functions and constraints are convex only in the state but not necessarily in the control input, direct methods are guaranteed to find the COCP's optimal cost (ϑ), state trajectory (x * ), and control signal (β * ). In particular, we will prove that the COCP's optimal state trajectory is also optimal for the OCP. Additionally, we propose algorithms to find an optimal control signal α * for the OCP from the COCP solution (x * , β * ).
The contributions of this article are as follows. 1) We convexify general state-constrained OCPs in control. Our methods provide more general optimality conditions than any other methods that are scalable for high-dimensional systems, as in Table I. 2) We also propose a numerical algorithm that is guaranteed to find the OCP's optimal solution. This is not guaranteed by numerical solvers that implement PMP. The rest of this article is organized as follows. Section II presents the problem description. Section III proposes the COCP and numerical algorithms to compute an optimal solution to the OCP. Next, Section IV presents convexity conditions for the COCP. Section V provides a numerical example, and finally, Section VI concludes this article.  1. Goal is to find the OCP's optimal value ϑ, optimal state trajectory x * , and optimal control signal α * . Our COCP formulation converts the control signal space α to β and then convexifies the OCP in the control-input space without approximation. If the OCP's functions and constraints are convex only in the state space, the COCP's functions and constraints become convex in both state and control input spaces. Thus, direct methods provide the COCP's optimal solution ϑ, x * , and optimal control signal in the converted space β * . Finally, our algorithm generates the optimal control signal α * from β * .
In terms of notation in this article, the subscript * denotes optimality, and the superscript * denotes the Legendre-Fenchel transformation. For example, α * is an optimal control signal, and H * is the Legendre-Fenchel transformation of a function H. The double superscript * * denotes the biconjugate, double Legendre-Fenchel transformation:

II. STATE-CONSTRAINED OCP
Consider a state trajectory x : [0, T ] → R n solving the ODĖ where f : [0, T ] × R n × A → R n is the system dynamics, x 0 is the initial state, α ∈ A is the measurable control signal where A is the set of admissible control signals: and A is a compact subset in R m . We would like to solve a finitehorizon, state-constrained OCP for the above dynamical system with respect to a cost ϑ, which we describe next. State-constrained OCP: Given the dynamical system (1)-(2), solve where subject to c(t, where x solves (1). Here, L : where I("condition") = 0, "condition" is satisfied ∞, otherwise.
In this extended value setting, the limit defined by ϑ in (3) exists. This is because ϑ 1 ≤ ϑ 2 for any 1 > 2 > 0. Thus, as converges to zero, the limit of ϑ exists in R ∪ {∞}. Also note that ϑ and ϑ 0 are different because the set of state trajectories solving (1) is not compact. Consider a case in which max t∈[0,T ] c(t, x(t)) > 0 for all α ∈ A, but there exists a sequence of α k ∈ A such that lim k→∞ max t∈[0,T ] c(t, x k (t)) = 0, where x k solves (1) for α k . In this case, ϑ 0 is ∞, but ϑ is finite. On the other hand, if A is convex, the set of state trajectories solving (1) becomes compact [4]. In that case, ϑ 0 = ϑ.
In this article, we assume the following. Assumption 1 (Continuity and compactness): x) for each a ∈ A, and uniformly continuous in a for each x, a) and, for each a, is locally Lipschitz in (t, x); A4. the terminal cost g = g(x) : R n → R is continuous in x and locally Lipscthiz in x; A5. the state constraint c = c(t, x) : [0, T ] × R n → R is continuous in (t, x) and, for each t, is locally Lipschitz in x. Assumptions A1 and A2 guarantee a unique state trajectory for the dynamics (1). Since we consider a bounded time interval, all state trajectories are also bounded. Thus, Assumptions A3 and A4 are enough to ensure that the integration of the cost in ϑ is bounded.

III. COCPS
Section III-A presents the theorem for the COCP, and Section III-B builds up mathematical background to prove the theorem, which is done in Section III-C.

A. COCP
Consider a multivalued map: for (t, x) ∈ [0, T ] × R n , where f and A are the dynamics and the control constraint of the system, respectively. B(t, x) is compact for all (t, x) since A is compact and f is uniformly continuous in a as in Assumption 1. Consider a state trajectory solvinġ where "co" represents a convex-hull operation. We assume that β is a measurable control signal. With respect to this differential inclusion, we propose our convexifying method in the following theorem, which will be later proved in Section III-C. Theorem 1 (COCP): Suppose Assumption 1 holds. Definē where x and β satisfy (9), H : and H * is the Legendre-Fenchel transformation (convex conjugate) of H with respect to the costate p: Then, ϑ in (3) andθ are the same ϑ =θ.
Both H in (12) and H * in (13) adopt the extended-value settings. However, for each (t, x), H is finite for all p since A is compact as in Assumption A1; on the other hand, H * is ∞ for some b ∈ R n . The Legendre-Fenchel transformation always provides a closed proper convex function; thus, the new stage cost H * is convex in the new control b.

B. Mathematical Formulation and Optimal Control Analysis
In order to establish the relationship between the OCP and the COCP proposed in Theorem 1, we will first build the relationship between two control inputs a in the OCP and b in the COCP.
For b ∈ R n , define Lemma 1: Suppose Assumption 1 holds. Then, L b and H defined in (15) and (12), respectively, have the following properties: Here, (L b ) * and H * are the Legendre-Fenchel transformations (convex conjugate) of L b and H, respectively, with respect to p for each (t, x).
which is a convex-hull of 1 . The curvature 3 represents L b (t, x, ·), and the curvatures 3 and 4 represent H * (t, x, ·). In the illustrated ex- Proof: (i) Proof of (16) and (17) Under the measurable assumptions on α and β, a state-unconstrained problem where B is defined in (8), and x(t) = x. For a rigorous proof of this equivalence in the sense of differential inclusion, the reader is referred to [11,Ch. 4.1, Fillipov's Theorem]. Then, the corresponding HJ PDEs [12] for V 1 and V 2 are the same; thus, the Hamiltonians in the HJ PDEs are the same: x)) For two closed convex sets {b} and co(B(t, x)), the separating hyperplane theorem [10, Ch. 2.5] implies that there exists a hyperplane (P : , x)), there are a finite number (I) of weights γ i and Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. where In general, I is less than or equal to n Using the above analysis for two pointwise control inputs a, b, we now find the relationship between the OCP's and the COCP's control signals (α and β, respectively). Consider a measurable control signal (β) and state (x) trajectories solving the dynamics (9). Assume that β is Riemann integrable in (0, T ). For some δ > 0, consider a temporal discretization: Note that the step size does not need to be fixed in this temporal discretization. We define a control signal α ∈ A: where a k i and γ k i are, respectively, the ith control and coefficient in (21) Then, we will prove that x solving (1) for α is close to x, and the integration of the stage and terminal costs of the OCP induced by α is also close to the counterpart of the COCP induced by β.
Theorem 2: Suppose Assumption 1 holds. Consider any measurable control signal β and state trajectory x solving (9). Assume that β is Riemann integrable in [0, T ]. Then, for any > 0, there exists δ > 0 such that, for any discretization where x solves (1) for α in (22). Proof: We will first prove that x and x are close within an -bound in the L ∞ -norm. We will define two more state trajectories x 0 and x 1 , and then show x is close to x 0 , x 0 is close to x 1 , and x 1 is close to x .
Here, x 0 solves (9) for a piecewise constant control generated from β on {t 0 , . . ., t K }: By the control decomposition in (21), we have I k number of (b (22), where t k,0 = t k and t k,ī k = t k+1 , and design a new piecewise constant control signal β 1 We define x 1 as the solution of (9) corresponding to this input β 1 .
Note that the set of states that can be reached is bounded since Assumption A2 holds. Thus, is less than 2c 1 , we conclude the right-hand side of (29) is less than or equal to 2c 1 δ. By where c 1 is defined in (31). Hence, we get The second inequality follows from Jensen's inequality for log. Thus, (34) and (35) imply For t ∈ (t k,i , t k,i+1 ), we can similarly show To sum up (i)-(iii), we conclude that, for any small > 0, we can find δ such that x − x L ∞ (0,T ) < .
(iv) Prove (24). Define a discrete sum of H * : Since H * is proper and β is Riemann integrable, H * (t, x(t), β(t)) is also Riemann integrable. Thus, for any 1 , there exists δ such that We can show S is close to T 0 L(t, x (t), α (t))dt + g(x (T )) by utilizing similar techniques used to prove x − x L ∞ (0,T ) < . In the proof, it is also necessary to use the fact that L is locally Lipschitz in (t, x) ∈ [0, T ] × cl(X ) and g is locally Lipschitz in x ∈ cl(X ), where X is defined in (30).
Consider the case where the COCP is feasible and has optimal control signal β * and state trajectory x * . Then, Theorem 2 provides the corresponding (x * , α * ) whose OCP cost without state constraints is close to the COCP's optimal cost, and x * is close to the COCP's optimal state trajectory within -distance in L ∞ -norm. As the time-step variable δ converges to 0, we can force to converge to 0. This implies that x * converges to the COCP's optimal state trajectory x * . Thus, the OCP's state constraint would be satisfied at the limit of at 0: lim →0 max t∈[0,T ] c(t, x * (t)) ≤ 0 although the limit point of α * might not exist. However, we need more careful analysis as to whether the OCP's optimal value is finite. On the other hand, where the COCP is infeasible, Theorem 2 does not help us analyze if the OCP's optimal value is finite or not. In the following section, we rule out these mathematical issues and complete the proof of Theorem 1, which states that the OCP and the COCP are equivalent.

C. Proof for the COCP
This section proves Theorem 1 using mathematical background from Section III-B.
Proof for Theorem 1: We will prove that the two epigraphs of ϑ and ϑ are the same. We first define a function J = J(z, α) where x solves (1) for α, and z represents the value-axis variable. The authors in [4] proved that the zero sublevel set of inf α J(z, α) is the epigraph of ϑ 0 if A is convex. In contrast, we do not assume that A is convex. However, the proof in [4, Th. 3.1] can be easily modified to show the zero sublevel set of J(z, α) is the epigraph of ϑ: Similarly, we define a functionJ =J(z, β) where x solves (9) for β, so that the zero sublevel set of inf βJ (z, β) represents the epigraph ofθ: where β(t) ∈ co(B(t, x(t))) for all t ∈ [0, T ].
Remark 1: 1) The optimal costs for the OCP and the COCP are the same. Also, the COCP is feasible if and only if the OCP for ϑ is feasible for all > 0. 2) For a feasible COCP whose optimal control signal is β * and state trajectory is x * , we can construct α * as in (22) with a temporal discretization {t 0 = 0, . . ., t K = T }. Then, Theorems 1 and 2 imply uniform convergence of the following three terms as converges to 0: where x * solves (1) for α * , and c(t, x * (t)) ≤ 0 for all t ∈ [0, T ].

D. Numerical Algorithm
Algorithm 1 presents a numerical algorithm to compute an optimal state trajectory (x) and a control signal (α) for the OCP using the COCP in Theorem 1.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  (10) subject to (11), and get x * , β * by multiple shooting or collocation methods 3: Solve the following problem whose optimal cost is 0: wherex We first numerically compute an optimal state trajectory (x * ) and a control signal (β * ) for the COCP in Theorem 1. The choice of method to solveθ(= ϑ) on line 2 of Algorithm 1 is open to the user, for example, direct methods with various optimizers [14] could be used. The rest of the algorithm is designed to find the numerical-optimal control signal α * and state trajectory x * . On line 3 of Algorithm 1, we find a pair of (a k i , b k i , γ k i ) satisfying (21) for t = t k , x = x * (t k ), and b = β * (t k ). As explained in Section III-B and Fig. 2, these pairs can be analytically found using the biconjugate's geometrical property: At last, on lines 4 and 5, we design α * and compute x * by solving (1). Note that the additional temporal discretization on line 4 could result in a frequent control-switching behavior unless β * (t) ∈ B(t, x * (t)) for all t.
To resolve this chattering issue, we propose another algorithm if the stage cost does not depend on the control input. As in Remark 1, the OCP's state trajectory solving (1) for the designed control signal α * (22) uniformly converges to the COCP's optimal state trajectory. Algorithm 2 utilizes this fact to avoid the control-switching behavior by removing the additional temporal discretization. Instead, Algorithm 2 has one additional optimal-control solving step as in line 3. For the OCP (46), direct methods result in nonconvex problems, but we know the optimal cost is always 0. Thus, if iterative algorithms to solve (46) converge to a local minimum with nonzero cost, then we can reinitialize a solution candidate and iterate again until the cost is close to 0. This is a big difference compared to direct methods, since it is hard to get any sense of how close locally optimal solutions by direct methods are to the optimal solution.

IV. CONVEXITY ANALYSIS FOR THE COCP
This section investigates conditions under which the COCP has more general convexity conditions than the OCP using direct methods, including multiple shooting and collocation methods. The direct methods discretize the OCP on a temporally {t 0 = 0, . . ., t K = T } and implicitly contain numerical methods, such as Euler or Runge-Kutta methods, to solve ordinary differential equations or integral equations.
Denote x[k] = x(t k ) and α[k] = α(t k ). x denotes both state trajectories x(·) in continuous time and sequences x[·] in discrete time, and the same notation rule is applied to α as well.
The direct methods result in finite-dimensional convex optimization problems in x[0], α[0], . . ., x[K], α[K] ∈ R (n+m)×(K+1) if the following convexity conditions hold: for each t, the stage cost L is convex in (x, a) ∈ R n × R m , the terminal cost g is convex in x ∈ R n , the control constraint set A is convex in R m , the dynamics f is affine in (x, a) ∈ R n × R m , and the state constraint function c is convex in x ∈ R n . Then, numerical methods for convex programming guarantee optimality. The above argument works for any different numerical integration methods, including the midpoint, trapezoidal, and Simpson's rule, and numerical ODE solving methods, including linear multistep and Runge-Kutta methods.
In this section, we analyze convexity conditions for the COCP's cost, constraint functions and dynamics, and verify the benefit of solving the COCP reformulation by direct methods over solving the original OCP formulation.
Suppose the stage cost L can be decomposed into the state-and control-dependent parts: We define the control constraint of the COCP in (x, b): for t ∈ [0, T ], where B(t, x) is defined in (8).
x j )) for j = 1, 2, there exist a finite number of a jl and γ jl ∈ [0, 1] ( Using this, we have Since co({−ϕ(t, a) | a ∈ A} is a convex set, both the left-and righthand sides of (55) belong to co({−ϕ(t, a) | a ∈ A}. Thus, db 1 + (1 − d)b 2 belongs to co(B(t, dx 1 + (1 − d)x 2 )). Table II summarizes the convexity conditions for the OCP and the COCP. If all the convexity conditions for the OCP are satisfied, the ones for the COCP are also satisfied. However, the opposite is not true.
Remark 2 (Benefits of the COCP): The COCP converts the OCP's nonconvexity in the control space to convexity. The convexity conditions for the COCP do not require that 1) the control constraint A is convex; 2) the control-dependent stage L a (t, a) in (47)

V. EXAMPLES AND DEMONSTRATIONS
We now introduce a 16-D example that illustrates the benefits of our COCP approach. For numerical computation, a computer with a 2.8-GHz Quad-Core i7 CPU and 16-GB RAM was used.
We define an OCP where four agents attempt to track their own reference trajectories while maintaining formation: where dist(x l (t), x l r (t)) is the position distance of the agent l from its reference trajectory in l 2 norm, the four reference trajectories x l r are in a square formation at each time, x l 0 is an initial state for each agent l, as shown in Fig. 3, and the state constraint function c includes obstacles and agent-collision avoidance constraints (c obs and c col , respectively): c(x(t)) = max{c obs (x(t)), c col (x(t))}, where c obs (x(t)) = max l=1,2,3,4 max{x l (t, 1), −x l (t, 3)}, c col (x(t)) = max i<j {c i,j col }, and c i,j col encodes an affine constraint to prevent collision between agent i and j, for example, c 1,2 col = −x 1 (t, 3) + x 2 (t, 3) + 0.2. The COCP formulation from Theorem 1 is where β(t) = [β 1 (t); β 2 (t); β 3 (t); β 4 (t)] ∈ R 16 and β l (t) = [β l (t, 1); β l (t, 2); β l (t, 3); β l (t, 4)] ∈ R 4 , l = 1, . . ., 4. The COCP's stage cost (H * ) is the same as the OCP's (L) by Corollary 1. The second and third lines in (60) are explicit expression for co(B(t, x(t))). We apply the Hermite-Simpson collocation method to the COCP with Algorithms 1 and 2, for which the interior-point method [10,Ch. 11] is utilized. Table III compares performance of various methods in terms of cost, computation time, L 2 -norm of the numerical derivative of control signalα (which measures the degree of control-switching behavior), and finally, the success rate of convergence from a random initial guess. We do not compare with the HJ methods [4], Branch-and-Lift [6], and control-set convexification methods [8] because the HJ method and Branch-and-Lift are intractable to handle this 16-D system, and control-set convexification method [8] assumes convex cost, linear dynamics, and convex state constraints. The three previous methods in Table III are directly applied to the OCP. For all five methods in Table III, we utilize interior-point methods [10,Ch. 11]. For each method, we run numerical solvers with ten different solution candidates and provide averages and standard deviations (in parentheses) for each performance measure, if applicable.
Algorithm 1 can been seen to outperform the other methods in terms of optimal cost, and has a much smaller standard deviation. This is because Algorithm 1 is guaranteed to find a solution for any initial solution, whereas the other methods do not. Also, Algorithm 1 uses less computation time than the other methods, with the exception of PMP. This is because the dimension of decision variables for PMP is significantly lower than the other four methods: unlike the rest, the decision variables for PMP do not depend on the number of temporal discretization points but rather just on the costate dimensions. However, it is important to note that faster computation time for PMP comes at the expense of solution success rate. The interior-point method fails to find solutions to the PMP optimality conditions for 70% of initial solution candidates; by contrast, the interior-point solver guarantees convergence to the globally optimal solution for Algorithm 1, and convergence to a locally optimal solution for the direct methods for any initial solution candidate.
We propose further improvements to our method from a practical standpoint, by addressing chattering behavior, which is caused by the subdiscretization in Algorithm 1's line 4. As shown in Fig. 3(b) and Table III, our Algorithm 2 improves upon this aspect. In addition to smoother control than Algorithm 1, COCP with Algorithm 2 continues to have a better average and standard deviation in cost than the remaining three methods, and less computation time than multiple shooting and collocations.
For robustness of controls and real-time computation, our approach can be combined with model predictive control (MPC) [15]. We demonstrate our preliminary results in Table IV, which shows the computation time of Algorithm 1 for each receding horizon, and the accumulated cost in the full time horizon [0,10]. We run numerical solvers with ten different solution candidates, and Table IV provides averages (without parentheses) and standard deviations (with parentheses) for each measure. Smaller receding horizon results in shorter computation time in each horizon but worse cost. Typically, longer receding horizons generally help satisfy constraints and find better cost.

VI. CONCLUSION
This article proposes a convexification formulation (COCP) for state-constrained OCP and presents two algorithms to compute optimal solutions. It has been proved that our COCP provides the OCP's optimal cost and state trajectory without approximation. Also, our first algorithm is guaranteed to find optimal solutions if the OCP's functions and constraints are convex in the state, and not necessarily in the control. This condition is more general than multiple shooting, collocation, and PMP's optimality conditions. Our simulation supports these results. The frequent control-switching behavior induced by our method is mitigated by our second algorithm that adds one more optimal-control-solving step. Although the additional OCP induces a nonconvex problem, its optimal cost is known to be zero. Using this fact, we can run optimizers with multiple solution candidates to find optimal solutions. Our simulation shows that the second algorithm has a similar level of control-switching behavior compared to collocation. Lastly, we demonstrate receding-horizon techniques to COCP for real-time computation.