Risk-Aware Linear Quadratic Control Using Conditional Value-at-Risk

Stochastic linear quadratic control problems are considered from the viewpoint of risks. In particular, a worst-case conditional value-at-risk (CVaR) of quadratic objective function is minimized subject to additive disturbances whose first two moments of the distribution are known. The study focuses on three problems of finding the optimal feedback gain that minimizes the quadratic cost of: stationary distribution, one-step, and infinite time horizon. For the stationary distribution problem, it is proved that the optimal control gain that minimizes the worst-case CVaR of the quadratic cost is equivalent to that of the standard (stochastic) linear quadratic regulator. For the one-step problem, an approach to an optimal solution as well as analytical suboptimal solutions are presented. For the infinite time horizon problem, two suboptimal solutions that bound the optimal solution and an approach to an optimal solution for a special case are discussed. The presented theorems are illustrated with numerical examples.


I. INTRODUCTION
Risk has been extensively studied in the financial industry [1]- [4], but it is not the only field that requires consideration of risk. As many engineering systems are subject to uncertainties, consideration of risk in the decision-making processes is important to guarantee safety and reliability. Thus, risk has been also considered, such as in optimal control [5]- [7], model predictive control [8]- [10], parameter estimation [11], and differential games [12], [13]. More recently, risk has been considered in reinforcement learning [14], [15] and inverse reinforcement learning [16].
One of the measures of risk is conditional value-at-risk (CVaR), which is defined as the conditional expectation of losses exceeding a threshold. Since CVaR not only provides a convex conservative approximation for a joint chance constraint [17], but also enjoys nice mathematical properties of a coherent risk measure [18]- [20], it has been used in portfolio optimizations [1], [2].
However, the computation of the value of CVaR requires the knowledge of probability distribution of uncertainties, which is not always available. Instead, only partial information on the underlying probability distribution is known in many situations. This may be because uncertainties come in many different ways and difficult to model. This issue may be overcome by distributionally robust approaches [8], [17], [21]- [23], which assume that the uncertainty distribution is constrained to lie within a set, thus robust to uncertainties in the disturbance distributions.
Since the solutions to the distributionally robust CVaR problem can be expressed using the worst-case CVaR, this article employs the worst-case CVaR, which remains a coherent risk measure [2]. We derive the optimal feedback gain for linear quadratic optimal control that minimizes a worst-case CVaR of quadratic objective function for three cases: at the stationary distribution, at one-step cost, and over an infinite time horizon. Such an objective function will minimize the expected value of the tail distribution of the quadratic cost. An interesting observation is that, for the stationary distribution problem, the optimal control gain that minimizes the worst-case CVaR of the quadratic cost is equivalent to that of the standard (stochastic) linear quadratic regulator. The rest of this article is organized as follows. After introducing the basic notation and definitions in Section II, Section III provides the system description along with some preliminary observations. The three problems of minimizing the stationary distribution cost, the one-step cost, and the infinite time horizon cost are discussed in Section IV, Section V, and Section VI, respectively. Following numerical examples in Section VII, Section VIII concludes this article.

A. Notation
The sets of real numbers, real vectors of length n, and real matrices of size n × m are denoted by R, R n , and R n×m , respectively. The set of natural numbers are denoted by N. For M ∈ R n×n , M 0, M 0, M ≺ 0, and M 0 indicate that M is positive definite, positive semidefinite, negative definite, and negative semidefinite, respectively. The sets of symmetric, positive-semidefinite, and positive-definite symmetric matrices in R n×n are denoted by S n , S n + , and S n ++ , respectively. M denotes the transpose of a real matrix M, and Tr(M ) denotes the trace of M . I n denotes the identity matrix of size n, and the subscript may be dropped when the size is clear. The Kronecker product of two matrices X and Y is denoted as X ⊗ Y . For x ∈ R, x + = max{x, 0}.

B. Conditional Value-at-Risk
Let ξ ∈ R n be a random vector whose distribution P has the mean μ ∈ R n and the covariance matrix Σ ∈ R n×n (we assume that P has a finite second-order moment). Let P denote the set of all probability distributions on R n that have the same first-and second-order moments as P . Define the second-order moment matrix of ξ by Definition 2.1 (Conditional value-at-risk [1]): For a given measurable loss function L : R n → R, probability distribution P on R n and tolerance ε ∈ (0, 1), the CVaR at level ε with respect to P is defined as follows where E P [·] denotes expectation with respect to P . Definition 2.2 (Worst-case CVaR [24]): The worst-case CVaR over P is defined as follows: Here, the exchange of the maximization and minimization is justified by a stochastic saddle point theorem [25].
The following result is based on Lemma 2.3 and provides easy-tocompute lower and upper bounds on the worst-case CVaR for a special case of zero-mean distribution.
Lemma 2.4: Suppose μ = 0 and L(ξ) = ξ P ξ + 2q P 1/2 ξ + q q +r with some P ∈ S n + ,q ∈ R n , andr ∈ R, then By relaxing this set of conditions by the necessary condition , a lower bound is obtained as follows: These bounds not only allow us to design suboptimal controllers with small computational cost, but also provide insights into the suboptimal controllers compared with standard linear quadratic regulator, which we discuss in Sections V and VI. If the function L(ξ) does not have a linear term, those bounds coincide as follows:

III. SYSTEM DESCRIPTION
Consider the discrete-time linear time-invariant stochastic system x t+1 = Ax t + Bu t + w t , x 0 is given (1) where x t ∈ R nx is the state, u t ∈ R nu is the control input, and w t ∈ R nx is the process noise or disturbance, respectively, at discrete time instant t ∈ N. A ∈ R nx×nx and B ∈ R nx×nu are time-invariant system matrices. It is assumed that the pair (A, B) is stabilizable, and w t are independent and identically distributed random variables. Each disturbance w t follows a distribution having finite mean μ w and finite covariance Σ w 0 for all t. The true underlying probability measure P is not exactly known, but it is known that P ∈ P, where Here, δ ij is the Kronecker delta.
In this article, we assume μ w = 0 unless, otherwise, stated. Thus, we define the second-order moment matrix of w t by Furthermore, for short-hand notation, we define Lemma 3.2: Consider the system (1). Suppose that A + BK is Schur stable. Then, with the state-feedback u t = Kx t , the state x t converges to a random variable x ∞ with mean 0 and covariance S, where S 0 is the solution to the Lyapunov equation Namely, for the system (1), if there exists K that strictly stabilizes A + BK, then the system trajectory x t converges to a stationary distribution that does not dependent on x 0 . The existing results such as in [26] have the same form, but are restricted to the case of Gaussian disturbances. This result leads to a fact that the computation of the worst-case CVaR at the stationary distribution is very simple.
where S 0 is the solution to the Lyapunov (8).

IV. MINIMIZING THE COST AT STATIONARY DISTRIBUTION
This section considers the problem of finding the optimal statefeedback controller that minimizes the stage cost at the stationary distribution. This problem is motivated by the fact that in many cases, the control operation is performed near a target stationary distribution with the objective of stabilizing the system to the target stationary distribution. The problem is setup as the following optimization problem.  (1), find the controller gain K that strictly stabilizes A + BK and that the control input u = Kx minimizes the following objective function: sup where Q ∈ S nx + , R ∈ S nu ++ ,x is the stationary state, andū = Kx is the corresponding state-feedback control input.
An answer to the problem is summarized in the following theorem. Theorem 4.2: The optimal controller gain K for Problem 4.1 is equivalent to the classical (stochastic) linear regulator problem. Namely,K where Λ =Λ is the solution to discrete-time algebraic Riccati equation Proof: Since Σ w 0, from Lyapunov theorem, S 0 in (8) Notice that by defining the matrices (13) can be written as follows: (15) This is exactly the same form as the problem of H 2 norm minimization for the standard (stochastic) linear regulator [27] 1 . The solution to the problem (15) is known to bẽ Thus, the optimal gain is TΛT is the solution to (12). This can be checked by substituting (14) into (17). This shows that the well-known linear quadratic regulator also minimizes the worst-case CVaR of the quadratic cost at the point of convergence. Moreover, this controller is independent of the level of CVaR ε.

V. MINIMIZING THE ONE-STEP COST
This section considers the problem of finding the optimal statefeedback controller that minimizes the one-step cost. Such a problem appears in one-step model predictive control [28]. Since the choice of the control input u t affects u t and x t+1 , the problem is setup as the following optimization problem.
Problem 5.1: Given the state x t , find the control input u t that minimizes the following objective function: sup 1 In [27], the constraintS I is used instead ofS ∈ S nx ++ . It can be easily proved that those are equivalent if they are given with (Ã +BK)S(Ã + BK) −S + I = 0. subject to the system (1), where Q ∈ S nx + and R ∈ S nu ++ . This problem can be solved by the convex optimization as follows. Theorem 5.2: The solution to Problem 5.1 can be found by solving the following convex optimization: Proof: Apply Lemma 2.3 along with the translation-invariance property of CVaR for Since Theorem 5.2 does not provide much insights into the relation between the state x t and control input u t (it is not even clear if there exists a constant optimal gain K such that u t = Kx t ), let us consider the following: Theorem 5.3: A suboptimal solution u t for Problem 5.1 is given by u t =K os x t , wherê Proof: Since using Lemma 2.4, an upper bound on (18) can be found as where f (u t ) = 1 ε (Ax t + Bu t ) Q(Ax t + Bu t ) + u t Ru t . We minimize this upper bound. Taking the gradient with respect to u t and setting it to zero as we obtainû Since the second derivative is (25) is a suboptimal controller as it minimizes f (u t ).
We see that this suboptimal solution has a similar form as a standard quadratic regulator with a parameter ε. In fact, if ε = 1, the controller is equivalent to that of the standard quadratic regulator. This makes sense because when ε = 1, the value of the worst-case CVaR of L(ξ) is the expected value of L(ξ). We can also observe that the upper and lower bounds of the worst-case CVaR agree in Lemma 2.4; thus, the suboptimal controller is in fact optimal if ε = 1.
Remark 5.4: In general (i.e., ε = 1), the suboptimal controller is close to the optimal controller if where v M = (v Mv) 1/2 is the weighted vector norm.

Remark 5.5:
With the suboptimal controllerK os in (21), we have the following lower and upper bounds using Lemma 2.4: where Y = ((2ε − 1)R + B QB)(εR + B QB) −1 . Remark 5.6: The right-hand-side of (28) is convex with respect to x t . To show this, note that εR + B QB 0 and Thus, from Schur complement lemma, Q − QB(εR + B QB) −1 B Q 0. Remark 5.7: Solving the optimization problem in Theorem 5.3 is much faster than that of Theorem 5.2.
Remark 5.8: The system is not necessarily stabilized with these greedy controllers.

VI. MINIMIZING THE INFINITE TIME HORIZON COST
This section considers the problem of finding the optimal statefeedback controller u t = Kx t that minimizes the quadratic cost over the infinite time horizon. This problem is similar to the classical infinite horizon linear quadratic controller, but replaces the expected value by the worst-case CVaR. The problem is setup as follows: Problem 6.1: For the system (1), find the controller gain K for the control input u t = Kx t that minimizes the following objective function: where Q ∈ S nx + , R ∈ S nu ++ , and α ∈ (0, 1) is the discount factor 2 . Because it is difficult to find an optimal controller for the general case, we will first seek for suboptimal solutions and then consider a special case. Theorem 6.2: A suboptimal gain K for Problem 6.1 is given bŷ where P =P is the solution to the algebraic Riccati equation This gainK inf provides an upper bound on Problem 6.1 where X = x 0 x 0 . Proof: Define the value function 2 Strictly speaking, if we replace the expected value in the classical problem by the worst-case CVaR, the objective function is . For computational reasons, we relax this objective function and consider (30). subject to (1) starting at x t = z. Then, it follows that We will show that there exists P 0 and that Suppose that (36) is true, then using Lemma 2.4, it follows that Substituting this into (35) Thus, a suboptimal control input v is found to be Choosing P 0 such that satisfies (36) and (38). By substitutingK inf , (41) is equivalent to On the other hand, the objective function is V 0 (x 0 ), thus providing an upper bound on (30). Clearly, if α = ε, then this suboptimal controller is the same as the standard infinite-horizon linear quadratic regulator with decay rate 1.
Remark 6.3: The system is not necessarily stabilized depending on the value of α. Remark 6.4: If the initial state x 0 is a random vector, simply replace X by (1/ε)(μ x 0 μ x 0 + Σ x 0 ), where μ x 0 and Σ x 0 are the mean and covariance of x 0 . Remark 6.5: In case the system is described by x t+1 = Ax t + Bu t + Ew t , Σ w is replaced by EΣ w E . Corollary 6.6: Another suboptimal gain K for Problem 6.1 is given byK where P =P is the solution to the algebraic Riccati equation With this gainK inf , we have Proof: Similar to that of Theorem 6.2 using the lower bound of Lemma 2.4. Now we discuss how we can approach to the optimal solution for a special case x 0 = 0. Theorem 6.7: If x 0 = 0, Problem 6.1 is equivalent to the following optimization: Proof: When x 0 = 0, because of (5), (30) is simplified as follows: Using Corollary 2.5, it is further simplified as follows: Tr(P (Q + K RK)).
Here, P = ∞ t=0 α t ((A + BK) t ) Σ w (A + BK) t 0 is the solution to for a Schur stable √ α(A + BK), which is necessary for the convergence of (30). Remark 6.8: This is similar to the case of Problem 4.1 except for the factor α, and can be solved in a similar manner.

VII. NUMERICAL EXAMPLES
This section provides numerical illustration using two systems: vehicle steering and inverted pendulum on cart. We will see the performance of each of the controllers, which have different objectives, as well as the effect of disturbance distributions.

A. Vehicle Steering
We first consider a discretized normalized linear model for vehicle steering in [29]. This 2-D example is used to visualize the performance of the controller using the phase plot (see Fig. 2).
The dynamics describing the lateral deviation is expressed by system matrices  which are all stabilizing controllers. In the following simulations, the initial condition is set to x 0 = [1, 1] . Fig. 1(a) and (c) shows the histograms with the controllersK ss andK inf subject to three different disturbance distributions: Gaussian (symmetric), uniform (symmetric), and lognormal (asymmetric) with zero-mean and covariance Σ w = 0.010 0.003 0.003 0.020 .
Here, the lognormal distribution was obtained by setting the variance of the corresponding Gaussian to be ln(1 + √ 5)/2 and shifting and scaling the entire distribution to have desired mean and covariance.
The x-axis are the cost L i computed as follows. 1) Theorem 4.2: L i =x i Qx i +ū i Rū i for N set of sample steady statex i and sample steady state inputū i , assuming steady state is achieved at time 100.
2) Theorem 6.2: where N = 1000 is the number of data. The y-axis are normalized so that the area under the histogram integrates to 1. The empirical VaR was computed by taking the 100(1 − ε)-th percentile of the data. The empirical CVaR was computed by solving the following linear program [1]: The values of the worst-case CVaR were computed using Lemma 2.3 with the optimal controllerK ss andK inf . For the case of Theorem 6.2, it was computed on the sum of x t (Q +K inf RK inf )x t as in the footnote of Problem 6.1 (but with the horizon length N ). Note that those values are functions of ε.
The empirical values of VaR and CVaR and the worst-case CVaR are summarized in Table I.
First, note that because Theorem 4.2 deals with the problem at the stationary distribution, the cost is distributed around zero, while Theorem 6.2 deals with the infinite horizon problem, the cost is distributed around some nonzero value, which depends on the initial condition. We also see that although the kind of disturbance distribution does not affect a lot the overall shape of the cost distributions, lognormal disturbance seems to result in a long tail of the cost distribution compared to others, this is probably because the lognormal disturbance has a long tail. This yields a larger empirical CVaR compared to the other two. Fig. 2 shows a scatter plot for the suboptimal controllerK os for three different disturbance distributions. It also includes the empirical values of g t (x t ) = x t+1 Qx t+1 + u t Ru t with u t =K os x t , its worst-case CVaR with the optimal controller (by Theorem 5.2) and an upper bound on the worst-case CVaR with the suboptimal controller u t =K os x t (by Theorem 5.3). Here, g t (x t ) is a function of x t = [x t,1 , x t,2 ] and time (disturbance w t ), because x t+1 = Ax t + Bu t + w t and the controller input is determined by x t . The data points were obtained by running simulations starting at [1,1] for 100 steps ten times.
Although this is a greedy controller, it is a stabilizing controller, and we see that states move toward the origin for all disturbances as it has a peak around the origin. Nevertheless, we see that some diagonal plots have a small peak at the initial states in addition to the origin. This is because there may be more data around the initial states at time 100 because the system is not yet at the steady state for long enough. Since g t (x t ) is convex, we also expected that most of the purple dots are below the purple curve because the purple curve is the upper bound of the worst-case CVaR for those purple dots. We observe that this is in fact true in those figures. During the simulation, it was observed that the computation times for the suboptimal controller by Theorem 5.3 is much faster than that of Theorem 5.2. However, the disadvantage is its conservativeness, which we observe as the gap between the solid and dotted lines.

B. Inverted Pendulum on Cart
Next, we consider the model of an inverted pendulum on a cart in [30] to show that the proposed method can be applied to a nontrivial system. This is a 4-D example; thus, we omit to show a plot that corresponds to Fig. 2.
With x 1 and x 2 being the displacement and speed of the cart in meter, respectively, and x 3 and x 4 being the angle and angular velocity of the The decay rate, weights, and the level of CVaR are set to α = 0.9, Q = I, R = 1, ε = 0.4.
In this example,K ss andK inf are both stabilizing gains, whereas the gainK os for the greedy controller is observed to be not stabilizing. In the following simulations, the initial condition is set to Similarly to the vehicle steering problem, Fig. 3 shows the histograms of the cost corresponding to Theorems 4.2 and 6.2 with Gaussian, uniform, and lognormal disturbances, respectively, with zeromean and covariance The cost, empirical VaR, empirical CVaR, and the worst-case CVaR were computed as before, and the empirical values of VaR and CVaR and the worst-case CVaR are summarized in Table II.
Most observations are the same as the previous example. However, we observe that the values of the worst-case CVaR are smaller in all plots, which is due to the larger value of ε.

VIII. CONCLUSION
With a focus on the CVaR as a risk measure, this article investigated the problems of linear quadratic regulator with the objective function that takes into account risks. First, we obtained the optimal controller that minimizes the risk associated with the quadratic cost of the stationary distribution. Here, we proved that the optimal controller is equivalent to that of the standard (stochastic) linear quadratic regulator. Then, we explored two other problems concerned with finding risk-aware controllers that minimize the one-step cost and the infinite time horizon discounted cost. For these two problems, we derived suboptimal static state-feedback controllers by using new bounds on the worst-case CVaR associated with quadratic loss functions.

APPENDIX PROOF OF LEMMA 3.2
In this proof, for a vector x, x denotes the Euclidean norm of x and for a matrix M , M denotes the spectral norm of M .
By substituting (3), (5) can be written as Let y t = t−1 i=0 (A + BK) t−1−i w t . Since A + BK is Schur stable, there exist c ≥ 0 and γ ∈ (0, 1) such that (A + BK) t−1−i ≤ cγ t−1−i . Therefore, Let y t,i be the ith entry of the vector y t . We have |y t,i | ≤ y t , and therefore, Moreover, we have Hence, y t,i = w t−1,i + y t−1,i , where w t−1,i is the ith entry of the vector w t−1 . Since E P [w t−1 ] = 0, we have E P [w t−1,i ] = 0. Moreover, since w 0 , . . . , w t−1 are independent identically distributed random variables, w t−1,i is independent of y t−1,i . It follows that E P [y t,i |y t−1,i , y t−2,i , . . . , y 1,i ] = y t−1,i . This shows that y t,i is a martingale satisfying (62). It follows from Doob's martingale convergence theorem [31] that as t → ∞, y t,i converges almost surely to a random variable y ∞,i satisfying E P [|y ∞,i |] < ∞. We use y ∞ to denote the vector with entries y ∞,i . Since lim t→∞ (A + BK) t = 0, we have lim t→∞ x t = y ∞ =: x ∞ almost surely. By Lemma 3.1, we have E P [x t ] = μ t , E P [(x t − μ t )(x t − μ t ) ] = Σ t . Since almost sure convergence of x t implies convergence in distribution, it follows that: Finally, by Schur stability of A + BK, we obtain lim t→∞ μ t = 0, lim t→∞ Σ t = S, which completes the proof.