Stochastic Optimal Linear Control for Generalized Cost Functions With Time-Invariant Stochastic Parameters

This study presents the design of feedback controllers for generalized cost functions to deal with stochastic optimal control problems. Target linear systems contain time-invariant stochastic parameters that describe system uncertainty. The cost functions involve nonlinear mappings and polynomial forms of the system states and inputs to express various performance metrics. Unfortunately, these properties cause difficulties in solving the problems. Conventional methods, such as the principle of optimality, are not employed to solve such problems owing to the time-invariant parameters. As opposed to the well-known quadratic functions, handling the generalized cost functions is a complicated task. This study overcomes these challenges by deriving an explicit relation between the cost function and the linear feedback gain of a controller. The derived relation enables the feedback gain to be optimized via a gradient method. A theoretical analysis ensures the convergence of the proposed gradient method. A suboptimal feedback controller is obtained to solve the problem, even for the generalized cost. Furthermore, the controller guarantees robust stability of the feedback system even with the stochastic parameters. It is demonstrated that the proposed cost function can express an expectation of a quadratic cost, risk-sensitive cost, polynomial cost, and input-to-state gain. A numerical simulation shows the effectiveness of the proposed method.


Stochastic Optimal Linear Control for Generalized
Cost Functions With Time-Invariant Stochastic Parameters Yuji Ito , Member, IEEE, Kenji Fujimoto , Member, IEEE, and Yukihiro Tadokoro , Senior Member, IEEE Abstract-This study presents the design of feedback controllers for generalized cost functions to deal with stochastic optimal control problems.Target linear systems contain timeinvariant stochastic parameters that describe system uncertainty.The cost functions involve nonlinear mappings and polynomial forms of the system states and inputs to express various performance metrics.Unfortunately, these properties cause difficulties in solving the problems.Conventional methods, such as the principle of optimality, are not employed to solve such problems owing to the time-invariant parameters.As opposed to the well-known quadratic functions, handling the generalized cost functions is a complicated task.This study overcomes these challenges by deriving an explicit relation between the cost function and the linear feedback gain of a controller.The derived relation enables the feedback gain to be optimized via a gradient method.A theoretical analysis ensures the convergence of the proposed gradient method.A suboptimal feedback controller is obtained to solve the problem, even for the generalized cost.Furthermore, the controller guarantees robust stability of the feedback system even with the stochastic parameters.It is demonstrated that the proposed cost function can express an expectation of a quadratic cost, risk-sensitive cost, polynomial cost, and input-to-state gain.A numerical simulation shows the effectiveness of the proposed method.
Index Terms-Generalized cost functions, linear stochastic systems, stochastic optimal control.

I. INTRODUCTION
U NCERTAIN dynamical systems are involved in the field of control engineering, such as semi-autonomous vehicles [3] and nanoscale receivers [4].The uncertainty may be caused by individual differences among human drivers and/or manufacturing variations.Other examples include mobile robots with vision sensors whose physical parameters are unknown or uncalibrated [5].Such systems should be controlled such that desired control performance can be achieved even with uncertainty in the system parameters.Thus, the focus of this study is to design optimal controllers that can maximize the performance of systems under uncertainty.Here, we consider linear systems that are regarded as (linearized) tractable models of spring-mass-damper systems [6], pendulum systems [7], and a shock absorber [8].
Time-invariant stochastic parameters are efficient for describing the uncertainty existing in systems, such as manufacturing variations [4].The parameters are static in time but have random values, which can represent various system dynamics.It is desirable to design an identical optimal controller independent of the parameter values because estimation and control design for each parameter is difficult to realize.Unfortunately, solving such optimal control problems with such stochastic parameters is challenging because conventional approaches for solving the problems are violated by the time-invariant stochastic properties.Traditional methods, such as Riccati equations, Hamilton-Jacobi-Bellman equations, and the principle of optimality [9], can be applied to systems containing time-varying stochastic parameters that are randomly changed in time.Such parameters can describe the state-and input-dependent noise in system dynamics.Control methods have been developed for the time-varying stochastic parameters to minimize various performance metrics, such as an expectation of a quadratic cost [10], the expectation involving a state variance [11], [12], and a risk-sensitive cost [13], [14].However, these controllers cannot treat the time-invariant stochastic parameters.Markov jump systems [15], [16], [17] are other elegant expressions if the system parameters are assumed to take values in a finite state space.
Although various control methods have been developed for time-invariant stochastic parameters involved in linear systems, the existing methods have focused only on the stabilization of the systems and/or optimality in limited performance metrics.Systems with stochastic/uncertain parameters can be handled using robust control approaches using linear matrix inequalities (LMIs) [18], [19], [20], [21], [22] and probabilistic design [23], [24].These approaches ensure the stability of systems with a lack of optimality in performance metrics.Ensuring only robustness often results in degradation of the control performance in an   [25], [26], [27], [28], [29] based on polynomial chaos expansion [30] have been developed to realize optimality.Our previous method also offered suboptimal controllers [1], which may be regarded as an extension of gradient flow approaches [31], [32].Unfortunately, these methods [1], [25], [26], [27], [28], [29], [31], [32] adopted only simple quadratic cost functions as performance metrics, the variety of which is limited.Innovative approaches with Kronecker products [33], [34] and the sum of squares [35], [36], [37], [38] can treat polynomial cost functions, which are less restrictive than quadratic functions.However, these techniques have not focused on systems containing time-invariant stochastic parameters.Other strategies, such as parametric optimization [39], feedforward control [40], and model predictive control (MPC) [37], [41], have the possibility of handling various types of cost functions.Unfortunately, the computational complexity of these methods increases depending on the finite time horizon of the cost functions.Explicit MPC [42] offers controllers offline to avoid the online computational burden, although the stability of the controllers is not guaranteed.
To overcome the above limitations, for linear systems with time-invariant stochastic parameters, this study presents a design method of suboptimal controllers.The control objectives are to minimize generalized cost functions over an infinite time horizon and to stabilize the systems robustly.Our design scheme is developed based on a gradient method, the implementation of which yields suboptimal linear feedback controllers for the cost functions offline.The main contributions of this study are summarized as follows.
1) In Theorem 1, we express a generalized cost function as a simple explicit function of the linear feedback gain of a controller.The cost function consists of nonlinear mappings and polynomial functions of the system states and inputs.This expression is helpful in deriving the gradient of the cost function.Moreover, this expression has the potential to be applied to other interesting problems, such as input matrix design [43], optimal consensus control [44], constrained trajectory optimization [45], and mean-field control [46].2) Theorem 2 provides the gradient of the generalized cost function in an explicit form regarding the feedback gain.Obtaining such a gradient is challenging because the cost function involves an infinite sum (over the infinite time horizon) of the polynomials, the nonlinear mappings, and the stochastic parameters.The combination of the obtained gradient with the proposed gradient method enables us to design suboptimal feedback gains for various types of performance metrics.
3) Theorem 3 guarantees that the proposed gradient method is successfully converged and that the system is robustly stabilized.In the theorem, we analyze desirable properties such as smoothness and boundedness of the cost function.It is expected that our analysis can help employ other gradient methods, such as conjugate gradient and quasi-Newton methods.4) We demonstrate that the proposed cost function can represent an expectation of a quadratic cost, risksensitive cost, polynomial cost, and input-to-state gain in Theorem 4. The effectiveness of the proposed method is demonstrated through a numerical simulation, using which the control performance is evaluated.Table I compares this study with related work.As shown in Table I, systems with time-invariant stochastic parameters have attracted considerable attention in the existing work.Quadratic cost functions on (stochastic) optimal control problems have been widely addressed because the quadratic forms make the problems simple.Treating both the time-invariant stochastic parameters and nonquadratic cost functions is not a straightforward task.This study overcomes such a difficulty, realizing optimal control with time-invariant stochastic parameters for various nonlinear cost functions.
This study is a substantially extended version of our previous conference papers [1], [2].The main extensions are summarized as follows.The proposed method can employ the generalized cost functions as introduced above, whereas the work in the conference papers treated only quadratic and polynomial functions.This study guarantees the convergence of the proposed gradient method, whereas the previous work did not.A new theoretical analysis and numerical simulation are provided to demonstrate the effectiveness of the proposed method.Technical points of the proposed method have been improved to enhance the technical soundness and readability.
The remainder of this article is organized as follows.Section II describes the notations.In Section III, the problem setting and main problem in this study are presented.In Section IV, we propose a method to overcome the problem.The proposed method is demonstrated through a theoretical analysis and numerical simulation in Section V. Finally, we conclude this article in Section VI.

II. NOTATIONS 1) I n :
The n-by-n identity matrix, where the subscript n indicating the dimension is omitted if it is obvious.
The element in the ith row and jth column of Z ∈ R n×m .5) [Z] −1 : The matrix excluding the first row and first column from a matrix

III. PROBLEM SETTING
This study focuses on minimizing a cost function while stabilizing a linear stochastic system robustly.Descriptions of the system, robust stability, and cost function are given in Sections III-A-III-C, respectively.Finally, the main problem to be addressed in this study is stated in Section III-D.

A. Target System
We consider the linear stochastic system with the state where A(θ) ∈ R n×n and B(θ

B. Controller and Robust Stability
A linear feedback controller is employed in the system (1) where K ∈ R m×n is the feedback gain.Note that K should be independent of θ because θ is unknown under Assumption 1 (ii).This study focuses on the robust stability indicating that for every θ and every x 0 , the state x t (θ) converges to zero asymptotically.Namely, the system (1) with the controller (2) applied is said to be robustly stable if the following holds: where S K is the set of robustly stabilizing feedback gains Throughout this article, we assume that the target system is robustly stabilizable, that is, S K is not empty.

C. Generalized Cost Function
For the system (1) with the controller (2), we introduce the infinite time horizon 1 cost function J : where h : . ., and Mth-order monomials of (x t (θ ), u t ).We define the monomials using the Kronecker product ⊗ and operator [ . . .] −1 as follows: The definitions ( 5)-( 8) are expressive to generalize the cost function J(K) for stochastic optimal control problems.Section V-A demonstrates that the proposed cost function J(K) can represent well-known cost functions, such as an expectation of a quadratic cost, risk-sensitive cost, polynomial cost, 1 In this article, ∞ t=0 denotes lim T→∞ T t=0 .For functions ψ and ψ t , if Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
and input-to-state L 2 gain.One example is presented herein with the following definition: where Q ∈ R n×n , S ∈ R n×m , and R ∈ R m×m are given.Proposition 1 (Linear Quadratic Regulators): For any feedback gain K ∈ S K , the cost function J(K) reduces to the expectation of the quadratic cost function (10) if the maximum order of the monomials φ(x t (θ ), u t ) in ( 7) is set to M = 2 and the functional h in ( 5) is given by Proof: The proof is described in Appendix B. Proposition 1 focuses on the well-known quadratic cost function with the cross term to demonstrate the generality of the proposed cost function in (5) rather than the advantages of the cost function in (10).Section V-A shows that the proposed cost function can represent various cost functions.

D. Main Problem
The main problem is associated with the system (1), robust stability (3), and cost function (5).
Main Problem: Design a linear feedback controller to minimize the cost function while guaranteeing the robust stability min K∈S K J(K) (12) where this problem permits local optimal solutions to (12).Remark 1 (Challenges Owing to Time-Invariant θ ): Difficulties in solving the main problem arise because of the stochastic parameter θ .From the standpoint that θ is time invariant and unknown, the principle of optimality and dynamic programming [9] is violated because x t (θ ) does not obey the Markov property; that is, x 0 that are difficult to compute as t increases.
Remark 2 (Why Linear Controllers?):Linear controllers are preferable owing to their safety, reliability, and synthesis with other systems when considering the resulting feedback system that retains the linearity.The concept of the method proposed in the next section has been extended to the design of nonlinear controllers for nonlinear stochastic systems in our relevant work [47].The proposed method has the following three advantages in comparison with [47].Cost functions are generalized to represent various costs (to be shown in Section V-A).The stability of the feedback system is guaranteed.The convergence of the proposed gradient method is guaranteed.

IV. PROPOSED METHOD
This section presents a solution to the main problem.An overview of the proposed method is provided in the following.To minimize the cost function J(K), we employ a gradient method that updates K, as follows: where α ≥ 0 and the superscript ( ) denotes the number of iterations.The function W (K ( ) ), which is explained later, is a barrier term to aid in satisfying the robust stability condition K ∈ S K .The iteration of ( 13) results in the convergence of K ( ) to suboptimal solutions to minimizing J(K).
To realize this approach, we describe several technical points in the subsequent sections.In Section IV-A, we provide the derivation of efficient expressions of the cost function and its gradient, which are important results of the proposed method.Section IV-B transforms the robust stability condition K ∈ S K into tractable matrix inequalities.In Section IV-C, we describe the implementation of the proposed gradient method using the derived gradient and transformed stability conditions.The robust stability and convergence of the gradient method are guaranteed in the implementation.

A. Main Results: Explicit Cost Function and Gradient
It is suitable to express the cost J(K) as a simple function of K, without using the infinite sum in (6), for employing the gradient method.In the following, Theorems 1 and 2 express the derivation of the cost function J(K) and its gradient ∂J(K)/∂K, respectively, in explicit forms.
Theorem 1 (Explicit Cost Function): For any feedback gain K ∈ S K , the cost function is represented as follows: where Proof: The proof is described in Appendix C.

Remark 3 (Contribution of Theorem 1):
The cost function J(K) is regarded as a function of the feedback gain K for given x 0 (θ ), h, M, A(•), B(•), and the PDF p(•), where θ is marginalized by h.An important advantage of ( 14) is that the infinite time horizon cost function J(K) is represented by an explicit function of K without using the infinite sum ∞ t=0 , whereas the original form 5) and ( 6) contains the infinite sum.Using (14) enables us to evaluate and differentiate J(K) without suffering from the problems of calculating the infinite sum (to be presented in Theorem 2), thereby enabling us to employ gradient-based optimization of J(K) with respect to K.Although promising models such as neural networks and fuzzy rules can express various functions, such models are not suitable for representing infinite time horizon cost functions because they cannot remove the infinite sum without approximations.Remark 4 (Ideas Underlying Theorem 1): The exclusion of the infinite sum is realized by virtue of the definition (7) of φ(x t (θ ), u t ) and the linearity of the controller (2).Using these two points, the monomials of x t (θ ) can be handled by the linear system with the expanded state (18) and matrix ( 16) Although the existing methods [33], [34] employed techniques similar to this expansion, the studies focused on continuoustime systems and a cost function limited to the quadratic form of the input.Meanwhile, Theorem 1 can be applied to polynomials in both the state and input.
Based on Theorem 1, the gradient of the cost function is derived in Theorem 2 under Assumption 2.
Assumption 2 (Functional h): where f : R The symbol e denotes the expectation regarding θ ; that is, Remark 5 (Supplement for Assumption 2): Assumption 2 is reasonable because J(K) can represent various cost functions such as an expectation of a quadratic cost, risk-sensitive cost, polynomial cost, and input-to-state gain even under Assumption 2, which is demonstrated in Theorem 4 in Section V-A.Moreover, the smoothness assumptions of f and g are not required if we employ other methods to minimize J(K) (approximately), such as gradient approximation methods [48] and other nonsmooth optimization methods [49,Ch. 3].Although other methods based on the HJB equations, for example, [50], may use cost functions not as smooth as f and g, recall that such methods are not applied to optimal control problems associated with time-invariant stochastic parameters.
Theorem 2 (Explicit Gradient): Suppose that Assumption 2 holds.For any feedback gain K ∈ Int(S K ), the gradient of the cost function is expressed as follows: where Here, the partial derivatives for each θ are polynomials of K (see Remark 7).
Proof: Appendix D gives the proof.Remark 6 (Contribution of Theorem 2): The gradient of the cost function J(K) is explicitly represented without suffering from the infinite sum of the monomials φ(x t (θ ), u t ) and the nonlinear mapping h.Using this gradient, the gradient method ( 13) can be performed under the robust stability condition K ( ) ∈ S K .In Sections IV-B and IV-C, we explain the implementation of the gradient method with the stability condition.
Remark 7 (Partial Derivatives): Note that for each θ, A cl (θ, K) in ( 16) and K(K) in ( 17) consist of polynomials of K. Thus, ∂ K(K)/∂[K] i,j and ∂ A cl (θ, K)/∂[K] i,j can easily be obtained as polynomials of K.Note that the gradient in (21) does not depend on each value of θ because of taking the expectation e(•) regarding θ .

B. Robust Stability Condition
This section clarifies the robust stability condition K ∈ S K for implementing the proposed gradient method while guaranteeing the stability.The set S K is not explicitly provided as defined in (4), whereas the gradient of J(K) has been derived in the previous section.Based on the well-known robust stability theory associated with polytopic system representations, the stability condition is determined by matrix inequalities, as described below.
Let P ∈ R d 1P ×d 2P be a matrix and v be the vector consisting of K and P, as follows: where The set S v is defined by the N F symmetric matrix inequalities F j (v) 0 where each component of The inequalities F j (v) are provided using the following polytopic representation of the system (1) with vertices A j ∈ R n×n and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where N ω is the number of vertices.Note that such a polytopic set exists because A(θ ) and B(θ ) are bounded, as discussed in Section III-A.A simple setting of the parameters N ω , A j , and B j has been found in [51] if [A(θ), B(θ )] depends affinely on some matrix-valued function (θ) and (θ) for every θ ∈ S θ is contained in a matrix polytope.Several types of matrix inequalities can be adopted for F j (v) 0. Two examples of such inequalities are presented below based on the existing robust stability theory [18].
(i) Quadratic Stability: The matrix inequalities are given by F j (v) = P + P A j + B j K P + P P + P A j + B j K P + P (27) where P ∈ R n×n and N F = N ω hold.(ii) Stability Based on the Results in [18,Th. 2]: The matrix inequalities are given by where N F = N ω holds.Proof: The proof is described in Appendix E. Proposition 2 indicates that the implicit condition K ∈ S K is subsequently relaxed as the simple condition v ∈ S v defined by the matrix inequalities.

C. Implementation of the Gradient Method
Using the gradient and stability condition v ∈ S v derived in Sections IV-A and IV-B, respectively, the gradient method is implemented provided that S v is not empty.The constrained minimization problem (12) of K is modified as an unconstrained problem of v based on a barrier method [52,Exercise 11.6].The addition of the barrier function b:S v → R yields the unconstrained problem min where the decision variable K is replaced with v.The modified objective function J b (v) is defined as follows: where β > 0 is a coefficient and J const ∈ R is a constant.This barrier function helps satisfy the stability condition v ∈ S v because b(v) increases infinitely as v reaches the boundary of S v .The second term v 2 regularizes the size of v. Thereafter, the gradient method in ( 13) is modified as follows: where v ( ) := [vec(K ( ) ) , vec(P ( ) ) ] .Note that β∂b(v ( ) )/∂K corresponds to W (K ( ) ) in (13).The gradient ∂J b /∂v consists of ∂J/∂v in Theorem 2 and ∂b/∂v that is calculated in an analytical manner [53,Sec. 2 We guarantee the convergence of the proposed gradient method.
Theorem 3 (Convergence of the Gradient Method): Let us define J const := J b (v (0) )+1 in (31).Suppose that Assumption 2 and the following conditions hold.
(i) The iteration of v ( ) obeys the gradient method in (33).
(ii) The initial guess satisfies the stability condition; that is, v (0) ∈ S v holds.(iii) The function J(K) is bounded below on S K .(iv) For all ≥ 0, the step size α > 0 satisfies the Wolfe conditions [54] where γ a and γ b are free parameters that satisfy 0 < γ a < γ b < 1.Then, the gradient method converges in the sense of lim Furthermore, the feedback gain satisfies the stability conditions; that is, K ( ) ∈ S K holds for all ≥ 0. Proof: The proof is described in Appendix F.

Remark 8 (Contribution of Theorem 3):
The convergence of the proposed design method is guaranteed based on [54,Th. 3.2].In practice, the gradient method will be terminated in a finite iteration.The robust stability is guaranteed even for the finite iteration because K ( ) ∈ S K for all ≥ 0.
Remark 9 (Initial Condition): Condition (ii) in Theorem 3 requires the initial value to satisfy v (0) ∈ S v .This can be satisfied as follows, provided that such a value exists.In the case of Proposition 2 (ii), v (0) can be obtained by solving the LMIs of the symmetric matrix Pj 0 ∈ R n×n for j ∈ {1, . . ., N ω }, matrix K ∈ R m×n , and matrix P inv ∈ R n×n , given by The initial value v (0) := [vec(K (0) ) , vec(P (0) ) ] is where P (0) := [P (0) N ω ] and κ > 0 is a positive free parameter to normalize P (0) .Considering the case of Proposition 2 (i), we obtain the initial value using a symmetric Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The proposed method is summarized in Algorithm 1.In practice, after iterating the update of v ( ) until ∂J b (v ( ) )/∂v is smaller than a threshold g > 0, let K * := K ( ) indicate a suboptimal gain to the minimization (30).The main problem has been solved in this sense.Note that Algorithm 1 focuses on local optimality or convergence to a stationary point rather than global optimality because the objective function J b (v) is nonconvex in a manner similar to other gradientbased methods.Recent approaches based on stochastic gradient descent [55, Sec.III] are also promising for complex optimization problems involving the expectation E θ [ . . .].
Remark 11 (Computational Complexity): Algorithm 1 provides a gradient method based on [54,Th. 3.2].For any predefined accuracy g > 0, the number * of iterations to satisfy the termination condition ∂J b (v ( * ) )/∂v < g has the order * = O( −2 g ) under all the assumptions in Theorem 3. We obtain this result by simply applying the result [56, pp. 29] to our method based on [54, Th. 3.2 and eq.(3.15)].

V. DEMONSTRATION WITH NUMERICAL SIMULATION
The proposed method is demonstrated in this section.Section V-A reveals that the proposed cost function can express various types of performance metrics for stochastic optimal control problems.Sections V-B and V-C describe the settings and results, respectively, of the simulation for evaluating the effectiveness of the proposed method.

A. Representation of Various Cost Functions
The following theorem demonstrates that the proposed cost function represents various performance metrics by extending Proposition 1 with the definition (9).g(c(θ , K)) = w(Q, S, R) c(θ , K). (43) (ii) Risk-Sensitive Cost: For any nonzero scalar ρ ∈ R\{0} if M = 2 holds and f and g are given by (iii) Polynomial Cost: Let N μ be the number of terms of a polynomial cost.For j ∈ {1, 2, . . ., N μ }, let μ j and q(j, i) (i = 0, 1, . . ., n + m) be a scalar and non-negative integer, respectively, satisfying 1 ≤ n+m i=1 q(j, i) ≤ M. The cost function J(K) reduces to 42), and g is given by (48) (49) where q(j, 0) := M − n+m i=1 q(j, i) and δ i is the (n + m + 1)-dimensional vector in which the ith component is 1 and the others are zero.(iv) Input-to-State L 2 Gain (Discrete Time and Expected Version): if M = 2 holds and f and g are given by 1/2 2 ( 51) Proof: The proof is described in Appendix G.

B. Simulation Settings
Let us consider the spring-mass-negative damper system discretized via the Euler method with the sampling interval of t = 0.05 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where the 2-D state x t (θ ) consists of the position and velocity of the mass point, and we set x 0 = [1,1] .The spring constant is set to k sim = 1.Suppose that the other physical parameters are uncertainly varied and represented by the stochastic parameter θ .Namely, the mass and damping coefficient are set to m sim (θ) = 1/(5 + 5[θ ] 1 ) and c sim (θ) = 3(1 − [θ] 2 ), respectively.The 2-D stochastic parameter θ exists on the set S θ = (0, 1) × (0, 1).The PDF p(θ ) is constructed using beta distributions where p(θ ) = 0 on R 2 \S θ .This simulation uses the polynomial cost , where the individual cost for each θ is given by which is developed using the settings of M = 4, N μ = 6, and To realize the robust stability, the four vertices A j (j = 1, . . ., 4) in (26)

C. Simulation Results
The proposed controller (stochastic optimal controller) is compared with its deterministic optimal version and the initial controller using K (0) .The deterministic optimal version focuses on the nominal system by replacing θ , A j , and B j with the expectations E θ [θ], A(E θ [θ]), and B(E θ [θ ]), respectively.For each controller, the state trajectories are computed according to Algorithm 2. Obtain θ i randomly sampled from the PDF p(θ).
3: 1) and ( end for 6: end for  g(c(θ, K)) and their expectations J(K), respectively.The expectations with respect to θ were computed using the Monte Carlo approximation with 50 000 samples.The infinite horizon of the costs was approximated by the finite horizon of 500.
Trajectories of the state x t (θ ) and input u t with the proposed controller.
which is the expectation of the individual costs.The results were consistent with the objective, that is, minimizing the expected cost J(K).Indeed, J(K) for the stochastic optimal version was smaller than those for the others, whereas the individual costs for certain parameters were degraded.Costs for the initial controller were not sufficiently minimized, because this controller focuses only on the robust stability.Several costs for the deterministic version diverged because the stability was ensured for the nominal case E θ [θ ], but not for every θ .Fig. 2 demonstrates the trajectories of the state x t (θ ) and input u t with the proposed controller applied.We see that the state was stabilized for various θ represented by the multiple trajectories.Hence, it is confirmed that we can apply the proposed method successfully to optimal control involving time-invariant stochastic parameters and generalized cost functions.
The proposed method is evaluated in a particular case for a deterministic system with a linear quadratic regulator.Namely, θ , A j , and B j are replaced with E θ [θ ], A(E θ [θ]), and B(E θ [θ]), respectively, and only quadratic terms remain in the cost g(c(θ, K)) in (55).The proposed method provided K * = [0.3035,−2.588] that is almost equivalent to the optimal gain K = [0.3034,−2.590] given by solving a Riccati equation, while it seems that numerical residuals occurred due to the influence of adding the barrier function.

VI. CONCLUSION
This study solved optimal control problems of generalized cost functions for linear systems with time-invariant stochastic parameters.The proposed cost function can represent various cost functions, such as an expectation of a quadratic cost, risk-sensitive cost, polynomial cost, and input-to-state L 2 gain, as demonstrated by Theorem 4. In the proposed method, a suboptimal linear feedback controller is designed for minimizing the cost function offline.The controller guarantees robust stability of the feedback system with the stochastic parameters.
The practical application of this study includes systems with parameter variations in mass production, where the manufacturing variations of physical parameters are represented by time-invariant stochastic parameters.Nanoscale receivers are such examples, where the lengths of carbon nanotube cantilevers are varied via their fabrication processes [4].

APPENDIX A PRELIMINARIES FOR PROOF
We derive Lemmas 1 and 2 used to prove the main theorems.Lemma 1: Let d 1 and d 2 be natural numbers.For any natural number M and any matrix ∈ R d 1 ×d 2 , there exists a unique matrix such that the following relations hold: Proof: For each M ≥ 1, if ( 58) and ( 59) hold, we have which imply ( 58) and ( 59), respectively, for M + 1.Moreover, 1 = clearly holds, which leads to ( 58) and ( 59) for M = 1.Therefore, for every M ∈ {1, 2, . ..}, the statement is satisfied based on mathematical induction.
Lemma 2: Let d 1 and d 2 be natural numbers.For any ∈ R d 1 ×d 2 , any z ∈ R d 1 , and any y ∈ R d 2 that satisfy z = y, we have the following for all M ∈ {1, 2, . . ., }: Proof: We use the property that for given matrices Z 1 , Z 2 , Z 3 , and Z 4 , we have holds, then we have Moreover, using the relation of z = y yields (63) for M = 1.Therefore, using mathematical induction, we have (63) for every M ∈ {1, 2, . ..}.As a result of the structure described in (58) in Lemma 1, applying the operator [ . . .] −1 to (63) yields (62).This completes the proof.

APPENDIX C PROOF OF THEOREM 1
Let A cl (θ, K) 0 denote I.If the following relations hold: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Finally, (68) is proven using the definition of A cl (θ , K) := A(θ ) + B(θ )K.As a result of the structure described in (58) in Lemma 1, A cl (θ , K) t is transformed as follows: Here, for given matrices Z 1 , Z 2 , Z 3 , and Z 4 , the property [58,Sec. 3.2.9].The use of this property yields the following relation: Replacing Z 2 in the above equation with ).The iteration of this replacement finally results in (Z ⊗M 1 ) t = (Z t 1 ) ⊗M .Applying this property to (70) yields Because of (72), we use Lemma 1 by replacing M and with A cl (θ, K) t and A cl (θ , K) t , respectively.For K ∈ S K , A cl (θ , K) is stable; that is, the property lim t→∞ A cl (θ, K) t = 0 holds.Subsequently, we have lim t→∞ A cl (θ , K) t = 0 that is equivalent to max i |λ i ( A cl (θ, K))| < 1.Furthermore, we show the nonsingularity of (I− A cl (θ , K)).Supposing that it is singular, we have det(I− A cl (θ , K)) = 0, implying λ i ( A cl (θ , K)) = 1 for some i.This contradicts max i |λ i ( A cl (θ, K))| < 1 and thus (I − A cl (θ, K)) is nonsingular.Because of the nonsingularity and lim t→∞ A cl (θ , K) t = 0, the relation (68) holds according to [58,Sec. 3.9.5].
We have derived the relations (66)-( 68) that lead to this theorem, as described in (69).The proof is now complete.

APPENDIX D PROOF OF THEOREM 2
First, the partial derivative ∂c(θ , K)/∂[K] i,j in ( 22) is obtained using the relations ∂c(θ, K) . Subsequently, we demonstrate (21).Because each component of g is totally differentiable at c(θ , K) according to Assumption 2 (iv) and c(θ , K) is partially differentiable at K ∈ Int(S K ), the use of the chain rule yields where i,j ∈ R m×n is the matrix such that [ i,j ] i,j = 1 and the others are zero.Let us consider the case in which S θ is infinite.For each K, let B(K) ⊂ Int(S K ) be a neighborhood of K.The partial derivative in (73) is bounded on B(K) × S θ because the derivative is continuous and S θ is bounded according to Assumption 1 (iii).Using this boundedness, Assumptions 1 (iv) and 2 (iii), and the Lebesguedominated convergence theorem, we obtain the following partial derivative: where the above result is obtained even if S θ is finite.Because Substituting (74) into (75) yields ( 21) and completes the proof.

APPENDIX E PROOF OF PROPOSITION 2
The term (P + P ) can be replaced with a symmetric matrix.Subsequently, the conditions F j (v) 0 for all j with (27) reduce to the inequalities for the quadratic stability.Moreover, F j (v) 0 with (28) reduce to the inequalities used in [18,Th. 2].These imply (24), and thus the proof is complete.

APPENDIX F PROOF OF THEOREM 3
We first show that the following conditions (v)-(ix) hold, where S v = ∅ according to condition (ii).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
We demonstrate condition (v).For a given > 0, let M be a symmetric matrix such that |[M ] i ,j | < holds for any i and j .Weyl's inequality [59] can be applied to the symmetric matrices F j (v) and M where λ min (M ) and λ max (M ) are the minimum and maximum eigenvalues of the symmetric M , respectively.For any v ∈ S v , there exists > 0 such that F j (v) + M 0 holds for any j and any M owing to min j λ min (F j (v)) > 0. Because [F j (v)] i ,j is continuous on R d v and the number N F of inequalities is finite, for any v ∈ S v and > 0, there exists δ > 0 Subsequently, we demonstrate condition (vi).For any v ∈ S v , its neighborhood is also contained in the open set S v ; thus, a neighborhood of the corresponding K is included in S K .This fact modifies the relation (24) as follows:  2 , and thus, S v,2 is open.As J b (v) is C 2 continuous on S v according to condition (vi), condition (viii) holds.
We prove that condition (ix) holds.The boundedness of S v,2 is first proven.Let r v > 0 be a scalar satisfying βr 2 v + inf K∈S K J(K) > J b (v (0) )+1/2 because J(K) is a scalar function and is bounded below.As the first term in b(v) is non-negative on S v , for any v ∈ {v ∈ R d v | v > r v } ∩ S v (if this set is not empty), we have This implies that S v,2 ⊆ {v ∈ R d v | v ≤ r v } ∩ S v holds, and thus, S v,2 and its closure are bounded.We next demonstrate that the closure of S v,2 is contained in S v as follows.Let > 0 be a positive scalar.For any v / ∈ S v and any λ > 0, there exists ∈ (0, ) such that λ min (F j (v) + M ) < λ holds for any M and some j because λ min (F j (v)) ≤ 0 for some j and Weyl's inequality (76) hold.Moreover, λ i (F j (v) + M ) is bounded for any i and j owing to < and Weyl's inequality (76).Using these properties, for any v / ∈ S v and any λ > 0, there exists δ > 0 such that λ min (F j (v )) < λ holds for some j and all λ i (F j (v )) are bounded for any v ∈ {v | v − v < δ}.Thus, for any v / ∈ S v , there exist λ > 0 and δ > 0 such that 0 < λ min (F j (v )) < λ and J b (v ) ≥ β det(F j (v )) −1 + inf K∈S K J(K) > J b (v (0) ) + 1 hold for some j and any v ∈ {v | v − v < δ} ∩ S v (if this set is not empty), because λ min (F j (v )) > 0 holds for v ∈ S v and det(F j (v )) −1 can be increased by λ min (F j (v )) −1 > λ −1 > 0 with the other bounded eigenvalues.This implies that any v / ∈ S v is not a closure point of S v,2 because such δ > 0 satisfies {v | v − v < δ} ∩ S v,2 = ∅.Thus, the closure of S v,2 is contained in S v .Consequently, the closure of S v,2 is a bounded closed set that is contained in S v .The second-order partial derivative of J b (v) is bounded on this bounded closure of S v,2 as it is continuous on S v containing the bounded closure.This leads to condition (ix).
Subsequently, we prove that K ( ) ∈ S K holds for all ≥ 0. The Wolfe condition (35) yields the property v ( ) ∈ {v|J b (v) ≤ J b (v (0) )} ⊆ S v for ≥ 0. This property and (24) imply K ( ) ∈ S K for all ≥ 0. The proof is now complete.APPENDIX G PROOF OF THEOREM 4 First, we prove (i).Combining the functions (42)-( 43) yields the function h used in (11) in Proposition 1.This indicates that the expectation of the quadratic cost (10) is represented.
Subsequently, we demonstrate (ii), which can be proved in a similar manner to the case of (i).Using the transformation in (65) with S = 0, combining ( 45) and ( 46) yields (44).

Theorem 4 (
Various Cost Functions): Suppose that K ∈ S K and Assumption 2 (i) and (iii) hold; that is, h = f • e • g and e(g(. . .)) = E θ [g(. . .)].The following types (i)-(iv) of the cost functions J(K) are represented.(i) Quadratic Cost: The cost function J(K) reduces to the expectation of the quadratic cost defined in (10) if M = 2 holds and f and g are given byf (e) = e(42)

Fig. 1 Algorithm 2
presents the control performance of the initial controller, deterministic optimal version, and stochastic optimal version.The bars indicate the individual costs g(c(θ , K)) in (55) for the various stochastic parameters θ that were randomly sampled.The horizontal lines indicate the cost function Calculation of the State Transitions Input: x 0 , K, p(θ) Output: the state trajectories x t (θ i ) for various θ i 1: for i ∈ {1, 2, 3, . . .} do 2:

Fig. 1 .
Fig. 1.Performance results for the initial controller, deterministic optimal version, and stochastic optimal version, corresponding to green, blue, and red, respectively.The bars and horizontal lines indicate the individual costsg(c(θ, K)) and their expectations J(K), respectively.The expectations with respect to θ were computed using the Monte Carlo approximation with 50 000 samples.The infinite horizon of the costs was approximated by the finite horizon of 500.

TABLE I COMPARISON
OF THIS STUDY WITH THE RELATED WORK average sense.Promising methods