Integral Reinforcement Learning Control for a Class of High-Order Multivariable Nonlinear Dynamics With Unknown Control Coefficients

,


I. INTRODUCTION
Adaptive control of multivariable systems has attracted significant attention in the last several decades, where the design of control laws is much more challenging than the single-input systems due to the dynamic couplings of the high-frequency gain matrix [1], [2]. In general, the high-frequency gain matrix of these works are known in advance to design controllers. However, some practical systems can not possess the knowledge of the high-frequency gain matrix in priori, see for example, in [3]- [5]. Hence, one of fundamental problems of multivariable systems is how to deal with the unknown high-frequency gain matrix known as UCCs, which makes the extension from single-input systems to multivariable systems far from trivial.
To address the problem of UCCs, one of effective methods is employing Nussbaum-type functions, which was first proposed in [6] to deal with single-input systems with UCCs, and then extended to deal with various single-input dynamics [7]- [10]. However, when multiple control inputs The associate editor coordinating the review of this manuscript and approving it for publication was Leo Chen. with UCCs are considered, i.e., in multivariable systems, one critical challenge is how to deal with the problem of multiple Nussbaum-type functions, where the effects of Nussbaum-type functions may counteract each other. Attempts to cope with this problem, the work of [11] proposed an adaptive control scheme for the strict-feedback system, where a new designed Nussbaum-type function was introduced to allow multiple Nussbaum-type functions in a single inequality. However, the method of [11] still cannot be directly extended to multivariable systems. Furthermore, the authors in [12] suggested to construct a partial Lyapunov function for each control input where only one Nussbaum-type function exists. Inspired by this idea, some extended results were applied to multi-agent systems with UCCs [13]- [15]. Although the whole multi-agent system can be regarded as a multivariable system, it still needs more efforts to make an extension to the case of general multivariable systems. To completely solve this issue, the work of [16] designed a novel adaptive control algorithm for uncertain multivariable systems with UCCs by using the backstepping control technique. In addition to employ Nussbaum-type functions, there are some other methods to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ deal with the problem of UCCs, such as nonlinear PI functions in [17]- [21], switching and monitoring functions in [22] and the logic-based switching mechanism in [23], [24]. Recently, along with the advantage of the IRL technique, the long-term performance index was widely adopted to evaluate the control performance of control systems, see for example, in [25]- [28]. Along with the IRL technique, adaptive controllers were designed in [29] for tracking control of second-order, square multivariable dynamics with UCCs, where a long-term performance index was adopted and the Nussbaum-type function was used to deal with the problem of UCCs. Nevertheless, the results of [29] focused on the tracking control problem of second-order, square multi-input multi-output (MIMO) systems. However, extending the current results in [29] to the case of high-order, nonsquare multivariable systems subject to UCCs by employing the IRL technique is still an unsolved and challenging problem.
Based on the analysis of the previous literature, the aim of this work is to present an IRL controller for a class of high-order multivariable nonlinear systems with UCCs. The long-term performance index will be first presented, and then the critic and action NNs are designed to estimate the unobtainable long-term performance index and the unknown drift of systems, respectively. Combined with Nussbaum-type functions, we can design IRL controllers for high-order, nonsquare multivariable systems to cope with the problem of UCCs. It is shown that all signals of the corresponding closed-loop systems are semiglobally UUB. The contributions of this paper are presented as follows.
1) This study first proposes a new long-term performance index. Different from the existing long-term performance index in [29], [30], where a small threshold should be designed, in this paper, the small threshold is not necessary. 2) Compared with the existing result [29], where the IRL controller is designed for tracking control of second-order, square MIMO dynamics, in this paper, the obtained results are generalized to high-order, nonsquare multivariable systems with UCCs.
We organize the rest of this study in the following. Preliminaries and the problem of this study is presented in Section II, while Section III mainly presents the IRL control laws for high-order, nonsquare multivariable systems with UCCs. One example is provided to show the effectiveness of proposed results in Section IV, and Section V gives the conclusion.

II. PRELIMINARIES AND PROBLEM FORMULATION
Notation: The notation I denotes an identity matrix with compatible dimensions. L ∞ denotes the bounded signals. For a vector a, a is the Euclidean norm. A F denotes the Frobenius norm for matrix A. λ min (·) and λ max (·) denote the minimum and maximum eigenvalues.

A. PRELIMINARIES
Definition 1 [6]: (1) Lemma 1 [31]: Suppose V (·) and k(·) are smooth functions over [0, t f ) satisfying V (t) ≥ 0 ∀t ∈ [0, t f ), N (·) is the Nussbaum-type function, and φ is some nonzero constant. If we have We will present some basic knowledge on radial basis function neural network (RBFNN), and by which we can approximate any continuous nonlinear functions. In view of [32], we can use the RBFNN to approximate a continuous . . , l being the Gaussian function as whereμ i = [µ i1 , µ i2 , . . . , µ il ] T is the center of receptive field and η i is the width of the Gaussian function. It is known [33] that if l is sufficiently large, (3) can approximate any continuous nonlinear functions on x ⊂ R q with any accuracy where the ideal weight is W * and is the approximation error. Furthermore, there exist ideal unknown, constant weights W * such that | | ≤ * with constant * > 0 for all x ∈ x . Furthermore, letŴ be the estimate of W * , then the weight estimation error can be defined asW =Ŵ − W * . In order to derive main results of this study, the following lemmas are needed: Lemma 2 [29]: For any two matrices A = a ij m×n ∈ For any a ∈ R n and b ∈ R n , one has a T b F ≤ a b and Aa F ≤ A F a . Lemma 3 [34]: Let S ∈ R m×m be a matrix, and x ∈ R m is a nonzero vector. If we define κ = x T Sx x T x , then at least one eigenvalue of matrix S is in (−∞, κ] and at least one in [κ, +∞).

B. PROBLEM FORMULATION
Consider a continuous-time high-order nonlinear system with nth-order (n ≥ 1) as . . , n − 1 being the mth derivative of the system state. The smooth function f (x(t)) ∈ R k represents an unknown nonlinear drift, g(x(t)) ∈ R k×l (k ≤ l) is the high-frequency gain, d(t) ∈ R k is the unknown, bounded disturbance, and u(t) ∈ R l represents the control input.
The aim is to design IRL controllers for high-order nonlinear system (6) such that the stability is guaranteed, i.e., for m = 1, 2, . . . , n − 1. Furthermore, all the signals of the closed-loop system are semiglobally UUB.

III. MAIN RESULTS
In this section, we present IRL controllers for the high-order nonlinear system with UCCs. To facilitate the design procedure, we first introduce the following state variables, In what follows, the following assumption is needed. Assumption 1: The matrix g (x(t)) is nonsquare and partially unknown with where the matrix g 0 (x(t)) ∈ R k×l is known, bounded with full row rank, and the matrix g u (x(t)) ∈ R l×l is unknown.
is either positive or negative definite. According to Lemma 3 with Assumption 1, we have where λ min (t) ≤ ρ(t) ≤ λ max (t), λ min (t) and λ max (t) are respectively the minimum and maximum eigenvalues of matrix , and (t) is defined in (8). From Assumption 1 and the definition of (11), we know that the sign of ρ(t) is nonzero, constant but unknown.

1) CRITIC NN DESIGN
A long-term performance index is proposed as where p( (τ )) = [p( 1 (τ )), p( 2 (τ )), . . . , p( k (τ ))] T with p( i (τ )) = tanh( 2 i (τ )). To construct the Bellman error for system (6), we define with where p sc represents the value cost on [t − T , t). It is known that p sc = p sc 1 , p sc 2 , . . . , , which implies p sc ≤ d p sc with a positive constant d p sc . Furthermore, it is seen from (12) that J s (t) contains future dynamical information. Then, the RBFNN is adopted to approximate it, i.e., J s (t) = W * T sc sc (x sc (t)) + sc (x sc (t)), ∀x sc ∈ x sc , (15) where the bounded ideal weight is W * sc satisfying W * sc F ≤ d W sc , the RBF vector sc (x sc (t)) is bounded with sc (x sc (t)) ≤ d sc , and the approximation error sc (x sc (t)) is bounded with sc (x sc (t)) < d sc . In general, the weight W * sc is unknown. Therefore, we have to estimate J s (t) bŷ and J s (t − T ) is estimated bŷ where In what follows, we design the updated law of weightŴ sc . A difference error of the long-term performance index is defined as where Then,Ŵ sc is updated bẏ where sc > 0 and δ sc > 0 is to be prescribed. The first term of (19) is to minimize the critic error e sc , and the second term is a modification term to make (19) robust to the disturbances [35]. VOLUME 8, 2020

2) ACTION NN DESIGN
To design the action NN, the RBFNN is first established to approximate the unknown function f (x(t)), i.e., f (x(t)) = W * T sa sa (x sa (t)) + sa (t), ∀x sa ∈ x sa , (20) where the ideal weight is W * sa satisfying W * sa F ≤ d W sa , the RBF vector sa (x sa (t)) satisfies sa (x sa (t)) ≤ d sa , and sa (t) with sa (x sa (t)) < d sa is the approximation error. Since the weight W * sa is unknown, we must estimate where x sa (t) =x(t). Next, we design the updated law of weightŴ sa . The aim of designing the action NN is to let (t) and J s (t) approach to zero, which implies the state x(t) and its derivatives x (m) (t), m = 1, 2, . . . , n − 1 converge to zero. In this way, we define the action error as Then,Ŵ sa is undated bẏ where sa > 0 and δ sa > 0 is to be designed. DefineW sa = W sa − W * sa . Similarly, the first term of (23) is to minimize the action error e sa , and the second term is a modification term to make (23) robust to the disturbances [35].
Based on the above critic and action NNs, the IRL controller for nonsquare multivariable systems with UCCs can be proposed as with and where λ t = 1 + t 2 , and tanh(·) is the hyperbolic tangent function. Then, the stability results for the controller (24) are summarized as follows: Theorem 1: Consider a continuous-time nonsquare multivariable system with UCCs given in (6) satisfying Assumption 1. The IRL controller (24), (25) and (26) with the critic NN (16) and (19), and the action NN (21) and (23) can achieve the objective (7) if the designed parameters are properly chosen, i.e., Furthermore, all signals of the closed-loop system are semiglobally UUB.
Proof: Considering the following positive definite function and D > 0 is the unknown upper bound of the two-norm of d(t) . Furthermore, the elements of vector (t) are defined as i (t) for i = 1, 2, . . . , k.
. By employing (24), (25) and (26) with the critic NN (16) and (19), and the action NN (21) and (23), we havė where we have used |y| − y tanh (yλ t ) ≤ 0.2785/λ t . Furthermore, we have from where d V 2 = d sa d sc d W sc − δ sa d W sa . In addition, we have from V 3 (t) = 1 2 tr W T sc (t) −1 scW sc (t) thaṫ where where we have used δ sa ≥ d 2 sa and δ sc ≥ d 2 sc given in (27). Since V s (t) and χ(t) are smooth functions on 0, t f with V s (t) ≥ 0, using Lemma 1 we have [ρ(t)N (χ (τ )) + 1]χ (τ )dτ (33) are semiglobally UUB on 0, t f . If the designed parameters of the closed-loop system are properly chosen, i.e., the inequality (27), there exists a compact set being the domain of attraction. Note that the compact set can be made to include any initial conditions by designing parameters. Therefore, the signals of the closed-loop system are semiglobally UUB.
Remark 1: Compared with existing results [29], where the IRL controller is designed for tracking control of secondorder, square multivariable dynamics, in this paper, the proposed controllers are developed for high-order, nonsquare multivariable systems.
In what follows, we will consider the system (6) in a special case, namely, the g (x(t)) is assumed to be a square matrix. In this case, the following assumption is needed.
Assumption 2: The matrix g (x(t)) + g T (x(t)) is either positive or negative definite.
According to Lemma 3 with Assumption 2, we define 1 2 where λ min (t) ≤ ρ(t) ≤ λ max (t), λ min (t) and λ max (t) are respectively the minimum and maximum eigenvalues of matrix 1 2 g T (x(t)) + g(x(t)) , and (t) is defined in (8). From Assumption 2 and the definition of (36), we know that the sign of ρ(t) is nonzero, constant but unknown.
It is easy to see that the Assumption 2 can be regarded as a special case of Assumption 1. Therefore, following the similar design procedure of nonsquare multivariable systems, the IRL controller for the square multivariable system with UCCs can be proposed as with (25) and (26) Then, the stability results are summarized as follows: Corollary 1: Consider a continuous-time square multivariable system with UCCs given in (6) satisfying Assumption 2. The IRL controller (37), (25) and (26) with the critic NN (16) and (19), and the action NN (21) and (23) can achieve the objective (7) if the designed parameters are properly chosen, i.e., the conditions (27) are satisfied. Furthermore, all signals of the closed-loop system are semiglobally UUB.
Proof: It is seen that the Assumption 2 can be regarded as a special case of Assumption 1. Therefore, the result of this corollary is a direct consequence of Theorem 1.

IV. SIMULATION EXAMPLES
In this section, one example is adopted to illustrate the effectiveness of the proposed controllers. We consider a system with second-order dynamics (k = 2) described by (6), where g (x(t)) = g 0 (x(t)) g u (x(t)) , in which g 0 (x(t)) = 1 2 0 −2 0 1 is the known, bounded matrix with full row rank, and is the unknown matrix. It is seen that g (x(t)) ∈ R 2×3 is a nonsquare and partially unknown matrix. Let d(t) = [sin(t), sin(2t) + 1] T . Therefore, this is a nonsquare multivariable system satisfying all the imposed conditions in Section III. The initial condition  4,4], and widths η l = 4, l = 1, 2, . . . , 21. Furthermore, to satisfy the conditions in Theorem 1, let the parameters δ sa = δ sc = 2. The Nussbaum-type function is N (χ(t)) = χ(t) 2 cos(χ(t)), the initial states of χ (t) andŴ sa (t) are zero, the initial statê W sc (0) = I 21×2 , and the parameters sa = sc = 0.3I 21 . The simulation results are given in Fig. 1-4, where it is shown that the objective (7) can be achieved. Moreover, the signals of the closed-loop system are all bounded.

V. CONCLUSION
In this paper, we have developed an IRL controller for a class of high-order multivariable nonlinear systems with UCCs. A new long-term performance index is first proposed to estimate the control performance, where a critic NN is designed since the long-term performance index contains unknown future states. Then, the action NN is prescribed to approximate the unknown drift of systems. By designing the critic and action NNs with Nussbaum-type functions, the IRL controllers of high-order, nonsquare multivariable systems can solve the problem of UCCs. It is proven rigorously that the signals of the closed-loop systems are semiglobally UUB. Finally, one example has been employed to illustrate the effectiveness of the proposed IRL controllers. Future work may focus on input saturation, sampled-data and time-delay with dynamics subject to UCCs.