Integral Reinforcement Learning for Tracking in a Class of Partially Unknown Linear Systems with Output Constraints and external disturbances

In this paper, the H∞ tracking control problem of partially unknown linear systems with output constraints and disturbance is studied by the reinforcement learning (RL) method. Firstly, an augmented system is established based on the reference trajectory dynamics and target system dynamics, and a special cost function is established to realize asymptotic tracking. In addition, the barrier function (BF) is used to transform the augmented system, and the output constraints is realized simultaneously by minimizing the quadratic cost function of the transformed system. Using only the obtained data and part of the system dynamics, the optimal control strategy and the worst disturbance strategy are obtained by using the integral reinforcement learning (IRL). Rigorous stability analysis shows that the proposed method can make the trajectory of system states converge, and the output of the control strategy can make the tracking error asymptotically stable. Finally, a simulation example is conducted to verify the effectiveness of the proposed algorithm.


I. INTRODUCTION
Due to the constraints in practical application, output constraints exist widely in the controlled system, such as the rotation angle of robot arm [1], [2], the driving speed of autonomous vehicle [3]- [5], etc. When designing controllers for such controlled systems, output constraints can be a great hindrance. On the other hand, the unknown dynamics of the system and the influence of external disturbances are also the factors that must be considered when designing such controllers. Modern control theory such as H ∞ control method and integral reinforcement learning (IRL) method have received considerable attention in solving the problems of unknown system dynamics and external disturbances [6]- [10]. However, these methods can not satisfy the condition of output constraints when solving the above problems. So it is still a challenging problem to design controllers for partially unknown linear systems with output constraints and external disturbances. In this paper, a new adaptive control method is proposed to solve the H ∞ tracking control problem of partially unknown linear systems under the condition of satisfying the output constraints.
For the optimal control problem, it usually depends on solving a complex Hamilton-Jacobi-Bellman (HJB) equation, which is a very difficult problem to solve with the traditional mathematical tools. In the past few decades, reinforcement learning (RL) [11]- [14] was also known as adaptive dynamic programming (ADP) or approximate dynamic programming. The advantage of the adaptive dynamic programming is that the neural network (NN) can be used to approximate the optimal cost function in the optimal regulation problem, so it is widely used to solve the optimal control problem [15]- [17]. The concept of adaptive dynamic programming was first proposed by Werbos in 1977 [18]. Murray et al developed an adaptive dynamic programming algorithm for optimal control of continuous time affine nonlinear systems [19], and gave a complete proof of its main theorem in [20]. Lewis et al. [21] proposed a synchronous policy iterative algorithm based on an actor-critic network to solve the optimal control solution of the nonlinear system with known dynamics, and gave a proof of convergence. These methods require completely knowable system dynamics and do not take into account the influence of external disturbances. On the basis of [21], [22] proposed an online adaptive control algorithm [23] based on policy iteration (PI) to solve the continuous time two-person zero-sum game with infinite horizon cost for nonlinear systems with external disturbances. In [24], a non-strategic reinforcement learning method was used to solve the H ∞ tracking control problem for completely unknown continuous time systems. An integral reinforcement learning method based on value iteration (VI) was proposed to design H ∞ controllers for continuous time nonlinear systems [25]. An online model-free integral reinforcement learning algorithm based on neural network was proposed to solve the H ∞ optimal tracking control problem with finite horizon for completely unknown nonlinear continuous systems, in which the disturbance and constrained control input [26] were considered [27]. Adaptive output feedback neural tracking control for a class of uncertain switched multiple input multiple output nonlinear systems with non-strict feedback delays is studied in [28]. However, in the case of output constraints, the existing methods to solve external disturbances and unknown system dynamics often fail to get the desired results.
In order to solve the output constraint problem, Tee et al. [29] proposed an barrier Lyapunov function (BLF) by combining Lyapunov analysis with barrier function. Based on the results of Tee, Ren et al. [30] proved that the boundedness of BLF is safe for adaptive neural control of a class of output feedback nonlinear systems with unknown dynamics. In [31], the output constraint adaptive control problem in nonlinear stochastic systems was considered, and the influence of output constraints on control performance was overcomed. In [32], the barrier Lyapunov function design was extended to pure feedback systems with full-state constraints. For a class of nonlinear state constrained time-varying delay systems with unknown control coefficients, Li et al. [33] proposed an adaptive tracking control method. The adaptive control problem for a class of stochastic nonlinear systems with unknown control gain and complete state constraints is studied in [34]. Yang et al. [35], [36] solved the zero-sum and non-zero-sum game problem based on the barrier function, transforming the punishment for violating state constraints into the change of system state.
In this paper, a novel integral reinforcement learning method is proposed to solve the H ∞ tracking control problem of partially unknown continuous time linear systems with output constraints and external disturbances. The main contributions of this paper are as follows: • In this paper, a H ∞ tracking controller satisfying the output constraints is designed under the condition of unknown dynamics and external disturbances. The stability of the transformed system can be expressed as satisfying the output constraints of the original system by using the barrier function transformation. • A new integral reinforcement learning method is designed to obtain the solution of H ∞ tracking control problem online. The proposed algorithm only uses the obtained data and part of the system information, and the system can be partially unknown. • It is proved that the proposed method can make the original system satisfy the output constraints under the condition of stable transformation system, and the output of the control strategy can make the tracking error asymptotically stable. The rest of this paper is organized as follows. The linear tracking control with state constraints problem formulation are given in section 2. In section 3, the barrier transformation and traditional policy iteration algorithm are considered. In the next section, a Integral RL method is proposed to obtain the optimal solution. In section 5, a numerical example is then presented to show the effectiveness of the proposed method. Finally, the conclusion of this paper is given.

II. LINEAR TRACKING CONTROL PROBLEM WITH STATE CONSTRAINTS
Considering the following linear continuous-time system, where x ∈ R n is the system state, u ∈ R m ⊂ U is the control input, d ∈ R m is the external disturbance term, f ∈ R n×n gives the drift dynamics of the system, g ∈ R n×m and k ∈ R n×m , C ∈ R p×n is the output matrix, y ∈ R p×1 is the system output. U denotes the set of all admissible inputs. Define that every element in C is not less than zero. It is also assumed that the system (1) is stabilizable. Assumption 1: The linear continuous-time system satifies the state constraints expressed as, where a i < 0, A i > 0 are the lower and upper boundaries of the system states, a x = [a 1 ; · · · ; a n ] and A Based on the Assumption 1, we define the output constraint vectors as follows a y = Ca x = [a y1 ; · · · ; a yp ], The output constraints can be expressed as, (3) Assumption 2: The reference output trajectory is defined aṡ y d = F y d which does not approach to zero as time goes to infinity, such as unit step, sinusoidal waveforms, etc., and the reference output trajectory y d satisfies the output constraints (3). In order to realize tracking control, we first establish an augmented system according to the system (1) and reference output trajectory y d . The augmented system state is defined as Based on the equation (1) and the reference output trajectoryẏ d , we can definė Based on the state constraints (2) and the output constraints (3), the state constraints of the augmented system (5) can be defined as a ζ = [a 1 ; · · · ; a n ; a y1 ; · · · ; a yp ], = [a ζ1 ; · · · ; a ζn ; a ζn+1 ; · · · ; a ζq ] Note that the desired reference output trajectory y d does not converge to zero as time goes to infinity. When the desired reference output trajectory is unstable and does not converge to zero, the feedback control will make the cost function of infinite horizon approach infinity [37]. According to Bellman's optimality principle, the cost function must be finite before the optimal feedback control strategy can be used to minimize it.
To relax the limit that the reference output trajectory must converge to zero, a discounted cost function is introduced as follows, where r(ζ, u) = ζ T C T 1 QC 1 ζ +u T Ru, Q > 0 and R > 0 are symmetric matrices, C 1 = [C − I], β > 0 is the discount factor, γ > 0 represents a bound on L2 gain required to move from disturbance d to the cost function, that is Based on the Assumption 1-2, the goal of the H ∞ tracking control problem with output constraints is to find an optimal control strategy u * such that, the system (5) has L2 gain less than or equal γ, the output satifites the constraints (3) and the tracking error asymptotically stable. It can be described mathematically as s.t. y j ∈ (a yj , A yj ), j = 1 · · · p, and y → y d , as t increase.
Unlike the previous studies, the output constraints (3) brings great difficulty to solve the optimal control strategy. This is because the proposed discounted cost function is only affected by the state of the system (1) and the output reference trajectory. In the next section, we will propose a barrier transformation approach to satisfy the output constraints in (3).
Remark 1: Each element of the output matrix C is predefined so that the output constraints can be defined by the state constraints. At the same time, the constrains (3) can be satisfied by constraining the system states.

III. PROBLEM TRANSFORMATION AND TRADITIONAL POLICY ITERATION ALGORITHM
In this section, the barrier function is used to transform the system (5) with the output constraints into a transformen system without the output constraints, that is, the H ∞ tracking control problem with the output constraints is transformed into a H ∞ tracking control problem without the output constraints. Before moving on, the following definition of the barrier function is introduced.
where a and A are two constants satisfying a < 0 < A. Moreover, the inverse of the barrier function is as follows with the derivative by, Remark 2: To satisfy the output constraints, the barrier function in Definition 1 should have the following characteristics: 1) The barrier function b(·) is a finite value within the state range of the constraints (a, A); 2) As the state approaches the constraint (a, A), b(·) approaches infinity, i.e., lim 3) The barrier function b(·) converges as the state converges.
The discounted cost function of the transformed system is defined as Based on the transformation of equation (14)(16), the H ∞ tracking control problem with output constraints has transformed into a H ∞ tracking control problem without output constraints. In other words, the goal of the H ∞ tracking control problem with output constraints becomes to find an optimal control law u * such that, the system (18) has L2 gain less than or equal γ, i.e., Remark 3: Because the barrier function will approach infinity at the safety constraints boundary, the reference output trajectory y d must strictly satisfy the output constraint (3), otherwise the transformation system states will tend to infinity during the process of tracking the reference trajectory.

B. POLICY ITERATION ALGORITHM BASED ON SYSTEM DYNAMICS
Define the Hamiltonian for the discounted cost function (19) as

s)(T (s) +Ḡ(s)u +K(s)d).
(21) where J s (s) is the partial derivative of J(s) with respect to s.
The HJB equation associated with the Hamiltonian (21) is as follows Then, we can get Lemma 1: Under Assumptions 1, 2 and 3, if the optimal control strategy (24) and the worst disturbance (25) can solve the H ∞ tracking control problem of the transformed system (18), then: (1) Reasonable selection of the discount factor β can ensure that the tracking error is asymptotically stable.
(2) The system states (1) satisfies the constraint (2) provided that the initial state x 0 of the system (1) satisfies the constraint (2).
(3) The L 2 gain condition (20) can be guaranteed, if the performance output is designed as s T C T 1 QC 1 s + u T Ru. Proof: (1) Differentiating the cost function (19) along the trajectories of the transformed system, we can get In order to make the tracking error asymptotically stable, we define the discount factor β ≤ , then we can geṫ (27) Therefore, the tracking error is locally asymptotically stable.
As long as the initial state x 0 satisfies the constraint (2) and the reference output satisfies (3), it can be concluded that the initial cost function J(s(0)) is finite, thus making the cost function J(s(t)) finite. Therefore, according to the Remark 2, we can infer that Therefore, the constraints (2) can be satisfied.
(3) Considering the system transformation (14) and the constraints (6), (7), each element of transformation system state s = [b 1 (ζ 1 ); · · · ; b q (ζ q )] is finite. Note that the optimal control input, the worst disturbance and the optimal cost function satisfy the HJB equation (22). Then, as long as the perference output is designed as This proof is completed.
Algorithm 1 Policy iteration based on system dynamics. Initialization: Start with an admissible control policy u 0 . Procedure: 1. Given u i , solve the cost function J(s) using update the control strategy using In the traditional model-based policy iteration algorithm 1, all the system dynamics, such as the original dynamic matrixT (s),Ḡ(s) andK(s) are essential. In practice, the unpredictability of system dynamics will make the traditional policy iterative method ineffective. In order to meet the strict requirements of the system information, the integral RL technology is applied to the tracking control design, so that the tracking control strategy can be obtained when part of the system dynamics is unknown. Remark 4: The system state of the transformation system (18) is defined by the system (1) and the barrier function. The barrier function is already defined by the equation (11), so the partially unknown linear systems (1) means that part of the transformation system is unknown.

IV. INTEGRAL RL FOR TRANSFORMED SYSTEM AND STABILITY ANALYSIS
Based on the transformation system dynamics and the traditional model-based policy iterative algorithm, the integral RL tracking control algorithm is designed for the system with partially unknown dynamics, and the tracking error can guarantee asymptotically stable under the condition of output constraints.

A. INTEGRAL RL FOR PARTIALLY UNKNOWN DYNAMICS
Based on the optimal control theory, the discounted cost function (19) can be rewritten by the positive definite quadratic function such that where P is the positive definite matrice. For time interval △t > 0, the cost function (19) satisfies Substituting equation (34) for (35), we can get Based on the equation (34) and (35), we use the integral reinforcement learning method to solve the H ∞ tracking control problem with the output constraints.
Algorithm 2 Integral RL Based policy iteration for tracking problem with output constraints. Initialization: Start with an admissible control policy u 0 . Procedure: 1. Given u i , solve the cost function J(s) using 2. Update the disturbance using update the control strategy using if ∥ P i+1 − P i ∥≤ ϵ, ϵ is a select positive number. The learning is finished and stop the iteration solution else i = i + 1, go to step 1 end End Procedure VOLUME 4, 2016 Theorem 1: Consider the transformation system (18), the Hamiltonian equation (21), the control input (39), and the disturbance input (38). Assume that the Assumptions 1, 2 and 3 hold. The iterative control strategy (39) obtained from (21) can minimize the right side of (37). The iterative disturbance strategy (38) obtained from (22) can maximize the right side of (37 If the time interval △t is small enough, then the higherorder infinitesimal term o(△t) can be ignored. we can get (42) e −β△t J i (s(t + △t)) = e β△t J i (s(t)) + e β△t J is (s(t))(T (s) +Ḡ(s)u * +K(s)d * ) △ t.
The iterative control policy u i satisfies (44) Since the iterative cost function J i (s) is not affected by the control strategy, we can get Based on (34), (42), (43), it yields (46) In the same way, we can get The proof is completed. Based on the integral RL method, an integral RL controller is proposed for the H ∞ tracking control system with output constraints. In algorithm 2, only part of the system information is used in the iterative process. According to lemma 1 and theorem 1, the integral RL control algorithm proposed in algorithm 2 can make the tracking error locally asymptotically stable under the condition that the trajectory of the system converges. Remark 5: Compared with the existing optimal tracking control standard solutions, the proposed method provides some advantages for solving partially unknown linear systems, which are reflected in the following aspects.
(1) In the existing strategy iteration algorithm, all the system information is repeatedly used and transformed in the solving process. The proposed algorithm only uses the obtained data and part of the system information, and the system can be partially unknown.
(2) By combining the tracking control problem with the barrier function, the tracking system and the tracking error are locally asymptotically stable under the condition of output constraints.

V. SIMULATION RESULTS
In this section, a linear example is presented to prove the validity of the proposed algorithm.
Consider the following linear systeṁ where, Assume that the desired output trajectory is generated by the command generator systemẏ d = 0 with the initial value y d (0) = 0.3. Define (19) as the performance function. One selects Q = 3, R = 1 and the discount factor γ = 1.5. The output constraints is defined as y ∈ (−0.5, 0.8).
Assuming that the system drift dynamics in algorithm 2 is unknown, the data obtained from the transformed system every 0.05 seconds is used for simulation. At the same time, we make a comparison with the method in [14], which proves that the proposed method is effective for the output constraints and the tracking control problems. Figure 1 shows the trajectory of the system output following the reference output under output constraints. Figure 2 describes the tracking control trajectory without output constraints based on integral reinforcement learning algorithm. Comparing the results of the two figures, it can be clearly seen that the proposed method can complete the tracking under the condition of ensuring the output constraints. Figure  3 shows the tracking trajectory of the transformation system, where the reference output trajectory s 3 = 0.940 is obtained by equation (14), which also verifies the second part of Theorem 1.  Figure 4 shows the parameter changes of matrix P in the process of iteration. The trajectory of the control strategy and disturbance strategy are shown in Figure 5. Figure 6 shows the tracking error, which obviously eventually converges to zero.

VI. CONCLUSION
In this paper, we studied the H ∞ tracking control problem for partially unknown linear systems with output constraints and disturbances. The asymptotic tracking and output constraint of the system were realized by building an augmented system and a reasonable system transformation. Integral reinforcement learning was used to obtain the optimal control strategy and the worst disturbance strategy online. It is proved that the proposed method can minimize the performance of the system under the influence of output constraints and disturbances. The numerical simulation example also demonstrated the effectiveness of the method.