Adaptive Event-Triggered Near-Optimal Tracking Control for Unknown Continuous-Time Nonlinear Systems

This paper studies the event-triggered optimal tracking control (ETOTC) problem of continuous-time (CT) unknown nonlinear systems. In order to solve the ETOTC problem, an augmented system composed of the error system dynamics and the reference dynamics is used to introduce a new discounted performance index function (DPIF). A novel event-triggered (ET) adaptive dynamic programming (ADP) method is developed to solve the ET Hamilton-Jacobi-Bellman equation (HJBE). The presented method is implemented via an identifier-critic architecture, which consists of two neural networks(NNs): an identifier NN is applied to estimate the unknown system dynamics, and a critic NN is constructed to obtain the approximate solution of the ET HJBE. The augmented closed-loop system and the critic estimation error are proved to be ultimately uniformly bounded (UUB) by the Lyapunov direct method. Finally, two simulations illustrate the effectiveness of the developed method.


I. INTRODUCTION
A DAPTIVE dynamic programming (ADP), a branch of the reinforcement learning, is widely concerned in solving optimal control problems in recent years [1]. On the basis of the neural network (NN) technology, the reinforcement learning and the dynamic programming (DP), ADP is an effective solution method in the Hamilton-Jacobi-Bellman equation (HJBE) of complicated nonlinear systems, and it can also overwhelm the "dimensionality curse" of the traditional DP [2]. Its essence is to use online or offline data to estimate the system performance index function (PIF) and obtain the approximate optimal control law according to the Bellman's optimality principle [3].
In the past decade, researches have proposed different ADP-based control methods to solve the problems of optimal control, trajectory tracking and robust stabilization in discrete-time (DT) or continuous-time (CT) systems [4]- [12]. As a human learning and decision-making process, ADP possesses several synonyms, such as approximate dynamic programming [13], adaptive critic designs [14], neurodynamic programming [15], relaxing dynamic programming [16], and reinforcement learning [17]. Based on different structures, ADP techniques can be classified into several categories, such as heuristic dynamic programming (HDP), dual HDP (DHP), globalized DHP (GDHP) and their action dependent form [18]- [22]. Based on different iteration methods, there are two major categories of ADP techniques, which are value iteration (VI) and policy iteration (PI) algorithms, respectively [23]- [29].
Most of the existing control methods are based on sampling with the period or triggering with the time. However, from the perspective of the resource allocation, the traditional methods have some disadvantages. For example, when the system works in a stable state, periodic sampling will cause the unnecessary waste of resources. For networked control systems, the periodic sampling can increase the computational cost as well as the communication burden [30]- [33]. In order to reduce the burden of computing and communication, researchers have proposed event-triggered control (ETC). The system signal sampling and the controller operation are driven by a specific event instead of the time instant. There are many types of the event which can be a variable of the VOLUME 4, 2016 system exceeding the limit value or a packet arriving at a node in the network control. Overall, the sampling period of the ETC system is time-varying, and the signals are sampled and transmitted according to the needs of the system [34]. Therefore, ETC is an effective control method for reducing computational costs and has a wide range of applications in areas such as the network system control and the complex system tracking [35]- [37]. ETC mainly discusses the introduction of the event-triggered (ET) mechanism, the stability analysis and the design method of ET condition, etc. Eqtami et al. [38] proposed an ETC method for DT systems, which uses the input-to-state stable technique to extend the ET conditions of CT systems to DT systems with guaranteed system stability. Wang and Lemmon [39] developed an ET mechanism for CT nonlinear systems and gave the minimum time interval between two adjacent triggered event. Li and Xu [40] designed a trigger condition that can guarantee the asymptotic stability of the CT nonlinear system according to Lyapunov's stability theorem.
Nowadays, more attention is paid to ETC methods based on ADP with the development of ADP theory. Zhong and He [41] presented an ET ADP control algorithm with a state observer based on the input and output data. For nonlinear DT systems, Dong et al. [42] designed an a new ET condition basd ADP and analyzed the stability of the system. Using a single NN approximation structure, Wang et al. [43] dealt with the nonlinear optimal regulation problem in the framework of the adaptive critic NN based on the ET mechanism. The optimal control law can be derived by training the critic NN. The decentralized ETC strategy was designed in [44] for a class of nonlinear systems with uncertain cross-linked terms. It is shown in this paper that the decentralized ETC policy for the whole system can be represented by optimal ETC policies of auxiliary subsystems. Luo et al. [45] studied the problem of event-triggered optimal control (ETOC) for CT systems and provide formal performance guarantees by proving a predetermined upper bound. In [46], a critic NN was used to obtain an approximate solution to the ET HJB equation, and the weights of the critic NN were updated using gradient descent and empirical replay techniques.
In this paper, a novel ETOTC ADP method is developed for unknown CT affine nonlinear systems. First, an augmented system composed of the error system dynamics and the reference dynamics is used to introduce a new discounted performance index function (DPIF) for ETOTC. Second, a NN-based identifier is designed to obtain the unknown system dynamic. Third, for the identified augmented system, a particular ET adaptive implement method is developed without initial stable control law, and the closed loop identified augmented system and the critic weight estimation error are proved to be UUB. Different from the existing methods, the contributions are summarized as follows.
1) We extend the ADP-based ETC method to address the optimal tracking control problem. It makes the actual trajectory of the unknown system track its desired one.
2) We establish a new ET condition of the error system, which ensures the augmented closed loop system and the critic weight estimation error to be ultimately uniformly bounded (UUB).
3) We propose a novel weights update law of critic NN without requiring an initial stable control law. 4) We implement the ET ADP-based control method, which gives to the near-optimal control law as well as reduces computation cost substantially.
In this paper, Section II formulates the problem of ETOTC of CT nonlinear systems with out the steady-state control law. Section III develops the NN-based identifier and critic, and the ET learning rule of the critic weights is proposed with the stability analyses. Section IV shows the detailed simulations and analyses. Section V gives some conclusions.
Notations: We use R to represent the set of real numbers. R + is the set of nonnegative real numbers. R m is the Euclidean space of dimension m. R n×m is the set of n × m real matrices. N is the nonnegative integer set. 1 n is the column vector of n ones. · is the vector norm on the Euclidean space. λ max (·) and λ min (·) are the maximum and minimum eigenvalues of a matrix, separately. Ω a = {x ∈ R n : x ≤ a, a ∈ R + } is a compact subset of R n . A(Ω a ) is the admissible control set on Ω a . C 1 (Ω a : R + ) is the set of once differentiable functions with respect with its argument on Ω a . diag(·) is a diagonal matrix operator.

II. PROBLEM FORMULATION
Let us take the following CT time-invariant nonlinear system into accountẋ where x(t) ∈ R n and u(t) ∈ R m are n-dimensional state and m-dimensional control input. f (x(t)) ∈ R n and g(x(t)) ∈ R n×m are local Lipschitz continuous and differentiable on compact set Ω a with f (0) = 0 and h(0) = 0. The desired reference trajectory x d (t) satisfieṡ where f d (x d (t)) is local Lipschitz continuous and differentiable on compact set Ω a with f d (0) = 0. Then, the tracking error can be defined as and according to (1), the tracking error system iṡ Inspired by [51]- [53], we define the augmented systeṁ x a (t) = f a (x a (t)) + g a (x a (t))u(t) where ] T is the augmented internal dynamic, g a (x a ) = [g(e d (t)+x d (t)), 0] T is the augmented input dynamic function.
Then, we define the infinite-horizon DPIF of the augmented system as R ∈ R m×m > 0, and γ > 0 is the discount factor.
Remark 1. For the standard solution to the optimal tracking control problem, the desired reference satisfieṡ The premise of founding (8) is that the inverse of the input dynamic function g(x d (t)) exists, then the steady-state control law is acquired as However, g −1 (x d (t)) can not be solved exactly due to the rank deficient input dynamic function g(x d (t)). Thus, the standard solution to the optimal tracking control problem is invalid.
With regard to any integral time T > 0, the related performance index function (DPIF) satisfies For all x a (t) = x a , if V (x a ) ∈ C 1 (Ω a : R + ), we let T → 0 in (10), and obtain the time-triggered nonlinear Lyapunov equation as follows: The Hamiltonian of the augmented system (5) is defined as According to the well-known Bellman's principle of optimality, it follows from (6) that the optimal DPIF is H(x a , u(x a ), ∇V * (x a )) = 0 (14) and the time-triggered optimal tracking control law is On the basis of event-triggered mechanism, the triggered instants are a monotone increasing sequence {t j } ∞ j=0 determined by the event-triggering condition, where t 0 = 0 and t j < t j+1 , j ∈ N are the sampling instants. Then, the sampled augmented system state is where j ∈ N, and t j+1 − t j is called the execution interval.
The corresponding ETOTC is derived as where ∇V * (x aj ) = ∂V * (x a )/∂x a | xa=xaj and by leading into the zero-order hold (ZOH), µ(x aj ) becomes a continuous function with respect to x aj . Notice that the ETOTC (17) depends on the solution of the HJBE (14). Next, We define the error between the sampled augmented state x aj (t) and the real state x a (t) as According to the (5), (15) and (18), the closed augmented system iṡ Remark 2. With regard to the time-triggered optimal tracking control u * (x a ), the sampling time of the augmented system (5) is always triggering. Different from u * (x a ), the ETOCT (15) remains unchanged in the execution interval, i.e. sampled time x a (t j ) at instant t j is maintained until the new sampled augmented sate x a (t j+1 ) is transmitted into the controller at instant t j+1 , which significantly reduces the computational and communication burden.

III. ADAPTIVE-IDENTIFIER-CRITIC DESIGN BASED EVENT-TRIGGERED OPTIMAL TRACKING CONTROL
In this section, a NN-based identifier is first designed, and the identifier error and the identifier weight estimation error are UUB, which is proved in Theorem 1. Then, for the identified augmented system, a particular ET adaptive implement method is developed without initial stable control law, and the closed loop identified augmented system and critic weight estimation error are proved to be UUB in Theorem 2.

A. NN-BASED IDENTIFIER DESIGN
The rewritten system dynamic can be expressed as followṡ where where A f and A g are prior known positive definite matrices. On the compact set Ωx, the activation functions σ f (x) and σ g (x) need to satisfy Then, the reconstruct system dynamic iṡ The error dynamic of identifier iṡ Theorem 1. For affine unkown nonlinear system (19), if the weight of NN-based identifier (23) has the update ruleṡ where P 0 , P f and P g are the predefined positive definite matrices. Then, the identification errorx(t) converges to the following compact set where K 0 > 0 is the constant parameter which can be determined in the following proof. Besides that, the weight estimation errorsW f andW g of the NN-based identifier are assured of the UUB.
Proof. Choose the Lyapunov function candidate as follows: Since A h is a Hurwitz matrix, there exists a positive-definite symmetric matrix P 0 satisfy Lyapunov's equation The time derivative of (28) can be expressed aṡ Observing tr(A 1 A 2 ) = tr(A 2 A 1 ) = A 2 A 1 for A 1 ∈ R n×1 and A 2 ∈ R 1×n . Furthermore, according to Cauchy-Schwarz inequality, (30) becomeṡ In order to guarantee that λ max ( the parameter matrices B and Q 0 need to be selected properly. Then,L(t) < 0 as long as the identifier errorx satisfies the following inequality: According to the Lyapunov's direct methods, the identifier errorx converges to Ωx and the weight estimation errorsW f andW g are UUB. This completes the proof.

B. ADAPTIVE-CRITIC DESIGN BASED EVENT-TRIGGERED OPTIMAL TRACKING CONTROL
In this part, we present an online implementation method based on NNs. We have the fact that the unknown nonlinear system (1) can be approximated by identifier NN with in a sufficiently small compact set Ωx. According to (23), the identified augmented system can rewritten as follows: and Then, we assume that the optimal value function V * (x a ) ∈ C 1 (Ω a ; R + ) exists. By the universal approximation property of feed-forward NNs [47], a critic NN is used to construct V * (x a ) on the compact set Ω a as where T is the activation function vector satisfying that σ ci ∈ C 1 (Ω a ; R) and σ ci (0) = 0, and l is the quantity of activation functions with lim l→∞ ε c (x a ) = 0. Generally, the activation functions are chosen to be linearly independent. However, the ideal critic NN weight vector W * c which offers the best approximation to the tracking HJBE is always unknown, then the optimal value functionV (x a ) can be approximated bŷ According to the rewritten augmented system (33), (36) and (37), the optimal event-triggered control law and approximate event-triggered control law are derived separately. wherex According to the approximate event-triggered control law (39), the identified closed-loop system iṡ The triggering condition of the identified nonlinear system is (42) Before implementing the classical adaptive-critic design approach, a special NN weight vector needs to be chosen in order to create an initial stable control law, and the critic NN weights can be trained by the classical approach. However, the special NN weight vector is difficult to choose, which may lead to the instability for the closed loop system. Inspired by [48]- [50], we adopt an auxiliary Lyapunov function to improve the learning criterion and employ it to tune the critic NN weights. Moreover, the following assumption is introduced.
Assumption 1. Consider that L a (x a ) ∈ C 1 (Ω a ; R + ) is an auxiliary differentiable Lyapunov function candidate, and L a (x) satisfies the following inequality.
In addition, let P a ∈ R n×n be a positive definite matrix, and the following condition is true on the compact set Ω a .
Remark 3. During the realization process, we can choose a polynomial, such asx T axa , to determine the auxiliary Lyapunov function L a (x a ), which guarantees the stability of the augmented state vectorx a and critic NN weightsŴ c .
Due to the fact that it is difficult to obtain the ideal critic NN weight vector W c , we design a online learning approach to approximate W c by employing the real-time data. Furthermore, V * c (x a ) and µ(x aj ) satisfy the event-triggered HJBE H(x a , µ(x aj ), ∇V * (x a )) = 0. Thus, we define the approximate error between H(x a , µ(x aj ), ∇V * (x a )) and H(x a ,μ(x aj ), ∇V (x a )), that is The critic NN weight vectorŴ c can be tuned by minimizing the square approximate errors E c (x a ,Ŵ c ) = (1/2)e T c e c . Furthermore, in order to avoidLμ a (x a ) = ∇ T L a (x a )(F a (x a ) + G a (x a )μ(x aj )) > 0, the auxiliary term is introduced into the tuning law forŴ c , which can be derived as follow.
where η 1 > 0 is a learning rate parameter, the term (1 + δ T (x a )δ(x a )) 2 is the normalization of the tuning process, and η 2 is a constant parameter which is designed to avoid the instability of the closed-loop system.
Moreover, according to the approximate event-triggered control (39), the weights tuning law (46) becomeṡ δμ(x a ) (1 + (δμ(x a )) T δμ(x a )) 2 e c (x a ,Ŵ c ) The critic NN weights tuning law (48) is contained with decreasing the HJBE approximate error and stabilizing the closed-loop system. Thus we can select initial critic NN weightsŴ c (0) arbitrarily without considering the initial stabilizing control law.

C. THEORETICAL ANALYSIS
In this part, we show the theoretical analysis of the ACD based ETOTC method. The critic error vector between W c andŴ c is defined asW c =Ŵ c − W c , and we further haveẆ c =Ẇ c . By introducing δ 1 = δμ(x a )/(1 + (δμ(x a )) T δμ(x a )), δ 2 = 1 + (δμ(x a )) T δμ(x a ), The critic error vector dynamic iṡ where e cH = ∇ T ε c (x a )(F a (x a ) + G a (x a )μ(x aj )) is the residual error of reconstruct HJBE, and is norm bounded, i.e. e cH ≤ e M [43], [54]. During the training process, the persistence of excitation (PE) condition ensures λ min (δ 1 δ T 1 ) > 0, which is necessary in the stability analysis [55], [56].
Before proceeding the stability analysis, some essential assumptions are employed to support the analysis.  (45), let Assumptions 2-3 hold. The event-triggered approximate optimal control law (39), the critic weight tuning rule (48) and the event-triggered condition (42) guarantee that the closed loop system state and the weight estimation error are UUB.
Proof. Choose the Lyapunov function candidate as follows Case 1: When t ∈ [t j , t j+1 ) for all j, there is no triggering in events. We can take the time derivative of (50) along with (19) and (49), the we havė For the first term of the right-hand side of (50), according to the Assumptions 2-3, we havė On the basis of the definition ofx aj (t) and (40),x aj (t) is unchanged on t ∈ [t j , t j+1 ). Thus, the second term of the right-hand side of (50) becomeṡ For the third term of the right-hand side of (50), the derivative of L a (x a ) is expressed aṡ According to (48), (49) and Young's inequality, the forth term of the right-hand side of (50) is Combining (52)- (55), and taking the optimal ET control law (38) into consideration, the overall time derivative of L 2 (t) iṡ where When the ET condition (42) satisfies, (56) becomesL where M 3 is formed as a positive constant According to (57),L 2 (t) < 0, providedê d lies out of the set orW c lies out of the set with the condition M 2 < 0.
Case 2: If the sate jumps at the ET time t = t j+1 , the time difference of the Lyapunov function candidate L 2 (t) is wherex a (t − j+1 ) = lim ε→0 −x a (t j+1 + ε). According to the fact thatx a and V * (x a ) are continuous in time interval [t j , t j+1 ), and L 2 (t) < 0 whenê d lies out of the set Ωê d and W c lies out of the set ΩW c , we can obtain where κ(·) is a class κ−function. Therefore, the Lyapunov function ∆L 2 (t) < 0 at all the ET time t j+1 . On the basis of the above two cases,L 2 (t) < 0 in the time interval [t j , t j+1 ), and ∆L 2 (t) < 0 at the each ET time t j+1 for all j ∈ N as long asê d lies out of the compact setΩê d andW c lies out of the set ΩW c . Therefore, the triggering condition (42) and the inequalities (59) and (59) ensure that the identified closed-loop system (41) and the critic weight error dynamic are UUB. This completes the proof. may be zero, which lead to the notorious Zeno behaviour. On the basis of the result in [43] and [57], the minimal intersampling time ∆t min of the ETOTC has the nonzero positive lower bound. Thus, the Zeno behaviour is avoided.

IV. SIMULATION STUDY
In this section, we provide two numerical examples to show the effectiveness of the adaptive-identifier-critic design based ETOTC method.

A. EXAMPLE 1
Consider a CT linear mass-spring-damper system [52] as where x 1 , x 2 , κ, l and m are the position, velocity damping, stiffness of the spring and mass; The initial value of [x 1 , x 2 ] T is set to [−1, 1] T ; u is the one dimensional control input; The true parameters are set as κ = 5N/m, l = 0.5N · s/m, and m = 1kg. Then, the desired reference trajectories are VOLUME 4, 2016 which are given by using the following command generator dynamics jh According to (64) and (67), we construct the augmented system indicated as below (68) and the augmented system state vector [e d1 , e d2 , x d1 , x d2 ] T is denoted as x a .
Due to the unavailable knowledges of the system, the identifier NNs are employed and the identifier gains are The activation functions σ f (x) and σ g (x) are The learning rate matrices of the identifier NNs are P f = 20 20 20 20 , P g = 10 10 10 10 .
During the identifying process, an excitation signal u = 70 sin(10t) cos(10t) is selected to guarantee that the identifier NNs weights converge to the real value. For the ETOTC problem, the parameters in the DPIF are chosen as γ = 0.1, Q = diag(10, 10), and R = 1. The DPIF from t = 0 is Then, the initial augmented system state vector is set as x a (0) = [−1, 1, −1, 1] T and the local critic NN is constructed to approximate the optimal value function which is defined aŝ For relaxing the condition of the initial stability controller, the auxiliary Lyapunov function is set as L a (x a ) = e 2 1d + e 2 2d . Moreover, for obtaining the satisfied simulation results, we set η 1 = 12, η 2 = 3, ξ 2 = 0.5, and G 2 aM L 2 σ cM + d 2 σ cM L 2 G aM = 0.8. During the training process, 0.3 exp(−0.1t) sin(10t) cos(10t) is selected as exploring noise to satisfy the PE condition.
The simulation results are presented in Figs. 1-6. The system identification errors converge to a small region of the equilibrium point in  From Fig. 2, the values of the initial critic weights are all set to zeros, which means that the initial stable control law is unnecessary. Fig. 3 displays the sampling interval under the ET mechanism, which indicates that our developed algorithm avoids the Zeno behaviour.
Using the trained critic NN, we can obtain the approximate ET control law and the closed loop tracking system which are shown in Fig. 4 and 5, separately. On the basis of the ET mechanism, a stairstep graph of the controller is shown in Fig. 4. In Fig. 5, the desired trajectories can be tracked well and the triangles is the ET instants. Fig. 6 presents the evolution of samples. The ET controller only uses 675 samples of the state while the time-triggered controller needs 9047 samples, which greatly saves bandwidth and computational resources (or particularly, 92.54%).

B. EXAMPLE 2
Consider a CT mass-spring-damper system with the nonlinearity k(x) = −x 3 in [52] as with the initial state [−1, 1], then the Corresponding augmented system is On account of the unavailable knowledges of the system, the identifier NNs are employed and the identifier gains are The activation functions σ f (x) and σ g (x) are The learning rate matrices of the identifier NNs are During the identifying process, an excitation signal u = 60 sin(10t) cos(10t) is selected to guarantee that the identifier NNs weights converge to the real value.
The simulation results are presented in Figs. 7-12. The system identification errors converge to a small region of the equilibrium point in  Fig. 8. The sampling interval under the ET mechanism is displayed in Fig. 9, which indicates that our developed algorithm avoids the Zeno behaviour.
Using the trained critic NN, we can obtain the approximate ET control law and the closed loop tracking system which are shown in Fig. 10 and 11, separately. On the basis of the ET mechanism, a stairstep graph of the controller is shown in Fig. 10. In Fig. 11, the desired trajectories can be tracked well and the triangles is the ET instants. Fig. 12 presents the evolution of samples. The ET controller only uses 412 samples of the state while the time-triggered controller needs 9679 samples, which greatly saves bandwidth and computational resources (or particularly, 95.74%).

V. CONCLUSION
In this paper, for the tracking control of the unknown continuous-time systems, an ET ADP-based method is devel-   oped. To handle the unknown systems, the identifier NNs are employed to learn the system dynamics. Then, the tracking control problem is transformed into the regulation problem of the identified augmented system. Only a critic NN is derived to reconstruct the DPIF and the actor is update by the ET condition. A novel online weight tuning law without requiring the initial admissible control is designed. Finally, the simulation results of the mass-spring-damper system show the effectiveness of the developed method. The deficiency of this method and the future works are discussed as follows.
(1) The unknown systems studied in this paper are widely concerned. In these researches, the identifier process and the critic design are independent each other. In out future work, we will develop a new online learning method which the identification and the control are carried out simultaneously.
(2) The actuator saturation is a common phenomenon in control systems, which reduces the system dynamic performance and induces the system instability. The research on the constrained systems will be one of our future work.
(3) The critic NN structure used in this paper is simple and has limited expression ability. In our future work, the deep learning method will be employed, which can decrease the fitting error effectively.