Event-Driven-Modular Adaptive Backstepping Optimal Control for Strict-Feedback Systems Through Zero-Sum Differential Games

This paper addresses the event-driven-modular optimal tracking control problem for nonlinear strict-feedback systems with external disturbances. Through the backstepping feedforward control, the optimal tracking problem is transformed into an equivalent optimal regulation problem of affine tracking error system. Subsequently, adaptive dynamic programming technique is introduced to generate the optimal feedback controller, and solve the optimization problem of two-player zero-sum differential game. A single critic neural network is constructed to evaluate the associated cost function online, where the novel weight updating law is derived based on the gradient-descent technique. The resulting event-triggered closed-loop system, modeled as an impulsive system, is proved to be asymptotically stable by Lyapunov theory. Finally, the reliability and effectiveness of the theoretical results is validated by numerical simulation examples.


I. INTRODUCTION
During the past decades, research on nonlinear strictfeedback systems have drawn considerable attention, i.e., hypersonic flight vehicle [1], inverted pendulum system [2], quadrotor [3], helicopter [4] ship autopilot [5], and robot manipulator [6], as the nonlinear general system satisfying certain geometric conditions can be transformed into strict-feedback form by the diffeomorphism theory. The celebrated backstepping recursive technology has been studied to accomplish the tracking control issue for strictfeedback systems [7]- [12]. Considering high-order nonlinear multiagent systems in semi-strict-feedback form, the neural network state observer is constructed for each follower and an adaptive consensus tracking control is studied via backstepping techniques [7]. A state-feedback control is constructed by backstepping, rational-exponent Lyapunov functions and Bernoulli inequality for nonlinear strict-feedback system with reduced design complexity [8]. The adaptive backstepping control is investigated for uncertain systems The associate editor coordinating the review of this manuscript and approving it for publication was Rajeeb Dey . subjected to input delay and disturbances. To compensate the input delay, pade approximation method is introduced to construct an auxiliary system [9]. Considering the magnetic levitation systems, nonlinear integral backstepping controllers are designed to generate magnetic flux to levitate a ferromagnetic object in the air [10]. For a special class of nonlinear strict-feedback systems performing the same operation, the echo state network and backstepping are combined to accomplish a iterative learning control scheme [11]. The backstepping control technique is present for a rotary inverted pendulum, only the system structure and state measurements available [12]. Despite of efforts, system optimality is not covered in the aforementioned research.
The optimal control can greatly promote social development and national economic construction; hence system optimality is a higher priority than the stability in practical engineering and some important results have been achieved recently. In the traditional optimal control, solving the Hamilton-Jacobi-Isaacs(HJI) equation is a intractable issue as a general closed-form analytical method has not yet been developed [13], [14]. To solve the HJI equation effectively, an adaptive optimal reinforcement learning control is VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ investigated for discrete-time systems based on backstepping and minimal learning parameter technique [15]. The ADPbased H ∞ tracking control is studied for discrete time-delay systems, deriving a data type Bellman equation [16]. The model-free state feedback tracking control is accomplished via approximate dynamic programming with the value iteration algorithm [17]. A data-driven trajectory tracking control is addressed for nonlinear discrete-time systems in presence of unknown dynamics [18]. In view of the strict feedback system, adaptive dynamic programming(ADP) is proposed to design the optimal control and minimize some predefined performance cost function. An optimal tracking control scheme, which is composed of backstepping-generated adaptive feedforward control and dynamic programming-based optimal feedback control, is proposed for continuous-time strict feedback systems [19]. An adaptive fuzzy decentralized optimal control is investigated for nonlinear large-scale systems subjected to the unknown nonlinear functions and unmeasured states [20]. For multi-agent systems in strictfeedback form with a fixed directed graph, the commandfiltered backstepping and adaptive dynamic programming technique are introduced to investigate the distributed fuzzy optimal tracking control [21]. However, the ADP control schemes [19]- [21] mentioned above are restricted to the assumptions that the system is not affected by external disturbances.
As a main branch of game theory, differential game theory introduces differential equations to address the issue of dynamic conflicts, competition or cooperation for multiple objects [22], [23]. In the zero-sum (ZS) differential game, the control policies are constructed to minimize a predefined cost function corresponding to the worst-case disturbance, and to obtain the saddle-point equilibrium solution. Exactly, the problem of attenuating the effect of external disturbances for nonlinear strict-feedback systems is a two-player zerosum differential game problem, where the control input is designed to minimize the performance cost function, and the disturbance policy is generated to maximize the performance cost function. The thought is succeeded for intercepting highspeed manoeuvring targets of interceptor missiles with output and input constraints [24], and stimulate the choice of optimal ADP zero-sum control strategy for the tracking error dynamics in nonlinear strict-feedback systems.
To reduce the computational burden and/or interactive information in plant-controller communication networks, developing an event-triggering control(ETC) is crucial scheme to decrease the power consumption of the actuators batteries and slowed down the actuators abrasion simultaneously [25]- [27]. In event-sampling control systems, the control input is updated only at a sequence of time instants, which are determined by the significant event-triggered condition. In general, the aperiodic updates or transmissions depend on the current state of the plant, which is more efficient than the periodic time-triggered execution. Referring to backstepping method in nonlinear strict-feedback systems, the adaptive event-triggered control is concerned by exploiting the event-sampled neural network and backstepping method [28]. An adaptive backstepping control is proposed for parametric strict-feedback systems, where the parameter estimator and controller are updated at the event-triggered instants [29].The adaptive fuzzy control based on an event-triggered mechanism is investigated for nonlinear strict-feedback systems in the presence of virtual control coefficients and actuator failures [30]. For event-triggered optimal control, the optimization problem of zero-sum differential game is solved in the framework of event triggering and adaptive dynamic programming algorithm [31]. The zero-sum game problem is addressed for partially unknown drift system, and an eventtriggered adaptive dynamic programming method is developed [32]. To the best of our knowledge, there are few reports on event-triggered control of adaptive backstepping optimal control. In strict-feedback systems, the topic of optimization performance and energy-saving property for the control system is interesting and challenging, which is the motivation of the paper.
Inspirited by the existing research, an adaptive eventsampling algorithm is devised and a two-player ZS differential game is formulated for strict-feedback systems in the presence of external disturbances. The whole controller in the framework of ET mechanism consists of a feedforward tracking controller by backstepping and an optimal feedback control by ADP algorithm. The main novelty of the paper can be extracted as: (I)A strict-feedback system in presence of external disturbances is converted into an equivalent two-player zero-sum differential game by backstepping. For the resulting affine tracking error system, the optimal feedback control is fulfilled by ADP technique. Compared to the research in [28]- [30], the proposed adaptive control can ensure the stability as well as the the optimality of the closed-loop system. (II)An event-triggered condition is specified to drive signal sampling and controller execution for two-player ZS games, which constitutes both the event-triggered control and time-triggered disturbance. In contrast to work [24], eventtriggered control can achieve less communication resources and/or computational burden effectively.
The proble formulation is presented in Section II. The feedforward tracking controller is proposed In Section III. Section IV formulates the optimal feedback controller, including the implementation of single adaptive critic neural network and event-triggered generator. Rigorous mathematical stability analysis of closed-loop system is accomplished in Section V. Two simulation results are provided to illustrate the effectiveness of the proposed control scheme in Section VI. Finally, the conclusion in Section VII summarizes the paper.

II. PROBLEM STATEMENT
The nonlinear strict-feedback system dynamic in presence of external disturbance is described as: 126512 VOLUME 8, 2020 where x(t) = [x 1 (t), . . . , x i (t), . . . , x n (t)] T ∈ R n , u(t) ∈ R, y(t) ∈ R denote system state vector, control input and available system output, respectively;x i (t) = [x 1 (t), · · · , x i (t)] ∈ R i is partial state vector; f i (x i (t)) ∈ R is the continuous function and f i (0) = 0;g i is known constant with the common assumption that g i > 0;d i (t) ∈ R is an unknown and bounded disturbance. The aforementioned variables and vectors are related to time t. For concise, time t will be omitted in the later description.
Assumption 1: The reference signal x 1d (t) ∈ R and its first-order derivativeẋ 1d (t) ∈ R are available, continuous and bounded.
The control objective is to construct an optimal control in event-triggered mechanism for strict-feedback system (1), such that the system output is driven to track the desired reference signal x 1d (t) in an optimal manner and all the signals in closed-loop system are uniformly ultimately bounded(UUB).

III. FEEDFORWARD TRACKING CONTROL
The celebrated backstepping technology is introduced to design a feedforward controller, and transform the tracking problem for nonlinear strict-feedback system (1) into an optimal regulation problem for system state tracking errors. The feedforward controller is designed based on the following coordinate transformation: with z i , i = 1, . . . , n is the state tracking error and x id , i = 2, . . . , n is the stabilized control. The control law x id consists of feedforward virtual control u b i−1 by backstepping technology and optimal feedback control u a i−1 by ADP: To implement the event-triggered control, a monotonical increasing time subsequence {t s } ∞ s=0 is defined as the eventtriggered sampling instant satisfying t s+1 > t s , t s ∈ R + 0 , ∀s ∈ Z + , and t 0 = 0 is the initial sampling instant. At the instants t = t s , the event generator(EG) is triggered and the last held states are updated with the current system states. Accordingly, the control input is updated at the trigger instants t = t s , and held by a zero-order holder (ZOH) until the next trigger instant comes. Therefore, the piecewise continuity of the control input can be maintained through the ''EG-ZOH'' mechanism. The last held state is denoted as: The event-triggered error between the current state and the last held state is: Defined event-referred tracking error as: The event-triggered feedforward controller laws, predicated on the sampled statex i (t) instead of the real state x i (t), are constructed as: where k i is controller gain.
Step 1: The dynamic of tracking error z 1 is: Considering the feedforward virtual control (7), we can deduce that: Step i, 2 ≤ i < n: The dynamic of z i is: Substituting the feedforward virtual control (7) into Eqns.(10), yields that: Step n: The dynamic of z n is: Considering the feedforward control (7), yields that: z n = f n (x) + g n u b n + g n u a n + d n −ẋ nd = f n (x) − k nẑn − f n (x nd ) + g n u a n + d n −ẋ nd = −k n z n + k n e n + H n (z n ) + g n u a n + d n −ẋ nd (13) During the flow interval t ∈ [ t s , t s+1 ) , the control x id are held by ZOH, andẋ id = 0.The dynamic of tracking error z i is: i + d i z n = −k n z n + k n e n + H n (z n ) + g n u a n + d n (14) The Lyapunov function candidate V z is defined as: The first-time derivative of V z along the trajectory (14) can be expressed as: where z i+1 = 0. VOLUME 8, 2020 By Young's inequality, we can obtain that: Substituting Eqns. (17) into Eqns.(16), we have: . . , u a n T and the disturbance . . , d n ] T will be specified via the differential game theory later. The tracking control problem for nonlinear strict-feedback system with external disturbance (1) has been transformed into the equivalent regulation problem for tracking error affine systems (19). Then the ADP-based differential game scheme will be introduced to stabilize the affine system (19) and achieve the optimal control performance.

IV. OPTIMAL FEEDBACK CONTROL
The affine system dynamic of tracking error Z is described as: where the optimal feedback control vector U a and disturbance input vector D are considered as two control inputs. The non-quadratic performance cost function is defined as: Therefore, the control problem can be viewed a two-player zero-sum optimal control problem, i.e., the feedback control policy U a is regarded as player 1 and the optimal control input is sought to minimize the performance cost function (20); while the disturbance policy D is considered as player 2 and the worst-case disturbance input is found to maximize the performance cost function (20). Herein, the utility function r(Z (τ ), U a (τ ), D(τ )) is defined as: where Q, are positive definite symmetric matrices and η > 0 is the disturbance attenuation constant. Definition 1: A set of control pairs {U a * , D * } is the saddlepoint equilibrium for the two-player zero-sum game, if the following inequalities hold: The corresponding Hamilton function is defined as: According to Bellman's principle of optimality, the optimal performance cost function J * (Z ) satisfies that: The Hamilton-Jacobi-Isaacs (HJI) equation and the stationary conditions hold: when the optimal-control/worst-case disturbance pair Then the HJI equation (24) under the control input (25) can be rewritten as: Lemma 1: Consider the tracking error system (19), the associated performance cost function (20), and the input pair (25). Assume that there exists a continuous differentiable Lyapunov function J s (Z ) such thatJ is gradient of J s (Z ) with respect to Z . Moreover, let (Z ) be a positive definite function, i.e. ∀Z = 0, (Z ) > 0, and Z = 0 ⇒ (Z ) = 0. Furthermore, let (Z ) satisfies that lim z→∞ (Z ) = ∞ as well as: Then, the following inequation holds: Assumption 2: Assume that the optimal closed-loop system is bounded by the function of system states, such that The event-triggered optimal feedback control input is proposed as: where e = [e 1 , e 2 , . . . , e n ] T is event-triggered error vector, andẐ is last held tracking error vector defined as: Therefore, the piecewise continuity of the control input can be maintained through the ''EG-ZOH'' mechanism. As an external signal to suppress, the time-driven disturbance law, related to Z (t), is defined as: Subsequently, the event-triggered system dynamic for the tracking error is described as:

A. ADAPTIVE CRITIC NEURAL NETWORK FRAMEWORK
The solution of HJI equation (26) is the primary problem to implement the feedback control/disturbance policy {(U a ) * , D * }, which provide a saddle-point for the zero-sum differential games. However, to obtain the analytic solution of the HJI equation (26) is general difficult even impossible, due to the nonlinear features in tracking error dynamic (19). Subsequently, the reinforcement learning mechanism is utilized to construct the critic network and solve the HJI equation (26) approximately in the zero-sum optimal control problem. A three-layer feedforward neural network is introduced to reconstruct the performance cost function J * (Z ) on a compact set : where W c ∈ R N c is the ideal bounded weight vector, ϕ c (Z) : R m → R N c is the activation function, N c is the neurons quantity in the hidden layer, ε c (Z ) ∈ R is the finite approximation error.
The gradient vector of the performance cost function with respect to Z is: The event-triggered optimal control/time-triggered worstcase disturbance pair can be deduced as: The HJI equation (26) can be derived as: where u = G u −1 G T u , d = G d G T d η 2 and the residual error ε H due to the function approximation error is: As the ideal weight vector is unknown, the actual critic network output is built as: whereŴ c denotes the estimated weight vector. Subsequently, the gradient vector can be expressed as: The approximated optimal control and worst-case disturbance inputs are implemented as: Define the estimation error of the weight vector in critic network as:W Then the approximated Hamilton-Jacobi-Isaacs function yeilds that: which yields that: Considering (36), we have: and the approximated HJI function satifies: Note that the ideal value of approximated Hamilton function (42) is 0, whenŴ c → W c . The critic weight vector is trained to minimize the squared residual error E c : By a modified normalized gradient-descent algorithm, the update law of the critic weight matrix is tuned as: where l c is the learning rate, υ 1 υ υ 2 , υ 2 1+υ T υ, υ 2 2 = (1 + υ T υ) 2 is the normalized term, λ 1 , λ 2 is the tuning gains, the term J s (Z ) is defined in Lemma 1.
The weight error dynamics of the critic NN can be deduced as: For the critic NN (32), the following assumption is provided.
2.The ideal weight vector W c and the estimated weight vector at the trigger instantŴ c (t s ) are upper bounded.

B. EVENT GENERATOR DESIGN
The traditional time-triggered control receives state-feedback information continuously and the control actions are updated periodically, resulting in unnecessary resource consumptions and communication costs. In this section, an event generator is implemented to reduce the tremendous state sampling, without compromising the system stability. The state feedback signals are transmitted, then the optimal controller is updated only when the event-triggered conditions are satisfied: where k min = min{k i − 1 2 g i − 1 2 g i−1 }, k max = max{k i }, σ, τ, λ are proper parameters satisfying that 0 < σ < 1, τ > 0, λ > 1, k min − 1 2τ − 1 2λ k max > 0, ρ > 0 is designed to ensure that e ≥ √ ρ > 0,i.e., the minimum positive triggering interval exists and the event generator is not triggered infinitely. Thus, the Zeno behavior can be avoided.

V. STABILITY ANALYSIS OF CLOSED-LOOP SYSTEM
The system states are transmitted to the controller in a package-based manner. Under the event-triggered modular, the transmissions can only occur at discrete instants t s , s ∈ Z + , satisfying 0 ≤ t 1 < t 2 < . . .. Hence, the closedloop system is modeled as an impulsive system with a flow dynamic during the flow interval t ∈ [ t s , t s+1 ) and a jump dynamic at time instant t = t s .The augmented state vector is defined as ζ = Z T ,Ẑ T ,W T c T , and the flow dynamics can be rewritten as: where the expression of (W c ) is given in Eqns.(46), C is the flow set defined by event-triggered conditions (47). Similarly, the reset dynamics can be deduced as: Theorem 1: For the nonlinear strict-feedback system (1), providing the Assumptions 1-3 are satisfied, the feedforward controller is designed as (7), the feedback control-disturbance pairs are renewed by the policy (39), the performance cost function are approximated by critic NN with the weight tuning law (45), and the system state and controller are updated by the triggered conditions (47).The resulting nonlinear impulsive closed-loop system is asymptotically stable and all the signals are uniformly ultimately bounded(UUB).
Proof: Case 1 (During the flow period t s ≤ t < t s+1 ,i.e., events are not triggered): Selecting the Lyapunov function candidate as: where The first-order time derivative of the first term V z (t) can be expressed as: Considering Assumption 2, yields that: The first-order time derivative of the second term Vˆz(t) is: The third term VW c (t) satisfies that: Note thatŴ c = W c −W c , the following formula holds: Considering (34) and (39), we have According to Lemma 1, we can deduce that, (56), as shown at the bottom of the next page, where (57), as shown at the bottom of the next page.
The parameters λ 1 , λ 2 are tuned to ensure the matrix K is positive definite, and Eqns.(56) is: It can be concluded that T is upper bounded T ≤T, supported by Assumption 3. Then the following inequation holds: Recalling the event triggering condition (47), the Eqns.(59) satisfies: as long as the following conditions hold: Therefore, all the signals in the closed-loop system realize UUB according to Lyapunov theorem during the flow t s ≤ t < t s+1 .
Case 2 (At the jump instants t = t s ,i.e., events are triggered): Choosing the same Lyapunov function candidate (50), and the first difference of the first term and second term is: We can conclude thatV ζ (t) < 0 for t ∈ [ t s , t s+1 ) , which indicates that V ζ (t) is monotonical decreasing. Then the firsṫ difference of the third term is: Finally, the first difference of the Lyapunov function (50) is: which satisfies that V ς < 0 as long as Ẑ > D Z 1 . Therefore, the system tracking errors Z , the event-referred tracking errorsẐ , and the critic NN weight errorsW c (t) are bounded at the jump instants.
The analysis for the two cases renders that the close-loop nonlinear impulsive system state ς is semiglobally ultimately bounded under the event triggered condition (47).

VI. NUMERICAL SIMULATIONS
Example 1: A second-order strict-feedback system is considered as a numerical example.
The reference signal is chosn as x 1d (t) = sin(t) + 0.5 sin(2t).The system state vector is initialized as x(0) = [1, 0] T . For the numerical example (65), the feedforward controller and the feedback control-disturbance pairs are designed as: The gains for feedforward tracking controller are selected as k 1 = 2, k 2 = 3. For the critic network in optimal feedback controller, the activation function is experimentally selected as ϕ c (Z ) = z 1 , z 2 , z 1 tan −1 (z 1 ), z 2 tan −1 (z 2 ), z 2 1 , z 2 2 , z 3 1 , z 3 System output trajectory in Fig. 1 exhibits the output can track the reference signal within an acceptable tracking error. The evolution of other system state and control input during the learning phase are shown in Fig.2 and Fig.3. The event sampling instants in Fig.4 are changing and non-periodic. The trajectories of estimated critic weights in Fig.5 indicate that the critic networks weights can convergent to the optimal values gradually and the optimal controller is successfully approximated.   Example 2: A single-link robot manipulator with a motor rotor is considered as a practical example. 3 (67) VOLUME 8, 2020  where x 1 is the motor angular position, x 2 is the motor angular rate, x 3 is the motor armature current, u is the control voltage input, d 2 , d 3 is disturbance, The physical parameters are: the link mass m = 0.506,the link length L 0 = 0.305,gravity coefficient g = 9.81,the load mass M 0 = 0.434,electromechanical conversion coefficient K τ = 0.9,the load radius R 0 = 0.023,viscous friction coefficient B 0 = 0.01625,the rotor inertia J = 0.001625,the back EMF coefficient K B = 0.9,the armature inductance and resistance L = 15, R = 5.The reference signal is x 1d (t) = sin(t).The initial system state is For the practical example (67), the feedforward controller and the feedback controller are proposed as: The feedforward controller gains are k 1 = 1, k 2 = 3, k 3 = 3. In optimal feedback controller, the activation function in critic network is chosen as ϕ c (Z ) = [z 1 , z 2 , z 3 , z 1 tan −1 (z 1 ),   Evolution of the angular position, angular rate and armature current are depicted in Fig.6-Fig.8. The simulation fig shows that the angular position can track to the reference signal despite of the transient behavior during the first 5s when the persistence excitation signal is turned on. The event sampling instants in Fig.9 show that less sampling instants are required than the time-triggered control. The proposed event driven control can save communication resources and control costs greatly.   Note that the trajectories of estimated critic weights plotted in Fig.5 and Fig.10 converge near zero, as the backsteppingbased feedforward control works as a main controller, and the ADP-based optimal feedback control works as an auxiliary controller. The optimal controller plays an important role during the transient process in Fig.5 and Fig.10(it is more obvious in Fig.10). From a theoretical perspective, the optimal controller is constructed to stabilize the tracking errors in backstepping control, which converge to zero. Hence, the critic weights converge near zero.
From the simulation results for Example 1 and Example 2, the proposed optimal control can guarantee the stability of the closed-loop system and provide an excellent control performance in an optimal manner, while limited samples and fewer transmissions are performed in the event driven environment.

VII. CONCLUSION
This paper introduced an adaptive optimal control for nonlinear strict-feedback systems with generalized ET-learning scheme. To reduce calculations and network communication, an event generator is devised. A suitable feedforward control is proposed by the backstepping method; while the optimal feedback control is designed by the ADP technique to stabilize the tracking error dynamics. For the optimal feedback control, a critic network is constructed to approximate the saddle-point equilibrium in the two-player zero sum differential games. Due to the event triggering mechanism, the resulting closed-loop system fall into the impulsive closed-loop system form. The proposed optimal control strategy not only ensures that all signals in the impulsive closed-loop system are bounded, but also guarantees the predefined cost function is minimized. Finally, two numerical examples demonstrate the performance of the proposed control algorithm.
In the future, the adaptive optimal control problems for large-scale systems or multi-agent systems in strict-feedback form is worthy of further investigation.
YUEHUI JI received the B.S. and Ph.D. degrees from the School of Electrical Engineering and Automation, Tianjin University, China, in 2009 and 2012, respectively. She is currently working as a Lecturer at the School of Electrical and Electronic Engineering, Tianjin University of Technology, China. Her research interests include nonlinear adaptive control and decentralized control for interconnected systems.