Robust H∞ Observer-based Reference Tracking Control Design of Nonlinear Stochastic Systems : HJIE-embedded Deep Learning Approach

The robust H∞ observer-based reference tracking control design of nonlinear stochastic systems with external disturbance and measurement noise is always a very complicated and difficult problem in the control field. It needs to solve a very difficult control-observer-coupled Hamilton Jacobi Isaacs equation (HJIE) for nonlinear observer and controller in the design procedure. At present, there exists no analytic and numerical way for solving this control-observer-coupled HJIE. A novel HJIE-embedded deep learning approach is proposed as a co-design of deep learning algorithm and H∞ observer-based tracking control scheme to directly solve the nonlinear partial differential control-observer-coupled HJIE of H∞ observer-based reference tracking control design problem of nonlinear stochastic systems. In the off-line training phase, state estimation error and tracking error are inputed to HJIE-embedded deep neural network (DNN) to output the solution of HJIE. If not, the learning error of HJIE is fedback to train DNN to solve HJIE for H∞ tracking control law, observer gain as well as the worst-case external disturbance and measurement noise, which will be sent back to nonlinear stochastic system model to replace the external disturbance and measurement noise and estimation error signal for next step training. The proposed DNN-embedded H∞ observer-based reference tracking scheme can achieve the theoretical H∞ observer-based reference tracking control strategy as the deep learning algorithm converges. If free of external disturbance and measurement noise, the proposed DNN-based H∞ observer-based reference tracking control scheme can approach to the stochastically asymptotical state estimation and reference tracking simultaneously. Finally, a design example of H∞ observer-based reference tracking control for quadrotor UAV system with external disturbance and output measurement noise is provided to illustrate the design procedure and to validate the state estimation and reference tracking performance simultaneously of the proposed HJIE-embedded H∞ DNN-based observer-based reference tracking control scheme of nonlinear stochastic systems.

D EEP neural network (DNN) was inspired by biological neural systems as an information processing model [1]. Through learning from big data, DNN enables us to perform tasks for a large variety of applications. Recently, many powerful big data-driven methods have been developed based on DNN and exceptionally applied to speech recognition [2], [3], translation of languages [3], [6] and image classification [4], [5], etc. These works like speech recognition, language translation and image classification are usually simple to perform by human brain but are still difficult for man-made machines. For the above applications, we need to train DNN as best as possible in order to fit the above input/output data pairs [7], [8]. Therefore, the big data-driven method has been employed for the current training methodology of deep learning methods of DNN [9]. Once it has been trained, DNN can respond to those never-observed data to make an optimal recognition, translation or classification according to its past trained knowledge [8], [9].
In the last few years, due to the development of hardware to process a very large amount of data, DNN-based learning methods have been significantly improved with wide applications, which require a very large amount of empirical data to train for a specific optimal performance [7], [9]. At present, the traditional DNN-based deep learning methodologies always employ big data-driven approaches since they need a very large amount of empirical data to train DNN for the system knowledge and behavior [1], [9]. The shortage of these traditional big data-driven deep learning methodologies always neglects the traditional system modelings and a large amount of well-developed theoretical results. For example, unlike speech recognition and image classification, the stochastic nonlinear system models with external disturbance and measurement noise have been constructed and the corresponding theoretical robust H ∞ state estimation and observer-based output feedback control results of nonlinear stochastic systems have been well developed. But an analytic or efficient method is still lack to solve them under intense research in the field of system control for several decades [10]- [14]. Therefore, it is more appealing to apply the deep learning scheme to solve these complicated nonlinear H ∞ stochastic observer-based control design problems other than the traditional big data-driven deep learning methods for speech recognition and image classification problem. These theoretical results of robust H ∞ state estimation and observer-based output feedback control design of nonlinear stochastic system under external disturbance and measurement noise could be considered as expert knowledge in the deep learning approaches, which can save much training data and time in the DNN-based robust H ∞ observer-based control design of nonlinear stochastic system with external disturbance and output measurement noise.
In the last decades, the robust H ∞ control strategies have been developed and widely applied to efficiently attenuate the effect of external disturbance on the stabilization performance of nonlinear dynamic systems with uncertain external disturbances [11], [12], [14], [15]. However, the robust H ∞ control design of nonlinear dynamic system with external disturbance needs to solve a very complicated nonlinear partial differential Hamilton Jacobi Isaacs Equation (HJIE), which can not be efficiently solved analytically and numerically at present [11], [14], [15]. Therefore, several approximation methods have been proposed to interpolate some local linearized systems by fuzzy interpolation method [16]- [17], gain scheduling method [21], [22] and global linearization [18]- [20] to approximate a nonlinear dynamic system so that the HJIE could be approximated by a set of local Riccati-like equations, which can be transformed to a set of linear matrix inequalities (LMIs) [20]. Then these interpolation methods for the robust H ∞ control design problem of nonlinear dynamic system with external disturbance can be transformed to the problem of how to solve a set of LMIs with the help of LMI toolbox in Matlab. The shortages of these interpolation methods are mentioned as follows: (i) In the process of transferring HJIE to a set of LMIs, we have performed the operation of inequality several times, leading to a very conservative result, (ii) a specific quadratic Lyapunov function V (x(t)) = x T (t)P x(t) for some P = P T > 0 is selected as the solution of HJIE, which will limit the domain of the solution to HJIE and lead to a conservative result, and (iii) the state feedback control law will be the interpolation of N local control laws. More computations are needed for control law, especially for complex nonlinear system. Further, if the system state x(t) is unavailable and to be estimated from output measurement, then the complex interpolatory observer-based output feedback controller is needed to compute at every time instant for the interpolatory observer-based output feedback control law. Obviously, the computation loadings of these H ∞ observer-based output feedback control laws are heavy and will prevent their practical applications, especially for highly nonlinear systems like quadrotor UAV systems [26].
Recently, a DNN-based H ∞ control scheme has been proposed for the stabilization control design problem of nonlinear time-varying dynamic systems with external disturbance [27]. The HJIE of robust H ∞ state feedback control design problem can be directly solved by the DNN-based learning algorithm so that the nonlinear H ∞ state feedback controller of nonlinear time-varying system can be obtained without the interpolation of local linear controllers by the conventional interpolation methods. However, in practical applications, the state variables of nonlinear dynamic system are not always all available and reference tracking control design is more appealing. These state variables can be only estimated from output measurement with the corruption of measurement noise. In this situation, the H ∞ observer-based output feedback control is more suitable to treat the robust reference tracking control of nonlinear dynamical systems with external disturbance and measurement noise in practical applications. Therefore, we need to solve a controlobserver-coupled HJIE for robust H ∞ observer-based output feedback reference tracking control design problem of nonlinear stochastic systems under external disturbance and measurement noise. To avoid solving the very complicated control-observer-coupled HJIE for the H ∞ observer-based output feedback control design, T-S fuzzy observer-based state feedback control was proposed to interpolate a set of N 2 local linear observer-based controllers to achieve the robust H ∞ estimation and control performance by solving a set of N 2 control-observer-coupled LMIs by a so-called two-step design procedure [16]. These local interpolation methods need much effort to solve the robust H ∞ observer-based output feedback control problem. Further, it needs much computational time to calculate N 2 local observer-based output feedback control laws at every time instant for non-linear dynamic system with external disturbance and output measurement noise. Since output feedback tracking control is more useful to achieve a desired reference tracking design in practical applications, it is more appealing to design a robust H ∞ Luenberger observer-based reference tracking control design of nonlinear stochastic system with output measurement under external disturbance and measurement noise. Therefore, to avoid design complexity and save computation loading in the above interpolation methods, in this study, an HJIE-embedded DNN learning method is proposed as a co-design of H ∞ observer-based output feedback reference tracking control scheme and DNN learning algorithm for the robust H ∞ Luenberger observer-based reference tracking control design of nonlinear stochastic system with output measurement under external disturbance and measurement noise.
In this work, based on the augmented stochastic estimation error dynamic system and time-varying reference tracking error system, the minmax stochastic H ∞ Nash game strategy [15] is employed to minimize the worst-case effect of external disturbance and measurement noise on the state estimation error and reference tracking error. The robust H ∞ observer-based output feedback reference tracking control design problem of nonlinear stochastic systems needs is the Lyapunov function of the estimation errorx(t) of observer, the tracking error e(t) of controller, and time t for the design of control law u * (t) and observer gain L * (x(t)) of the robust H ∞ observer-based output feedback reference tracking control design. The difficulties in the design procedure of H ∞ observer-based output feedback reference tracking control of nonlinear stochastic system comprise: (i) The separation principle of observer and controller can not hold for nonlinear stochastic systems with external disturbance and output measurement noise so that HJIE of H ∞ observer-based output feedback reference tracking control design is control-observer-coupled; (ii) the HJIE is also a function of state x(t), which is unavailable in our nonlinear stochastic system with output measurement. To overcome this difficulty, both observer dynamic system model and estimation error dynamic system model are needed to generatex(t) andx(t), respectively, to provide x(t) =x(t) +x(t) for solving HJIE as shown in Fig. 1; (iii) in the off-line training phase, external disturbance v(t) and measurement noise n(t) are unavailable to generate output measurement y(t) by nonlinear stochastic system model for observer to estimate system state. In this study, the worst-case external disturbance and measurement noise v * (t) and n * (t) of H ∞ observer-based reference tracking control strategy are employed to replace v(t) and n(t), respectively, and are inputed to nonlinear stochastic system model with H ∞ control law u * (t) to produce the measurement signal y(t) for the off-line training as shown in Fig. 1; (iv) the most difficulty work is to solve ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T from HJIE directly for H ∞ control law u * (t) and observer gain L * (x(t)). In this study, HJIE-emdedded DNN is employed to efficiently approach to ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T via Adam learning algorithm [23]. We can prove that as the learning error of the embedded HJIE by Adam learning algorithm approaches to zero, the DNN with inputx(t) and e(t) will output ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T , i.e., the HJIE-embedded DNN observer-based reference tracking control scheme in Fig. 1 can approach to the H ∞ observerbased output feedback reference tracking control design of nonlinear stochastic system with external disturbance and measurement noise. We have also proved that if the external disturbance and measurement noise are of finite energy, the mean-square asymptotical estimation to the true state (i.e., x(t) → x(t)) and the mean-square asymptotical tracking ability to the desired target r(t) (i.e., x(t) → r(t)) are both guaranteed by the proposed HJIE-embedded DNN observerbased reference tracking control scheme too.
In this study, the theoretical control-observer-coupled HJIE of H ∞ observer-based output feedback reference tracking control strategy could be employed as an expert knowledge of DNN-based H ∞ observer-based reference tracking control scheme. Based on the system models and embedded control-observer-coupled HJIE to train DNN to approach H ∞ observer-based reference tracking control strategy, we could not only apply DNN learning scheme to approach to the traditional complicated H ∞ nonlinear estimator-based output feedback reference tracking control design but also significantly reduce a larger amount of training data and training time than the conventional deep learning approaches via big data training.
The major contributions of this work are described as follows: 1) A novel DNN-based robust H ∞ observer-based reference tracking control scheme is proposed for nonlinear stochastic system with external disturbance and output measurement noise. The off-line training process of DNN can be accomplished by deep learning algorithm to approach to ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T to solve the controlobserver-coupled HJIE for the H ∞ control law, H ∞ observer gain and the worst-case external disturbance and measurement noise simultaneously to achieve the H ∞ observer-based output feedback tracking control design of nonlinear stochastic system with external disturbance and output measurement noise. 2) We could show that the proposed DNN-based output feedback tracking control scheme can approach to the theoretical H ∞ observer-based output feedback reference tracking control of nonlinear stochastic system by Adam learning algorithm. We could also prove that if the stochastic nonlinear system is with finiteenergy external disturbance and measurement noise, the proposed DNN-based robust H ∞ observer-based output feedback tracking scheme will achieve the mean-square asymptotical state estimation, asymptotical reference tracking and asymptotical stability of closed loop nonlinear stochastic observer-based con-VOLUME 4, 2016 trol system simultaneously. 3) By the proposed HJIE-embedded DNN H ∞ observerbased output feedback reference tracking control scheme, the traditional nonlinear stochastic system models and theoretical results of robust H ∞ observerbased output feedback tracking control could be employed to complement the traditional big data-driven deep learning approaches with more wide applications to nonlinear system control designs. Further, since system model, observer model and tracking error model are employed to help training DNN-based H ∞ observer-based output feedback reference tracking control scheme, we can save much training data and time than the conventional big data-driven deep learning approaches.
The remainder of this study is organized as follows: In section II, we will discuss the robust H ∞ observerbased output feedback tracking control design problem of nonlinear stochastic system with external disturbance and measurement noise. In section III, a novel control-observercoupled HJIE-embedded H ∞ DNN observer-based reference tracking scheme is introduced to deal with output feedback reference tracking control design problem of a nonlinear stochastic system under external disturbance and measurement noise. In section IV, a numerical simulation of UAV reference tracking control design through output measurement is given to illustrate the design procedure and validate the reference tracking performance. Finally, a conclusion is given in section V.
Notation: A T denotes the transpose of vector or matrix A; P = P T ≥ 0 denotes semi-positive definite matrix ∂e(t) ∂V (x(t),e(t),t) ∂t   ; R n denotes the set of n-tuple real vectors; L 2 [0, ∞) denotes a set of real functions x(t) ∈ R n with finite energy, i.e.,

II. PROBLEM DESCRIPTION
For decades, H ∞ robust control designs have been the important research topics for a broad spectrum of application areas and impacts [10]- [15]. In practical applications, external disturbances and measurement noises are always unavoidable in real control systems; for example, the loadings and environment interferences are considered as external disturbance, and output measurement noises always occur when state variables are unavailable and we need to estimate state variables from output measurement for state feedback control. During past decades, robust H ∞ observer-based output feedback control strategies have been developed for efficient attenuation of uncertain external disturbance and measurement noise on nonlinear quadratic stabilization and nonlinear state estimation of nonlinear dynamic system from the minmax Nash game point of view [16]- [17]. In this section, we review the robust H ∞ observer-based reference tracking control design of nonlinear stochastic system with external disturbance and output measurement noise. Consider the following nonlinear stochastic system with external disturbance and output measuremenṫ (1) where x(t) ∈ R n is the state vector, x 0 ∈ R n denotes the initial condition, u(t) ∈ R m is the control input, y(t) ∈ R l is the measurement output, F (x(t)) ∈ R n , G(x(t)) ∈ R n×m , C(x(t)) ∈ R l and D(x(t)) ∈ R n×k are system functions. These system functions are assumed to satisfy with the Lipschitz continuity. v(t) ∈ R k and n(t) ∈ R l denote the random external disturbance and measurement noise, respectively. In the nonlinear stochastic system in (1), only output measurement y(t) is available.
In this study, we want to control the state x(t) of nonlinear stochastic system in (1) from output y(t) to track a desired reference target r(t). Let us denote the reference tracking error as e(t) = x(t) − r(t). Then the reference tracking error dynamic can be described as follows: Then we get the reference tracking error time-varying dynamic equation aṡ The design purpose of this study is to specify a robust state feedback reference tracking control for the nonlinear stochastic system in (1). However, the state x(t) in (1) is unavailable. Therefore, the following Luenberger observer is employed to estimate the state vector for observer-based output feedback reference tracking control of nonlinear stochastic system in (1) · x(t) = F (x(t)) + G(x(t))u(t) + L(x(t))(y(t) − C(x(t))) u(t) = K(x(t), e(t)) (5) where L(x(t)) ∈ R n×l denotes the observer gain and K(x(t), e(t)) is the control gain based on the state estimatê x(t) and the tracking error e(t). The flow chart of the HJIE-embedded DNN-based H∞ observer-based output feedback reference tracking control scheme of nonlinear stochastic system in (1) with external disturbance and measurement noise. Since the real random external disturbance v(t) and measurement noise n(t) are unavailable in the off-line training phase, the worst-case external disturbance v * (t) and measurement noise n * (t), which are generated by the output ∂V (x(t),e(t),t) ∂[x T (t) e T (t)] T of DNN based on (13), are used to replace the real v(t) and n(t) to generate y(t) by the the stochastic system model with control input u * (t), which is also generated by the output ∂V (x(t),e(t),t) ∂[x T (t) e T (t)] T of DNN based on (14). Then y(t) is inputed to Luenberger observer in (5) to generatex(t) with observer gain L * (x(t)) being generated by the output ∂V (x(t),e(t),t) ∂[x T (t) e T (t)] T of DNN based on (15). Further u * (t), L * (x(t)) and v * (t) n * (t) , which are also generated by the ∂V (x(t),e(t),t) ∂[x T (t) e T (t)] T of DNN, are also inputed to the estimation error system in (7) to generate estimation errorx(t). Then we can obtain the state x(t) =x(t) +x(t) to obtain the tracking error by e(t) = x(t) − r(t). Finally, the state estimation errorx(t) and reference tracking error e(t) are both inputed to DNN to be expected to output ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T after the proposed HJIE-embedded deep learning training by Adam learning algorithm in the off-line training phase. In the on-line operation phase, the flow chart of HJIE-embedded DNN-based H∞ observer-based output feedback reference control scheme is similar to the off-line training phase except the Adam learning algorithm being stopped and v * (t) and n * (t) being replaced by the real v(t) and n(t), respectively.
Based on the above analysis, the following H ∞ observerbased output feedback reference tracking control design problem for the nonlinear stochastic system in (1) is how to specify control gain K(x(t), e(t)) and observer gain L(x(t)) in (5) such that the worst-case effect of external disturbance v(t) and measurement noise n(t) on the estimation error and reference tracking error must be minimized and below a prescribed attenuation level ρ as follows [11], [15]: where E{·} denotes the expectation of {·}, t f denotes the terminal time, the weighting matrices Q 1 ∈ R n×n , Q 2 ∈ R n×n and R ∈ R m×m are specified by control designer as the tradeoff among the estimation error, tracking error and control effort. The physical meaning of (6) is that the worstcase effect of external disturbance and measurement noise on the state estimation error, reference tracking error and control effort should be minimized by observer-based control u(t) = K(x(t), e(t)) and observer gain L(x(t)) and must be less than a prescribed level ρ simultaneously.
From (1) and (5), we could obtain the state estimation error equation ofx(t) = x(t) −x(t) as follows: VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
Then the H ∞ observer-based output feedback reference tracking control strategy in (6) could be reformulated as follows: In (8), to simplify the design procedure, the robust H ∞ observer-based output feedback reference tracking control strategy in (8) becomes the robust H ∞ stabilization strategy of the following augmented nonlinear stochastic time- In the robust H ∞ stabilization strategy in (8), we assume the initial conditionsx(0) and e(0) are all 0. Ifx(0) ̸ = 0 and e(0) ̸ = 0, the effect of initial energy V (x(0), e(0), 0) due to the initial condition should be extracted as follows: where V (x(t), e(t), t) denotes the Lyapunov energy function of the nonlinear augmented stochastic time-varying error system in (9). Based on the above analysis, the complex H ∞ observerbased reference tracking control design problem in (6), which involves the nonlinear stochastic system in (1), tracking error time-varying dynamic in (4) and observer-based output feedback in (5), becomes how to solve the minmax H ∞ stabilization design problem in (10) for the nonlinear augmented stochastic time-varying error system in (9). This control design problem reformulation will significantly simplify the control design procedure in the sequel.
In (10), one playerv(t) wants to maximize the payoff function while other players K(x(t), e(t)) and L(x(t)) try to minimize the payoff function. However, it is not easy to directly solve the minmax game problem of the fractional payoff function in (10) for the nonlinear stochastic augmented error system in (9). The following indirect twostep method [11] is employed to solve minmax Nash game problem in (10). Since the selection of the playerv(t) in the denominator of payoff function in (10) is independent on K(x(t), e(t)) and L(x(t)), the minmax game problem in (10) is equivalent to [11], [15] min Then a two-step method is employed to solve the constrained minmax Nash quadratic game problem in (11), i.e., (i) in the first step, we need to solve the following minmax Nash quadratic game problem at first then (ii) in the second step, we need to guarantee J ≤ E{V (x(0), e(0), 0)}. Based on the above two-step indirect method, the robust H ∞ observer-based output feedback reference tracking control strategy in (10) for the augmented time-varying stochastic error system in (9) is solved by the following theorem. (1) with external disturbance and output measurement noise, suppose the Luenberger observer-based control law in (5) is employed to achieve the robust H ∞ observer-based output feedback reference tracking control strategy in (6), (10) or (11). Then the worst-case external disturbance v * (t) and measurement noise n * (t) as well as H ∞ control gain K * (x(t), e(t)) and observer gain L * (x(t)) are given as follows:

Theorem 1. (a) For the nonlinear stochastic system in
where the Lyapunov function V (x(t), e(t), t) > 0 with V (0, 0, t) = 0 is the solution of the following time-varying HJIE In the nonlinear stochastic system in (1), if external disturbance v(t) ∈ L 2 [0, ∞) and measurement noise n(t) ∈ L 2 [0, ∞), then the asymptotical estimation and tracking ability will be achieved by the proposed robust H ∞ observer-based output feedback control in (5) with control gain K * (x(t), e(t)) and observer gain L * (x(t)) in (a), i.e., x(t) → 0 and e(t) → 0 as t f → ∞. Further, if r(t) ∈ L 2 [0, ∞), then the mean-square asymptotical stability of the closed loop system is also guaranteed.
Proof. See the Appendix A.
From (16), all terms of HJIE contain the estimation error x(t), system state x(t) and tracking error e(t) = x(t) − r(t), i.e., HJIE in (16) is a control-observer-coupled HJIE. It is very difficult to solve HJIE in (16) analytically or numerically, especially, when x(t) is unavailable. Remark 1. In general, for the H ∞ observer-based output feedback tracking control design, there are two-coupled HJIEs to be solved, one for observer and another for controller. However, when the state estimation error and reference tracking error are simultaneously considered in the H ∞ observer-based output feedback reference tracking performance in (6) or (8) and the state estimation error equation and reference tracking error equation are combined as the augmented nonlinear error system in (9), we could obtain a control-observer-coupled HJIE in (16) for H ∞ observerbased output feedback tracking control strategy. This will simplify the design procedure of DNN-based H ∞ observerbased reference tracking scheme.
In general, it is very difficult to solve the partial timevarying nonlinear differential equation HJIE in (16) for the H ∞ observer-based reference tracking control design of K * (x(t), e(t)) in (14) and L * (x(t)) in (15) for nonlinear stochastic system in (1) with external disturbance and measurement noise. In the last decades, several approximation methods like the global linearization method [18], [19], T-S fuzzy interpolation method [16] and gain scheduling method [21], [22], etc., have been used to interpolate N local stochastic linearized systems to approximate the nonlinear system with output measurement in (1). Then a set of N 2 local linear observer-based state feedback controllers are interpolated to approximate the nonlinear observer-based output feedback controller in (5). Based on these interpolation approximation methods and with the choice of quadratic Lyapunov function coupled Riccati-like equations, which could be transformed to N 2 control gains-observer gains-coupled bilinear matrix inequalities (BMIs) [16]. Finally, a two-step method is used to solve these complex control gains-observer gains-coupled BMIs. The shortages of these interpolation methods are given as follows: In some practical nonlinear systems, in order to reduce the approximation error, there are a large number of local control gains {K i } to be solved from 15625 K i and L j -coupled BMIs in the quadrotor UAV H ∞ observer-based tracking control design [17]. It is a very complex design procedure to solve these {K i } 125 i=1 and {L j } 125 j=1 from 15625 K i and L j -coupled BMIs by the so-called two-step design procedure for robust H ∞ fuzzy observer-based quadrotor UAV tracking design problem. Further, we need to compute the following T-S fuzzy observer-based state feedback control u(t) at every time instant [17] where h i (x(t)), i = 1, . . . , 125 are the complicated fuzzy interpolation bases. Moreover, the selection of quadratic Lyapunov solution V (x(t), e(t), t) =x T (t)Px(t) for HJIE in (16) is very conservative, which will limit the solution of nonlinear V (x(t), e(t), t) in (16).
In the above minmax H ∞ observer-based output feedback reference tracking control design problem of non-VOLUME 4, 2016 0 FIGURE 2: The architecture of the HJIE-embedded DNN is to be trained by Adam learning algorithm to solve HJIE = 0 in (19) in the off-line training phase. After Adam learning algorithm approaches to 0, the HJIE-embedded DNN with input (14), H∞ observer gain L * (x(t)) in (15) and worst-case v * (t) n * (t) in (13) simultaneously.
Then Luenberger observer in (5) can generatex(t) and the estimation error equation in (7) can generatex(t), so that x(t) =x(t) +x(t) can be obtained for tracking error Finally,x(t) and e(t) are fedback as input to DNN to begin another cycle of Adam learning process based on the error ε(θ(t)) of HJIEε in (20).
linear stochastic system in (1), the main difficulty lies in how to solve the two partial differential ∂V (x(t),e(t),t) (14) and L * (x(t)) in (15). In this study, an HJIE-embedded deep learning approach in Fig. 1 and Fig. 2 will be proposed to directly solve control-observer-coupled HJIE in (16) for the H ∞ observer-based output feedback reference tracking control design problem of nonlinear stochastic system with external disturbance and output measurement noise in (1).

III. HJIE-EMBEDDED DNN-BASED H∞ OBSERVER-BASED OUTPUT FEEDBACK REFERENCE TRACKING CONTROL DESIGN OF NONLINEAR STOCHASTIC SYSTEMS
For the nonlinear stochastic system with random external disturbance and output measurement noise in (1), in this study, the minmax H ∞ observer-based output feedback reference tracking control strategy in (6) is employed by the observer-based output feedback control in (5) to minimize the worst-case effect of external disturbance v * (t) and measurement noise n * (t) on the state estimation errorx(t) and reference tracking error e(t) as well as control effort u(t) from the energy perspective. From Theorem 1, we need to solve the very complicated time-varying partial differential HJIE in (16) for ∂V (x(t),e(t),t) (14) and L * (x(t)) in (15) for nonlinear observer-based output feedback control law in (5) andv * (t) in (13) for the worst-case external disturbance and measurement noise. In this study, in order to avoid the above complex approximation methods of interpolation through local linearization schemes, we employ an HJIE-embedded DNN to approach ∂V (x(t),e(t),t) , e(t), t) in the conventional methods [14]. The reason is that even V (x(t), e(t), t) is solved [14], it is still very difficult to calculate ∂V (x(t),e(t),t) (13) in the real-time H ∞ observer-based output feedback tracking process.
For the convenience of design, let us denote Then HJIE in (16) can be rewritten as Recently neural network is considered as an universal approximator to any complex function after an efficient deep learning approach [23], [24], [29]. Therefore, in this study, DNN is employed with deep learning approach to solve (14), the worst-case external disturbance and measurement noise v * (t) n * (t) in (13) and (15) as shown in Fig. 1.
In the off-line training phase, since the external disturbance v(t) and measurement noise n(t) are unavailable for the nonlinear stochastic system in (1), the worst-case external disturbance v * (t) and measurement noise n * (t) generated by (13) are employed to replace v(t) and n(t) to generate y(t) by nonlinear system model with the H ∞ control law u * (t) in (14), which does not affect the performance of H ∞ observerbased reference tracking control strategy in (6) because it is designed based on the worst-case v * (t) and n * (t) in (6). Since state variable x(t) of HJIE in (19) is unavailable too, we need nonlinear Luenberger observer in (5) to generatê x(t) and state estimation error equation in (7) to generate estimation errorx(t) so that system state x(t) can be obtained by x(t) =x(t) +x(t) for F (x(t)), G(x(t)), C(x(t)) and D(x(t)) in (19), and therefore the tracking error e(t) can be obtained by e(t) = x(t) − r(t) for F e (e(t), t), G e (e(t), t) and D e (e(t), t) in (19) too. Thenx(t) and e(t) are inputed to DNN to expect generating ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T after deep learning scheme via the HJIE-embedded Adam learning algorithm. In this study, the output ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T of DNN will be sent to the embedded HJIE to check the value of HJIE in (19). If the DNN output approaches to the expected HJIE after the training of deep learning scheme, then HJIE in (19) will approach to 0 too. If not, DNN will output ( ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T ) ε which leads to the following result of HJIE (20) Then ε(θ(t)) will be fedback to train the weighting parameters of neurons in DNN as shown in Fig. 2 until the output of DNN approaches to ∂V (x(t),e(t),t) as shown in the flow chart of Fig. 1 (14) and by 1 is also used (15). These information of H ∞ control u * (t), H ∞ observer gain L * (x(t)) and the worst-casev * (t) are sent to nonlinear system model to generate y(t), Luenberger observer in (5) to generatex(t) and the estimation error equation in (7) to generatex(t) and then generate x(t) = x(t) +x(t) to obtain e(t) = x(t) − r(t). Finally,x(t) and e(t) are inputed to DNN for training andx(t), e(t) and x(t) are also sent to HJIE to calculate its value ε(θ(t)) for deep learning training as shown in the flow chart of Fig. 1 in the off-line training phase of HJIE-embedded DNN-based H ∞ observer-based output feedback tracking control scheme of nonlinear stochastic system with external disturbance and measurement noise in (1).
After the off-line training phase with ε(θ(t)) = 0, Then the proposed HJIEembedded DNN-based H ∞ observer-based reference tracking control scheme is shifted to the on-line operation phase as shown in Fig. 1. In the on-line operation phase, the output y(t) of nonlinear stochastic system is generated by the robust H ∞ observer-based state feedback control u * (t) in (14) and real v(t) and n(t). Therefore, we do not need ∂V (x(t),e(t),t) ∂[x T (t) e T (t)] T of DNN output to generatev * (t) by (13) in the on-line operation phase. The remainder procedure is similar to the off-line training phase. In the on-line operation phase, generally, we do not need to train DNN again. However, in some situation, if ε(θ(t)) > δ for a specified small value δ, we could also start on Adam learning algorithm without affecting the operation procedure of DNN-based H ∞ observer-based reference tracking control scheme in the online operation phase.
The architecture of HJIE-embedded DNN in the proposed robust H ∞ observer-based output feedback reference tracking control scheme consists of input layer, multiple hidden layers, HJIE layer and output layer as shown in Fig. 2. The estimation errorx(t) and reference tracking error e(t) are inputed to DNN. Input layer consists of 2n nodes, n nodes forx(t) and another n nodes for e(t). There are 2n + 1 nodes in output layer, n nodes for outputing ∂V (x(t),e(t),t) , another n nodes for ∂V (x(t),e(t),t) ∂e(t) and one node for ∂V (x(t),e(t),t) ∂t .
The neurons of DNN are with LeakyReLU σ(x) as the activation function of x in hidden layers. LeakyReLU is not equal to zero but a constant gradient as input is negative, and is the same as ReLU while input is positive [23]. This way can keep the advantage of ReLU and avoid the problem of dead ReLU when input is negative, i.e., the operation of LeakyReLU is given as follows: where a 1 and a 2 are some constant with a 1 , a 2 ∈ (0, 1). The error ε(θ(t)) of HJIE in (20) will be fedback to DNN to train the weighting parameters and biases of neurons in DNN via the following Adam learning algorithm by minimizing the objective function ε 2 (θ s (t)) [23], [24], where θ s (t) is the weighting parameter vector of the sth training iteration at time t, which is to be trained to let DNN output ( ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T ) ε at time t. l is the learning rate or stepsize and S is the number of training time steps at each time t. ζ is a small number to prevent the denominator from being 0.m s (t) andṽ s (t) are bias-corrected estimators defined as follows [23]: where where g s (t) = ∂ ∂θs(t) 1 B B ε 2 (θ s (t)) is the gradient vector of root mean square (RMS) error, i.e., the partial derivative of objective function at time step s at time t. B denotes the batch size. µ 1 , µ 2 ∈ [0, 1] in (23) are the degree of the previous influence on the current direction to be specified by designer, which can be considered as the concept of momentum to avoid being trapped by a local minimum and speed up the learning process [23], [24]. Based on the bias-corrected estimators in (22) and (23), if the direction of current gradient g s (t) is the same as the accumulated VOLUME 4, 2016 gradient, then the gradient will be strengthened, otherwise, the gradient will be weakened. µ s 1 and µ s 2 denote the sth power of µ 1 and µ 2 , respectively. m s (t) and v s (t) in (23) are the moving average of gradient and squared gradient of g s (t) at time t, respectively. Withṽ s (t) in (21), we can take the advantage of the adaptive learning rate, i.e., it should be larger at the beginning and then smaller near the minimum. The Adam learning algorithm in (21)-(23) can combine both the advantages of the above momentum and RMSProp [23], [24] and is found to be an efficient parameter-specific adaptive learning method. Due to its easy implementation and great performance, Adam learning algorithm is one of the most popular optimizer being uesd recently and will be employed to train HJIE-embedded DNN to output ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T for solving HJIE = 0 in (19) and generating H ∞ observer gain L * (x(t)), H ∞ control u * (t) and the worst-casev * (t) in (15), (14) and (13), respectively, for the HJIE-embedded DNN-based H ∞ observer-based output feedback reference tracking control scheme in Fig. 1.

Remark 2.
The convergence of weighting parameter vector θ s (t) of Adam learning algorithm in (21)-(23) has been proven in [23]. If the number of hidden neurons and time steps S are large enough, the updating weighting parameter vector θ s (t) of DNN by Adam learning algorithm could converge to a globally optimal θ * s (t) at a linear convergence rate as s → ∞ in (21)- (23).
During the off-line training process of HJIE-embedded DNN through the above Adam learning algorithm in (21)-(23) in Fig. 1, the output ( ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T ) ε of DNN is sent to HJIE ε in (20) to calculate its error ε(θ s (t)) at the sth training step of Adam learning algorithm at time t in (21)-(23) as follows: The error ε(θ s (t)) of HJIE will be fedback to train DNN iteratively by Adam learning algorithm to expectantly output the precise ∂V (x(t),e(t),t) and the worst-case external disturbance and measurement In the following theorem, we will prove that as the error ε(θ s (t)) in (24) approaches to 0 by the Adam learning algorithm in (21)-(23), the output ( ∂V (x(t),e(t),t) (16). Theorem 2. If ε(θ s (t)) in (24) approaches to 0 by Adam learning algorithm in (21) (16). In this situation, (a) the HJIE-embedded DNN-based robust observerbased output feedback reference tracking scheme in Fig. 1 will approach to the theoretical H ∞ observer-based output feedback reference tracking control design (13)- (15) in Theorem 1. (b) If v(t), n(t) and r(t) ∈ L 2 [0, ∞), the mean-square asymptotical estimation tracking and stability of closed loop system is also guaranteed, i.e.,x(t) → 0, e(t) → 0, u(t) → 0, x(t) → 0 as t f → ∞ in the mean-square sense.
Proof. See Appendix B.
From Theorem 2, it can be seen that the proposed HJIE-embedded DNN-based robust observer-based reference tracking scheme by Adam learning algorithm in Fig. 1 can approach to the theoretical H ∞ observer-based output feedback reference tracking control design in Theorem 1(a) when ε(θ s (t)) of HJIE approaches to zero after the deep learning process of Adam learning algorithm in (21)-(23). In Theorem 1(b), if the nonlinear stochastic system in (1) is free of v(t) and n(t), i.e., v(t) = 0 and n(t) = 0, then the minmax H ∞ observer-based output feedback reference tracking control could achieve the asymptotical estimation and reference tracking ability, i.e.,x(t) → 0 and e(t) → 0 as t → ∞. Therefore, as ε(θ s (t)) → 0, the proposed HJIE-embedded DNN-based robust observer-based output feedback control scheme in Fig. 1 can also achieve the asymptotical estimation and reference tracking ability of nonlinear stochastic system in (1) with v(t) = 0 and n(t) = 0.

Remark 3.
In the off-line training phase, we input the estimation errorx(t) and the reference tracking error e(t) into HJIE-embedded DNN as shown in Fig. 1. According to Theorem 2, we can train HJIE-embedded DNN to output and H ∞ observer gain L * (x(t)) if ε(θ s (t)) calculated by HJIE ε approaches to 0. However, in practical applications, we always stop the off-line training phase in Fig. 1 and shift to on-line operation phase if |ε(θ s (t))| ≤ δ for a prescribed small value δ or the number of training time steps approaches to a specified number S in (21). In this study, the number of training time steps S = 30 is given in the following design example.

Remark 4.
In the on-line operation phase as shown in Fig.  1, based on the training weighting parameters θ s (t) of DNN in the off-line training phase, y(t) can be generated by real physical system of (1) with real external disturbance v(t) and measurement noise n(t) through H ∞ observer-based output feedback control u * (t). We inputx(t) and e(t) into DNN to FIGURE 3: The flow chart of the HJIE-embedded DNN-based H∞ observer-based reference tracking control of nonlinear stochastic sample-data system with external disturbance and measurement noise in (25). In the on-line operation phase, the output y(t) is generated by the physical system with the worst-case external disturbance v * (t) and measurement noise n * (t) through H∞ observer-based control input u * (t). Since x(t) is unavailable, which is needed in calculating HJIE in (19), we need Luenberger observer in (26) to generatex(t) and state estimation error equation in (28) to generatex(t) to obtain x(t) =x(t) +x(t) for HJIE in (19) and e(t) = x(t) − r(t) as input to the trained DNN to output ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T to generate L * (x(t)) in (15) for observer and estimation error dynamic and u * (t) = K * (x(t), e(t)) in (14) for nonlinear stochastic system and estimation error equation.

generate ∂V (x(t),e(t),t)
∂[x T (t) e T (t) t] T to produce u * (t) and L * (x(t)) for the robust H ∞ observer-based reference tracking control of nonlinear stochastic system in (1). However, if |ε(θ s (t))| > δ during on-line operation phase, the weighting parameters θ s (t) of DNN can be still updated by Adam learning algorithm in (21)-(23) in the on-line operation phase without the influence on DNN-based control.

VOLUME 4, 2016
for nonlinear stochastic continuous system in (1)-(5) is modified as Fig. 3 for nonlinear stochastic sample-data system in (25)- (28). Based on the above analyses of the nonlinear stochastic sample-data system in (25)- (28), the flow chart of HJIE-embedded DNN-based H ∞ observer-based output feedback reference tracking control scheme is modified in Fig. 3. In the following simulation example of H ∞ observerbased reference tracking control of quadrotor UAV, for the convenience of design, the nonlinear stochastic sample-data systems (25)-(28) are employed for practical application.
Remark 5. In this study, unlike the conventional big datadriven DNN schemes, the training datax(t) and e(t) of DNN are generated by system model, observer model and estimation model, respectively. Furthermore, the theoretical result of HJIE for H ∞ observer-based reference tracking control strategy is also employed to train DNN to achieve the robust H ∞ state estimation and H ∞ reference tracking performance simultaneously, which are not easily achieved by the conventional big-data driven DNN approaches. Therefore we could save a much amount of training time and training data of HJIE-embedded DNN. Moreover, with the well-trained DNN with a large amount of initial conditions in the off-line training phase, we could also avoid the instability of the H ∞ observer-based output feedback tracking control of nonlinear stochastic system in (1) in the beginning of online operation phase.

Remark 6.
The minmax H ∞ observer-based output feedback reference tracking control design problem in Theorem 1 can be reduced to the following optimal H 2 observer-based output feedback reference control design problem if v(t) = 0 and n(t) = 0 in (1) and ρ 2 = ∞ in (6), i.e., the robust H ∞ observer-based reference tracking control strategy becomes the following optimal H 2 quadratic observer-based reference tracking control strategy if v(t) = 0, n(t) = 0 and for the nonlinear system in (1). Then the H 2 optimal control gain and observer gain of the observer-based reference tracking control law in (5), which achieve the minimization in (29), are given as follows: L * (x(t)) = 1 2 ( ∂V (x(t),e(t),t) ∂V (x(t),e(t),t) ∂x(t)

2C
T (x(t)) where V (x(t), e(t), t) > 0 with the initial condition V (0, 0, t) = 0 is the solution of the following HJIE, [13]. Therefore, with a small modification, the proposed HJIE-embedded DNN-based H ∞ observer-based control scheme in Fig. 1 and Fig. 3 can be also used to treat the optimal H 2 quadratic observer-based output feedback tracking control of nonlinear stochastic system in (1) without consideration of external disturbance and measurement noise.

IV. SIMULATION EXAMPLE
After the introduction of HJIE-embedded DNN-based robust H ∞ observer-based output feedback reference tracking control design for nonlinear stochastic system with external disturbance and output measurement noise in (1), an H ∞ observer-based output feedback reference tracking design example of quadrotor unmanned aerial vehicle (UAV) is given in this section to validate the reference tracking performance of the proposed scheme in comparison with robust H ∞ T-S fuzzy observer-based output feedback reference tracking control method [16]. The dynamic system of a quadrotor UAV is described by two reference frames including the inertial earth-fixed frame (x e , y e , z e ) and the body-fixed frame (x b , y b , z b ) as shown in Fig. 4.
With the above definitions and notation, the dynamic equation of quadrotor UAV in Fig. 4 can be written in form of two subsystems corresponding to translational motion (referring to the position (x(t), y(t), z(t)) of mass center of UAV) and angular motion (referring to the attitude (ϕ(t), θ(t), ψ(t)) of UAV). Based on the Newton-Euler method, the stochastic quadrotor UAV system with external disturbance and output measurement noise can be described as follows [28]: where x 1 (t), y 1 (t), z 1 (t) ∈ R 1 define the locations of the UAV in x-, y-, and z-axes in the Cartesian coordinate space with respect to the inertial frame, x 2 (t), y 2 (t), z 2 (t) ∈ R 1 define the velocities of the UAV in x-, y-, and z-axes in the Cartesian coordinate space with respect to the inertial frame, ϕ 1 (t), θ 1 (t), ψ 1 (t) ∈ R 1 define the attitudes of the UAV in the Euler angles with respect to the body frame and ϕ 2 (t), θ 2 (t), ψ 2 (t) ∈ R 1 define the angular velocities of the UAV in the Euler angles with respect to the body frame, respectively. v x (t), v y (t), v z (t) are the external disturbances of the UAV in the three translation dynamics and v ϕ (t), v θ (t), v ψ (t) are the external disturbances caused by unexpected rotation force in roll, pitch and yaw dynamics, respectively. m ∈ R + is the total mass of the UAV, J x , J y , J z ∈ R + are the moments of inertia of the UAV, g ∈ R + is the acceleration of gravity, K x , K y , K z , K ϕ , K θ , K ψ ∈ R + denote the aerodynamic damping coefficients of the UAV. F (t) ∈ R 1 represents the total thrust and τ ϕ (t), τ θ (t), τ ψ (t) ∈ R 1 represent the rational torques caused by the four rotors of the UAV. l ∈ R + denotes the distance between epicenter of the UAV and the rotor axis. c ∈ R + denotes costant of force-tomoment factor.
In this observer-based reference tracking design example, let us denote the state vector X(t) of quadrotor UAV and the desired reference target r(t) to be tracked by quadrotor UAV as follows: Then the tracking error between the state X(t) of UAV and the desired reference target r(t) is defined as Thus, based on (33), the stochastic quadrotor UAV system with external disturbance and measurement noise can be described as the following nonlinear stochastic systeṁ where X(t) is defined in (34) and VOLUME 4, 2016 According to Theorem 1, the following H ∞ Luenberger observer-based output feedback control is designed to achieve the H ∞ observer-based reference tracking strategy in (6) or (10) · X(t) = F (X(t)) + G(X(t))u(t) + L * (X(t))(y(t) − C(X(t))) u(t) = K * (X(t), e(t)) (14) and H ∞ observer gain L * (X(t)) = 1 2 ( ∂V (X(t),e(t),t) T (X(t)) in (15), where the estimation errorX(t) and reference tracking error e(t) of the quadrotor UAV are generated, respectively, by · X(t) =F (X(t)) +G(X(t))u(t) − L * (X(t))C(X(t)) + D(X(t)) −L * (X(t)) v(t) anḋ e(t) = F e (e(t), t)+G e (e(t), t)u * (t)+D e (e(t), t)v(t) (39) where F e (e(t), t) = F (e(t) + r(t)) −ṙ(t)
In this design example, the parameters of quadrotor UAV in (33) are given as follows: K x = K y = K z = 0.01N s 2 /rad, J x = J y = J z = 0.1N s 2 /rad, m = 2kg, l = 1.2m, K ϕ = K θ = K ψ = 0.01N s 2 /rad, c = 1, g = 9.8m 2 /s. The weighting matrices Q 1 = 0.01diag{2I 6 , 5I 6 }, Q 2 = 0.01I 12 and R = 10 −4 I 4 are given in the H ∞ observer-based reference tracking control strategy in (6). The sampling time h is 0.01s and the terminal time t f = 30s. The random external disturbances are supposed Gaussian noises with the probability distribution functions as follows: v N (0, 0.01) and v ψ (t) ≑ N (0, 0.01) where N (0, σ) denotes the Gaussian distribution with 0 mean and σ standard deviation. The distribution function of measurement noise n(t) ≑ N (0, 0.1I 12 ). The desired reference position and yaw angle trajectories are given as x d (t) = 10 sin(0.5t), y d (t) = 10 cos(0.5t), z d (t) = t and ψ d (t) = 0, respectively, and ϕ d (t), θ(t) are given in (40). This is a circle with a radius of 10 meters in x − y axis, rising at a constant velocity on the z−axis.
According to Theorem 1, the minmax H ∞ observer-based reference tracking control design needs to solve the following HJIE for H ∞ control gain K * (X(t), e(t)) in (14) and H ∞ observer gain L * (X(t)) in (15) for the observer-based output feedback control in (37) to achieve the robust H ∞ observerbased reference tracking control strategy in (6) with ρ 2 = 2.  [16].
Since it is very difficult to solve ∂V (X(t),e(t),t) T (X(t)) in Theorem 1 for the H ∞ observer-based output feedback controller in (5) to achieve the H ∞ reference tracking of quadrotor UAV. Therefore, the proposed HJIE-embedded DNN-based H ∞ observer-based reference tracking scheme based on stochastic sample-data systems in (25) The desired references and their corresponding attitude trajectories by the proposed H∞ HJIE-embedded DNN-based observer-based tracking control scheme and the H∞ observer-based T-S fuzzy tracking control scheme [16].
an HJIE layer with input ∂V (X(t),e(t),t) ∂[X T (t) e T (t) t] T and output ε(θ(t)) as shown in Fig. 2 , v θ (t) ≑ 0.01 sin(0.5t) and v ψ (t) ≑ 0.01 sin(0.5t) is also shown in Fig. 13. In the transient response, the UAV has fluctuation in the proposed method due to the initial value of control inputs.
In the real application with DNN method, at the beginning of off-line training phase, we need to randomly select the initial training data near the state estimation errorx(t) and reference tracking error e(t). This will significantly affect the training performance. If the domain of initial conditions is limited by VOLUME 4, 2016 the random selection in the off-line training phase, the state x(t) =x(t) +x(t) and e(t) = x(t) + r(t) may be far from the training data during the on-line operation phase. Hence, it will limit the domain of ∂V (x(t),e(t),t) of HJIE which will cause the fluctuation in the transient response. In this situation, the error ε(θ k (t)) will be larger than a small prescribed δ, (i.e., |ε(θ k (t))| > δ) so that we need to start on Adam learning algorithm again for updating weighting parameters of DNN. The computational complexity of H ∞ HJIE-embedded DNN-based observerbased reference tracking control scheme can be approximately calculated as O(Ln), where L is denoted as the number of layers in the DNN and n is denoted as the dimension of the state estimation errorx(t) and reference tracking error e(t). The real H ∞ observer-based output feedback reference tracking control performance in (10) is also calculated as follows: In comparison with the proposed method, the H ∞ observer-based T-S fuzzy reference tracking control design in [16] is carried out and the results are shown in Figs. 5-7, 11-12 when compared with the proposed H ∞ HJIEembedded DNN-based observer-based reference tracking control scheme. In the transient response, the simulation results of H ∞ observer-based T-S fuzzy reference tracking control have large fluctuations in the position and attitude except for the x(t), y(t) and z(t). Due to the limitaion of the number in fuzzy rules, the H ∞ observer-based T-S fuzzy reference tracking control design is only employed to track a small size of reference trajectory. The way to improve the tracking performance is to increase the number of fuzzy rules. However, the observer-based T-S fuzzy control law h i (z(t))h j (z(t))(A ix (t) + B i u(t) + L i (y − C jx (t))) and u(t) = 125 i=1 h i (z(t))K ix (t) with 125 fuzzy rules is used in this simulation. We need to solve 125 2 BMIs for fuzzy controller and fuzzy observer in the H ∞ observerbased T-S fuzzy control design. If we want to increase the fuzzy rules to increase the performance, more computations will be required. In addition, the robust H ∞ observer-based T-S fuzzy controller needs to compute the above observer and control law at every time instant. The real tracking control performance of the H ∞ observer-based T-S fuzzy reference tracking control design is also calculated as ρ * ≈ 7.61 which is poor than the proposed method because of the worse tracking results in the transient response. The computational complexity of H ∞ observer-based T-S fuzzy reference tracking scheme is calculated as O(L ′ 2 n), where L ′ denotes as the number of the local systems and n is the dimension of the state estimation errorx(t) and reference tracking error e(t).

Remark 8.
In the field of observer-based tracking control designs at present, there exists no study to use the conventional big data-driven DNN method to deal with this nonlinear observer-based control design problem due to the unavailable empirical data for training DNN. In general, the conventional DNN is based on a very large amount of input/output empirical data pairs to train the weighting parameters of hidden layers in DNN by optimizer from the big data perspective. However, in the nonlinear H ∞ observerbased reference tracking control design problem, x(t) is unavailable and needs to be obtained from x(t) =x(t) +x(t). Therefore, we can not compare with the conventional big data-driven DNN method in nonlinear H ∞ observer-based reference tracking control design. Instead, we show the H ∞ observer-based T-S fuzzy reference tracking control design to compare the tracking control performance with the proposed H ∞ HJIE-embedded DNN-based observer-based reference tracking control scheme.

V. CONCLUSION
In this study, a novel HJIE-embedded DNN-based H ∞ observer-based control scheme is proposed to directly solve ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T of the nonlinear partial differential HJIE of robust H ∞ observer-based reference tracking control design problem of nonlinear stochastic system under external disturbance and measurement noise. In order to overcome the unavailable external disturbance v(t) and measurement noise n(t) for system model to generate training data and unavailable x(t) and e(t) = x(t) − r(t) in HJIE, we used v * (t) and n * (t) to replace v(t) and n(t) without influence on the H ∞ estimation and control, and employed estimation dynamic to generatex(t) and estimation error model to generatex(t) so that we could obtain x(t) =x(t) +x(t) and tracking error e(t) = x(t) − r(t) so that we could calculate the error HJIE ε = ε(θ(t)) in (20) for Adam learning algorithm to train DNN to solve ∂V (x(t),e(t),t) ∂[x T (t) e T (t) t] T of HJIE for the design of H ∞ observer-based reference tracking control design. We have proven that if the approximation error of HJIE approaches to 0 by DNN through Adam learning algorithm, the proposed DNN-based estimator-based reference tracking control scheme could achieve the H ∞ estimator-based reference tracking control strategy unlike the conventional big data-driven DNN only for classification and recognition. Based on system modeling and theoretical result of HJIE for nonlinear H ∞ estimator-based reference tracking control of nonlinear stochastic system, the proposed DNN-based scheme could efficiently achieve the nonlinear H ∞ estimation and reference tracking control design simultaneously with much saving of training data and time. Finally, a design example of H ∞ observer-based reference tracking control quadrotor UAV system is also given to illustrate the design procedure and validate the effectiveness of the proposed method in comparison with the conventional robust H ∞ T-S fuzzy observer-based tracking control of nonlinear dynamic system.

Appendix A: Proof of Theorem 1
By the indirect method, the minmax H ∞ observer-based reference tracking strategy in (10) is transformed to the equivalent constrained minmax Nash quadratic game strategy in (11). Based on two-step method, in the first step, we need to solve the minmax Nash quadratic game problem in (12). In the second step, we need to guarantee J ≤ E{V (x(0), e(0), 0)}. Therefore, Theorem 1 will be proven as follows: (a)(i) Step 1: From (12), we get J = min , e(t), t))dt} (A1) By the fact V (x(t), e(t), t) is the Lyapunov function of the augmented time-varying nonlinear stochastic error system in SubstitutingḠ(x(t), e(t), t) = G (x(t)) G e (e(t), t) and (A2) into (A1), we get From (A4), it is seen that the worst-case v * (t) n * (t) in (13) and the optimal u * (t) = K(x(t), e(t)) in (14) make the involved quadratic terms be zero achieve min u(t)=K(x(t),e(t)) ) T F (x(t)) − L(x(t))C(x(t)) +( ∂V (x(t),e(t),t)