DOBC Based Fully Probability Design for Stochastic System With the Multiplicative Noise

This paper proposes a Fully Probabilistic control framework for stochastic systems with multiplicative noise and external disturbance. The proposed framework consists of two main components, the disturbance observer based compensator to reject the modelled disturbance, and the Fully Probability Design (FPD) controller to achieve the regulation objective. The disturbance observer is developed based on Bayes’ theory following a probabilistic framework. Compared with the conventional FPD, the new framework in this paper is extended to deal with multiplicative noise, and at the same time improve the performance of the control system by rejecting external disturbances. The convergence analysis of the estimation and control processes is also provided. Finally, a numerical example is given to illustrate the effectiveness of the proposed control method.


I. INTRODUCTION
Many real-world systems are inherently stochastic, affected by external disturbances and noise, and operate under high levels of uncertainty [1]- [4]. Therefore, stochastic control methods have attracted much attention in the last few decades, mostly focusing on designing robust controllers that take knowledge about disturbances and uncertainty into consideration. To deal with the effect of the disturbance, disturbance-observer-based control (DOBC) strategies have been considered since the 1980s [5]. They have also been successfully applied to various practical systems such as mechatronic systems [6]- [8] and aerospace systems [9]- [11]. The control task in DOBC is usually divided into two subtasks. In the first subtask, a disturbance observer is designed to estimate the disturbance and then later used to cancel its effect on the dynamics of the system. The second subtask considers the design of a controller that will achieve the main control objectives such as making the system state follow a predefined desired state values, or regulating the system state around its fixed point. Under this framework, a large amount of literature and research ideas have been investigated. For instance, in [12], a novel control The associate editor coordinating the review of this manuscript and approving it for publication was Feiqi Deng . method has been proposed for Markovian jump systems with multiple disturbances by combining DOBC with H ∞ control. Besides, a Lyapunov-based nonlinear disturbance observer for unknown two link manipulator has been studied in [13]. The work in [14] considered the design of robust variance-constrained composite control problem for linear uncertain discrete-time stochastic systems. Despite some of the remarkable developments and the intensive research work on DOBC, most of the DOBC based approaches are presented for continuous systems and more important developed in a deterministic way. Thus they lack consideration of noise effects, which have limited their applications to real-world stochastic systems.
On the other hand, the development of more effective control algorithms for stochastic systems with random noises is another critical topic in stochastic systems control. Consequently, a considerable amount of literature has been published focusing on minimising the effect of systems noises. Examples include the linear quadratic Gaussian (LQG) method [15], the minimum entropy control [1], [2], and the H 2 /H ∞ control [16].
To address the above limitations of the currently developed DOBC methods, this paper proposes a comprehensive approach for the development of an active control algorithm that can take multiplicative noises into consideration in the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ derivation of the optimal control law as well as reject the effect of any external disturbances that might be acting on the system. The proposed approach is following a fully probabilistic framework where a randomised controller is derived from dealing with systems noises and uncertainties and a probabilistic model is used to estimate and reject the effect of the disturbance. The first part of designing the randomised controller is based on the probabilistic description of the closed-loop system. Here the randomised controller is the minimiser of the Kullback-Leibler divergence (KLD) of the system closed-loop description to the desired one. The idea of FPD is not new. It was initially proposed by Karny [17], and then further developed and applied in much literature. For example, a novel distributed FPD approach is presented in [18] for large, complex, noisy and highly connected systems. In [19], a generalised fully probabilistic controller design was studied for stochastic linear Gaussian systems where the uncertainty introduced by the model discrepancy is estimated as a function of the system inputs. Moreover, [20] and [21] investigated a probabilistic Dual Heuristic Programming (DHP) adaptive critic method to minimise the computational loads of FPD caused by the evaluation of the randomised optimal controller which involves multivariate integration steps. However, current and previous developments on FPD have not considered the class of stochastic systems with multiplicative noise, despite its relevance to various physical systems such as biological movement systems [4]. For these systems, more robust controllers need to be developed to minimise the effects of the multiplicative noises [22]. Therefore, an additional contribution to this paper is to further develop the FPD control method such that it takes the multiplicative noise into consideration in the derivation of the randomised optimal controller. To summarise, the regulation problem for a class of stochastic systems with multiplicative noise and external disturbance will be considered in this paper. For this purpose, a fully probabilistic framework is proposed where a randomised controller is designed following the FPD approach and a probabilistic DOBC are combined together. The architecture of the proposed control framework is shown in Fig 1. As can be seen from this Figure, the proposed framework consists of two main components: the disturbance observer compensator to eliminate the effect of the disturbances on the system dynamics and the FPD to bring all the system states back to zero and at the same time ensure that they track the desired distribution. Unlike most existing literature on DOBC, the proposed disturbance observer in this paper is developed following a probabilistic approach using Bayes theory that is more appropriate for stochastic systems. In addition, the FPD procedure is extended in this paper to take multiplicative noises into consideration when deriving the optimal randomised controller.
The remainder of this paper is organised as follows. Section II formulates the problem statement. Section III investigates the disturbance observer design while Section IV describes the FPD. In Section V, the convergence analysis is stated. Finally, the proposed algorithm is applied to a numerical example in Section VI to demonstrate its effectiveness and the conclusion is summarised in Section VII.

II. PROBLEM STATEMENT
This paper considers the following class of stochastic linear discrete time systems, where x k ∈ n is the system state, u k ∈ m is the control input, and A, B, and F are the parameter matrices with appropriate dimensions. Also, v k ∈ is a Gaussian noise with zero mean and covariance matrix Q, and d k ∈ m is an external disturbance which is assumed to be described by the following state space model, where τ k−1 ∈ l represents the disturbance state, W , H and V are assumed to be known constant matrices with appropriate dimensions and δ k ∼ N (0, R) is a Gaussian random noise. In addition, δ k and v k are mutually independent with each other. For stochastic systems where the disturbance term d k−1 in Equation (1) does not exist, it is usually sufficient to design a single controller that can be optimised in order to achieve the required performance of the system. However, for stochastic systems that are affected by external disturbances as stated in Equation (1), although a single controller might be able to drive the system state in a prespecified required manner, but it might not be robust to sudden effects that may result from the external disturbance. Under these conditions, researchers considered the design of an additional controller that is designed to cancel the disturbance effect as its primary objective. Here, we follow the same approach of designing an additional controller, but in addition since the system described in Equation (1) is not only affected by disturbance, but also inherently stochastic our main controller will be designed following a fully probabilistic approach. In order to cancel the effect of the disturbance, our secondary controller will be based on designing an observer that can predict the disturbance that will affect the system and then cancel its effect. Because the system is stochastic, we again adhere to our probabilistic framework; thus we design a probabilistic observer based on Bayes' theorem. The design procedure of the controller will be discussed in the following sections, but first we discuss the design process of the probabilistic disturbance observer.

III. DISTURBANCE OBSERVER DESIGN
The objective of this section is to discuss the design procedure of the probabilistic observer that is required to estimate the disturbance d k defined in Equation (3). As discussed earlier once this estimate of the disturbance becomes available, it will be taken as the negative of the second control input, the DBOC, thus eliminating the effect of the disturbance on the system dynamics. However, as can be seen from Equation (3), the disturbance d k can only be observed through its state τ k . Therefore, the observer of τ k is developed here instead of d k . Consequently, the state of the disturbance, τ k can be estimated by observing its effect on the system state, x k which is assumed to be measurable in this paper. For this purpose we design a fully probabilistic observer that is based on the Bayes' theorem as follows, where P( τ k−1 | x k ) represents the posterior distribution, The recognition of Equation (4) implies that in order to evaluate the posterior distribution of τ k−1 , both its prior and the likelihood distributions need to be evaluated. The evaluation of the prior can be achieved by noting that at time k − 1, the probability distribution of τ k−1 can be represented as, where τ k−1 and P k−1 are the expectation and variance of ( τ k−1 | x k−1 ), respectively. They can be easily evaluated to give, The evaluation of the likelihood function, on the other hand can be done by firstly defining e k as the error in predicting x k from τ k−1 as follows, where x k is the estimation of x k using the estimated τ k . Given the fact that A, B, x k−1 , u k−1 , V , W , and τ k−2 are all known or have been estimated, observing x k is equivalent to observing e k . Therefore, the approach we will follow here to calculate the posterior of τ k is to use P( e k | τ k−1 , x k−1 ) as the likelihood function instead of P( x k | τ k−1 , x k−1 ). Consequently Eq.(4) can be rewritten in the following form by replacing x k with e k , Using Equation (3) in Equation (1) and then substituting the result in Equation (7) yields the following expression for the error e k , From Eq. (9), the likelihood can be represented as, Having evaluated the prior and the likelihood functions, the posterior P( τ k−1 | e k ) can then be calculated following the Bayes' theorem as follows, However it is worth noting that the direct evaluation of the the posterior P( τ k−1 | e k ) from Equation. (11) requires heavy effort and high computational cost. Nonetheless, because τ k−1 and e k are multivariate Gaussian random variables, then the conditional distribution of τ k−1 conditioned on e k and x k−1 can be shown to be given by the following proposition. Proposition 1: The conditional distribution of τ k−1 conditioned on e k and x k−1 can be shown to be given by where,τ and where, Proof: The proof of the above proposition is given in the Appendix.

A. ALGORITHM OF THE PROPOSED PROBABILISTIC DISTURBANCE OBSERVER
To summarise the detailed implementation procedure of the proposed probabilistic disturbance observer discussed in Section III, we introduce the following definitionsτ − k and P − k for the prior estimation and prior covariance matrix respectively, andτ + k−1 and P + k−1 for the posterior estimation and the posterior covariance matrix respectively. Using these definitions, Equations (13) and (14) can be rewritten as follows, Then the following algorithm can be readily applied, 1 Initialize x 0 , u 0 ,τ − 0 and P − 0 ; 2 Calculate the prior estimation τ − k−1 using Equation (17); 3 Calculate the prior covariance matrix P − k−1 from Equation (18); 4 Calculate e k using the new obtained x k following Equation (7), repeated here, Calculate the observer gain L k−1 following Equation (19);  6 Update the prior estimation ofτ + k−1 from Equation (15);  7 Update the posterior covariance matrix P + k−1 according to Equation (16); 8 Move to the next sampling instant k = k + 1 and update the system using step 2. Remark 1: Please note that the delay between the measured variable x k and the latent hidden state variable, τ k−2 is 2 as can be seen from Equations (1), (2) and (3). Therefore to allow the exploitation of the Kalman filter approach to develop the required disturbance observer we use the prior distribution to predict the τ k values instead of using the posterior distribution as in the conventional Kalman filter approach.

IV. CONTROLLERS DESIGN
As discussed previously, the control applied to the system needs to be designed such that it cancels the effect of the disturbance on the system dynamics and at the same time achieves the control objective which is defined in this paper to be the regulation of the system state to zero. Therefore, the control input is designed to be consisting of two parts, where u 2 k−1 is designed to cancel the disturbance and u 1 k−1 is designed to achieve the control objective as will be discussed in the next sections.

A. DISTURBANCE BASED OBSERVER CONTROL
Once the disturbance observer has estimated the disturbance that affects the system dynamics, it can be used to design a control input to cancel the effect of this disturbance as follows, Using Equation (21) and Equation (22) the system state as given by Equation (1) can be rewritten as follows, where, Following the cancellation of the system disturbance using the control input u 2 k−1 as stated in Equation (22), the other control input that will achieve the control objective can be designed as discussed in the next section.

B. PROPOSED GENERALISED FULLY PROBABILISTIC CONTROL DESIGN
The designed control input defined in Equation (22) is developed such that it cancels the effect of the disturbance on the system dynamics. However, it will not be expected to control the system and make it performs in a prespecified desired manner. To be able to control the system and make it achieve a desired response, this section will explain the design procedure of the main controller of the system that will be designed to achieve this objective. Because of the stochasticity of the system, the main controller, u 1 k−1 will be designed here following the fully probability design approach as discussed earlier.
Using Equation (22) in Equation (1), the system state based on the estimated disturbance from the disturbance observer can be seen to be given by, where ε k−1 as defined in Equation (24) is the disturbance estimation error. Since the prior disturbance estimationτ − k−1 is applied to the system in Equation (25), the covariance of ε k−1 is the prior covariance P − k−1 as defined in Equation (18). This means that ε k−1 is subjected to the following distribution, Consequently the conditional distribution of the system dynamics of Equation (25) can be described by a Gaussian distribution with mean µ k and covariance k , where, and where P − k−1 can be evaluated using Equation (18). As can be seen from Equation (25), the system state at time k is affected by random noises thus its value can only be specified entirely using pdfs. One efficient method for designing a robust controller under these conditions is the FPD method [17]. However, in its original form, the FPD method is not developed to deal with the stochastic systems that are affected by multiplicative noises such as those considered in this paper. Therefore, the following sections will discuss how to extend the conventional FPD method such that it can take the multiplicative noises into consideration in the derivation of the optimal control law. Similar to the conventional FPD method we start here by defining the Kullback-Leibler divergence (KLD) between the actual joint pdf f (D) of the observed data D = (x(H ), u(H )) and the ideal joint pdf f I (D) on a set of possible D as the performance index to be optimised, with H being the control horizon. According to the chain rule for pdfs [23], the joint distribution of the probabilistic closed-loop description of the system dynamics could be evaluated as follows: where c(u 1 k−1 |x k−1 ) is the actual conditional pdf of the system controller u 1 k−1 . Similarly, the ideal probabilistic closed-loop pdf can be expressed in the same form as Equation (31) with ideal system model pdf s I (x k u 1 k−1 , x k−1 ) and ideal controller pdf c I (u 1 k−1 |x k−1 ), Using the Kullback-Leibler divergence (30), the closed loop joint pdf (31) and the desired closed loop joint pdf (32), the performance index can be formulated to be given by the following expression: where the first item in parenthesis in Equation (33) stands for the partial cost while the second item is the expected minimum cost-to-go function. The recursive formulation of the performance index (33) is similar to the Dynamic programming. Full derivation of Equation (33) can be found in [20]. The minimisation of the performance index (33) yields the following closed form solution for the required randomised controller u 1 k−1 , where, This solution is the general solution to the fully probabilistic control design irrespective of the type of distribution describing the system dynamics or whether the system is linear or nonlinear. The specific solution of the FPD for linear Gaussian systems will be derived in the next section for stochastic systems with multiplicative noise. It will be extended to consider this multiplicative noise in the derivation of the randomised controller.

C. GENERALISED PROBABILISTIC CONTROL FOR LINEAR SYSTEMS WITH MULTIPLICATIVE NOISE
Based on the FPD algorithm described by Equation (35), the generalised fully probabilistic control solution of the regulation problem for the stochastic linear system defined in Equation (25) with multiplicative noise is derived in this section. The regulation problem is considered here, which means that the objective of the controller is to return the system states to zero from their initial values. Therefore, the ideal distribution of the system described by Equation (25) is specified as, where 2 is a given covariance matrix. The ideal distribution of the controller can also be defined as follows, where is the ideal covariance of the control input. Note that the covariance indicates the allowed range of optimal control input. Remark 2: Note that k − 2 is assumed to be a positive definite matrix, reflecting our objective to decrease the variance of the system and reduce the system randomness. Both 2 and are chosen based on the requirement of the system. Usually we choose 2 as small as the required system randomness and we choose based on the constraints that are associated to the cost of the control input.
Before we design the controller, some assumptions and lemmas are given as follows.
(38) VOLUME 8, 2020 Assumption 1: For the considered regulation problem, it is expected that at steady state the covariance of the system dynamics, k will become close to the covariance of the specified ideal distribution, 2 . This means that the following inequality holds, Because of the linearity of the system defined in Equation (25) and the Gaussian form of its probabilistic description, the performance index (33) can be assumed to have a quadratic form which is described by the following theorem.
Theorem 1: Under Assumptions 1, using the ideal distribution of the system dynamics (36), the ideal distribution of the controller (37) and the real distribution of the system dynamics (27) into Equation (35), the performance index (33) can be shown to be given by, where, and where, Remark 3: Compared with the conventional FPD, the derived Riccati Equation, (41), in this work has an additional term M 2 . The manifestation of this additional term is due to the consideration of the multiplicative noise in the optimisation process of the randomised controller. This means that the derived control solution takes the covariance of the multiplicative noise into consideration and at the same time it works on making this covariance smaller as the noise is state dependent and can be made smaller.
The derivation details of the results given in Equations (40) to (43) are discussed below.
Proof: Recall Equation (35), we have, As we can see from Equation (44), the evaluation of β 1 u 1 k−1 , x k−1 and β 2 (u 1 k−1 , x k−1 ) are essential for the calculation of γ (x k−1 ). Therefore, using the second equation in Equation (35), β 1 u 1 k−1 , x k−1 can be evaluated as follows, By using Lemma 1, ln(| k | | 2 | −1 ) in Equation (45) can be further expressed as, Based on Lemma 2.6 in [25] and Assumption 1, the following holds, where n is the dimension of x.
Then β 1 can be obtained by substituting Equation (47) into Equation (45) and evaluating the integral.
Next the calculation of β 2 can be obtained using the third equation in Equation (35) as follows, where we used, where M 2 is defined in Equation (43).
Therefore, substituting Equation (48) and Equation (49) into Equation (44) and using Equation (37), we obtain, By completing the square with respect to u 1 k−1 , the integral in Equation (51) can be evaluated to give, Substituting Equation (52) back into Equation (51), γ (x k−1 ) can be finally obtained as, Equating quadratic terms in x k−1 in the right and left hand side of Equation (53) yields the difference of S k−1 defined in Equation (41) and the constant terms in Equation (53) gives w k−1 defined in Eq.(42). This completes the proof. Based on the quadratic form of the performance index (53) and the ideal distribution of the controller (37), the optimal controller form can be evaluated by substituting Equation (48) and Equation (49) in Equation (34), which yields the following theorem. Theorem 2: The distribution of the optimal controller for system (25) that minimizes the performance index (53) is given by, where, Defining M k = −1 2 + S k , and substitute the S k−1 and w k−1 as described in Equation (41) and Equation (42) into Equation (56), equation can be expressed as, It can be seen that the distribution given in Equation (57) is the optimal control distribution as specified by Equation (55). End of proof.

V. ALGORITHM OF THE PROPOSED FPD CONTROL FRAMEWORK
Following the derivation of the main controller given in Equation (57) and the secondary controller given in Equation (22), the following algorithm for the implementation of the proposed fully probabilistic control framework for stochastic systems with multiplicative noise and external disturbance can be summarized, 1 Initialize the system states, including the observer parameters and the FPD Riccati matrix S 0 ; 2 Design the disturbance observer following the provided procedure Equation (15)-Equation (19) and obtain u 2 k ; 3 Evaluate the Riccati matrix S k using Equation (41); 4 Calculate the FPD controller gain K k from Equation (54) and obtain u 1 k ; 5 Formulate the controller signal u k using Equation (21); 6 Move to the next sampling instant k = k + 1 and update the system using step 2. A flow chart is shown in Figure 2 to help explaining the implementation steps of the proposed probabilistic control framework.

VI. CONVERGENCE ANALYSIS
In this section, the convergence of the developed disturbance observer will be analysed. Therefore, the following conditioned theorem is introduced.
Theorem 3: If there exists a positive definite symmetric matrixP which can make the following inequality hold, the output of the proposed disturbance observerτ − k in Equation (17) will converge to the real disturbance state VOLUME 8, 2020 where, Proof: The residual error which is given by Equation (24) between the output of proposed disturbance observer τ − k and the real disturbance state value τ k can be further expressed as follows, Define a Lyapunov functionṼ k as follows, The derivation of the Lyapunov function is then given by, where we have introduced the definition, To satisfy the condition in Equation (58), there should exist a small positive number 0 < σ < λ max (P) that makes the following inequality holds.
where λ max (P) is the maximum eigenvalue of the matrix (P).
Then we have, where E[·] represents the mathematics expectation. Also, based on Equation (63) we have the following inequality, Combining Equation (66) and Equation (67), we have, which yields, where θ = 1 − σ λ max (P) and 0 < θ < 1. The following inequality can be easily obtained, From Equation (70), it can be concluded that, which competes the proof.

VII. SIMULATION RESULTS
This section demonstrates the effectiveness of the proposed generalized probabilistic framework to control and reject the disturbance effect on a stochastic system with multiplicative noise. The system is described by the following stochastic discrete time dynamical equation,  Here x k ∈ 2 and u k ∈ stand for the system measurable states and system controlled input, respectively. d k ∈ is the observable disturbance while τ k ∈ 2 is the disturbance state. v k ∈ and δ k ∈ represent the independent Gaussian noise with zero mean and variance Q = 0.14 and R = 0.23, respectively. The initial state of the system state x 0 is taken to be 0.5 0.6 T while the initial state of the disturbance τ 0 is taken to be 4 2 T . Moreover, the ideal variance of controller is set to 0.2 for faster converging speed. The ideal covariance of the state 2 should be chosen small and the value we chose is 2 = 0.00001 0 0 0.00004 .
The simulation results are shown in Fig 3 -Fig 7. The system states (blue solid line) are presented by Fig 3 and  Fig 4. To show the advantages of the DOBC framework, the system states which are generated without the compensative controller u 2 are also given by a red dashed line in the same figures in Fig 3 and Fig 4. From Fig 3 and Fig 4, it can be seen that after k = 20 the values of system states stay around zero. It means that the FPD algorithm successfully brings   all the states from initial values back to zero. In addition, compared with the states without u 2 , the DOBC based system states have much less randomness, which indicates that the disturbance observer-based controller narrows down the system disturbance, and the DOBC framework has achieved the desired performance. Fig 5 and Fig 6 show the real disturbance states (blue dashed line) and the estimated disturbance states (red solid line). We can see that the estimated disturbance states have become identical to the real disturbance states after the first few steps, which means that the proposed VOLUME 8, 2020 disturbance observer works well. Moreover, Fig 7 provides the DOBC based FPD optimal gain, which converges to a steady state value which indicates the convergence of the whole control process. Hence, these results confirm that the DOBC is successfully designed combining the FPD and all the desired results have been reached.

VIII. CONCLUSION
A novel approach has been provided in this paper by combining DOBC and FPD for a class of stochastic systems with multiplicative noises. The control framework is composed of two parts, where the first one is an anti-disturbance observer to cancel the modelled disturbance in the input channel, while the other is the FPD controller designed to bring all the states back to zero. Both DOBC and FPD have been designed based on Bayesian theory. Moreover, the FPD has been extended to deal with systems that are affected by multiplicative noise through the modification of the Riccati equation. Besides, the procedure of the disturbance observer and the procedure of the whole control framework have been provided with specific details. Moreover, the convergence analysis has been provided. To verify the proposed control algorithm, the associated simulation results have been produced via a numerical example and the expected results have been obtained. Future work will consider the application of the proposed methodology to real world systems.

APPENDIX PROOF OF PROPOSITION 1
Proof: To obtain the distribution of P( e k | τ k−1 , x k−1 ), the following lemma [26] is applied.
Lemma 2: Given that Y 1 and Y 2 have a bivariate normal distribution with means µ 1 and µ 2 , respectively, and a covariance matrix, 11 12 21 22 .
(75) This is described by .
Then the conditional distribution of Y 1 given Y 2 is, where, To apply Lemma 2 to our case, we let Y 1 corresponds to e k and Y 2 corresponds to τ k−1 . Based on Equation.(6), we can see that, Replace Y 1 , Y 2 , µ 2 and 22 by e k , τ k−1 , W τ k−2 and W P k−2 W T + HRH T , then according to Eq.(10), the mean value in Eq.(78) becomes, which yields, This yields, where we used the fact that 21 = T 12 = (W P k−2 W T + HRH T )V T B T since its a covariance matrix. Then by switching the position of Y 1 and Y 2 in Eq.(76), the joint distribution of τ k−1 and e k conditioned on x k−1 can be described by the following form, where,