Collaborative Estimation of State and Guidance Parameter for Interceptor Based on Variational Bayesian Technique

In this paper, we study the state estimation problem with an unknown guidance parameter of the interceptor, where the guidance parameter is modeled by normal-gamma distribution. To solve the problem, we propose a variational Bayesian (VB) based collaborative estimation algorithm for state and parameters, where the joint posterior distribution (JPD) of state and guidance parameter is approximated by a free-form distribution. The proposed algorithm can be divided into two-stage iterative steps: in variational Bayesian expectation (VB-E) step, with guidance parameter fixed, the cubature Kalman filter (CKF) is employed to realize state estimation. In variational Bayesian maximum (VB-M) step, the statistical characteristics of the guidance parameter are then deduced with state fixed. The state and the guidance parameter can be effectively estimated by performing VB-E and VB-M steps recursively. Finally, we illustrate the effectiveness of the proposed algorithm by a collaborative estimation problem in the two-dimensional aerial engagement scenario.


I. INTRODUCTION
Along with the development of guidance technology, the interceptor brings significant challenges to the survival of aircraft [1]- [3]. Correspondingly, a considerable effort has been made to study target-evasion guidance laws in an attempt to increase the survivability of aircraft. But unfortunately, the prior information about interceptor's guidance laws, guidance parameters (e.g., navigation constants), and states, which are required for the application of these target-evasion guidance laws, are seldom available in practice [4]- [6]. Therefore, during the aerial engagement, it is necessary for our aircraft to obtain guidance laws, guidance parameters, and states of the enemy interceptor by measurements.
For the sake of brevity and generality, in the following content, the enemy interceptor is called the pursuer; our aircraft is called the evader.
The associate editor coordinating the review of this manuscript and approving it for publication was Liang Hu . Considering the uncertainty caused by system models and parameters, the multiple-model adaptive estimator (MMAE) approach is usually employed to solve the above problems [7]. Generally, MMAE approach is based on the assumption that the dynamic model of pursuer belongs to a closed set of models, each of which represents a kind of guidance law with known guidance parameter. For example, the model set in [2] was composed of proportional navigation (PN) guidance law with different navigation constants. In [8], the model set was composed of PN guidance law, augmented PN (APN) guidance law, and optimal guidance law (OGL). In [9], the model set was composed of pure proportional navigation (PPN) and impact angle control (IAC) guidance law. However, once the dynamic model of pursuer is more complex than the preselected model set, the state estimation error will be too large to meet the requirements of practical application [7]. In addition, as the scale of the model set increases, the computational consumption will be too heavy to satisfy the requirements of online estimation [7].
To overcome the drawback of MMAE approach, some researchers make an assumption that the pursuer adopts VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ PN guidance law with unknown guidance parameter, for PN guidance law has been widely used in active air defense weapon system [10], and moreover, other guidance laws such as APN, ideal PN (IPN) and OGL can be approximated by PN guidance law with different guidance parameter [8], [11]. In this way, the identification problem of guidance laws and guidance parameters is transformed into the problem of collaborative estimation of state and guidance parameter. Currently, a strategy to add the parameter as a part of the state variable is widely used to estimate the state and parameter simultaneously. For example, in [12], [13], the state-augmentation (SA) approaches were proposed to estimate the pursuer's velocity, position and guidance parameter simultaneously, in which the state vectors were augmented to include the guidance parameter. The drawback of SA approach is that the changes of state dimension may change the observability of the original system and may lead to filter divergence [14]. Furthermore, once the system becomes unobservable, it is necessary to obtain additional measurement or prior information of the interceptor, such as accurate launch point, noise-free position and speed [12], [13], [15]. However, it is difficult to achieve in practice.
In recent years, Bayesian technology has been widely used in machine learning, state estimation and prediction applications due to its advantage of providing probability density function (PDF) [16]- [19]. Such PDFs can be obtained via statistical inference, Bayesian analysis, or can be the result of some recursive Bayesian methods with better robustness [20], [21]. In addition, variational Bayesian (VB) based method provides an effective method from the perspective of probability density function approximation, that is, to transform the cooperative estimation problem into the JPD approximation problem [22]- [26]. Different from the general description of the collaborative estimation problem [27], [28], the VB based method supplies an information-theoretic perspective and provides an analytical approximation to the posterior probability of the unknown parameters [25], which has benefits of low computational cost and analytic tractable [26]. For example, VB filters were employed to approximate the JPD of the state and unknown noise parameters, where the noises were modeled by Inverse-Gamma distribution [23], Gaussian mixture distribution [24] and Wishart distribution [25], respectively. However, unlike the independence of noise and state, the guidance parameter and state are tightly coupled, which makes the above method ineffective for our problem.
The objective of this paper is to design a filter algorithm for collaborative estimation of state and guidance parameter. The main idea behind the proposed algorithm is that the JPD of state and guidance parameter can be approximated by VB technique. We show that the state and guidance parameter can be estimated by performing VB-E and VB-M steps recursively. The main contribution of this paper can be summarized as follows: 1) We proposed a framework for collaborative estimation of state and guidance parameter by VB method. Different with MMAE [8], [9] and SA [12], [13] approaches, the proposed method does not change the observability of the original system, and can effectively reduce the computational complexity. 2) By modeling the unknown guidance parameter as normal-gamma distribution with unknown parameters, we propose a VB based adaptive filter to approximate the JPD by a free-form distribution. 3) Considering the strong nonlinearity of interceptor and the tight coupling between the state and guidance parameter, a variational Bayesian based cubature Kalman filter (VB-CKF) is proposed, and the detail steps of VB-E and VB-M are deduced. The rest of this paper is organized as follows. The problem formulation is given in section II. VB-based cubature Kalman filter is investigated in section III. Simulation results are shown in section IV. Finally, the concluding remarks can be found in section V.
Notations: Throughout the paper, R n represents the n-dimensional Euclidean space and R n×m represents the set of all n × m real matrices. diag{·} represents the diagonalization of block or scalar elements. exp{·} represents the exponential operation. N(x; µ, P) denotes the Gaussian distribution with mean µ and covariance P. NG(µ N ,k , τ N ,k ; Θ N ,k ) denotes the normal-gamma distribution with mean µ N ,k , precision τ N ,k , and parameter set Θ N ,k = {µ N ,k , λ N ,k , α N ,k , β N ,k }. E{·} denotes the expectation operation, and E p [f (·)] denotes an expectation of a function f (·) with respect to the p distribution.

II. PROBLEM FORMULATION
In this section, we first consider a two-dimensional aerial engagement scenario, and give the dynamic model of pursuer, then formulate a joint posterior distribution estimation model of state and guidance parameter based on Bayesian framework, finally formulate the problem under VB structure.

A. DYNAMIC MODEL OF THE PURSUER
Consider a two-dimensional aerial engagement geometry in Fig. 1. Here we have an inertial coordinate system fixed to the surface of a flat-Earth model (i.e., the x-axis and y-axis are the downrange and the altitude, respectively). For simplicity, the pursuer and the evader are assumed to be material point, and gravitational and drag effects have been neglected. To describe the problem better, the relevant state variable are defined primarily, as shown in Tab. 1.
The dynamic model of the pursuer can be expressed as where the acceleration command n c is determined by the guidance law and guidance parameter. Different guidance  laws represent different coupling relations between parameter and state. Similar to [8], [10], [11], we here assume that the pursuer intercepts the evader with PN guidance law, i.e., where N is the unknown guidance parameter and V c is the IM-BM closing velocity. Denote ∈ R 6 , and discrete (1), the dynamic model of the pursuer can be expressed as where N k ∈ R refers to guidance parameter at k, and w k ∈ R 6 is Gaussian noises with zero mean and variance Q k [7]. It should be noticed that we do not have prior knowledge about guidance parameter N k . Consequently, we consider it an unknown parameter in the dynamic system, and estimate the state and guidance parameter simultaneously.

B. JOINT POSTERIOR DISTRIBUTION ESTIMATION MODEL
In the interception problem, the evader needs to detect and track the pursuer through sensors. At present, there are many ways to detect the pursuer, such as radar ranging, infrared angular measurement, Doppler velocity measurement, and so on. Here we assume that the evader can obtain the relative elevation angle α between the pursuer and itself, which can be determined by where R PEx = R Px −R Ex and R PEy = R Py −R Ey . The discrete measurement equation can be written as where y k ∈ R 1 and v k is Gaussian measurement noises with zero mean and variance R k [7]. The relationship between state x k and measurement y k can be described by likelihood function p(y k |x k ). If guidance parameter N k can be obtained in advance, traditional Bayesian filter can compute the posterior distribution p(x k |y k ) through the following two steps: However, in practice, the guidance parameter N k is difficult to get exactly. Therefore, our aim is to develop a estimate algorithm to obtain the JPD p(x k , N k |y k ) with unknown guidance parameter N k .
Remark 2.1: Different from the existing joint estimation methods, we use the JPD p(x k , N k |y k ) to formulate the collaborative estimate problem of state x k and parameter N k , which has the following advantages: 1) Compared with the state-augmentation methods proposed in [12], [13], the JPD estimation model does not change the state dimension of the original system. Therefore, it is not necessary to analyze the observability of the system again, nor to add additional measurements. 2) In [2], [9], [29], the identification result was one element of the initial guidance parameter set, which was composed of discrete prior guidance parameters. By contrast, the JPD estimation model can directly estimate parameters in continuous domain without accurate prior information. 3) In [2], [9], [29], the computation complexity of multi-model methods is greatly increasing due to the interaction and mixing steps. By contrast, JPD estimation model only requires one model to characterize the pursuer motion, which effectively reduces the computation. Due to VB is a statistical machine learning method to approximate posterior distribution, which has benefits of low computational cost and tractable analytic [26], [30]. In what follows, we will formulate how to approximate the JPD p(x k , N k |y k ) in our problem with VB method.

C. VB APPROXIMATION
Since the normal-gamma distribution is the conjugate prior of a normal distribution with unknown mean and precision, we model the unknown guidance parameter as Our aim is to design an adaptive filter with unknown guidance parameter N k to estimate the JPD p(x k , N k |y k ). Since state x k and parameter N k are conditional independent for given measurement y k , the posterior distribution of x k−1 and N k−1 conditional on y k−1 can be written as The posterior distribution at time k will remain the same if the prediction distribution has the same form with (8). According to Chapman-Kolmogorov equation, the prediction distribution of x k and N k can be given as The predictionx k andP k can be approximated by traditional Bayesian filter (6). Since N is a constant, we arrive at the following dynamic model, Based on VB, we approximate the JPD of x k and N k by a free-form variational distribution q(x k , N k ) as follows, where q(x k ) and q(N k ) follow the Gaussian and normal-gamma distribution, respectively. The VB approximation can be achieved by minimizing the Kullback-Leibler (KL) divergence between p(x k , N k |y k ) and q(x k , N k ), wherec 0 is the constant term which is independent of the variational distribution q(x k , N k ), and is the evidence of lower bound (ELBO). Then our problem can be converted to As is shown in Fig. 2, the VB method solves the problem (15) by alternating between a VB Expectation (VB-E) step and a VB Maximization (VB-M) step. In VB-M step, the state is set to maximize the lower bound L(q(x k , N k )) concerning q(N k ) given the statistics gathered from the VB-E step. In VB-E step, the hidden variable variational posterior is set to minimize KL(q(x k , N k )||p(x k , N k |y k )), subject to q(N k ) lying in the family of constrained distributions. Mathematically, update equations can be written as:

III. VB BASED CUBATURE KALMAN FILTER
In this section, we first discuss the solutions of problem (16) and (17), then propose a VB based cubature Kalman filter algorithm (VB-CKF). In order to simplify the description, q 1 , q 2 , and q are used to denote q(x k ), q(N k ), and q(x k , N k ), respectively.

A. VB-M
In VB-M step, denote q * 1 is one solution to problem (16), i.e., q * 1 = arg max Since the ELBO can be written as Fixing q 2 , we have According to Bayesian rule L(q) can be expressed as In this paper, the exponential family distributions [31] are used to help us solve the problem. Specifically, the exponential family forms of p(x k |y k , N k ) and q(x k ) can be written as where η 1 , η 2 are natural parameters, and T (x k ) represents sufficient statistics, and A g (η 1 ), A g (η 2 ) are log-partition functions. For a Gaussian distribution N(x; µ, ), the natural parameter of N(x; µ, ) is −1 µ − 1 2 −1 , and corresponding sufficient statistics is x xx T [31]. The following Theorem shows the solution of VB-M (16).
Theorem 3.1: Denote q(x k ) is one solution to problem (16) with fixed q(N k ), and η 2 is the natural parameter of q(x k ). Then the optimal natural parameter of q(x k ) is given by η * 2 = η 1 , where η 1 is the natural parameter of p(x k |y k , N k ). Proof: Substituting (24) and (25) into (21), we have Recalling the property of exponential family distribution E[T (x k )] = Ag(η) [32], by taking derivative of L(q) with respect to η 2 , we have The optimal solution η * 2 is achieved by zeroing out η 2 L(q), i.e. by taking η * 2 = η 1 via (27).
is the natural parameter of p(x k |y k , N k ), which can be obtained by traditional Bayesian filter with fixed N k . In this paper, we provide a cubature rule based approximation to evaluate η * 2 . The details are shown in A.
Remark 3.1: Statex k and covariance P k can be approximated by Bayesian filter [33]. Different non-linear filter algorithms [33]- [35] are obtained by different choices for the Gaussian integral approximations, for example, Gauss-Hermite Kalman filter (GHKF) [33], the unscented Kalman filter (UKF) [34], and cubature Kalman filter (CKF) [35]. In this paper,x k and P k are obtained by CKF, because it has the advantages of convenient parameter selection and good convergence effect [35].

B. VB-E
Next we need to discuss the solution of problem (17). Similarly, denote q * 2 is one solution to problem (17), i.e., q * 2 = arg max where According to Bayesian rule Substituting (30) into (29), we have Similarly, we still use exponential distribution families to solve this optimization problem. Specifically, the exponential family forms of p(x k |N k , y k−1 ), p(N k |y k−1 ) and q(N k ) can be written as Theorem 3.2: Denote q(N k ) is one solution to problem (17) with fixed q(x k ), and η 5 is the natural parameter of q(N k ). Then the optimal natural parameter of q(N k ) is given by η * 5 = E q 1 [η 3 ] + η 4 , where η 3 and η 4 are the natural parameters of p(x k |N k , y k−1 ) and p(N k |y k−1 ), respectively. VOLUME 8, 2020 Proof: Substituting (32), (33), and (34) into (31), we have Recalling the property E[T (x k )] = Ag(η), by taking derivative of L(q) with respect to η 5 , we have The optimal solution η * 5 is achieved by letting η 5 L(q) = 0, that is η * 5 = E q 1 [η 3 ] + η 4 . Notice that E q 1 [η 3 ] can be obtained by the dynamic model and η 4 is the natural parameter of NG(µ N ,k , τ N ,k ; Θ N ,k ). The details are shown in B.
Theorems 3.1 and 3.2 provided an overall framework for provided an overall framework for collaborative estimation of state and guidance parameter based on variational Bayesian technique, which is shown in Fig. 3. We summarize VB based cubature Kalman filter (VB-CKF) at time k as shown in Algorithm 1, with a prior informationx k−1|k−1 , P k−1|k−1 and Θ N ,k−1 , and new measurement y k . Also, to improve the readability, we give the flow chart of the algorithm at time k, as shown in Fig. 4.
Clearly, in VB-E step,the traditional CKF is employed to get the statex  respectively [36], [37]. On the other hand, the operations in VB-M step are composed of a few simple addition and multiplication, and the computational complexity is far less than O(n 3 ) + O(l 3 ). Therefore, the computational complexity of our algorithm only relies on the number of VB iterations S. In practical application, S is usually between 1 and 5, which can ensure the fast convergence of the algorithm but not cause too much computational consumption [22], [24].

IV. SIMULATION ANALYSIS
In this section, we illustrate the effectiveness of the proposed VB-CKF for collaborative estimation of state and guidance parameter. Unless noted otherwise all the numerical results shown below are generated using a computer with Intel Core i7 − 8700 @3.20 GHz CPU, 8.0 GB memory and Windows operation system. The simulation environment is MATALB R2015a.

A. SIMULATION SCENARIO
Here we consider a two-dimensional aerial engagement scenario as shown in Fig. 1. It is assumed that the pursuer can accurately obtain the state of the evader and it adopts PN guidance law. The dynamic model of the pursuer is shown as (1). In addition, the dynamic model of the evader is given as where V Ex , V Ey are the components of evader velocity V E , and a Ex , a Ey are the components of evader acceleration n E , i.e., Similar to [38], we assume that the pursuer launches from the origin of the coordinate system at t 0 = 0s, i.e.,
Figs. 6 and 7 are estimated trajectory and guidance parameter estimation of the pursuer, respectively, which intuitively show that our proposed algorithm can solve the collaborative estimation problem. The performances of the VB-CKF are compared with following methods: VOLUME 8, 2020 • SAKF: State-augmentation Kalman filter in [12]. Simulation results are obtained over Nment = 500 Monte Carlo trials.
Consider the mean square estimation errors (MSE) shown as follows, where MSE is widely used for estimating performances of filters. In order to compare the estimation errors of guidance parameter, we define the total estimation errors of N k as In Figs. 8, 9, and 10, the MSE and TEEN among VB-CKF, MMKF and SAKF are compared respectively. In Table 2, the MSE, TEEN, and average CPU times of all tested methods   are listed. As the estimation error of SCKF is better than that of other methods, the error curves of VB-CKF, MMKF1 and MMKF2 in Figs. 8, 9, and 10 are too concentrated to distinguish. To clearly show the error of different methods, we add a local enlarged image in Figs. 8, 9, and 10. Obviously, the MSE of SAKF can not converge. The reason for this might be the fact that SAKF increases the dimension of state, which make the original system become locally weakly observable [12]. In contrast, our proposed algorithm does not change the observability of the original system, which means that no additional measurements are necessary to ensure the accuracy of estimation. As expected, the performance of MMKF2 is significantly better than that of MMKF1, due to the fact that the prior parameter set of MMKF2 contains true guidance parameter. However, It is worth to be pointed out that the state estimation accuracy of VB-CKF is the highest (see Figs. 8 and 9) even if the initial value of guidance parameter is set randomly. Furthermore, because a set of Kalman filters is run in parallel in the MMAE approach [7], our proposed algorithm has less computing cost (see Table 2), which implies better real-time performance.
In Fig. 10, TEEN of MMKF1 and MMKF2 converge to 0.25 and 0 respectively in the form of segmented step curve, because MMAE approach directly outputs an element from the initial guidance parameter set as the estimation result, which also leads to the high dependence of MMAE approach on the initial parameter set [2], [8]. In contrast, the proposed VB-CKF still achieve high estimation accuracy with random initial guidance parameter, which is higher than that of MMKF1 and slightly lower than that of MMKF2. In addition, our proposed algorithm does estimate the guidance parameter in continuous domains, which means that our algorithm is suitable for other guidance laws, such as APN, IPN and OGL, which can be approximated by PN guidance laws with different guidance parameter.

V. CONCLUSION
In this paper, we have proposed the online collaborative estimation algorithm for interceptor state and unknown guidance parameter, where the guidance parameter is modeled by the normal-gamma distribution. Variational Bayesian technique has been used to approximate the JPD of state and guidance parameter. Then VB-CKF has been proposed, which is divided into two-stage iterative steps: in VB-E step, the CKF is employed to realize the state estimation with guidance parameter fixed; In VB-M step, the statistical characteristics of guidance parameter are deduced with state fixed. Finally, the effectiveness of the proposed algorithm has been verified by simulations. Compared with MMKF and SAKF, VB-CKF has lower dependence on prior information, higher tracking accuracy and less calculation time.

APPENDIX A EVALUATE η * 2 VIA CKF
With fixed N k , the problem studied in this paper degenerates to a state estimation problem for general nonlinear system, i.e., and the p(x k |y k , N k ) degenerates to Therefore, using the traditional CKF, we directly estimate P k andx k to further obtain η 1 = P −1 2) Time Update • A set of cubature points are generated by where ξ i represents the i-th (i = 1, 2, . . . , 2n x ) column of the sampling matrix (47) • The propagated cubature points can then be given as where f (•) represents the dynamic process in system (43).
• Evaluate the predicted state and the predicted error covariance as follows,

3) Measurement Update
• Evaluate the cubature points where S k|k−1 S T k|k−1 = P k|k−1 . • The measurement propagation of cubature points can then be given as where h(•) represents the measurement process in system (43).
• Estimate the predicted measurement • Evaluate the Kalman gain where P xz,k|k−1 and P zz,k|k−1 represent cross-covariance and innovation covariance respectively. And the specific calculation processes of them are as follows: • Evaluate the updated state and the corresponding error covariancex where y k is the new measurement at time k. Obviously, we can easily get η 2 as follows APPENDIX B EVALUATE THE NATURAL PARAMETER η *

5
Notice that η 3 is the natural parameter of the Gaussian distribution p(x k |N k , y k−1 ), whereas η 4 and η 5 are the natural parameters of the normal-gamma distribution p(N k |y k ) and q(N k ), respectively. Obviously, the form of η 3 is different from η 4 and η 5 , which greatly increases the difficulty of directly calculating E q 1 [η 3 ]. Therefor, we simplify this problem appropriately, and the specific steps at time k are as follows:

B. CALCULATE η 4
We denote Θ i = {µ i , λ i , α i , β i }(i = 3, 4, 5) are the parameter sets of p(x k |N k , y k−1 ), p(N k |y k−1 ) and q(N k ), respectively. Since N is a constant, the prediction of N can be given as whereΘ N ,k = Θ N ,k−1 . The exponential family form of the above distribution has been written in (33), so Θ 4 = Θ N ,k−1 , i.e., (61) Based on the fact that a k is a part of x k , and only a k is related to N k . Therefore, we can use p(a k |N k , y k−1 ) instead of p(x k |N k , y k−1 ) for simplicity, i.e., In order to calculate the natural parameter η 3 in the above distribution, we use the relationship between a k and N k in dynamic model (1), i.e., where φ k = −V cλ sin λ V cλ cos λ is a constant vector in that x k is fixed. Since N k ∼ NG(µ N ,k , τ N ,k ; Θ N ,k ), the mean and variance of a k can be expressed as Then, we get Taking logarithm operation to both side of (65), we have ln p(a k |N k , y k−1 ) = ln After comparing the forms of (62) and (66), we have D. CALCULATE η *
XIBIN AN received the B.S. and M.S. degrees in control science and engineering from the Rocket Force University of Engineering (RFUE), China, in 2014 and 2017, respectively, where he is currently pursuing the Ph.D. degree in control science and engineering.
His research interests include machine learning, trajectory planning, and distributed systems.