A Novel Outlier-Robust Kalman Filtering Framework Based on Statistical Similarity Measure

In this article, a statistical similarity measure is introduced to quantify the similarity between two random vectors. The measure is, then, employed to develop a novel outlier-robust Kalman filtering framework. The approximation errors and the stability of the proposed filter are analyzed and discussed. To implement the filter, a fixed-point iterative algorithm and a separate iterative algorithm are given, and their local convergent conditions are also provided, and their comparisons have been made. In addition, selection of the similarity function is considered, and four exemplary similarity functions are established, from which the relations between our new method and existing outlier-robust Kalman filters are revealed. Simulation examples are used to illustrate the effectiveness and potential of the new filtering scheme.

non-Gaussian heavy-tailed noises, which are often induced by state and measurement outliers from external interference or unreliable sensors [2].
For such non-Gaussian filtering problems, the particle filter (PF) can achieve an approximate state estimate by modeling the noises as non-Gaussian heavy-tailed distributed and approximating the posterior probability density function (pdf) as a set of weighted random samples using the sequential Monte Carlo sampling technique [3]- [5]. The Gaussian sum filter (GSF) can be also used to address the non-Gaussian filtering problem by running a group of Kalman filters, in which a finite sum of Gaussian distributions are employed to model non-Gaussian noises, and the posterior pdf can be updated as a weighted sum of Gaussian pdfs [6]. Moreover, the interacting multiple model (IMM) filter is a promising approach to address model uncertainty, in which several subfilters are performed in parallel based on the preselected model set and the corresponding model transition probability matrix, and the subfilters interact with each other by fusing the state estimates and corresponding estimation error covariance matrices based on the recursive estimates of the mode probabilities [7]. Normally, the performances of the PF, GSF, and IMM filter rely heavily on the preselected distributions to model non-Gaussian state and measurement noises. Unfortunately, it is very difficult to select accurate distributions to model unknown and time-varying non-Gaussian noises, which are often induced by outliers so that the estimation accuracy of the PF, GSF, and IMM filter degrades significantly when inaccurate or even wrong noise distributions are used. The contribution of this article is, therefore, to provide a unified theoretical framework to solve the non-Gaussian filtering problem for a linear state-space model with unknown non-Gaussian heavy-tailed noises.
A large number of outlier-robust Kalman filters have been proposed to achieve a tradeoff between estimation accuracy and computational complexity. As a classic robust regression technique, the M-estimator is robust to measurement outliers and has been successfully extended to the Kalman filter setting [8]. By employing the influence function approach on the prediction error and residual error, many outlier-robust Kalman filters have been proposed based on the M-estimate method [9]. The Huber Kalman filter (HKF) is the most famous extension of the Mestimator to the Kalman filter setting, which utilizes a combined This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ l 1 and l 2 norm as a robust cost function, and a generalized robust maximum likelihood estimate is achieved by minimizing the Huber cost function [10]. The maximum correntropy Kalman filter (MCKF) is an alternative method to handle state and measurement outliers. The sum of the Gaussian kernel functions of the prediction error and residual error are selected as the robust cost function, and many MCKFs have been proposed by maximizing such robust cost functions [11]- [13]. Motivated by the fact that the outlier contaminated state and measurement noises often have non-Gaussian heavy-tailed distributions, many outlier-robust filters have been proposed by modeling the state and measurement noises as Student's t distributed [14]- [23]. These robust filters can be divided into two categories: Student's t filter and robust Student's t based Kalman filter (RSTKF). In the Student's t filter, the posterior pdf is approximated as a Student's t pdf with fixed degrees of freedom (DOF) parameter based on the Bayesian rule [14]- [16], nevertheless, in the RSTKF, the posterior pdf is approximated as a Gaussian pdf based on the variational Bayesian (VB) approach [22], [23]. For a linear system with moderately heavy-tailed state and measurement noises, the RSTKF can achieve better estimation accuracy than the Student's t filter but at the cost of higher computational complexity. Also, the adaptive Kalman filter based on the VB approach can to some extent address the heavy-tailed state and measurement noises induced by outliers based on the adaptive modifications of the one-step prediction error covariance matrix and measurement noise covariance matrix [24].
Although the HKF, MCKF, and RSTKF can all achieve better estimation accuracy than the standard Kalman filter for a linear system with outlier corrupted state and measurement noises, their interrelationships have as yet not been revealed. Except for the Gaussian scale mixture distributions, the design method of the existing RSTKF cannot be generalized to a general non-Gaussian heavy-tailed distribution, which limits its further improvement. In addition, it is worth studying whether the estimation accuracy of the existing HKF, MCKF, and RSTKF can be further improved. An advanced outlier-robust Kalman filtering framework is, therefore, required to reveal the interrelationships between the existing HKF, MCKF, and RSTKF and further improve the estimation accuracy of the existing HKF, MCKF, and RSTKF.
In this article, a statistical similarity measure (SSM) is introduced to quantify the similarity between two random vectors. The measure is then used to develop a novel outlier-robust Kalman filtering framework, in which lower bounds of the SSM between the state vector and the predicted state vector and that between the measurement vector and the predicted measurement vector are maximized, and the posterior pdf is approximated as Gaussian. To illustrate the effectiveness of the proposed framework, the effects of the approximation errors on the proposed framework are analyzed in detail, and the numerical and filtering stabilities of the proposed framework are also discussed. To implement the proposed framework, the fixed-point iterative algorithm and the separate iterative algorithm are proposed, and their local convergence conditions are also provided and compared. Furthermore, the selections of the similarity functions are presented, and four exemplary similarity functions are provided, from which the relations between the proposed method and the existing outlier-robust Kalman filters are revealed. Simulation results of a manoeuvring target tracking example illustrate that by selecting appropriate similarity functions, the proposed filters have improved estimation accuracy but higher computational complexities than the existing state-of-the-art filters.
The remainder of this article is organized as follows. In Section II, a novel SSM is proposed to quantify the similarity between two random vectors. In Section III, a novel outlier-robust Kalman filtering framework based on the SSM is proposed, and its error analyses and stability discussions are provided. In Section IV, two novel iterative algorithms for proposed outlier-robust Kalman filtering framework are proposed, and their local convergence conditions are provided. In Section V, we present the selections of the similarity functions. In Section VI, simulation results of a manoeuvring target tracking example and comparisons with existing filters are given. Conclusions are drawn in Section VII.

II. PROPOSED SSM
In this article, a novel SSM is proposed to quantify the similarity between two random vectors. The proposed SSM s(x, y) for random vectors x and y is defined as follows: denotes the expectation operation, · denotes the Euclidean norm, and p(x, y) denotes the joint pdf of random vectors x and y. In this article, the scalar function f (·) is named as the similarity function, and it must satisfy the following three conditions. 1) Condition 1: f (·) is a continuous function defined on [0, +∞). 2) Condition 2: f (·) is a strictly monotonically decreasing function:ḟ (t) < 0 for t ∈ [0, +∞).

3) Condition 3:
The second derivative of f (·) is nonnegative: f (t) ≥ 0 for t ∈ [0, +∞). The monotonically decreasing property of the similarity function f (·) can guarantee that the proposed SSM s(x, y) is increasing as the distance between x and y decreases. As a result, the proposed SSM conforms to the usual definition of a similarity measure that is in some sense the inverse of distance metrics. The higher similarity between random vectors x and y, the larger SSM becomes. The proposed SSM has some basic properties as follows.
if the Taylor expansion of the similarity function f (t) exists when t ≥ 0. Property 1 can be easily verified using the definition of the proposed SSM in (1), and Property 3 can be directly derived by exploiting the Taylor series expansion of the similarity function f (t) at t = 0. Next, we will prove Property 2 in the following Theorem 1.
Proof: See Appendix A. Property 2 guarantees that the proposed SSM has a maximum point, i.e., x = y. It is noteworthy that the proposed SSM s(x, y) is an upper bound of f (E[ x − y 2 ]), and they have the same maximum point, i.e., x = y.
Remark 1: The proposed SSM is a generalized similarity measure between two random vectors and encompasses existing similarity measures. For example, the proposed SSM s(x, y) is the negative mean squared error (MSE) between random vectors x and y when the similarity function is chosen as f (t) = −t. The proposed SSM s(x, y) is the correntropy between random vectors x and y when the similarity function is selected as [11], [13]. More interestingly, the generalized correntropy [25] between random vectors x and y is also a special case of the proposed SSM s(x, y) when the similarity function is chosen as f (t) = α 2βΓ(1/α) exp(−t α 2 β −α ) and the shape parameter satisfies the constraint 0 < α ≤ 2.
Remark 2: For the proposed SSM, except for the first two conditions, the similarity function f (·) has to satisfy the third condition. The third condition not only can facilitate the design of an outlier-robust Kalman filter but also can guarantee robustness to outliers and local convergence of fixed point iterations, as shown in Sections III, IV, and V.
The proposed SSM can be used in Bayesian inference. Different SSMs are achieved when different similarity functions f (·) are selected, based on which different state estimates can be obtained by maximizing the corresponding SSMs. Next, we will propose an outlier-robust Kalman filtering framework based on the proposed SSM.

III. PROPOSED OUTLIER-ROBUST KALMAN FILTERING FRAMEWORK BASED ON SSM
Consider a linear dynamical system described by a linear discrete-time state-space model as follows: where k is the discrete time index, x k ∈ R n is the state vector, z k ∈ R m is the measurement vector, F k ∈ R n×n and H k ∈ R m×n are, respectively, the known state transition matrix and measurement matrix, and w k ∈ R n and v k ∈ R m are, respectively, state and measurement noise vectors. In this article, the state and measurement noises are assumed to have non-Gaussian distributions that are, respectively, induced by state and measurement outliers.

A. Design of Outlier-Robust Kalman Filtering Framework
Similar to the standard Kalman filter, the proposed outlierrobust Kalman filtering framework is also composed of time and measurement updates. In the time update, the one-step predicted state vectorx k|k−1 and corresponding nominal prediction error covariance matrix P k|k−1 are calculated as follows: wherex k−1|k−1 and P k−1|k−1 are, respectively, the state estimate and corresponding estimation error covariance matrix at time k − 1, and Q k denotes the nominal state noise covariance matrix. Note that P k|k−1 is called the nominal prediction error covariance matrix since the used nominal state noise covariance matrix Q k is inaccurate in the presence of state outliers.
In the measurement update, we aim to achieve an approximate posterior pdf q * (x k ) ≈ p(x k |z 1:k ) through maximizing the sum of the SSM between S −1 k|k−1 x k and S −1 k|k−1x k|k−1 and the SSM between S −1 R k z k and S −1 where S k|k−1 and S R k are, respectively, the square roots of the predicted error covariance matrix P k|k−1 and the nominal measurement noise covariance matrix R k , i.e., where R k denotes the nominal measurement noise covariance matrix. Note that the 1-step statistical similarity measure based cost function in (4) is sufficient for designing an outlier-robust Kalman filtering framework in this article, in which the one-step predicted state vectorx k|k−1 and measurement vector z k at the current time are used to construct the cost function. The idea of this article can be extended to derive an outlier-robust Kalman smoothing framework based on a multiple-steps SSM cost function by resorting to some standard techniques for designing a maximum a posterior estimator in a Bayesian framework [26].
Considering that the one-step predicted state vectorx k|k−1 and measurement vector z k are known and deterministic quantities in the measurement update of the Kalman filter, the maximization problem in (4) can be reformulated as where f x (·) and f z (·) denote the similarity functions of state and measurement models, respectively. It is very difficult to achieve an optimal solution for the maximization problem in (6) since both the explicit form of posterior pdf q(x k ) and closed form solutions for the integrals are unavailable. To solve this problem, we propose to achieve an approximate q * (x k ) by approximating the posterior pdf q(x k ) as Gaussian and maximizing the lower bound of the cost function.
First, we propose to approximate the posterior pdf q(x k ) as a Gaussian pdf, i.e., where μ k and Σ k are, respectively, the mean vector and covariance matrix of the posterior pdf q(x k ). (6), the maximization problem with respect to the posterior pdf q(x k ) in (6) is approximately transformed into a maximization problem with respect to the posterior mean vector and covariance matrix, i.e.,

Substituting (7) in
where μ * k and Σ * k denote the optimal posterior mean vector and covariance matrix. Next, the cost function in (8) will be approximated as its lower bound, and then an approximate solution can be obtained.
Theorem 2: If the similarity functions satisfy Condition 3, i.e.,f x (t) ≥ 0 andf z (t) ≥ 0 for t ∈ [0, +∞), the maximization problem in (8) can be transformed as follows: where J 1 (μ k , Σ k ) is the lower bound of the cost function in (8) and formulated as where tr(·) denotes the trace operation of a matrix, and A k and B k are, respectively, given by Proof: See Appendix B. Define four auxiliary variables ξ k , λ k ,ξ k , andλ k as follows: and three auxiliary matrices as follows: where Δ μ k (μ k , Σ k ) and Δ Σ k (μ k , Σ k ) denote the Jacobian matrices of the approximate cost function J 1 (μ k , Σ k ) with respect to the posterior mean vector and covariance matrix, respectively, and Θ μ k (μ k , Σ k ) denotes the Hessian matrix of the approximate cost function J 1 (μ k , Σ k ) with respect to the posterior mean vector.

Theorem 3:
The optimal solution μ * k of the approximate cost function J 1 (μ k , Σ k ) is formulated as follows: whereP * k|k−1 andR * k are, respectively, the modified one-step prediction error covariance matrix and measurement noise covariance matrix given bỹ and the auxiliary parameters ξ * k and λ * k are given by and the auxiliary parameters A * k and B * k are given by Proof: See Appendix C.
Next, we will further confirm the extreme point μ * k in (15) is a maximum point or a minimum point, and present the monotonicity of the approximate cost function J 1 (μ k , Σ k ) with respect to the posterior covariance matrix Σ k .
Theorem 4: If the similarity functions satisfy Condition 2 and Condition 3 and the following inequalities hold: then both the Hessian matrix Θ μ k (μ * k , Σ * k ) and the Jacobian matrix Δ Σ k (μ k , Σ k ) are negative definite, i.e., Proof: See Appendix D. Theorem 4 implies that the extreme point μ * k in (15) is a maximum point of the approximate cost function J 1 (μ k , Σ k ). It can be also observed from Theorem 4 that the approximate cost function J 1 (μ k , Σ k ) is monotonically decreasing with respect to the posterior covariance matrix Σ k , and then the approximate cost function J 1 (μ k , Σ k ) can achieve a unique optimal solution Σ * k at the lower bound of the posterior covariance matrix. Next, we will determine the optimal posterior covariance matrix Σ * k . To obtain the maximum point Σ * k , we need to find a reasonable constraint to apply upon Σ k . Motivated by the fact that the covariance matrix of the posterior pdf is the negative inverse of the Hessian matrix of the least squares cost function in the traditional maximum a posteriori estimation framework, we propose a heuristic assumption that Σ k is not less than the negative inverse of the Hessian matrix It is worth noting that the well-known M-estimate employs a similar way to deal with the posterior covariance matrix, which is set as the negative inverse of the Hessian matrix of the robust cost function [8]- [10]. Since the cost function J(μ k , Σ k ) is monotonically decreasing with respect to Σ k , the optimal covariance matrix should be the negative inverse of the Hessian matrix, i.e., . Unfortunately, the Hessian matrix can easily lose its negative definiteness during the iterative computation, and then the filter was often found to halt its operation due to the numerical problem. To keep the negative definiteness, the pos- . That is to say, the lower bound of the posterior covariance matrix is further reduced, i.e., Exploiting (24) and (60), the maximum point Σ * k can be formulated as Employing the matrix inversion lemma [1, pp. 11-12] and (16), (17), and (25) gives

B. Error Analyses of the Proposed Framework
In Section III-A, three approximations are employed to derive an analytical solution for the original maximization problem in (6), which are listed as follows.
1) Approximation 1: The posterior pdf q(x k ) is approximated as a Gaussian pdf in (7). 2) Approximation 2: The original cost function is approximated as its lower bound by Theorem 2.

3) Approximation 3: The original Hessian matrix is approx
. First, we discuss the reasonability of Approximation 1. The outlier-contaminated state and measurement noises often have non-Gaussian distributions, and then the true posterior pdf p(x k |z 1:k ) also has a non-Gaussian distribution [14], [16]. Unfortunately, there is often not a mathematical formulation for a general non-Gaussian distribution. As a result, it is not possible to achieve an optimal solution of the maximization problem in (6) for a general non-Gaussian linear system. Motivated by the fact that the Gaussian approximation to the posterior pdf has been widely accepted in designing a cost-effective filter for a linear system with non-Gaussian noises, the posterior pdf is also approximated as a Gaussian pdf in this article, as shown in (7), based on which an approximately analytical solution can be obtained. Although such an approximation may impose an error on the posterior pdf, it often exhibits good estimation accuracy with reasonable computational complexity. Thus, the Gaussian approximation to the posterior pdf can provide a tradeoff between estimation accuracy and computational complexity.
Second, we analyze the effects of Approximation 2 on the optimal solution. To this end, the relation between the maximization problem in (8) and the maximization problem in (9)-(12) will be further revealed. Define four auxiliary variables as follows: where Y * 1k and Y * 2k are, respectively, the expectations of Y 1k and Y 2k with respect to the approximate posterior pdf q * (x k ) = N(x k ; μ * k , Σ * k ). Taking the first-order Taylor series expansions of the similarity functions f x (t) and f z (t) at t = Y * 1k and t = Y * 2k , respectively, we have where Proposition 1: The maximization problem in Theorem 2 and the maximization problem in (8) with the first-order Taylor approximations (27) and (29) 2k depend heavily on the variances of the random variables Y 1k and Y 2k . That is to say, the higher variances of the random variables Y 1k and Y 2k , the larger fluctuation ranges of the difference values Y 1k − Y * 1k and Y 2k − Y * 2k will be generated and then result in larger higher order approximation errors, and vice versa. Thus, the approximation accuracy of Theorem 2 is mainly dominated by the variances of the random variables Y 1k and Y 2k . Therefore, next, we will study the variances of the random variables Y 1k and Y 2k .
Proposition 2: The variances of the random variables Y 1k and Y 2k satisfy the following upper bound constraints: where Var[·] denotes the variance operation.
Proof: See Appendix F. It is observed from Proposition 2 that both the variances of the random variables Y 1k and Y 2k depend on the state dimension n and the posterior covariance matrix Σ k . It can be observed from (30) that the higher the state dimension, the larger will be the variances of the random variables Y 1k and Y 2k . Such a result is consistent with the intuition that the higher the state dimension, the larger the errors will be when the original cost function is approximated by its lower bound in Theorem 2. We can also observe from (30) that the larger the posterior covariance matrix Σ k , the larger will be the variances of the random variables Y 1k and Y 2k . This result is also consistent with the fact that the approximation errors are mainly dominated by the randomness of state vector x k (i.e., the covariance matrix) when the original cost function is approximated by its lower bound based on Jensen's inequality. In conclusion, the effects of Approximation 2 on the optimal solution relies mainly on the state dimension and the posterior covariance matrix of the state vector. Fortunately, the posterior uncertainty is gradually reduced as the filter converges, which can to some extent mitigate the effects of Approximation 2 on the optimal solution. More importantly, both the errors Y 1k − Y * 1k and Y 2k − Y * 2k can be deemed as small terms as the filter converges, and the higher order derivatives of the similarity functions f x (t) and f z (t) are significantly smaller than the first-order derivatives for the exemplary similarity functions in Table I. As a result, the higher ) are significantly smaller than the first-order terms for the exemplary similarity functions, which contributes to the effectiveness and reasonability of Approximation 2.
Finally, we discuss the effects of Approximation 3 on the optimal solution. The original Hessian matrix is forcibly reduced by subtracting a positive semidefinite matrix, and the resultant posterior covariance matrix is less than the optimal value. Although such approximation imposes an error on the posterior covariance matrix, it is beneficial to guarantee the positive definiteness of the posterior covariance matrix and then improve the numerical and filtering stabilities, as shown in Sections III-C and IV-A. Moreover, the reduction of the posterior covariance matrix is also beneficial to mitigate the effects of Approximation 2 on the optimal solution.

C. Stability Discussions of Proposed Framework
In order to employ the proposed algorithm in engineering applications, it is necessary to guarantee the numerical and filtering stabilities of the proposed method. Next, these will be discussed.
In this article, the filtering stability means that the state estimation errorx k|k is bounded in the sense of mean square, i.e., E{ x k|k 2 } < +∞ [27]. According to the theoretical analysis of the filtering stability in [27], if the modified one-step prediction error and measurement noise covariance matrices and the posterior covariance matrix satisfy the following constraints: where q max , r min , p min , and p max are all positive real numbers, then the state estimation errorx k|k of the proposed filter is bounded in the sense of mean square, i.e., E{ x k|k 2 } < +∞. Note that the other conditions for guaranteeing the filtering stability in [27] hold naturally for linear systems.
Proposition 3: If there exists positive real numbers ξ min , ξ max , λ min , and λ max such that the following constraints are fulfilled: then, the inequalities in (33) hold.
Proof: See Appendix G. Proposition 3 demonstrates that if both the auxiliary parameters ξ * k and λ * k have lower and upper bounds, then the proposed outlier-robust Kalman filtering framework is always stable, which will impose additional constraints on the similarity functions f x (·) and f z (·).
Remark 3: The proposed outlier-robust Kalman filtering framework is related to some existing advanced Kalman filtering algorithms. The standard Kalman filter and existing RSTKF [22] are special cases of the proposed outlier-robust Kalman filtering framework. The proposed outlier-robust Kalman filter becomes the standard Kalman filter when the similarity functions are, respectively, chosen as f x (t) = −0.5t and f z (t) = −0.5t, and the proposed outlier-robust Kalman filter becomes the existing RSTKF when the similarity functions are, respectively, selected as f

A. Fixed-Point Iterative Algorithm
In general, it is very difficult to find analytical solutions for μ * k and Σ * k through solving (15)- (20) and (26)  Inputs: Check the convergence of iteration Outputs:x k|k and P k|k and δ denotes the lower bounds of the auxiliary parameters ξ k and λ k that is beneficial to guarantee the filtering stability, as discussed in Section V.
The premise of applying the proposed outlier-robust filtering algorithm to practical engineering is to guarantee the convergence of fixed-point iterations. To this end, the relation between the fixed-point iterative algorithm and the existing nonlinear optimization algorithm is first revealed.
Proposition 4: The fixed-point iterative algorithm is identical to the existing Newton's method with the modified Hessian matrix as follows: is subtracted from the original Hessian matrix so that the negative definiteness can be preserved. As a result, the fixed-point iterative algorithm has better numerical stability than the standard Newton's method. Normally, to guarantee the local convergence of the Newton's method, the Hessian matrix needs to satisfy the Lipschitz condition, and the initial value is sufficiently close to the optimal value. Next, we will provide the convergence conditions of the fixed-point iterative algorithm.
Theorem 5: If the initial mean vector μ (0) k is sufficiently close to the optimal mean vector μ * k and there are positive and bounded real numbers α 1 and α 2 making the following inequalities hold: then, the fixed-point iterative algorithm has local convergence.
Proof: See Appendix I. Generally, the initial mean vector μ (0) k is selected as the one-step predicted state vectorx k|k−1 , i.e., μ (0) k =x k|k−1 , sincê x k|k−1 is the only available information for the mean vector before the iterative measurement update. It is seen from Theorem 5 that the initial value μ (0) k , i.e.,x k|k−1 , needs to be sufficiently close to the optimal value μ * k to guarantee the local convergence of the fixed-point iterative algorithm. Unfortunately, the one-step predicted state vectorx k|k−1 may be far away from the optimal value μ * k when the linear system suffers from large process uncertainty or a state outlier. As a result, the fixed-point iterative algorithm may not converge to a local optimum when the linear system suffers from large process uncertainty or a state outlier, which will degrade the filtering accuracy of the proposed outlier-robust filter dramatically.

B. Novel Separate Iterative Algorithm
In order to motivate the proposed separate iterative algorithm, we first present the problems that exist in the fixed-point iterative algorithm. It can be seen from Algorithm 1 that the Kalman gaiñ K (i+1) k plays an important role in the iterative measurement update because it can adjust the weights of one-step predicted state vectorx k|k−1 and measurement innovation z k − H kxk|k−1 adaptively. Next, we will discuss the behavior of the Kalman gaiñ K (i+1) k during the iterative measurement update. Using the 4th and 5th equations in Algorithm 1, the Kalman gainK (i+1) k can be rewritten as It is observed from (37) that the behavior of the Kalman gaiñ K Inputs: Check the convergence of iteration Outputs:x k|k and P k|k the iterative posterior mean vector μ In this article, we propose a heuristic idea that the iterations of auxiliary parameters are separated to guarantee the definite behavior of the Kalman gain. That is to say, the auxiliary parameter λ (i)(0) k is first iterated with fixed auxiliary parameter ξ (0)(0) k = 1 until convergence, i.e., lim i→+∞ λ (i)(0) k =λ k , and then the auxiliary parameter ξ (0)(j) k is iterated with fixed auxiliary parameter λ k until convergence, i.e., lim j→+∞ ξ (0)(j) k =ξ k , whereξ k and λ k denote the local optimums of the auxiliary parameters. The detailed implementation of the proposed outlier-robust Kalman filtering framework based on the proposed separate iterative algorithm is listed in Algorithm 2, where 1 and 2 denote the iteration thresholds of ξ k and λ k , respectively, and N 1 and N 2 denotes the maximum numbers of iterations of ξ k and λ k , respectively, and i * 1 and j * 2 denote the cyclic variables at the end of the first and second loops, respectively.
We can see from Theorem 6 that if the similarity functions f x (·) and f z (·) satisfy Condition 3 andḟ z (0) andḟ x (0) have lower bounds, the iterative auxiliary parameters λ (i)(0) k and ξ (0)(j) k will converge to local optima, and then the corresponding posterior mean vector and covariance matrix also converge to local optima, which guarantees the local convergence of the proposed separate iterative algorithm. As compared with the fixed-point iteration algorithm, the proposed algorithm does not require initial mean vector to be sufficiently close to the optimal mean vector. Thus, the local convergence conditions of the fixed-point iterative algorithm are more harsh than those of the proposed separate iterative algorithm.

V. SELECTIONS OF THE SIMILARITY FUNCTIONS
It is seen from Algorithms 1 and 2 that the similarity functions f x (·) and f z (·) are necessary to implement the proposed outlier-robust framework. Next, we will provide the selections of the similarity functions to guarantee that the proposed framework is identical to the standard Kalman filter for the case of Gaussian noises and robust to state and measurement outliers.
First, the relationship between the approximate one-step prediction error covariance matrix A * k and the true one-step prediction error covariance matrix P k|k−1 and that between the approximate measurement noise covariance matrix B * k and the true measurement noise covariance matrix R k are studied for a linear state-space model with Gaussian state and measurement noises.
Proposition 5: For a linear state-space model with Gaussian state and measurement noises, A * k and B * k can be, respectively, approximated as the one-step prediction error covariance matrix and measurement noise covariance matrix, i.e., Proof: See Appendix K. Second, we study the conditions of the similarity functions f x (·) and f z (·) to guarantee that the proposed framework is identical to the standard Kalman filter for the case of Gaussian noises. Substituting (39) in (18), we have where n and m are, respectively, the dimensions of the state vector and the measurement vector. It is seen from Algorithms 1 and 2 that the proposed robust filter is identical to the standard Kalman filter when the modified parameters are unity, i.e., ξ * k = λ * k = 1. Therefore, in order to guarantee that the proposed filter is identical to the standard Kalman filter when there are no state and measurement outliers, the similarity functions f x (·) and f z (·) need to satisfy the following:ḟ Finally, we discuss the conditions of the similarity functions f x (·) and f z (·) to guarantee that the proposed framework is robust to outliers. Define two auxiliary matrices as follows: In general, if there are state and measurement outliers, the approximate one-step prediction error covariance matrix A * k is not less than the nominal one-step prediction error covariance matrix P k|k−1 , and the approximate measurement noise covariance matrix B * k is also not less than the nominal measurement noise covariance matrix R k , i.e., Furthermore, the auxiliary matrices Ψ k1 and Ψ k2 depend on the magnitudes of the state and measurement outliers, respectively, and the larger the state and measurement outliers, the larger auxiliary matrices Ψ k1 and Ψ k2 will be generated. Proposition 6: For a linear state-space model with outliercontaminated state and measurement noises, if the similarity functions f x (·) and f z (·) satisfy the following: then, the modified auxiliary parameters ξ * k and λ * k satisfy the following: and the larger state and measurement outliers, the smaller modified auxiliary parameters ξ * k and λ * k will be obtained. Proof: See Appendix L. Employing (45) in (17) results iñ It is observed from (46) that the modified prediction error covariance matrix and the modified measurement noise covariance matrix are, respectively, not less than the nominal prediction error covariance matrix and the nominal measurement noise covariance matrix when there are, respectively, state and measurement outliers. Moreover, according to Proposition 6 and (17), the larger the state and measurement outliers, the smaller the modified auxiliary parameters ξ * k and λ * k will be, and the larger the modified prediction error covariance matrix and modified measurement noise covariance matrix will become.
Using (16) and (17), the Kalman gainK * k can be reformulated asK * According to Proposition 6 and (47), the Kalman gainK * k depends heavily on the relative magnitudes of the state and measurement outliers. Specifically, the Kalman gain is increased if the state outlier has larger magnitude than the measurement outlier, and vice versa. As a result, the negative effects of outliers on the proposed Kalman filtering framework can be resisted through adjusting the Kalman gain adaptively.
As an example, some exemplary similarity functions f (·) are listed in Table I, where p denotes the dimension of the state vector for f (·) = f x (·) and p denotes the dimension of the measurement vector for f (·) = f z (·), and σ and ν are, respectively, named as the kernel size and the DOF parameter to be consistent with the existing MCKF [12] and RSTKF [22], and ω is also named as the DOF parameter.
It is easy to verify that the exemplary similarity functions listed in Table I satisfy the conditions of Theorems 1 and 2 and Proposition 6. Theorem 3 and Propositions 1, 2, 4, and 5 do not need the exemplary similarity functions to satisfy the additional conditions. Next, we will further confirm whether the exemplary similarity functions satisfy the conditions of Theorems 4-6 and Proposition 3.
Corollary 1: If the kernel size σ and the DOF parameter ν satisfy the following constraints: then, the exemplary similarity functions in Table I satisfy (21) in Theorem 4. It is worth noting that there is not a constraint on the DOF parameter ω to guarantee that Theorem 4 holds. It is seen from Corollary 1, the constraint on the kernel size σ depends on the auxiliary parameters Y * 1k and Y * 2k . As a result, the constraint on the kernel size σ may change for different application scenarios. To address this problem, a reasonable scheme is choosing a sufficiently large kernel size σ so that the constraint on the kernel size always holds.
It is seen from (45) that ξ * k and λ * k have positive upper bounds ξ max = λ max = 1. Since the second derivatives of the similarity functions are nonnegative, the minimum values of the negative derivatives of the similarity functions −ḟ x (t) and −ḟ z (t) are achieved at t = +∞. It can be seen from Table I that the negative derivatives of the exemplary similarity functions approach 0 as t tends to +∞. As a result, the modified auxiliary parameters ξ * k and λ * k do not have positive lower bounds, and then Proposition 3 does not hold, which may lead to filtering instability. To address this problem, we can impose a very small lower bound δ on the modified auxiliary parameters ξ * k and λ * k to guarantee filtering stability, as shown in the 13th and 14th equations of Algorithm 1 and the 10th and 18th equations of Algorithm 2.
Corollary 2: For the exemplary similarity functions, if the kernel size σ 0 and the DOF parameters ν 0 and ω 0, then there exists positive and bounded real numbers α 1 and α 2 making (36) in Theorem 5 hold.
Finally, we discuss the conditions of Theorem 6. It is easy to verify that if the kernel size σ 0 and the DOF parameters ν 0 and ω 0, thenḟ (0) has a lower bound, which guarantees the local convergence of the proposed separate iterative algorithm. Also, according to Theorem 5 and Corollary 2, if the kernel size σ 0 and the DOF parameters ν 0 and ω 0, then the fixed-point iterative algorithm has local convergence when the initial mean vector μ (0) k is sufficiently close to the optimal mean vector μ * k . Thus, for the exemplary similarity functions, the convergence conditions of the proposed separate iterative algorithm is easier to be satisfied as compared with that of the fixed-point iterative algorithm.
Remark 4: It is observed from Table I that the derivatives of the exemplary similarity functions are all −0.5 when the kernel size σ and the DOF parameters ν and ω tend to infinity, i.e., σ → +∞, ν → +∞ and ω → +∞. As a result, the proposed outlier-robust Kalman filters based on the exemplary similarity functions all reduce to the standard Kalman filter when the parameters σ, ν, and ω tend to infinity.
Remark 5: The state vector is deemed as a random quantity in the proposed outlier-robust Kalman filter but the state vector is treated as a deterministic quantity in the existing M-estimator. It is observed from Algorithms 1 and 2 that the posterior covariance matrix is employed to calculate the modified parameters during the iterative measurement update in the proposed filter but the posterior covariance matrix is independent of the minimization of the robust cost function in the existing M-estimator. As compared with the existing M-estimator, the proposed outlier-robust Kalman filter considers the randomness inherent in the state vector by exploiting the posterior covariance matrix during the iterative measurement update so that the state estimation accuracy can be further improved, as will be shown in the simulation study.
Remark 6: Different from the existing MCKF, the correntropy is approximated as its lower-bound by using Jensen's inequality in the proposed outlier-robust Kalman filter when the similarity functions are, respectively, set as f x (t) = σ 2 exp( p−t 2σ 2 ) and f z (t) = σ 2 exp( p−t 2σ 2 ), and the robust state estimate is obtained by maximizing the lower-bound of the correntropy. Thus, the proposed outlier-robust based Kalman filter is an improved version of the existing MCKF when the similarity functions are, respectively, set as f x (t) = σ 2 exp( p−t 2σ 2 ) and f z (t) = σ 2 exp( p−t 2σ 2 ), and the improved state estimation accuracy can be achieved, as will be illustrated in the simulation study.

A. Simulation Setup and Description
We consider a problem of tracking an agile target whose positions are measured in clutter, and the horizontal positions and corresponding velocities are chosen as elements of the state vector. The state transition matrix and measurement matrix are, respectively , and the initial state estimate is randomly selected from a Gaussian distribution, i.e., x 0|0 ∼ N(x 0 , P 0 ).
As an example, the similarity functions f x (·) and f z (·) are, respectively, selected as exponential, logarithmic, and squareroot functions as in Table I, and the separate iterative algorithm is employed to implement the proposed outlier-robust Kalman filtering framework. Then, three outlier-robust Kalman filters can be obtained including SSMKF-exp-S, SSMKF-log-S, and SSMKF-sqrt-S, where SSMKF-exp-S denotes the exponential similarity function and the separate iterative algorithm based Kalman filter, and the explanations of the other two abbreviations are similar to the SSMKF-exp-S.
The proposed outlier-robust Kalman filters are compared with the standard Kalman filter with true noise covariance matrices (KFTNCM), the HKF [10], the MCKF [12], the RSTKF [22], the IMM filter [7], and the PF [3], where the true noise covariance matrices are used to obtain filtering estimates in the KFTNCM. The tuning parameter of the existing HKF is set as a common value of γ = 1.345 [10], and the kernel size of the proposed SSMKF-exp-S and the existing MCKF is selected as σ = 10 to achieve a tradeoff between estimation accuracy and stability [12], and the DOF parameter of the proposed SSMKF-log-S and the existing RSTKF is set as ν = 10, and the DOF parameter of the proposed SSMKF-sqrt-S is set as ω = 5. To guarantee the convergence of the iterations, the iteration threshold and the maximum number of iterations are, respectively, set as In the first PF (PF-1) and the third PF (PF-3), the true Gaussian mixture pdfs of state and measurement noises given in (49) are used, and the particle numbers are, respectively, selected as 1000 and 500 in the PF-1 and PF-3. In the second PF (PF-2), the inaccurate Gaussian mixture pdfs of state and measurement noises are employed, where the used state and measurement noise pdfs are, respectively, p(w k ) = 0.98N(w k ; 0, Q) + 0.02N(w k ; 0, 1000Q) and p(v k ) = 0.98N(v k ; 0, R) + 0.02N(v k ; 0, 100R). Note that the IMM-1 and PF-1 are only used as filtering references since the true instantaneous values of state and measurement noise covariance matrices and the true state and measurement noise pdfs are all unavailable in the presence of random and unknown state and measurement outliers. All filtering algorithms are coded with MATLAB and are executed on a computer with Intel Core i7-6900K CPU @ 3.20 GHz. The MATLAB codes of this article will be open access and can be freely downloaded from the link https://www.researchgate.net/profile/Yulong_Huang3.
In this simulation, the simulation time is set as 200 s, and the total number of Monte Carlo runs is selected as 1000. The rootmean-square errors (RMSEs) and averaged RMSEs (ARMSEs) of position and velocity are chosen as performance metrics to compare the estimation accuracy, whose definitions are given in the literature [22]. To better exhibit the RMSEs of position and  velocity of all filters in Figs. 1 and 2, the RMSEs are smoothed using a moving average method with span of 10 s.

B. Simulation Results and Comparisons
The RMSEs and ARMSEs (40-200s) of position and velocity and single step run time from the proposed SSMKF-exp-S, SSMKF-log-S and SSMKF-sqrt-S and the existing outlierrobust Kalman filters are, respectively, illustrated in Fig. 1 and Table II. It can be seen from Fig. 1 and Table II that the RMSEs and ARMSEs of position and velocity from the proposed SSMKF-exp-S, SSMKF-log-S, and SSMKF-sqrt-S are all smaller than those from the existing KFTNCM, HKF, MCKF, and RSTKF. We can also see from Table II that the proposed SSMKFs all require more run time than the existing outlierrobust Kalman filters. As compared with the best ARMSE pos  TABLE II  SINGLE STEP RUN TIME AND ARMSES OVER 40-200 S from the existing RSTKF and the best ARMSE vel from the existing MCKF, the ARMSEs of position and velocity from the proposed SSMKF-log-S improve 20.91% and 5.18%, respectively. Thus, the proposed SSMKF-exp-S, SSMKF-log-S, and SSMKF-sqrt-S all have better estimation accuracy but higher computational complexities than the existing KFTNCM, HKF, MCKF, and RSTKF. Fig. 2 and Table II, respectively, show the RMSEs and ARM-SEs (40-200 s) of position and velocity and single step run time from the proposed SSMKF-exp-S, SSMKF-log-S, and SSMKFsqrt-S and the existing IMM filters and PFs. It is observed from Fig. 2 and Table II that the proposed SSMKF-exp-S, SSMKFlog-S, and SSMKF-sqrt-S all have smaller RMSEs and ARMSEs of position and velocity than the existing IMM-2 (inaccurate noise models), PF-2 (inaccurate noise pdfs and 1000 particles), and PF-3 (accurate noise pdfs and 500 particles). As compared with the best ARMSE pos from the PF-2 and the best ARMSE vel from the PF-3, the ARMSEs of position and velocity from the proposed SSMKF-log-S improve 23.42% and 1.27%, respectively. It can be also observed from Fig. 2 and Table II that the RMSEs and ARMSEs of position of the proposed SSMKF-log-S are close to those of the IMM-1 (filtering reference), and the RM-SEs and ARMSEs of velocity of the proposed SSMKF-log-S are close to those of the PF-1 (filtering reference), and the proposed filters all have smaller RMSEs and ARMSEs of position than the PF-1. The reason that PF-1 exhibits poor estimation accuracy of position may be because the heavy-tailed features of posterior pdfs are easily lost during the particle filtering process when a limited number of particles are used. Furthermore, we can observe from Table II that the proposed SSMKFs have slightly greater run time than the IMM filters but significantly less run time than the PFs. Although the computational time of the PF can be significantly reduced if it is implemented in a parallel fashion, it still requires accurate knowledge of the probability distributions of the state and measurement noises. Thus, the proposed SSMKF-exp-S, SSMKF-log-S, and SSMKF-sqrt-S all have better estimation accuracy than the IMM-2, PF-2, and PF-3, and slightly higher computational complexities than the IMM filters but significantly lower computational complexities than the standard PFs.

VII. CONCLUSION
In this article, a SSM was proposed to quantify the similarity between two random vectors. The measure was then employed to develop a novel outlier-robust Kalman filtering framework. Some theoretical analyses and discussions about the approximate errors and the numerical and filtering stabilities were provided to illustrate the effectiveness of the proposed framework. The fixed-point iterative algorithm and the separate iterative algorithm were proposed to implement the proposed framework, and their local convergence conditions were also provided and compared. In addition, the selections of the similarity functions were presented, and four exemplary similarity functions were provided, from which the relations between the proposed method and the existing outlier-robust Kalman filters were revealed. Simulation results illustrated that by selecting appropriate similarity functions, the proposed filters can achieve improved estimation accuracy but have higher computational complexities than the existing outlier-robust Kalman filters. Also, as compared with the existing IMM filter and PF, the proposed filters are more suitable for addressing the filtering problem of a linear system with outlier-contaminated state and measurement noises.
Considering that the inequality (51) holds for arbitrary random vectors x and y, we have max s(x, y) = f (0).
It is evident that the SSM s(x, y) is identical to f (0) when x = y, and then x = y is a maximum point of the SSM s(x, y).

B. Proof of Theorem 2
Sincef x (t) ≥ 0 andf z (t) ≥ 0 for t ∈ [0, +∞), f x (·), and f z (·) are convex functions. Using Jensen's inequality, we have where the equalities hold if and only if the similarity functions f x (·) and f z (·) are linear or the covariance matrix of q(x k ) is zero.

F. Proof of Proposition 2
Using (27) (71) Since the posterior mean vector and covariance matrix of the Gaussian distributed random vector x k are, respectively, μ k and Σ k , the random vector τ x has a standard Gaussian distribution, i.e., τ x ∼ N(0, I n ). Employing (71) and τ x ∼ N(0, I n ), the variance of Y 1k is calculated as where note that the cross variance between S −1 k|k−1 Σ k τ x is zero since the odd origin moments of τ x are all zeros. According to the compatibility of matrix and vector norms, we have Considering that the random vector τ x has a standard Gaussian distribution, then the random variable τ x 2 is a sum of the squares of n independent Gaussian random variables and has a chi-square distribution with the DOF parameter n, i.e., τ x 2 ∼ χ 2 (n). According the property of the chi-square distribution, the second-order origin moment of τ

G. Proof of Proposition 3
Using (34) in (17) and (25) yields Choosing q max and p max as the maximum eigenvalues of P k|k−1 /ξ min and (ξ min P −1 k|k−1 + λ min H T k R −1 k H k ) −1 and selecting r mim and p min as the minimum eigenvalues of R k /λ max and (ξ max P −1 k|k−1 + λ max H T k R −1 k H k ) −1 , we can obtain (33).
According to the Lagrange mean value theorem, there is a variable θ ∈ [0, 1] such that the following is fulfilled: