Adaptive Fading Bayesian Unscented Kalman Filter and Smoother for State Estimation of Unmanned Aircraft Systems

This paper proposes an adaptive fading Bayesian unscented Kalman filter (AF-BUKF) and explores its application for state estimation of unmanned aircraft systems (UASs). In the AF-BUKF, the state and noise densities are approximated as finite Gaussian mixtures, in which the mean and covariance for each component are recursively estimated using the UKF. To avoid the prohibitive computational complexity caused by the exponential growth of mixture components, a Gaussian mixture simplification algorithm is employed. Moreover, the AF-BUKF algorithm employs a novel adaptive fading strategy to recursively update the Gaussian components, so that the adverse effect of inexact knowledge of the state and measurement noise covariance can be mitigated. An AF-BUK Smoother (AF-BUKS) is also proposed by extending the AF-BUKF algorithm using the concept of optimal Bayesian smoothing and the Rauch-Tung-Striebel Smoother to improve estimation accuracy. Experimental results on simulated and real UAS data show that the proposed AF-BUKF/S algorithms can achieve better performance compared with the conventional methods. Thus, they can serve as attractive alternative approaches for nonlinear state estimation of UASs and other problems.


I. INTRODUCTION
Small unmanned aircraft systems (UASs) [1]- [4] have gained growing popularity owing to their convenience for mobile missions such as search and rescue [5], monitoring [6], path planning [7], [8], and target tracking [9]- [11]. The control of UASs requires the prior knowledge of the corresponding state parameters which may include the positions, velocity, orientation and orientation rate of the aircraft. Though some of the parameters can be measured using specialized sensors such as Global Positioning System (GPS) receivers and Inertial Measurement Units (IMUs) [12], the signals obtained from such sensors may be contaminated by noise/interference from ambient environments and the bias inherited in the sensors themselves. Therefore, it is important to develop appropriate onboard algorithms for estimating the states of UASs from noisy measurements.
The associate editor coordinating the review of this manuscript and approving it for publication was Prakasam Periasamy .
Kalman filter (KF) and its various extensions have been widely utilized in state estimation since the dynamic model of the system states can be incorporated through the state space model for tracking together with their measurements. Consequently, the estimation performance can be improved by taking advantage of the additional prior information provided by the state dynamic. Moreover, uncertainties in the system model and measurements can be conveniently incorporated via the additive noises in the model.
In the traditional KF [13], the system model, which includes the state equation and measurement equation, is assumed to be linear. Moreover, the modeling error or noise in the state equation and measurement noise in the measurement equation are assumed to be zero mean Gaussian distributed with known covariance matrices. Under these conditions, the KF is the optimal estimator with minimum mean squares errors. However, for more complicated systems, both the state and measurement equations are nonlinear functions. Furthermore, the uncertainties encapsulated in form of modeling and measurement noises may be non-Gaussian. Thus, in this case, there are generally no analytic expressions for the a posteriori densities, and various extensions of the KF have been developed towards this end. For general nonlinear systems with non-Gaussian densities, the state and noise densities can be approximated by various methods such as histogram or Gaussian mixtures (GMs). The nonlinearity in the state and measurement equations are usually tackled through linearization as in the extended Kalman filter (EKF) [11], [14], [15] or by approximating the state density after the transformation using Gaussian distribution through unscented transformation as in the unscented KF (UKF) [16]- [18]. Other approximations include the cubature KF [19] and Gauss-Hermite KF [20].
The Monte Carlo (MC) statistical approach represents the density as a set of particles and it gives rise to the particle filter (PF) technique [21], [22], which approximates the non-Gaussian densities with a set of weighted particles. However, the computational complexity of the MC statistical method will grow exponentially with the dimension of the states and particles. To address this problem, many GM-based PFs [22]- [24] have been proposed but the number of components in the GM will grow rapidly with the number of iterations. To avoid this growing complexity, re-sampling and clustering techniques have been used to simplify the resultant density at each iteration. These developments paved the way for filtering algorithms to solve more challenging problems such as the simultaneous localization and mapping problem [16]. Furthermore, various algorithms have been proposed for KF filtering with heavy-tailed non-Gaussian noises, such as Student's t UKF (STUKF) [25], Student's t cubature KF (STCKF) [26], robust KF [27] and other techniques described therein.
In our previous work [28], a Bayesian Kalman filter (BKF) for linear systems with non-Gaussian noise was proposed. It approximates the non-Gaussian density as a Gaussian mixture and uses a novel Gaussian mixture simplification algorithm to maintain the number of components in the mixture at a reasonable level. This helps to keep the complexity of the resultant BKF at a reasonably low level. The effectiveness of the BKF algorithm has been demonstrated in video object tracking and other applications [29], [30]. The usefulness of the non-Gaussian state and measurement noises is their ability to model sudden change in system dynamics and possible measurement outliers.
In applications such as small unmanned aircraft systems (UASs) [1]- [4], the system models can be highly nonlinear [3], [14] and may subject to various disturbances. Therefore, it would be desirable if the above linear BKF with simplified Gaussian mixtures can be extended to these nonlinear settings. Another major practical issue in KFs is the selection of the covariance matrices of the modeling and measurements noises. This is also true for the Gaussian mixtures in the BKF where the covariance matrices of the GMs need to be determined a priori.
To address these important challenges, this paper aims to extend the BKF concept in the following perspectives: 1) State estimation of nonlinear systems with non-Gaussian distributed noises: as mentioned earlier, the BKF algorithm offers promising performance for state estimation of linear non-Gaussian systems. It is desirable to extend these nice properties to solve nonlinear system identification problems.
2) Adaptive estimation of GM parameters under modeling and time-varying model errors: the BKF algorithm offers a framework for state estimation using a simplified GM model. It works well when the state and measurement noises are perfectly known a priori. However, in practical situations, such quantities may not be known exactly or even be time-varying. Moreover, it is usually tedious to select such parameters by trial and error. In time-varying scenarios, the performance of the BKF with these preset values may also be compromised. Therefore, it is necessary to devise an automatic method for updating adaptively the parameters of the GMs so that such modeling or time-varying errors can be mitigated.
To this end, we propose a novel adaptive fading Bayesian unscented Kalman filter/smoother with simplified Gaussian mixtures (AF-BUKF/S) for state estimation in nonlinear and non-Gaussian systems. Among different nonlinear filtering methods, we have chosen the unscented transform 1 as our starting point due to its outstanding performance in approximating the mean and covariance of the state using a minimal set of sample points, rather than linearizing the nonlinear function as in the EKF. While the nonlinearity is captured using the unscented transform, the state density in the proposed approach is modeled using an efficient GM simplification procedure, which directly simplifies the GMs by minimizing an upper bound of the approximation error between the original and the simplified models. The resultant filter is similar to the Gaussian sum unscented Kalman filter (GSUKF) [31], [32], except that the noise covariances are made adaptive by utilizing the adaptive fading approach and the BKF-SGM method in [28]- [30] is used for mixture reduction. 2 The adaptive fading approach 3 estimates a correction factor for the nominal noise covariance matrices so as to account for possible mismatch in these parameters, as they may not be exactly known or even be time varying. Furthermore, the correction parameters are tracked using BKF-SGM using a random walk model to further reduce its variance and improve its tracking speed. This greatly extends the applicability of the BUKF-SGM and yields improved estimation performance. Finally, the AF-BUKF method is further extended to an AF-BUKS algorithm by employing the optimal Bayesian smoothing formula and the 1 The proposed approach is also applicable to the cubature KF [19] and Gauss-Hermite KF [20]. Here, we shall present the details on UKF. 2 The usefulness of this concept was briefly introduced using simulation in our preliminary work [33]. Here we further improve its adaptability by using the adaptive fading approach and propose its smoother counterpart. Its application to the state estimation of UASs is demonstrated by detailed simulations. 3 Due to page limitations, previous works on adaptive fading estimation are reviewed in the supplementary materials where we show that the proposed adaptive fading approach can offer better performance than the other stateof-the-art algorithms in [34]- [37].
Rauch-Tung-Striebel smoother concept to improve estimation accuracy by using future measurements. The AF-BUKS algorithm is useful when finite delays can be tolerated by utilizing certain amount of future data for improving the current estimation. Experimental results and comparisons on simulated UASs datasets and real flight data show that the proposed AF-BUKF/S algorithm can offer significantly better performance for state estimation of UASs than the conventional approaches.
In summary, our contributions are: • We extended the BKF-SGM to state estimation in nonlinear and non-Gaussian systems using the UKF and the Gaussian mixture representation of the excitation and measurement noises. It utilizes the Gaussian mixture simplification algorithm to avoid the exponential growth in complexity over time and the resultant BUKF-SGM algorithm gives better estimation accuracy than the particle filter (PF) and other GM simplification methods including the pruning and Kullback-Leibler-based merging (KLM) approaches.
• A novel adaptive fading method to update the scale parameter of the Gaussian component, which is proposed for the BUKF-SGM framework to give an AF-BUKF algorithm for better estimation of the GM parameters to account for modeling and time-varying model errors.
• To further improve the estimation accuracy, the AF-BUKF algorithm is extended by means of an optimal Bayesian smoothing concept, yielding an AF-BUKS algorithm which can be used in offline applications or online applications where certain delay can be tolerated. Moreover, it yields better performance than the AF-BUKF. Finally, the application of the proposed algorithms to UASs is studied through extensive simulations on real and synthetic data under both non-Gaussian excitation and measurement noises. The rest of the paper is organized as follows: we briefly review the background and the conventional UKF method in Section II. The proposed AF-BUKF/S algorithms are described in Section III. In Section IV, the performance of the proposed methods for state estimation of UASs are evaluated and compared with the conventional algorithms including EKF and UKF. Section V concludes the paper.

II. BACKGROUND AND UKF
For simplicity, we consider the following autonomous discrete-time nonlinear state-space model: where x k ∈ R L and z k ∈ R m denote respectively the state and measurement vectors at time instant k, f k (·) and h k (·) are respectively known nonlinear state and measurement functions, and ω k and υ k represent respectively the state and measurement noises, which are independent of each other. We first introduce in this section the conventional UKF where ω k and υ k are Gaussian distributed, whereas in Section III, we shall describe the proposed Bayesian UKF with Gaussian Mixture (BUKF-GM) by allowing the noises to be non-Gaussian distributed. The adaptive fading BUKF (AF-BUKF) for adaptively updating the covariance matrices of the GM will also be described. The UKF algorithm belongs to a general class of nonlinear KF filter called Gaussian filtering [36], which assumes that the probability density functions (pdfs) of the state noise ω k and measurement noise υ k to be Gaussian distributed, i.e., where N (u k ;ū k , C k ) denotes a Gaussian distribution with meanū k and covariance C k . Usually, the noises are of zero mean and henceω k = 0 andῡ k = 0. Consequently, using (1a), one gets the predicted state mean and covariance as: Similarly, according to (1b), the predicted measurement mean, covariance as well as the cross-covariance of the state and measurement can be obtained as follows: Finally, one can update the state mean and error covariance as: In the conventional EKF, f k (x k ) and h k (x k ) are locally linearized at x k to obtain a locally linear state space system. Its limitation is that the error resulting from the linearization will usually increase with the step taken. Hence, one would expect that when a large step is taken at region with strong nonlinearity, the error in tracking the states will be large. Rather than linearizing the nonlinear functions as in the EKF, UKF utilizes an unscented transform to approximate the non-Gaussian state density arising from the nonlinearity by a Gaussian distribution through sampling at a set of sigma points. Other methods to approximate the integrals in (3b) and (4b) include cubature KF [19], particle filters [21] and Gauss-Hermite KF [20]. It was shown in [38] that the performances of the cubature KF, UKF and Gauss-Hermite KF are similar if the covariance is estimated by the variational Bayes approach. Therefore, we shall focus on the UKF in this paper.
Given an L-dimensional state vector x k with mean µ k and covariance P k , the UKF algorithm [16], [18] generates two samples, so-called sigma points, along each dimension and evaluates the outputs of the corresponding nonlinearity. By doing so, one can estimate the Gaussian approximation of the state density by estimating empirically the corresponding mean and covariance from the samples at the sigma points. For instance, given the Gaussian approximated state density at time k − 1, p (x k−1 |Z k−1 ), where Z k−1 = {z 1 , . . . , z k−1 } denotes the measurements up to the (k − 1)-th time instant, one can update the predicted density p (x k |x k−1 ) = p f ,w k (x k−1 |Z k−1 ) from the dynamic equation through sampling, where p f ,w k (x k−1 |Z k−1 ) denotes the density after the nonlinear transformation of p (x k−1 |Z k−1 ) with f (·) followed by the addition of ω k ∝ N (ω k ; ω k , Q k ). To this end, samples of p f ,w k (x k−1 |Z k−1 ) at the sigma points are computed and thus p (x k−1 |Z k−1 ) is approximated by a Gaussian distribution empirically. This gives the time update in Algorithm 1. The sampling technique can also be used to update the state density p (x k |Z k ) from p (x k |x k−1 ) via the measurement equation in (1b), where Z k = {z 1 , . . . , z k } denotes the measurements up to time instant k. This yields the measurement update in Algorithm 1.
Next, we shall propose a new Bayesian UKF with GM (BKF-GM) algorithm, which allows the noises to be GMs.

III. THE PROPOSED BAYESIAN UKF/S
We first briefly outline the notation of the BUKF in Section A below, which is also similar to the GSUKF [31], [32]. As mentioned in the previous section, our development is also applicable to the cubature KF [19] and Gauss-Hermite KF [20]. The proposed mixture simplification algorithm will be introduced in Section III-B. The adaptive fading  6: where κ is the scaling parameter, ( √ A) l denotes the l-th column of the matrix square root of A. 7: Propagate each sigma point: 8: χ l,k|k−1 = f k χ l,k−1 , l = 0, · · · , 2L. 9: Approximate the mean and covariance for p (x k |x k−1 ): 10:  18: where l = 1, · · · , L. 19: Estimate the measurement mean: 20: l=0 w l z l,k|k−1 . 22: Estimate the measurement error covariance and crosscovariance: 23: 28: Estimate the Gaussian approximation of p (x k |Z k ) 29: : BUKF will be introduced in Section III-C. The Adaptive Fading BUK Smoother (AF-BUKS) will be discussed in Section III-D.

A. BAYESIAN UKF (BUKF)
As mentioned earlier, the pdfs of the state noise ω k and measurement noise υ k in the proposed BUKF are characterized by a finite set of GMs as: where I and J are respectively the numbers of components in the GMs of ω k and υ k with α i,k and β j,k as the probabilities of the corresponding components in the GM, which satisfy model sudden change of states in the dynamical system such as directions in UASs. On the other hand, (6b) is useful to model measurement noise corrupted by impulsive noise and outliers. Since ω k and υ k are now GMs, the pdf of the state is generally not Gaussian distributed. Hence, it is assumed that the a posteriori pdf of the state at the previous time instant is approximated by the following finite GM with G components: where γ g,k−1 , g = 1, · · · , G, denotes the weight of the g-th component at the k − 1 instant and G g=1 γ g,k−1 = 1. Consequently, the predictive priori density can be written as GI , is used to index all such combinations. The subscript k|k − 1 denotes the quantities in the predicted density of x k given measurements up to time instant k − 1, Z k−1 . Consequently, for zero mean modeling noise, i.e., ω g ,k = 0, (8) can be approximately simplified to where G = GI and γ g ,k|k−1 = γ g,k−1 α i,k . The quantities µ g ,k|k−1 and P g ,k|k−1 , for each g -th component, can be predicted using the time update of the UKF in Algorithm 1.
Similarly, when the current observation at time instant k is available, the a posteriori density at time instant k can be updated as where , and G = G I = GIJ is the number of components after the update. µ g ,k and P g ,k are respectively the mean and covariance of the corresponding Gaussian component obtained with the prior component density N x k ; µ g ,k|k−1 , P g ,k|k−1 and measurement z k and component noise covariance R j,k . This can be obtained with the measurement update of the UKF in Algorithm 1. It can be seen that the number of mixture components will grow from G to G in the prediction step and further from G to G in the subsequent measurement update step. Over time, the number of mixture components and hence the complexity will grow exponentially, which makes the computation prohibitive. Fortunately, this issue can be overcome by using the order reduction approach which we have proposed in [28] for the BKF. More precisely, the GM model in (10) is simplified to the form of (7) so that the number of mixture components is maintained after each pair of time and measurement update steps. For completeness, we shall briefly describe this order reduction process for GM simplification in the next subsection.

B. BUKF WITH SIMPLIFIED GM (BUKF-SGM)
Consider the GM model in (10) with G components: where φ g (x) = N x; µ g , P g is the g -th component and For notational convenience, the subscript k and the measurement matrix Z in (10) are dropped. Our goal is to approximate p(x) as a simplified mixture with fewer components in the form of between the functions φ g (x) and φ g (x), the error of approx- Traditionally, the simplification is done by resampling followed by clustering using the K-means or EM algorithm. However, the complexity depends exponentially on the dimension of the state and hence it will soon become infeasible. As an alternative, simpler algorithms such as pruning and merging can be used for mixture reduction. In the pruning method, GM components with relatively lower weights are discarded directly and then the remaining components are renormalized for the estimation. Rather than simple pruning, the merging method performs mixture reduction by calculating the expected parameters for each merged component across a set of components to be fused. It has been proved that pruning is inferior to the merging method [39]. Moreover, for better determination of the components to be merged, a Kullback-Leibler-based approach is proposed in [40] where mixture reduction is achieved by minimizing an upper bound on the increase of the Kullback-Leibler divergence between the original and reduced mixtures. In this paper, instead of using the pruning and merging methods, we follow our preliminary work in [28] to adopt a two-step algorithm in [41] for model order reduction, which offers more accurate estimates than the Kullback-Leibler approach [40] at the cost of a small increase of computational complexity for mixture reduction.
Due to space limitations, interested readers are referred to Part 1 in the supplementary material for a detailed comparison of the two-step algorithm against the other approaches. The two-step algorithm goes as follows: At the t-th iteration, the component mixture is partitioned g that minimizes the local quantization error of the group is solved using coordinate descent, C , and then updates S The above process is repeated until either the change in total distortion or is less than a certain threshold, or a maximum number of iterations is reached. The main steps of the GM simplification algorithm are summarized in Algorithm 2, where k(r) and k (r) are Gaussian kernels, r p,q gg K is a normalized constant, ε 1 and ε 2 are stopping thresholds for the update of µ g and P g , respectively.
It is seen that by using this GM simplification algorithm, the mixture model in (10) can be approximated by the mixture model in (7) with fewer components, so that the number of components after each iteration can be maintained at a constant level. If the dimension of the state is d, the complexity of the above-mentioned two-step algorithm is O T (L + G) G d 3 , where T and L are respectively the numbers of iterations for the outermost loop and the loop from line 8 to 21 in Algorithm 2. Their values are typically small 4 [41].

C. ADAPTIVE FADING BUKF (AF-BUKF)
As mentioned earlier, the BUKF-SGM algorithm works well when the system dynamics and measurement relationships are known a priori. However, its performance may be degraded when the parameters of the state and measurement noises are not known exactly or time-varying. In order to achieve better state estimation, we propose in this subsection an adaptive fading BUKF algorithm to adapt the noise parameters. We first introduce a novel adaptive fading UKF (AF-UKF) that can adaptively adjust the state and measurement noise covariance in the conventional UKF to better explain the observations. Then, it is extended to the BUKF framework.
Given the residual in the measurement update of UKF (line 29 of Algorithm 1), the measurement residual covariance can be calculated as 4 Preliminary results in [33] has demonstrated the potential of the proposed BUKF approach over other methods in a simulated example. T is around 15, L is around 6 as observed in our experiments.

4:
Mean update 5: for g = 1, 2, . . . , G do 6: µ 0 g = g ∈Sg γ g µ g g ∈Sg γ g ; 7: repeat 9: p = 0, q = 0; 10: repeat 11: repeat 15:  for g = 1, 2, . . . , G do 26: for g = 1, 2, . . . , G do 27: Assign φ g to S g : g = arg min k D φ g , C k ; 31: e (t) = e (t) + D φ g , C k ; 32: end for 33: until e (t) − e (t−1) ≤ 0.001 e (t) or t == t max 34: Output: where R k is the measurement error covariance resulted from the true state error covariance P x,k|k−1 , and R k is the true measurement noise covariance. P zz,k|k−1 , P x,k|k−1 and R k are different from their nominal values denoted by P zz,k|k−1 , P x,k|k−1 and R k , respectively. We aim to find positive scalar correction or fading factors ρ s,k and ρ m,k for P x,k|k−1 and R k , respectively, such that In other words, we hope to compensate for the mis-specified covariance by adaptively scaling the covariance matrices through the fading factors, while the nominal shape of the covariance is still fixed. Since P zz,k|k−1 and P zz,k|k−1 can be approximated by the unscented transformation of P x,k|k−1 and P x,k|k−1 , the ratio of the latter quantities, i.e., ρ s,k in (13a), is approximately equal to that of the former. Therefore, from (13a), we have Substituting (13b) and (14) into (12) gives Instead of matching the whole matrix, we take the trace of the left and right hand sides of (15), yielding the following equation at each time instant k: tr P e k = ρ s,k tr P zz,k|k−1 + ρ m,k tr (R k ) .
Note that the measurement residual covariance P e k can be estimated empirically as: where M specifies the window size for estimating the empirical covariance. Thus, the estimation of the fading factors reduces to a recursive linear regression problem over time as tr P e k ≈ ρ s,k tr P zz,k|k−1 + ρ m,k tr (R k ) , where k = 1, 2 · · · , ρ s,U ≥ ρ s,k ≥ ρ s,L ≥ 0, ρ m,U ≥ ρ m,k ≥ ρ m,L ≥ 0, and ρ s,U and ρ s,L are respectively the upper and lower bounds of ρ s,k (likewise for ρ m,k ). To solve the above linear regression problem recursively, one can use KF with the following state-space model: where x c k = ρ s,k , ρ m,k T stands for the state vector containing the fading factors, y c k = tr( P e k ), F c k = I, H c k = tr P zz,k|k−1 , tr (R k ) , and ξ k and η k are the corresponding state and measurement Gaussian distributed noises, respectively. This allows the fading factors to be estimated recursively using Kalman filtering approaches. To be more specific, if ξ k and η k are modeled by single Gaussian components as ξ k ∼ N (0, Q c k ) and η k ∼ N (0, R c k ), the estimation of fading factors can be achieved by using the conventional Algorithm 3 The Proposed AF-BUKF/S Algorithm 1: AF-BUKF Tracker 2: Initialization: p(x 0 ) = G g=1 γ g,0 N x 0 ; µ g,0 , P g,0 ; 3: for k = 1, 2, · · · do 4:

18:
for j = 1 : J do 19: for g = 1 : GI do 20: if g == g f then 22: Estimate µ g ,k|k−1 and P g ,k|k−1 with µ g ,k−1 , P g ,k−1 and ρ m,k R j,k (UKF Measurement Update); 23: else 24: Estimate µ g ,k|k−1 and P g ,k|k−1 with µ g ,k−1 , P g ,k−1 and R j,k (UKF Measurement Update); 25: end if 26: e k = z k − h k (µ g ,k|k−1 ) − υ; 27: µ g ,k = µ g ,k|k−1 + K g ,k e k ; 28: P g ,k = I − K g ,k H k P g ,k|k−1 ; 29: γ * g ,k = γ g ,k|k−1 β j,k p j z k |µ g ,k ; 30: end for g 31: end for j 32: for g = 1 : G do 39: Get component indices g * and i * ; 40: Compute µ g * ,k+1|k , P g * ,k+1|k and C g * ,k+1 ; 41: K g,(k|N ) = C g * ,k+1 P −1 g * ,(k+1|k) ; 42: µ g,(k|N ) = µ g * ,k +K g,(k|N ) (µ g,(k+1|N ) −µ g * ,(k+1|k) ); 43: P g,(k|N ) = P g * ,k + K g,(k|N ) P g,(k+1|N ) −P g * ,(k+1|k) K T g,(k|N ) ; 44: end for g 45: x (k|N ) = G g=1 γ g,(k|N ) µ g,(k|N ) ; 46: end for k KF. Particularly, the parameters for Q c k and R c k should be set according to the characteristics of systems to be estimated. More precisely, let Q c k = diag(a, b) where diag(a, b) represents the diagonal matrix with diagonal elements a and b. The setting of values for a and b should be as follows: for systems with inaccurate state equations, a relatively larger value shall be assigned to a so that more compensation for the state can be achieved than that for measurement, while b shall be larger than a for systems with inaccurate measurement equations. On the other hand, we also examine the case where ξ k is modeled as GMs as in (6a). This allows faster tracking of the fading factors and the BKF algorithm [28]- [30] we developed previously can be utilized. The advantages of GM modeling for fading factors over the single Gaussian representation are demonstrated with simulated data and UASs examples in Part 2 of the supplementary material. Moreover, in each recursion, the estimated parameters will be projected to the bound constraints above. This is in contrast to other adaptive KFs [34]- [37], [42]- [45]. Due to page limitations, a comparison of the proposed AF-UKF and the state-of-theart approaches in [34]- [37] is given in the supplementary materials. It is shown that the proposed AF-UKF offers a better performance than the conventional methods. It should be noted that the measurement covariance matrix can also be estimated using variational Bayesian approach in [38]. However, its extension to uncertainty in the state noise is not yet available. On the other hand, the proposed AF approach has fewer parameters to be estimated and is applicable to both state and measurement noises. Therefore, we shall focus on the AF approach in this paper.
To incorporate the proposed AF-UKF into the BUKF framework, one can notice that the basic UKF is used to compute each of the G components in (10) before they are simplified to a GM with G components in (7) using Algorithm 1. Ideally, one can simply replace the UKF in the AF-BUKF with the AF-UKF algorithm. However, such adaptation may result in all mixture components of the state and measurement noise converging to a single solution. In case of sudden changes of state or measurement covariance, these changes can no longer be modeled as a GM using the prior distribution. Therefore, we propose to select the most likely component from the GMs and utilize the AF-UKF instead of the conventional UKF to update its distribution. The AF component together with other components will then be approximated as a GM with G components. This yields the AF-BUKF tracker in Algorithm 3. Finally, it is worth noting that our AF and SGM approaches are also applicable to other methods for approximating the integrals in (3) and (4) such as cubature KF, EKF and Gauss-Hermite KF. Due to page limitation, the details are omitted.

D. ADAPTIVE FADING BUK SMOOTHER (AF-BUKS)
In filtering, the objective is to recursively estimate the current state given the observation up to the current time k. Smoothing can yield significantly better estimates than filtering if the estimation of the current state can be delayed so that more data can be utilized in the estimation process [46]. Therefore, it aims to improve the estimation accuracy by utilizing a certain number of future samples in exchange for a delay in obtaining the required smoothed estimates. This is desirable in offline applications or applications which certain delay can be tolerated. Here, we extend the AF-BUKF algorithm by means of the backward smoother, i.e. Rauch-Tung-Striebel Smoother [47] for further improving the estimation accuracy. More specifically, it can be shown in [48] that the desired pdf at the current time k, p(x k |Z N ), is given by the following general Bayesian smoothing formula: where p(x k+1 |Z N ) is the posterior pdf obtained from backward smoothing at time k + 1. Since it has used data up to time N , we have used Z N in the condition of p(x k+1 |Z N ). It can be seen that the pdf obtained in the Bayesian filtering step, p(x k |Z k ), is updated through the integral term in the smoothing step. In the Gaussian case, the integral and the entire computation can be further simplified to the RTS smoother. In preparation for the approximated Bayesian smoothing, we save the GIJ components before the mixture simplification at time k + 1: The number of components in p(x k+1 |Z N ) is chosen to be G, which is the same as that of the simplified density p(x k |Z k ). Therefore, the backward smoothed pdf of p(x k+1 |Z N ) will take the form p x k+1 |Z N = G g=1 γ g,k+1|N N x k+1 ; µ g,k+1|N , P g,k+1|N . To simplify the evaluation of the integral in (20), we assume that there is a correspondence between the GMs in p(x k+1 |Z N ) and p(x k |Z k ). Consequently, the update in (20) can be reduced to the update of individual components of the mixture. Moreover, the RTS smoothing can be applied to each of the components to give p (x k |Z N ) = G g=1 γ g,(k|N ) N x k ; µ g,k|N , P g,k|N , where µ g,k|N and P g,k|N are the mean and covariance of the GMM, respectively. To find the correspondence between the GMM, we propose to find the closest neighbor of N x k+1 ; µ g,(k+1|N ) , P g,k+1|N in p (x k+1 |Z k+1 ), i.e. N x k+1 ; µ g ,k+1 , P g ,k+1 for some g , since it is possible to trace back the component in p (x k |Z k ) which leads to the identified component in p (x k+1 |Z k+1 ). Consequently, RTS algorithm can be applied to these components to update the corresponding component in p (x k |Z k ). Thus, the GM of the smoothed pdf p (x k |Z N ) can be obtained from the updated components by appropriate normalization. Particularly, we propose to use the Bhattacharyya distance [49]: VOLUME 8, 2020 to measure the distance between the g-th component N x k+1 ; µ g,k+1|N , P g,k+1|N and the g -th component in {N x k+1 ; µ g ,k+1 , P g ,k+1 } g =1,...,GIJ , where P = P g,(k+1|N ) +P g ,k+1 2 , and det(·) denotes the operation of matrix determinant. From the identified component in {N x k+1 ; µ g ,k+1 , P g ,k+1 } g =1,...,GIJ , we can identify the corresponding component g * in p(x k |Z k ) = G g=1 N x k ; µ g,k , P g,k , and the corresponding indices i * , and j * which lead to the g -th component above. Then, we can update all the components in p(x k |Z k ) by performing the unscented RTS smoothing steps for each of the components in p(x k |Z N ) as follows: Step 1: Generate the sigma points {χ l,k−1 }, l = 1, · · · , L, with positive weights {w l } from the Gaussian approximation of p(x k−1 ) with mean µ g * ,k and covariance P g * ,k as follows where κ is the scaling parameter, and P g * ,k l denotes the l-th column of the matrix square root of P g * ,k .
Step 2: Propagate each sigma point through the nonlinear system state function to obtain samples of the prediction density p (x k+1 |x k ): Step 3: Approximate the predicted mean µ g * ,k+1|k , the predicted covariance P g * ,k+1|k and the cross-covariance C g * ,k+1|k as follows: Step 4: Compute the smoother gain K g,k|N , the smoothed mean µ g,k|N and covariance P g,k|N : µ g,k|N = µ g * ,k + K g,k|N µ g,k+1|N − µ g * ,k+1|k , (25b) P g,k|N = P g * ,k + K g,k|N P g * ,k+1|N − P g * ,k+1|k K T g,k|N . (25c) As such, the smoothed density p(x k |Z N ) can be obtained by normalizing the weight of the updated components so that they sum to one. The smoothed state (point) estimates is thus the mean of the mixture: The key ideas and steps of the proposed AF-BUKS algorithm are illustrated by the block diagram in Fig. 1. Finally, the proposed AF-BUKS is summarized in Algorithm 3.

IV. APPLICATIONS OF AF-BUKF/S IN UASs AND EXPERIMENTAL RESULTS
In this section, we focus on the application of the proposed AF-BUKF/S algorithms to state estimation in UASs. We shall first describe the state-space model of the UASs used for simulation data generation. Then, the performance of the AF-BUKF/S algorithms are compared with conventional approaches including EKF and UKF on the simulated UAS data. Finally, we present experimental results on real flight data to show the efficiency of the proposed algorithm for state estimation of UASs. Due to space limitation, comparisons with previous works on adaptive fading are presented in the supplementary materials where we have shown that the proposed AF approach outperforms the state-of-the-arts algorithm in [34]- [37]. We have also compared the proposed algorithm with the STUKF/STCKF algorithms using a maneuvering bearingonly tracking example in [26] and the proposed approach is able to offer better accuracy for state estimation of the moving target. The results are also provided in the supplementary materials.

A. STATE-SPACE MODEL OF UASs
The system dynamics of small UASs can be mathematically represented by a continuous-time state-space model aṡ where ω(t) and υ(t) take into account the non-deterministic influences, e.g., state propagation errors and sensor measurement noise that affect the system state dynamics and measurement, respectively. The IMU, GPS and other available measurements can be included into a complete inertial navigation system (INS) for the state estimation. To convert the continuous-time model in (28) to a discrete-time version in (1), the following equation can be used to propagate x k−1 forward t seconds for an estimation of x k : where t is the sampling time.

B. SIMULATED DATA CASE
The UASs simulation model designed in [3] is used for performance comparison between the proposed method and the conventional approaches. There are totally 6 simulated UAS datasets in [3] including Letter E (D1), Letter E with Gusting (D2), Loiter (D3), Loiter with Gusting (D4), Square Spiral (D5) and Square Spiral with Gusting (D6). The gusting data (D2, D4 and D6) take the practical wind flickering problem into account, as small UAS flying at relatively slow speed may be heavily influenced by the movement of the air mass around it. The time duration for the simulation of each data is 400s and the sampling rate is 10Hz.

1) STATE-SPACE MODELS FOR SIMULATION
The nonlinear state and measurement functions are given in (26), as shown at the bottom of the page [3], where the time index is omitted for notation simplicity and the parameters involved are described in Table 1. The continuous time state-space model described above can be linearized and discretized so that an EKF algorithm can be employed for the state estimation. Due to space limitation, interested readers are referred to [3] for more details of the system model and its EKF implementation. In contrast to the EKF approach, the UKF-based algorithms rely on the unscented transformation and avoid the linearization of the state-space model. Hence, the UKF-based methods can mitigate the error caused by linearization and offer better tracking results as we shall show later in Section IV-B.3. Since the number of states is rather large, the complexity of particle filter-based methods is expecteḋ to be extremely large. Therefore, they are not included in the comparison.

2) SIMULATION AND PARAMETER SETTINGS
We first describe the settings for simulation data generation. Specifically, the uncertainties of the simulated system were mimicked by the following two scenarios: C1) Abrupt change of the measurement noise covariance to illustrate sudden changes in environmental condition 5 : the measurement noise was chosen as R k in the first 100s and abruptly increased to 50R k in the subsequent time, where R k = diag(0.05 2 , 2 2 , 2 2 , 2 2 , 1, 1, 1) with diag(a 1 , · · · , a n ) being a diagonal matrix with diagonal elements a 1 , · · · , a n ; C2) Measurement with impulsive noise components: impulsive noises with occurrence probability of 0.1 were incorporated into the measurement data, and the amplitude is randomly generated with zero mean and variance 50R k .
We now describe the settings for various approaches including EKF, UKF, STUKF, STCKF, BUKG-SGM and AF-BUKF/S. For a fair comparison, the state and measurement covariance matrices of the EKF, UKF, STUKF and STCKF algorithms in our experiments were set identical to those in [3], which are denoted with Q k and R k , respectively. For BUKF-SGM, the system state noise Here, we used a component with relatively larger state covariance to model possible uncertainty stemmed from system state noise, which was manually set by trial and error. The system measurement noise υ k in (6b) is p(υ k ) = 2 j=1 β j,k N υ k ; υ j,k , R j,k with β 1,k = 0.9, β 2,k = 0.1, υ 1,k = υ 2,k = 0, R 1,k = R k and R 2,k = 10R k . The component with larger measurement covariance could be used for impulsive noise suppression. Again the value was manually chosen via trial and error. The initial a posteriori pdf of the state was a GM with four components having the same weight.
In order to better show the effectiveness of the covariance compensation that benefits from the incorporated adaptive fading strategy in the proposed AF-BUKF approach, we let one of the GM components, both in state and measurement covariance matrices, be randomly generated with specific variances. Specifically, its settings were the same as those in BUKF-SGM, except that: 1) Q 2,k = r s Q k , where the value of r s was randomly generated with variance 2; 2) R 2,k = r m R k , where the value of r m was randomly generated with variance 10; We will see later in Section IV-B3 that the overall performance of the AF-BUKF approach is better than those of EKF and UKF with carefully chosen noises.
3) Additional parameters for the BKF during the estimation of fading factors in the AF-BUKF/S were as follows: the window size was M = 20; the initial a posteriori pdf of the state vector composed of fading factors was a GM with two components having the same weight; the pdf for the state noise was modeled by a GM with two components, i.e. (10 −4 , 10 4 ) and Q c 2,k = diag(10 −2 , 10 2 ); the pdf for measurement noise was modeled by single zero mean Gaussian as p(η k ) = N η k ; 0, R c k with R c k = 10 −2 ; the upper and lower bounds of the fading factors are set as 10 ≥ f s,k ≥ 1, 100 ≥ f m,k ≥ 1. 4) the delay for smoothing estimation of the AF-BUKS algorithm was set to be 10s (100 samples) which is quite less than the duration for the whole simulation, 400s. This is designed to mimic the case of online tracking when certain delay can be tolerated by the system. Settings for various algorithms were identical in both of the above-mentioned cases.

3) SIMULATION RESULTS
We now compare the performance of the proposed AF-BUKF/S algorithms with the EKF, UKF, STUKF, STCKF and BUKF-SGM approaches in turns through visual and quantitative assessments.
First, the estimated state variables in case C1 are presented on the left hand side of Fig. 2 for visual assessment. The plotted position (East and North locations) and orientation (Roll, Pitch and Yaw angles) were obtained by averaging the results over 100 Monte Carlo trials of various approaches. Due to page limitations, only the results for D4 are presented here for visual comparison. We refer interested readers to Part 4 of the supplementary materials for more results. In the first 100s duration for each data, it can be seen that all algorithms can track the state parameters accurately without any obvious visual artifacts, while the AF-BUKS is the best and the rest are with similar performances. However, for the subsequent tracking after the abrupt change in the measurement covariance, the BUKF-based methods perform significantly better than the other approaches, especially for the estimation of roll and pitch attitude. The improvement is mainly attributed to their non-Gaussian dynamic state modeling of UASs. This allows them to better track the state variables during rapid system changes by modeling the state and measurement noise as GMs. In contrast, the state or measurement variations in the conventional EKF or UKF are modeled by a single Gaussian component, whose noise covariance may not be exactly known a priori especially in time varying and other practical applications. On the other hand, the performances of the STUKF and STCKF algorithms are worse than the EKF and UKF methods. This suggests that the performance of the Student's t-based filters will be degraded substantially when the system noises are not heavy tailed, while the proposed AF-BUKF algorithm does not suffer from such restrictions. Moreover, from the calculated absolute errors (AEs) shown on the right hand side of Fig. 2, it can be seen that the proposed AF-BUKF/S algorithms yield better estimation accuracy than its non-adaptive BUKF-SGM counterpart. This demonstrates the benefits of using the adaptive fading strategy for state estimation with system uncertainties. Furthermore, the AF-BUKS algorithm outperforms the AF-BUKF method on the estimation of various state variables, especially for the position parameter as shown on the top right plot in Fig. 2. We attribute such improvements to the fusion of Bayesian smoothing and adaptive fading strategies. Overall, the proposed AF-BUKF/S methods are particularly useful for UASs state estimation under various uncertainties which may stem from unknown ambient noises such as wind force. Now, the results of case C2 are demonstrated in Fig. 3. Again, due to space limitations, only the results of D6 are presented here for illustration and more results can be found in the supplementary materials. To better visualize the different behaviors of various approaches, the locations of impulsive noises are randomly generated and then fixed at some specific time instants and the results with 100 realizations are shown in Fig. 3 for visual assessment. It can be seen that the BUKF-SGM and AF-BUKF/S algorithms are less sensitive to impulsive noises than the other compared approaches. From the AEs presented in the right column in Fig. 3, we can see that the student's t-based algorithms, STUKF and STCKF, are slightly better than the EKF/UKF approaches, while the BUKF-based methods can offer significantly better performance than the EKF/UKF algorithms. Moreover, it can also be seen that the overall performance of the AF-BUKF algorithm is significantly better than the BUKF-SGM approach. This is due to the fact that, in our simulation, the amplitude of the impulsive noise was randomly generated with variance 50R k which is larger than the largest component in the GMs of the measurement noise. In other words, there is a substantial model mismatch in the measurement noise for the BUKF-SGM method. In contrast, the AF-BUKF algorithm is able to automatically compensate for the mismatch through the adaptive fading strategy. Consequently, the adverse effect of the impulsive noise component can be effectively mitigated yielding better performance. Furthermore, the overall performance provided by the AF-BUKS algorithm is better than that of the AF-BUKF. This shows the significance for extending the AF-BUKF to its smoother version via the proposed unscented RTS smoothing concept.
Next, we compare the algorithms quantitatively in terms of the following time-averaged root mean squared error (RMSE), which is defined as where x l k (m) and x l k (m) denote respectively the l-th element of the true and filtered state x k and x at the m-th Monte Carlo run (m = 100 both in cases C1 and C2 for quantitative analysis). It shall be noted that, unlike in the visual analysis, the locations of impulsive noises were randomly generated in the quantitative assessment. The averaged RMSEs for various algorithms on each dataset are compared in Fig. 4. It is seen that the overall performance of the UKF algorithm is slightly better than that of EKF. It seems that the improvement of the unscented transform is somewhat marginal in this case, except for the estimation of the position in case C1 (see left hand sides in Fig. 4(a)). This is mainly due to the mild nonlinearity of the simulated UASs. It can also be noticed that VOLUME 8, 2020 FIGURE 5. Experimental results of AF-BUKF/S and EKF algorithms: left column shows the estimated attitude parameters including roll, pitch and yaw; right column presents the corresponding errors. The data length is from 7 to 13.5 minutes, which is the same as in [50].
the BUKF-SGM algorithm yields better performance than the EKF, UKF, STUKF, and STCKF approaches. This further confirms the usefulness of non-Gaussian dynamic state modeling of UASs. Moreover, as seen in Fig. 4, the proposed AF-BUKF/S algorithms outperform the BUKF-SGM and they provide much better performance than the other compared methods. This demonstrates the effectiveness of the proposed AF-BUKF/S algorithms for adaptive estimation of nonlinear states with system uncertainties.

C. REAL DATA CASE
To further demonstrate the performance of the proposed AF-BUKF/S algorithms, a benchmark test on real flight data was conducted. We use the real small UAS flight data in [50] to evaluate the performance of the AF-BUKF/S algorithms with comparison against the EKF, UKF, STUKF, STCKF and BUKF-SGM methods. The dataset is obtained from a low-cost IMU/GPS system made by Crossbow. It consists of GPS data from Micro Nav. and IMU data obtained from gyro and accelerometer readings. The ground truth is obtained by MIDG II, which is an integrated IMU/GPS system equipped on the aircraft that can provide higher quality state variables. The specifications of the MIDG II system are 0.4 • (1σ ) in pitch and roll, and 2 • (1σ ) in heading. The dataset was collected from an unmanned aircraft flying in circular patterns over 13.5 minutes. For detailed description of the nonlinear system models of the flight, we refer interested readers to [50] for more information.

1) PARAMETER SETTINGS
For the EKF algorithm, we keep the original settings of the model parameters i.e., the initial state and state error covariance, state noise covariance Q k and measurement noise covariance R k in [50] unchanged as they are carefully assigned. Same quantities are used in the UKF, STUKF, STCKF algorithms for fair comparison. For the BUKF-SGM, the settings are as follows: The system state noise is p(ω k ) = 3 i=1 α i,k N ω k ; ω i,k , Q i,k with α 1,k = 0.5, α 2,k = 0.25, α 3,k = 0.25, ω 1,k = ω 2,k = ω 3,k = 0, Q 1,k = Q k , Q 2,k = 2Q k and Q 2,k = 5Q k . The system measurement noise is p(υ k ) = 2 j=1 β j,k N υ k ; υ j,k , R j,k with β 1,k = 0.5, β 2,k = 0.5, υ 1,k = υ 2,k = 0, R 1,k = R k and R 2,k = 2R k . The initial a posteriori pdf of the state is a GM with four components having the same weight. These parameters were manually selected by trial and error for fair comparison. For the AF-BUKF/S, the settings of the GMs were identical to those in BUKF-SGM. Additional parameters for the BKF during the estimation of fading factors in the AF-BUKF/S are the same as those in Section IV-B.2.

2) EXPERIMENTAL RESULTS
We first compare the performance of the proposed AF-BUKF/S algorithms with that of the EKF algorithm in [50]. Specifically, the attitude parameters of roll, pitch and yaw estimated by these two algorithms are presented in the left hand size of Fig. 5. It can be seen that the overall performance of the proposed AF-BUKF algorithm is better than that of the EKF. We attribute this improvement to the non-Gaussian modeling of the state noise of the UASs in the AF-BUKF. This yields more accurate estimation results than the conventional EKF approach with a single component. Next, the error statistics for the AF-BUKF/S and EKF algorithms are plotted in the right hand size of Fig. 5 for comparison. The Euclidean distance between the estimated state x k and the ground truth x k is used as the error measure It can be seen that the AF-BUKS algorithm outperforms the AF-BUKF method, whole performance is much better than the EKF approach. As mentioned earlier, the AF-BUKS algorithm is particularly useful for offline applications or online applications where certain delay can be tolerated. This provides more freedom to users according to the system requirements.
Finally, in Table 2, the proposed AF-BUKF/S algorithms are compared with other approaches based on the mean Euclidean distance criterion, i.e., the errors of (31) averaged over time: The results indicate that the AF-BUKF/S algorithms are able to obtain better estimation accuracy than other approaches.

V. CONCLUSION
A novel AF-BUKF algorithm has been presented for state estimation of nonlinear and non-Gaussian processes. In contrast to the UKF approach, the AF-BUKF algorithm models the state and noise densities with finite Gaussian mixtures so that the problems caused by system noise uncertainties can be alleviated. Moreover, the AF-BUKF approach is capable of adapting the Gaussian components to further mitigate the adverse effect of inexact knowledge of the state and measurement noise covariance, and thus avoids tedious tuning and offers better adaptation to time-varying environment. In addition, the proposed AF-BUKF algorithm has been further extended to an AF-BUK smoother (AF-BUKS) using the concept of Rauch-Tung-Striebel smoothing for more accurate state estimation. Experimental results on both simulated and real flight data show that the proposed AF-BUKF/S algorithms can better adapt to the uncertainties of system noises and lead to more accurate and reliable estimation of the state of UASs than conventional approaches.