Analytic Minimum Mean-Square Error Bounds in Linear Dynamic Systems With Gaussian Mixture Noise Statistics

For linear dynamic systems with Gaussian noise, the Kalman filter provides the Minimum Mean-Square Error (MMSE) state estimation by tracking the posterior. Similarly, for systems with Gaussian Mixture (GM) noise distributions, a bank of Kalman filters or the Gaussian Sum Filter (GSF), can provide the MMSE state estimation. However, the MMSE itself is not analytically tractable. Moreover, the general analytic bounds proposed in the literature are not tractable for GM noise statistics. Hence, in this work, we evaluate the MMSE of linear dynamic systems with GM noise statistics and propose its analytic lower and upper bounds. We provide two analytic upper bounds which are the Mean-Square Errors (MSE) of implementable filters, and we show that based on the shape of the GM noise distributions, the tighter upper bound can be selected. We also show that for highly multimodal GM noise distributions, the bounds and the MMSE converge. Simulation results support the validity of the proposed bounds and their behavior in limits.


I. INTRODUCTION
Sequential Bayesian estimation techniques provide probabilistic approximations of the state in dynamic systems. Specifically, these techniques track the posterior, hence provide the MMSE state estimate, as it is the expected value of the posterior [1], [2]. For the special case of linear systems with Gaussian process and measurement noise, the Kalman filter can optimally track the posterior [3], [4], and it provides the MMSE state estimate [5]. However, in many applications the noise distributions are non-Gaussian. Some examples include event-based state estimation [6], [7], systems with multipath fading or non-line-of-sight conditions (NLOS) [8], systems using multiple measurement sensors [6], or systems with maneuvering targets [2]. In such cases, the Kalman filter cannot provide an optimal MMSE solution [9] as we need higher order statistics to represent the posterior [10]. Hence, approximations should be made to find suboptimal MMSE solutions [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney.
For linear dynamic systems with GM noise statistics, a bank of Kalman filters, or the Gaussian Sum Filter (GSF), can be used to track the posterior and provide the MMSE state estimate [2], [27]. However, the MMSE itself is not analytically tractable. Moreover, the general analytic bounds on MMSE including Posterior Cramer-Rao [28], Bobrovsky-Zakai [29], and Weiss-Weinstein [30], [31] become nontractable for GM noise statistics. Thus, in this work we evaluate the MMSE and provide analytic lower and upper bounds that unlike [32], [33], do not require complex numerical approximations. This is particularly important as the MMSE is the best achievable MSE of any estimator, and by having low complexity methods to evaluate its bounds, suitable filtering schemes for specific applications can be designed. An important application for this work is for the multiple model systems where each model corresponds to a mode in the posterior and can be tracked by a filter. This applies to cases where there is model uncertainty [34], [35] or for modality selection [36]. This work can be considered an extension to [37], in which the authors propose analytic bounds for static systems with GM noise distributions. Contrarily, in this work we consider dynamic systems, in which the state can change over time. Additionally, we propose two implementable filters which have MSEs acting as upper bounds for the MMSE. The choice between these two upper bounds depends on the shape of the GM noise parameters. Moreover, since they are the MSEs of implementable filters, they can be used instead of the MMSE filter in applications where the upper bound for the MMSE provides the desired precision. The system and noise models used in this work, are the same models that we used in [8], [38]. However, the purpose of this work is to provide analytic bounds on the MSE of the MMSE state estimator, rather than proposing new filtering or reduction schemes.
The rest of this paper is organized as follows: In Section II we provide the details of the system model and introduce the notation used in this paper, as well as the details of GSF, which is the MMSE filter. In Section III, we evaluate the MMSE and provide its analytic lower bound in Section III-A. In Section III-B we propose two upper bounds corresponding to the MSEs of two implementable filters. Depending on the shape of the GM noise models in the system, the tighter upper bound can be chosen. This is shown through simulations in Section IV. Finally, the paper is concluded in Section V.

II. SYSTEM MODEL
Suppose an observable, discrete-time linear dynamic system, described by the following dynamics and measurement equations: where {x k , k ∈ N}, and {z k , k ∈ N} are the state and measurement sequences, respectively. The process noise, {v k , k ∈ N} are independent random vectors with the known pdfs {p(v k ), k ∈ N}. The measurement noise vectors, {w k , k ∈ N} are also independent random vectors with the known pdfs {p(w k ), k ∈ N}, and they are independent from the process noise. The matrices F k and H k are known and they define the linear relationships between the current and previous state vectors, and the current state and measurement vectors, respectively. If we denote the size of the state vector by n x and the size of the measurement vector by n z , the process noise vector is of size n x and the measurement noise vector is of size n z . The matrices F k and H k are of size n x × n x and n z × n x , respectively. In many signal processing applications, it is required to estimate the current unobservable state of the system, x k , from the available noisy observations, z 1:k . Since the MMSE estimator of state is the expected value of the posterior distribution [1], [2], in Bayesian filtering techniques the state is estimated probabilistically, by approximating the posterior distribution, p(x k |z 1:k ). For the special case of Gaussian process and measurement noise distributions, the Kalman filter optimally tracks the mean and covariance matrix of the Gaussian posterior [3], hence providing the MMSE state estimate. However, for non-Gaussian noise distributions other approximations need to be used.
Gaussian Mixtures (GM) have been an attractive method for approximating non-Gaussian noise distributions, as they provide asymptotically unbiased estimations [11], with the desired level of accuracy [5]. 1 Moreover, by using the conditionally Gaussian representation for Gaussian Mixtures (GM), the MMSE state estimator can be evaluated by a bank of Kalman filters, or Gaussian Sum Filter (GSF) [2], [27] (see Section II-A). Therefore, in this work we assume that the process noise, measurement noise, and the initial state pdf are represented by GM distributions. Hence, assuming the GM process noise pdf has C v k components, with component means where i ω i k = 1, and N (x; µ, ) represents a Gaussian distribution with argument x, mean µ, and covariance matrix . Similarly, the measurement noise distribution can also be written as: where, C w k is the number of components of the GM distribution with coefficients ρ j k , 1 ≤ j ≤ C w k and j ρ j k = 1. The mean and covariance matrix of component j, 1 ≤ j ≤ C w k are b j k and R j k , respectively.

A. MMSE FILTER
Using the noise distributions in (3)-(4) for the system defined in (1)-(2), the posterior can be viewed as having multiple models, M ij k ; 1 ≤ i ≤ C v k , 1 ≤ j ≤ C w k , each corresponding to a mode in the posterior, which is a unique combination of the components in (3) and (4). Hence, we can write: which denotes a GM distribution with C v k ×C w k components, with coefficients The mode-conditioned posterior, p x k z 1:k , M ij k , is a Gaussian distribution with the pdf, and its parametersx ij k|k and P ij k|k can be evaluated using the mode-matched Kalman filter [2], [3], as follows: wherex k−1|k−1 and P k−1|k−1 are the estimated state and the estimation error covariance matrix at k − 1, respectively, and (.) T indicates the transpose of its argument. The matrix S ij k and the vector W ij k represent the measurement prediction covariance matrix and the filter gain, respectively. Consequently, the posterior in (5) can be tracked by a bank of C v k × C w k mode-matched Kalman filters [2], [5]. Using the coefficients µ ij k (z k ) are evaluated as: Notice the difference in the condition on measurements history between (6) and (16).
Since the posterior density in (5) is a Gaussian mixture, having the mode-conditioned state estimations and covariance matrices, we can find the MMSE state estimate, and the covariance matrix of the combined estimate, where E x {g(x)} is the expected value of the function g(x) with respect to the random vector

III. UNCONDITIONAL MSE EVALUATION AND BOUNDS
Using an approach similar to [37], in this section we evaluate the unconditional MSE of the MMSE filter, denoted by 2 k . Lemma 1: The unconditional MSE of a sequential state estimator, with the state estimation error covariance matrix P k|k , is equal to where tr(.) denotes the trace of its argument. Proof: See Appendix V. Using Lemma 1, and (21), we can write: where To evaluate the MMSE, we find the matrix M k . Using (22) we can write where, Additionally, from (19), we have Hence, M 1 k is evaluated as since P ij k|k is not a function of the measurements (see (14)). Similarly, using (30), (13), and (14) we can write: To evaluate M 3 k , we use (20) and write: which cannot be evaluated in closed-form due to the GM pdf in the denominator. Additionally, since it cannot be written as an expected value, numerical methods used for evaluating expectations cannot be applied to this problem. Hence, in this work we propose bounds for the MMSE.

A. LOWER BOUND FOR THE MMSE
The MMSE filter evaluates and tracks the mode-conditioned posteriors and combines them using (20) to provide the state estimation. However, at each iteration only one of the models is active, corresponding to the process and measurement noise components in effect at that iteration. Consequently, if prior information about the active model is available, by only including the posterior component corresponding to this model in the estimation of the current state, the total MSE is minimized. This is due to the fact that Kalman filter is the optimal state estimator if the model parameters are selected correctly [3], [4]. Assuming that component i and j are the active components of the process and measurement noise, respectively, we denote the current active model by M ij * k . The Matched filter state estimation and covariance matrix can then be defined as: Hence, we have 2 Now if * 2 k denotes the MSE of the Matched filter, using Lemma 1, we can write * 2 k = tr P * k|k p(z 1:k )dz 1:k .
Using (25) To evaluate * 2 k , we define Alternatively, Corollary 1 can be proved using an approach similar to [37]. Specifically, using the definitions in (27)-(29), we have Additionally, since it is the expected value of the spread of means, Hence, we have By applying (37) to (55), (51) is proved. 2 It is worth noting that there could exist another model, M lm k = M ij * k , such that where . p indicates the p-norm of its argument. Hence, the MSE of the estimatorx lm k|k will be lower than P * k|k . However, since the active model is M ij * k , this estimator can be biased.
GSF-R is defined as the Kalman filter ij in GSF. In other words, GSF-R selects and uses the Kalman filter with the maximum weight in GSF. We also define

Proposition 1: The unconditional MSE of GSF-R, acts as an upper bound for the MMSE.
Proof: Hence, we can write which is a function of the current observation and the parameters of the GM posterior, i.e. the coefficients, component means, and covariance matrices. Now, if R 2 k denotes the MSE of GSF-R, using Lemma 1 we have which is equal to the trace of M R , where The covariance matrix of GSF-R can be written as 3 Due to the unavailability of prior information about the active mode.
The first term in (65), can be evaluated using (21) as Now, by conditioning on the active model and the selected model by GSF-R and using (62), we can find Similarly, we can find Finally, by conditioning on the selected model by GSF-R and using (60), we can find Hence, C R k|k can be written as as we have Thus, and using (24), (25), (63) we have The details of evaluating M R k , are provided in Appendix V.

C. ALTERNATIVE UPPER BOUND
The Kalman filter provides the MMSE state estimation for Gaussian noise distributions by tracking the sufficient statistics of the Gaussian posterior. For non-Gaussian posteriors, however, the Kalman filter can only provide the Linear MMSE (LMMSE) state estimation [1]. In this section, using an approach similar to [37], we use the Linear MMSE (LMMSE) to find an upper bound for MMSE.
To apply the Kalman filter to the system defined in (1)-(2), we need to use the moment-matched Gaussian distributions of the GM noise pdfs in (3)-(4). Hence, we define: Thus, the LMMSE can be evaluated by running a Kalman filter on a Gaussian process noise with meanū k , and covariance matrixQ k , and a Gaussian measurement noise with meanb k and covariance matrixR k , as follows: The shape of the GM noise parameters can affect the choice of the upper bound. Specifically, increasing the distance between the means of components in the GM noise distributions, increases the distance between the components of the GM posterior distribution. Consequently, tr P K k|k will have a larger value. This is supported through simulations in Section IV. However, this trend is not valid for tr P R k|k , hence it is a better upper bound for highly multimodal GM posteriors (See Proposition 3).

Proposition 3: Increasing the Mahalanobis distance between the components of the GM posterior decreases the difference between the bounds in (51), (76) and the MMSE. In the limit, the upper bound in (76) and the MMSE, converge to the lower bound in (51).
Proof: Larger Mahalanobis distance among the components of the GM posterior, leads to increased difference between the likelihoods, In the limit, we have only one mixand corresponding to Hence, using (86) in (28), (29), we have which yields Additionally, since in the limit there is no overlap between the mixands of the GM posterior, from (61), we have Also, from (73) Using(90) in (88), we have

IV. SIMULATION RESULTS
In this section we use synthetically generated data to approximate the MMSE and compare it with the proposed bounds, as well as the Posterior Cramer-Rao lower bound [28]. In our simulation scenario, we assume a positioning system with the following parameters: where t k is the time interval between the measurements z k−1 and z k . 4 In our simulations, we use t k = 0.1080 s for all iterations, k.
The process model used in our simulations is a random walk velocity motion model. Therefore, in (1) we have: where v k is a univariate random variable. In our simulations, we use the same model 5 to generate process and measurement noise samples. To ensure that our simulation results are not model dependent, we used three different GM models to generate our data: Model 1: Symmetric distribution with the mixands all having the same weights. Model 2: Symmetric distribution with the mixands having not necessarily equal weights. Model 3: Asymmetric distribution. The coefficients of the mixands are chosen such that the noise processes are zero mean. In our simulations, we assumed GM distributions with 5 components, each having a variance of 1. The coefficients and the means of the components are given in Table 1, and they are denoted by w and m, respectively. The parameter c in this table is multiplied by the means to change the distance among the components of measurement noise distribution, i.e. the separation among the components. For the process noise, we use c = 1 to keep the state vector, and hence the error, bounded. 6 Fig. 1-3, show the MMSE versus the Kullback-Leibler (KL) divergence between the GM measurement noise distribution and its moment-matched Gaussian pdf for the three GM models used. 7 In addition to the proposed bounds, we have also provided numerically approximated Posterior Cramer-Rao lower bound for the MMSE filter, for comparison purposes. The values in these figures are evaluated at approximately 95% confidence interval with 1000 Monte Carlo runs. As shown in Section III-A, the lower bound for 6 As shown in [2], if the filter is not completely consistent with the underlying system model, the estimation error is a function of the state vector. Therefore, if the state vector is unbounded, the filter will be unstable. To avoid this, we kept the variance of the state vector constant and only varied the measurement noise distribution. 7 The KL divergence between the process noise distribution and its moment-matched Gaussian pdf for Model 1, 2, and 3 is 2.0352, 1.9980, and 2.6255, respectively.  the MMSE is the MSE of the Matched filter, for which we have prior information about the noise components in effect. Since the unconditional MSE of the Matched filter is only a function of the coefficients and the covariance matrices of the components (see (50)), it remains constant throughout our simulations as we are only varying the separation between components by changing the parameter c. This is not true for the Posterior Cramer-Rao lower bound (PCRLB). Specifically, PCRLB follows the trend of the MMSE more closely when compared with our proposed lower bound. Consequently, for smaller values of KL divergence (approximately KL < 1), it provides a tighter lower bound on the MMSE compared with our proposed lower bound. However, for larger values of KL divergence (approximately KL > 1), PCRLB is smaller than the proposed lower bound. Moreover, it is worth noting that unlike the proposed lower bound, PCRLB cannot be evaluated analytically for GSF and requires numerical approximations.
Unlike the proposed lower bound, the performance of the upper bounds depends on the separation between the components. Specifically, for smaller values of KL divergence, i.e. KL < 1.2, KL < 1.5, KL < 1.6 for Model 1, 2, and 3, respectively, the Kalman filter provides a tighter upper bound (LMMSE) when compared with GSF-R. However, further increasing the separation between the components, hence the KL divergence, improves the performance of GSF-R as an upper bound. This is because with more separation between the components, the parameter γ ij,lm * k (z k ) will have a smaller value for ij = lm * . Consequently, the part of P R k|k which depends on the distance between components will be smaller.
It is also worth noting that the MSE of GSF-R follows the same trend as the MMSE versus the KL divergence between the measurement noise and its moment-matched Gaussian pdf. This is due to the fact that the performance of both filters depends on the separation among the components, since they both depend on the parameters µ ij k (z k ) to estimate the state. Specifically, when the components are very close (approximately KL < 0.1), both MMSE and GSF-R cannot find the active models accurately as the likelihoods for all components are very close. Moreover, since the distribution is very close to a single Gaussian, the Kalman filter performs similar to the MMSE filter. However, further increasing the KL divergence, increases the MMSE until it reaches its maximum, at 0.5 ≤ KL ≤ 1. This is because at lower separation levels, i.e. 0.1 < KL < 0.5, the MMSE filter cannot differentiate between the active model and the other models using the coefficients. Increasing the KL divergence past the peak of MMSE (KL > 0.5), results in lower MMSE since with higher separation, the coefficients are better indicative of the active model. This is why the MSE of GSF-R also decreases at these KL divergence values.
As mentioned earlier, to avoid having an unbounded error, in this work we used a constant process noise pdf. However, as shown in [8], at very large KL divergences, e.g. KL > 6 for process and measurement noise, the MMSE converges to the MSE of the Matched filter, i.e. the lower bound (see Proposition 3). However, since the wrong selection of the active model can lead to unbounded error [2], GSF-R becomes highly unstable for large values of KL divergence, especially for Model 1 and 3 which have a more uniform weight distribution (among the components of the noise distributions), and rely more on the likelihood for the correct selection of the active model.
Based on the simulation results provided in this section, the proposed analytic bounds for MMSE show good performance, with the maximum distance between the bounds and MMSE equal to 20 dB for lower bound and 10 dB for the upper bounds. Moreover, depending on the GM noise distributions in the system under study, a tighter upper bound can be selected. For instance if the fitted Gaussians provide better representations of the noise distributions than their components with maximum weights, LMMSE can provide a tighter bound. This is true at approximately KL ≤ 1.5 in our simulations. This is particularly important since the proposed upper bounds are the MSEs of actual implementable filters. Hence, when the upper bound for the MMSE provides the desired estimation precision, the suitable upper bound filter (Kalman or GSF-R depending on the noise models) can be used instead of the MMSE filter to save computational resources.

V. CONCLUSION
In many signal processing applications, we seek to find an estimation of the inherent state of a dynamic system from the available noisy observations. Bayesian filtering techniques can be applied to the available noisy observations to estimate the unknown state. For linear dynamic systems with Gaussian noise, the Kalman filter provides the MMSE state estimation. Thus, by approximating non-Gaussian noise distributions with Gaussian Mixtures (GM), a bank of Kalman filters or the Gaussian Sum Filter (GSF) can be used to find the MMSE state estimation. However, the MMSE itself is not analytically tractable. Specifically, the proposed analytic bounds in the literature, including Posterior Cramer-Rao [28], Bobrovsky-Zakai [29], and Weiss-Weinstein [30], [31], do not have a closed-form for GM noise distributions. Hence in this work we first evaluate the MSE of GSF, and then propose analytically tractable lower and upper bounds for the MMSE.
The proposed lower bound is the MSE of a filter which selects and uses the Kalman filter in GSF corresponding to the active model, by using prior information. The upper bounds, however, are given by filters that do not require prior information and therefore they are implementable: We propose two filters with their MSEs acting as upper bounds for the MMSE. The first upper bound is the MSE of the Kalman filter which is the Linear MMSE (LMMSE). The second upper bound, is the MSE of GSF-R which only selects and uses the Kalman filter with the maximum weight in GSF. The choice between these two filters depends on the GM noise parameters, as shown through simulations. We also show that in the limit, when the Mahalanobis distances between the components of the posterior approach infinity, the MSE of GSF-R (upper bound) and the MMSE both converge to the lower bound. The fact that the upper bound filters are implementable is particularly important, since for certain applications where the upper bound provides the desired precision, the suitable upper bound filter can be used instead of the MMSE filter, hence saving computational resources.

APPENDIXES APPENDIX A PROOF OF LEMMA 1
Proof: Assume a sequential state estimator, which provides an estimate of the unknown state x k , using the available measurements z 1:k . If we denote the state estimation byx k , and the state estimation error covariance matrix by P k|k , the unconditional MSE of this estimator, 2 k , can be evaluated as × p(x k |z 1:k )p(z 1:k )dx k dz 1:k (96) = tr P k|k p(z 1:k )dz 1:k .