Quantification of mismatch error in randomly switching linear state-space models

Switching Kalman Filters (SKF) are well known for their ability to solve the piecewise linear dynamic system estimation problem using the standard Kalman Filter (KF). Practical SKFs are heuristic, approximate filters that are not guaranteed to have optimal performance and require more computational resources than a single mode KF. On the other hand, applying a single mode mismatched KF to a switching linear dynamic system (SLDS) results in erroneous estimation. This paper aims to quantify the average error an SKF can eliminate compared to a mismatched, single mode KF in a known SLDS before collecting measurements. Mathematical derivations for the first and second moments of the estimators errors are provided and compared. One can use these derivations to quantify the average performance of filters beforehand and decide which filter to run in operation to have the best performance in terms of estimation error and computation complexity. We further provide simulation results that verify our mathematical derivations.

Quantification of mismatch error in randomly switching linear state-space models Parisa Karimi, Zhizhen Zhao, Mark Butala, and Farzad Kamalabadi Abstract-Switching Kalman Filters (SKF) are well known for their ability to solve the piece-wise linear dynamic system estimation problem using the standard Kalman Filter (KF). Practical SKFs are heuristic, approximate filters that are not guaranteed to have optimal performance and require more computational resources than a single-mode KF. On the other hand, applying a single-mode mismatched KF to a switching linear dynamic system (SLDS) results in erroneous estimation. This paper aims to quantify the average error an SKF can eliminate compared to a mismatched, single-mode KF in a known SLDS before collecting measurements. Mathematical derivations for the first and second moments of the estimators' errors are provided and compared. One can use these derivations to quantify the average performance of filters beforehand and decide which filter to run in operation to have the best performance in terms of estimation error and computation complexity. We further provide simulation results that verify our mathematical derivations.
Index Terms-Switching Kalman filter, recursive estimation, detection, switching linear dynamic systems, model mismatch.

I. INTRODUCTION
A pervasive problem in virtually all branches of physics and engineering sciences, such as time-dependent tomography and imaging [1], [2], geophysical data assimilation [3], genetics [4], and economic forecasting [5], is the estimation of multi-dimensional state variables of a dynamical system from a collection of indirect, noisy measurements. Given the initial state distribution and a state-space model, state estimates may be recovered using Bayesian inference algorithms [6]. The Kalman filter [7] provides the optimal solution for the linear state-space model with additive Gaussian noise [8].
A computationally efficient generalization of the linear state-space model is obtained by augmenting hidden discrete random variables to the linear model to account for nonlinearities, referred to as a switching linear dynamic system (SLDS) [9]. In this model, random switches occur in the system's dynamic model, and Bayesian estimation may be used to estimate both the discrete modes and continuous states. Finding the exact posterior and optimal filtering in this scenario is computationally intractable [10] since the belief state grows exponentially with time, and practical SKF formulations produce suboptimal estimates (see e.g. [10]- [12], and the references within).
This paper aims to quantify the deviation of the wellknown SKF estimators from a single-mode mismatched KF analytically because the performance of the SKF may be significantly better than or close to a single-mode KF depending on the particulars of the switching distributions. This study is essential as using an SKF requires additional computation compared to a single-mode KF, and this computational burden can be significant or even intractable if the state dimension is large. The result developed in this work ensures that the SKF is used in practice only if its estimation has a considerable improvement compared to that of a mismatched KF (the metric by which an improvement is deemed significant or not is application-based). Our goal is achieved by studying the estimation errors as a function of the switching model and filter parameters, as well as the initial conditions. The performance of mismatched KFs in single-mode linear dynamic system has been studied previously [13]- [17]. In the case of switching dynamics, [18] explores the conditions under which the instantaneous mode detection is successful or not based on the statistics of the residuals of the predicted and the collected measurements. Also, Zhang et al. [19]- [22] study the convergence of a mode-based KF and argue the conditions under which the steady state's bias term will converge to zero in a switching mode dynamic system. However, to the best of our knowledge, estimation of the transient evolution of the error in an SLDS prior to running the experiment and collecting measurements has not been investigated. This paper provides a quantitative measure of how effective an SKF is in an SLDS in terms of mean squared error (MSE) compared to a mismatched, single-mode KF before collecting the measurements. The result informs the decision of whether or not to use an SKF in a particular scenario and how to choose the filter with the best performance in terms of computational considerations and estimation accuracy.
The SKF algorithm's performance in terms of MSE is a function of 1) the detection rate at each time step (and the detection algorithm), and 2) the mismatch bias whenever the algorithm detects the wrong mode. Both these parameters are functions of the switching distributions and transition probabilities. Due to space restrictions, we have assumed an estimate or an upper bound of the detection rate to be known in this work and calculate the MSE accordingly. Approximation of the detection rate as a function of the problem specification using approximate metrics [23]- [26] will be studied in a later publication.
The remainder of the paper is organized as follows. The common notations used throughout the paper are given in Section II. The SLDS signal model and KF/SKF formulations are reviewed in Sections III and IV, respectively. Section V derives estimation error for mismatched KFs and SKFs in SLDS and Section VI discusses practical implementations. Simulations verify the derivations in Section VII, and conclusions are presented in Section VIII. arXiv:2012.04542v1 [eess.SP] 8 Dec 2020 : the random vector x has a Gaussian distribution with mean m and covariance C. • E, C, p refer to the expectation, covariance, and probability operators, respectively, and I is the identity matrix.

III. SIGNAL MODEL
The state-space model for an SLDS may be defined as where the subscript n is the time step, x n is the hidden state variable to be estimated, and the given model parameters are the measurement vector y n , the z × z evolution matrix A n ∈ {A (Sn) }, S n = 1, ..., r (S n is the hidden random variable determining the mode of the system, to be detected), the m × z measurement matrix H n , and the covariance matrices , and E[ω n ν T n ] = 0 (m is the number of measurements, z is the state dimension, and r is the number of modes the system may switch between).

IV. KALMAN FILTER/SWITCHING KALMAN FILTER
The KF is an optimal estimator for linear dynamic systems.
, andP n|n = C(x n |y n 1 ). The "F ilter" operator is defined as which involves the repeated application of a time update step and a measurement update step The application of a single-mode KF to the general SLDS of (1)-(2) results in erroneous estimates. The well known SKF formulation detects the switching mode and its corresponding model parameters (A n , Q n ) at each time step, and estimates the state variables accordingly. Upon perfect detection of the modes, one could obtain optimal estimates of the state variable x n in terms of both MAP and MSE metrics. Due to the exponentially explosion of the posterior in optimal SKFs [10], several approximate SKF algorithms have been proposed (e.g. [10], [12]). Due to space restrictions, we assumed an estimate of the detection rate to be known and the details of the approximate SKF algorithms are not presented here.

V. DERIVATION OF THE MEAN SQUARED ERROR
To quantify the performance of each filter, we first study the estimation error imposed by applying a mismatched model to a single-mode linear dynamic model in Section V-A. Then, in Sections V-B and V-C, the effect of applying a single-mode mismatched KF and SKF to an SLDS is quantified using prior and transition probabilities of the SLDS.

A. Mismatched Kalman filter error
Instead of the state-space equations 1-2 with the correct dynamic evolution model (A n , Q n ) = (A, Q) at time n, consider a mismatched model using (A d n , Q d n ) = (A d , Q d ) (superscript d refers to the mismatched model). It is well known that KF estimates are unbiased and optimal, but this is true only when the correct model is used. In order to determine how far the estimates are from the correct model estimates, we study the error term e n = x n − x n|n , where x n is the ground truth state variable, x n|n = x d n|n is associated with the mismatched model, and K d n = K d is the Kalman gain at time n obtained based on equations (4)-(8) using the mismatched dynamic model (A d , Q d ).
x n = Ax n−1 + ν n , where B d n = I − K d H n . This defines a new state space model where the noise terms are white Gaussian (but note that the measurement noise ω n and state evolution noise B d n ν n − K d ω n are dependent) and the input is B d n (A − A d ) x n−1 . This error term may be studied in terms of its mean and covariance. The mean is given by To calculate C(e n ), we need to first calculate the following covariance terms: We denote C(x d n|n , x n ) by u n and obtain the following recursion, with u 0 = C(x 0|0 , x 0|0 + ν 0 ) = 0. Therefore, using (11) and calculating the covariance, we have All the above variables can be calculated recursively.

B. Single-mode Kalman filter error in an SLDS
In this section, an arbitrary single-mode KF is applied to an SLDS, and the MSE is calculated. Let l n refer to a trajectory from the set of all possible r n trajectories of discrete modes that may occur, where r is the number of possible modes to occur at each time step and n is the time step. Also, let e ln n be the conditional error of the KF with trajectory l n , so its mean and covariance can be calculated based on Section V-A recursively given the trajectory. Assuming l n = [l n−1 , i] s.t. i ∈ {1, ..., r} and n > 1, the error at each time is given by: e n = ln δ ln e ln n , where x ln−1 n−1 and e ln−1 n−1 refer to the ground truth state variable and error for trajectory l n−1 , K n is the Kalman gain for the single-mode KF at time n, and δ ln equals one when trajectory l n occurs and is zero otherwise. The expectation of the error at n over all possible trajectories is, where π ln is the probability of trajectory l n and can be calculated based on the given SLDS transition probabilities and priors. Similarly, we compute E[e n e T n ] = ln π ln E[e ln n (e ln n ) T ] and the covariance C(e n ) = E[e n e T n ] − E[e n ] E[e n ] T . This formulation can be used to calculate the performance of an arbitrary single-mode KF in an SLDS.

C. Switching Kalman filter error in SLDS
We now calculate the MSE when a SKF algorithm is applied to a known SLDS. Let l n and q n refer to the trajectory that occurs (the true trajectory) and that is detected using the SKF algorithm in an SLDS, respectively, each taking values in be the conditional error based on these trajectories, which can be calculated recursively based on results from Section V-A given the trajectories l n and q n : where δ ln;qn equals one when l n occurs and q n is detected, and 0 otherwise. The mean of this random process is calculated as where π ln,qn is the probability that trajectory l n occurs and trajectory q n is detected, which may also be calculated recursively. Similarly, we compute E[e n e T n ] = VI. DISCUSSION Some challenges in calculating the derived statistics are discussed below. 1) Applying the formulation to multi-modal cases enables making the optimal decision on which modes to keep in a SKF framework, but at huge computational cost due to having r n trajectories at time n (r = number of modes). A suboptimal, feasible solution to the problem calculates the marginal transition probability between each pair of modes and applies the formulation to each pair. In this framework, the collection of switching dynamic systems is represented using a graph network where each node is a dynamic system mode, as shown in 1. For each pair of nodes: if using an SKF for the pair does not provide a significant improvement over single-mode KF using the proposed formulation, merge them and if not, keep both. A multi-modal problem is divided into multiple bi-modal problems, as a result.
2) The recursive calculation of (14) and (17) for all possible trajectories is practically infeasible as n becomes large. We propose a solution to solve this problem under two scenarios: (i) If the transition probabilities are equal, the bias at time n based on (18) for a bi-modal system can be derived recursively using the same notation as in Section V-C: where we assumed δ T D [n] to be the Kronecker delta when the correct mode is detected at time n and δ F D [n] = 1 − δ T D [n] (in this study, the true detection rate is approximated by a constant rate for both modes at each time step in order to make the computations practically feasible), δ [ln−1,i];qn−1 = δ i|ln−1 δ ln−1;qn−1 due to the Markov property of transitions, δ i|ln−1 is the Dirac delta when mode i occurs at time n given that the trajectory l n−1 occurred at time n − 1, and L T D,i and L F D,i refer to the linear functions based on (11) when true detection and false detection occurs, respectively. The fact that modes i = 1, 2 are independent (so the covariance between the terms of the summation over the current mode i is zero), δ i|ln−1 = δ∀i due to the transitions being equi-probable, as well as the properties of covariance for a linear combination of variables, allowed us to conclude that knowledge of the mean and covariance of x n−1 and e n−1 and the noise statistics is sufficient to recursively calculate the MSE. In this case, propagating all trajectories is not required to calculate the statistics at each time step. The same reasoning also applies to the formulation of a single-mode KF in an SLDS, as a special case of this general scenario.
(ii) When the transition probabilities are not equal, the probability of each trajectory is a deterministic function of the transition matrix and the initial probabilities. Therefore, by keeping the K trajectories with the largest probabilities that sum to P c and ignoring the rest of them, the calculated MSE is ensured to be in an -neighborhood of the correct MSE ( is a function of P c and the dynamic system's parameters). The number of trajectories required to keep increases as time increases and as the transition probabilities tend to equal (0.5).
Mode detection in an SKF has its largest error when all transition probabilities are equal (0.5 in bimodal system), since the uncertainty is maximized in this case. Therefore, studying the performance of an SKF using equal transition probabilities between modes is a computationally feasible and robust metric for the purpose of comparison of the performance of an SKF with a single-mode Kalman filter.
VII. SIMULATIONS A bi-modal 4D state-space model is simulated via Monte Carlo (MC) simulations, and the error statistics calculated using analytic derivations and MC simulations are compared to verify the proposed formulation. An approximation of the detection rate is assumed to be given in these simulations and the SKF gain at time n when mode i occurs is approximated by the gain of the KF with mode i at time n. Let the state-space equations be as presented in (1) The single-mode KFs using models (1) and (2), as well as the "average KF" A n = π 1 A (1) + π 2 A (2) , where π i is the probability of mode i at each time n (calculated based on the prior and transition probabilities of the discrete mode), are used for estimation. Intuitively, if the switching distributions are close to each other (e.g. in the KL divergence sense), the average filter's estimates are close to the optimal solution. Alternatively, if the distributions are far from each other (in the KL divergence sense), the average KF's estimates are poor.
Using 20k MC simulations, E[e n ] and C[e n ] are calculated analytically, and the MSE is calculated based on MSE = E[e n ] T E[e n ]+Tr[C(e n )] (Tr refers to the trace of the matrix) for the SKF and mismatched single-mode KFs using the proposed formulation. Fig. 2 show the consistency between the analytic and MC calculated MSEs. In this case, there is a visually significant difference between the performance of the SKF compared to the single-mode KFs (the level of significance is application dependent). Simulations for higher state dimensions also verified the MSE derivations.
VIII. CONCLUSION AND FUTURE WORK The MSE performance of the SKF was compared to an arbitrary single-mode mismatched KF analytically using a recursive formulation. This formulation may be used to decide which filter to run operationally for a specific SLDS. This work is a step towards automating the filter decision process for a specific SLDS scenario by evaluating the accuracy versus computation trade-off.