Autonomous Tracking and State Estimation With Generalized Group Lasso

We address the problem of autonomous tracking and state estimation for marine vessels, autonomous vehicles, and other dynamic signals under a (structured) sparsity assumption. The aim is to improve the tracking and estimation accuracy with respect to the classical Bayesian filters and smoothers. We formulate the estimation problem as a dynamic generalized group Lasso problem and develop a class of smoothing-and-splitting methods to solve it. The Levenberg–Marquardt iterated extended Kalman smoother-based multiblock alternating direction method of multipliers (LM-IEKS-mADMMs) algorithms are based on the alternating direction method of multipliers (ADMMs) framework. This leads to minimization subproblems with an inherent structure to which three new augmented recursive smoothers are applied. Our methods can deal with large-scale problems without preprocessing for dimensionality reduction. Moreover, the methods allow one to solve nonsmooth nonconvex optimization problems. We then prove that under mild conditions, the proposed methods converge to a stationary point of the optimization problem. By simulated and real-data experiments, including multisensor range measurement problems, marine vessel tracking, autonomous vehicle tracking, and audio signal restoration, we show the practical effectiveness of the proposed methods.


I. INTRODUCTION
A UTONOMOUS tracking and state estimation problems are active research topics with many real-world applications, including intelligent maritime navigation, autonomous vehicle tracking, and audio signal estimation [1]- [5].The aim is to autonomously estimate and track the state (e.g., position, velocity, or direction) of the dynamic system using imperfect measurements [6].A frequently used approach for autonomous tracking and estimation problems is based on Bayesian filtering and smoothing.When the target dynamics and observation models are linear and Gaussian, Kalman smoother (KS) [1], [7] provides the optimal Bayesian solution which coincides with the optimal minimum mean square error estimator in that case.In the case of nonlinear dynamic systems, the iterated extended Kalman smoother (IEKS) [8]- [10] makes use of local affine approximations by means of Taylor series for the nonlinear functions, and then iteratively carries out KS. Sigma-point based smoothing methods [11], [12] employ sigma-points to approximate the probability density R. Gao and S. Särkkä are with the Department of Electrical Engineering and Automation, Aalto University, Espoo, 02150 Finland (e-mail:rui.gao@aalto.fi;simo.sarkka@aalto.fi).R. Claveria-Vega and S. Godsill are with the Department of Engineering, University of Cambridge, CB2 1PZ, UK (email:rmc83@cam.ac.uk; sjg@eng.cam.ac.uk).This work was supported by Academy of Finland.
of the states, which can preserve higher-order accuracy than IEKS.Random sampling-based filters such as particle filters [1], [13]- [15] can be used to deal with nonlinear tracking situations involving potentially arbitrary nonlinearities, noise, and constraints.Although these trackers and estimators are capable of utilising the measurement information to obtain the estimates, they ignore sparsity dictated by physical attributes of dynamic systems.
The motivation for our work comes from the following real-world applications.One significant application is marine vessel tracking [5], [6].Vessels are frequently pitching and rolling on the surface of the ocean, which can be modelled as sparsity in the process noise.Our methodology is also applicable to autonomous vehicle tracking, which enables a vehicle to autonomously avoid obstacles and maintain safe distances to other vehicles.In presence of many sudden stops (i.e., velocities are zero), the tracking accuracy can be improved by employing sparsity [16].Other examples of tracked targets include robots [9] and unmanned aerial vehicles [17].Another practical application is audio signal restoration, where typically only a few time-frequency elements are expected to be present, and thus, sparsity is an advisable assumption [18].For example, Gabor synthesis representation with sparsity constraints has been proved to be suitable for audio restoration [19].Similar problems can also be found in electrocardiogram (ECG) signal analysis [20] and automatic music transcription [21].Hence, computationally effective sparsity modelling methods are in demand.
Since sparsity may improve the tracking and estimation performance, there is a growing literature that proposes sparse regularisers such as Lasso (i.e., least absolute shrinkage and selection operator, or L 1 -regularisation) [22], [23] or total variation (TV) [24], [25] for these applications.Existing methods for sparse tracking and estimation can be split in two broad categories: robust smoothing approaches and optimisationbased approaches.The former approaches merge filtering and smoothing with L 1 -regularisation.For instance, the modified recursive filter-based compressive sensing methods were developed in [26]- [29].An L 1 -Laplace robust Kalman smoother was presented in [30].Using both sparsity and dynamics information, a sparse Bayesian learning framework was proposed in [31].The latter approaches formulate the whole tracking and state estimation problem as an L 1 -penalised minimisation problem, and then apply iterative algorithms to solve the minimisation problem [16], [32]- [36].While these L 1 -penalised estimators offer several benefits, they penalise individual elements of the state vector or process noise instead of groups of elements in them.

arXiv:2007.11573v3 [stat.ME] 30 May 2021
tion problem as a generalised L 2 -minimisation problem.Particularly, we present a broad class of regulariser configurations parametrised by sets of matrices and vectors.We introduce the batch tracking and estimation methods in Section III and present three augmented recursive smoothing methods in Section IV.In Section V, we establish the convergence.In Section VI, we report numerical results on simulated and reallife datasets.Section VII draws the concluding remarks.
The notation is as follows.Matrices X and vectors x are indicated in boldface.(•) represents transposition, and (•) −1 represents matrix inversion.The R-weighted norm of x is de- is the (g, t):th element of matrix X, and x (k) denotes the value of x at k:th iteration.vec(•) represents a vectorisation operator, diag(•) represents a block diagonal matrix operator with the elements in its argument on the diagonal, and x 1:T = vec(x 1 , . . ., x T ).∂φ(x) denotes a sub-gradient of φ.J φ is the Jacobian of φ(x).δ + (A) denotes the smallest eigenvalue of A. p(x) denotes probability density function (pdf) of x and N (x | m, P) denotes a Gaussian probability density function with mean m and covariance P evaluated at x.

II. PROBLEM STATEMENT
Let y t ∈ R Ny be a measurement of a dynamic system and x t ∈ R Nx be an unknown state (sometimes called the source or signal).The state and measurement are related according to a dynamic state-space model of the form where h t : R Nx → R Ny and a t : R Nx → R Nx are the measurement and state transition functions, respectively, and t = 1, . . ., T is the time step number.The process and measurement noises q t ∼ N (0, Q t ) and r t ∼ N (0, R t ) are assumed to be zero-mean Gaussian with covariances Q t and R t , respectively.The initial condition at t = 1 is given by x 1 ∼ N (m 1 , P 1 ).A particular special case of ( 1) is an affine Gaussian model by where A t ∈ R Nx×Nx , H t ∈ R Ny×Nx are the transition and measurement matrices, and b t , e t are bias terms.
The goal here is to obtain the "best estimate" of x 1:T from imperfect measurements y 1:T .For computing x 1:T with sparsity-inducing priors, we define a set of matrices G g,t ∈ R Pg×Nx | g = 1, . . ., N g , matrices B t , and vectors d t , for t = 1, . . ., T , and impose sparsity on the groups of elements of the state or the process noise.Mathematically, the problem of computing the state estimate x 1:T is formulated as where µ > 0 is a penalty parameter.A merit of our formulation is its flexibility, because the selections of G g,t , B t , and d t can be adjusted to represent different regularisers.With matrix G g,t , the formulation (3) accommodates a large class of sparsity-promoting regularisers (e.g., Lasso, isotopic TV, anisotopic TV, fused Lasso, group Lasso, and sparse group Lasso).A list of such regularisers is reported in Table I.Meanwhile, the formulation (3) also allows for putting sparsity assumptions on the state or the process noise by different selections of B t and d t (see Table II).

Regularisation
Gg,t descriptions L 2 -regularisation Gg,t is an identity matrix Lasso Ng = Nx, Pg = 1 for all g, Gg,t has 1 at g:th column and zeros otherwise.
Anisotopic TV Gg,t encodes the g:th row of a finite difference operator.

Group Lasso
Gg,t has 1, corresponding to the selected elements of xt in the group and zeros otherwise.
Sparse group Lasso g = 1, . . ., Nx, Pg = 1, Gg,t has 1 at g:th column and zeros otherwise; g = Nx + 1, . . ., Ng, Gg,t has the same setting with group Lasso.A simple, yet illustrative, example can be found in autonomous vehicle tracking.When there are stop-and-go points (e.g.vehicle stops) in the data, the zero-velocity and zero-angle values at those time points can be grouped together via the L 2norm and G g,t .That means three elements can be forced to be equal to zero at the same time.Another application is in audio restoration, where the matrices G g,t are defined so that only two elements of the state x t -corresponding to the real and imaginary parts of a synthesis coefficient -are extracted at a time step.Thus, these pairs, which are associated with the same time-frequency basis functions, tend to be non-zero or zero together.
The problem (3) is more difficult to solve than the common L 2 -minimisation problem (which corresponds to G g,t = I, where I is an identity matrix) or the squared L 2 -minimisation problem (the problem with G g,t (•) 2 2 ), since the penalty term G g,t (•) 2 is nonsmooth.Furthermore, G g,t is possibly rankdeficient matrix.In this paper, we first derive batch tracking and estimation methods, which are based on the batch compu-tation of the state sequence.To speed up the batch methods, we then propose augmented recursive smoother methods for the primal variable update.

III. BATCH TRACKING AND ESTIMATION METHODS
In this section, we introduce the multi-block ADMM framework.Based on this framework, we derive batch algorithms for solving the regularised tracking and state estimation problem.

A. The General Multi-block ADMM (mADMM) Framework
The methods that we develop are based on the multi-block ADMM [47].The multi-block ADMM provides an algorithmic framework which is applicable to problems of the form (3), and it can be instantiated by defining the auxiliary variables and their update steps.We introduce auxiliary variables v t and w g,t , g = 1, . . ., N g , t = 1, . . ., T , and then build the following constraints Note that in (4) we could alternatively introduce auxiliary variables w g,t = G g,t (x t −B t x t−1 −d t ), but this replacement would require G g,t to be invertible when using the augmented recursive smoothers later on.To avoid such restrictions, we employ variables v t and w g,t to build the more general constraints in this paper.
For simplicity of notation, we denote w t = w 1,t , . . ., w Ng,t , G t = G 1,t , . . ., G Ng,t , and then solve (3), using an equivalent constrained optimisation problem min The variables x 1:T , w 1:T , and v 1:T can be handled by defining the augmented Lagrangian function where Lagrangian multiplier, and γ > 0 is a penalty parameter.The multi-block ADMM (mADMM) framework minimises the function L γ by alternating the x 1:T -minimisation step, the w 1:T -minimisation step, the v 1:T -minimisation step, and the dual variable η 1:T update step.Given (x the iteration of mADMM has the following steps: where η t = vec(η t , η 1,t , . . ., η Ng,t ).We solve the w t , v t , and η t subproblems for each t, respectively.The w t -subproblem and v t -subproblem have the solutions where S µ/γ (•) is the shrinkage operator [49].
Given the mADMM framework, the solutions in (8a), (8b), and (7d) are the basic steps of our methods.In a single iteration, the w t -update can be computed in O(N g ) operations, and each v t -update takes O(N 3 x ).However, when the x 1:Tsubproblem is solved by the batch estimation methods, it typically takes O(N 3 x T 3 ) operations.Thus the main computational demand is in updating x 1:T .Our main goal here is to derive efficient methods for the x 1:T -minimisation step.Before that, we first develop batch methods to solve the x 1:T -subproblem.

B. Batch Solution for Affine Systems
The first batch method we explore is for affine Gaussian systems.We first stack all the state variables into single variables, and then rewrite the x 1:T -subproblem (7a) in the form where we have set The other variables y, e, m, d, e, v, η, H, R, Q, A are defined analogously to Equation ( 17) in [16].By setting the derivative to zero, the solution is In other words, computing the x-minimisation amounts to solving a linear system with the coefficient matrix When the matrix inverse exists, the x-subproblem has a unique solution.Additionally, with a sparsity assumption on the states x t , Φ is an identity matrix, and d is a zero vector.When the noise q t is sparse, we can set which corresponds to the setting of B t and d t according to Table II.The disadvantage of the batch solution is that it requires an extensive amount of computations when T is large.For this reason, in Section IV-A, we propose to use an augmented recursive smoother, which is mathematically equivalent to the batch method, to improve the computational performance.

C. Gauss-Newton (GN) for Nonlinear Systems
When the system is nonlinear, we use a similar batch notation as in the affine case, and additionally define the nonlinear functions The primal x 1:T -subproblem then has the form where The function θ(x) can now be minimised by the Gauss-Newton (GN) method [46].In GN, we first linearise the nonlinear functions a(x) and h(x), and then replace them in θ(x) by the linear (or actually affine) approximations.The GN iteration then becomes where The above computations are carried out iteratively until a maximum number of iterations I max is reached.We take the solution x (k,Imax) as the next iterate x (k+1) .While GN avoids the trouble of computing the Hessians of the model functions, it has problems when the Jacobians are rank-deficient.The Levenberg-Marquardt method is introduced next to address this problem.

D. Levenberg-Marquardt (LM) Method
The Levenberg-Marquardt (LM) method [50], also called as the regularised or damped GN method, improves the performance of GN by using an additional regularisation term.With damping factors λ (i) > 0 and a sequence of positive definite regularisation matrices S (i) , the function θ(x) can be approximated by Using the minimum of this approximate cost function at each step i as the next iterate, we get the following iteration: which is the LM method, when augmented with an adaptation scheme for the regularisation parameters λ (i) > 0. The regularisation parameter here helps to overcome some problematic cases, for example, the case when J θ J θ (x) is rank-deficient, by ensuring the existence of the unique minimum of the approximate cost function.
At each mADMM iteration, the computation in the x 1:Tsubproblem such as (11), (16), and (18), has a high cost when T is large (e.g., T = 10 8 ).As discussed above, when the main computational demand is indeed in the update of x 1:T .Therefore, we utilise the equivalence between batch solutions and recursive smoothers, and then develop efficient augmented recursive smoother methods for solving the x 1:T -subproblem.

IV. AUGMENTED RECURSIVE SMOOTHERS
In the section, we will present the augmented KS, GN-IEKS, and LM-IEKS methods for solving the x 1:Tsubproblem.
A. Augmented Kalman Smoother (KS) for Affine Systems Solving the x 1:T -subproblem involves minimisation of a quadratic optimisation problem, which can be efficiently solved by Kalman smoother (KS), see [51] for details.We rewrite the batch minimisation problem (9) as It is worth noting that when B t = 0 and d t = 0, the cost function corresponds to the function minimised by KS, which leads to a similar method as was presented in [16].For notational convenience, we leave out the iteration number k of mADMM in the following.
Here we consider the general case where B t and d t are non-zero.Such case is more complicated as we cannot have two dynamic models in a state-space model.For building a dynamic state-space model, we need to fuse the terms in the pairs 1  2 2 into single terms.We combine matrices A t and B t to an artificial transition matrix Ãt , fuse b t and (d t +v t −η t /γ) to an artificial bias bt , and introduce an artificial covariance Qt , which yields Now, the new artificial dynamic model (20) allows us to use KS to solve the minimisation problem.The problem (19) becomes which corresponds to a state-space model, where additionally the initial state has mean m1 = (P −1 1 + γI) −1 (P −1 1 m 1 + γm 1 + γv t − η t ) and covariance P−1 The solution in (21) can be then computed by running KS on the augmented state-space model The augmented KS requires only O(N 3 x T ) operations which is much less than the corresponding batch solution in (11).The augmented KS method is summarised in Algorithm 1.

B. Gauss-Newton IEKS (GN-IEKS) for Nonlinear Systems
The solution of (15) has similar computational scaling challenges as the affine case discussed in previous section.However, we can use the equivalence of IEKS and GN [8] to construct an efficient iterative solution for the optimisation problem in the primal space.In the GN-IEKS method, we first approximate the nonlinear model by linearisation, and then use KS on the linearised model.The x 1:T -subproblem now takes the form of (7a).In IEKS, at i:th iteration, we form affine approximations of a t (x t−1 ) and h t (x t ) as follows: We replace the nonlinear functions in the cost function with the above approximations, and compute the next iterate as the solution to the minimisation problem which is equivalent to (19) with The precise expressions of B t and d t depend on our choice of sparsity.When q t is sparse, the expressions are given by which needs the same computations as in (20).Thus we can solve the minimisation problem in (7a) by iteratively linearising the nonlinearities and then by applying KS.This turns out to be mathematically equivalent to applying GN to the batch problem as we did in Section III-C.The steps of the GN-IEKS method are summarised in Algorithm 2.

C. Levenberg-Marquardt IEKS (LM-IEKS)
There also exists a connection between the Levenberg-Marquardt (LM) and a modified version of IEKS.The LM-IEKS method [10] is based on replacing the minimisation Algorithm 1: Augmented KS  of the approximate cost function in (24) by a regularised minimisation of the form where we have assume that T ).Similarly to GN-IEKS, when B t and d t are non-zero, we need to build a new state-space model in order to have only one dynamic model.Following [10], the regularisation can be implemented by defining an additional pseudo-measurement t /λ (i) .Using (20) and (25) , we have the augmented state-space model which provides the minimum of the cost function as the KS solution.By combining this with λ (i) adaptation and iterating, we can implement the LM algorithm for the x 1:T -subproblem using the recursive smoother (cf.[10]).See Algorithm 3 for more details.

D. Discussion
All the methods discussed above, namely augmented KS, GN-IEKS, and LM-IEKS, provide efficient ways to solve the x 1:T -subproblem.When we leverage the Markov structure of the x 1:T -subproblem arising in mADMM iteration, we can significantly reduce the computation burden.In particular, when the function a t (x t−1 ) and h t (x t ) are affine, the augmented KS method can be used in the x 1:T -subproblem (see (19)).Both GN-IEKS and LM-IEKS are based on the use of linearisation of the functions a t (x t−1 ) and h t (x t ), and they work well for most nonlinear minimisation problems.However, when the Jacobians (e.g., J at (x 23)) are rankdeficient, the GN-IEKS method cannot be used.As a robust extension of GN-IEKS, LM-IEKS significantly improves the performance of GN-IEKS.It should be noted that when the regularisation term is not used in LM-IEKS (when λ (i) = 0), then LM-IEKS reduces to GN-IEKS [10].

V. CONVERGENCE ANALYSIS
In this section, we prove that under mild assumptions and a proper choice of the penalty parameter, our KS-mADMM, GN-IEKS-mADMM, and LM-IEKS-mADMM methods converge to a stationary point of the original problem.Although convergence of the multi-block ADMM has already been proven, the existing results strongly depend on convexity assumptions or Lipschitz continuity conditions (see, e.g., [52]- [54]).In the analysis, we require neither the convexity of the objective function nor Lipschitz continuity conditions.Instead, we use a milder condition on the amenability.This allows us to establish the convergence of the three methods.
For the case when the functions a t (x t−1 ) and h t (x t ) are affine (see ( 2)), we have the following lemma.
be the iterates generated by (7).Then we have where Proof.See Appendix A. We will then establish the convergence rate of the proposed method in terms of the iteration number.
Theorem 1 (Convergence of KS-mADMM).Let Q t and P 1 be positive semi-definite matrices.Then the sequence {x 1:T } generated by KS-mADMM converges to a stationary point (x 1:T , w 1:T , v 1:T , η 1:T ) with the rate o( 1 k ).Proof.The proof is based on the convexity of the function.Because of the equivalence between mADMM and KS-mADMM, we start by establishing the convergence of mADMM.When Q t and P 1 are positive semi-definite, the function in (5) is convex.Because of [Φ 0][0 I] = 0, we can write x and w into a function Ξ(ζ) in the batch form [52].
For simplicity of notation, we define s = v η .Using Lemma 1, we obtain where ξ and F (ξ) are defined in Appendix A (see (37)).We sum the inequality (30) from 0 to k, and divide each term by k + 1.Since s − s (k+1) 2 Ω ≥ 0, we then have Let k) .Because of the convexity of Ξ, we further write (31) as The convergence rate o( 1 k ) of mADMM is thus established.As batch mADMM is equivalent to KS-mADMM, then the sequence {x (k) , w (k) , v (k) , η (k) } and the sequence {x 1:T } are identical.This concludes the proof.
When the functions a t (x t−1 ) and h t (x t ) are nonlinear, we have the function Definition 1.The function s(x) is strongly amenable [55] at x when the condition is satisfied only when z is zero.
Let s(x) be strongly amenable.Then, s(x) will be proxregular [56] .We are now ready for introducing the following lemma.

Proof. See Appendix B.
Next, we present the main theoretical result.
Theorem 2 (Convergence of GN-IEKS-mADMM).Let the assumptions in Lemma 2 be satisfied.Then there exists γ > 0 such that the sequence x T generated by GN-IEKS-mADMM locally converges to a local minimum.
Proof.Similarly to Theorem 2, we use Lemma 2 to establish that the sequence L γ (x (k) , w (k) , v (k) ; η (k) ) is bounded and nonincreasing.Due to the convexity, the w and v subproblems have a local minimum.By Lemma 3, the sequence x (i) generated by LM converges to x .Then the sequence {x (k) , w (k) , v (k) , η (k) } locally converges to a minimum (x , w , v , η ).Since the sequence {x (k) , 1:T } generated by LM-IEKS [8], [10].

VI. NUMERICAL EXPERIMENTS
In this section, we experimentally evaluate the proposed methods in a selection of different applications, including linear target tracking problems, multi-sensor range measurement problems, ship trajectory-tracking, audio restoration, and autonomous vehicle tracking.As for the convergence criteria, we can easily verify that the assumptions for convergence are satisfied for the linear/affine examples in Sections VI-A, VI-C, and VI-E.Additionally, the nonlinear coordinated turn model in Section VI-D also satisfies assumptions for convergence.However, for the distance measurement in Section VI-B, it is hard to establish the strong amenability although empirically the convergence occurs.

A. Linear Target Tracking Problems
In the first experiment, we consider simulated tracking of a moving target (such as car) with the Wiener velocity model [6] as the dynamic model and with noisy location measurements.In the simulation, the process noise q t was set to be zero with probability 0.8 at every step t.The state x t has the location (x t,1 , x t,2 ) and the velocities (x t,3 , x t,4 ).The measurement model matrix and the measurement noise covariance are The transition matrix and the process noise covariance are .
We define the estimation error as , where x true t is the ground truth.The estimation results are plotted in Fig. 1, where the circles denote the noisy measurements and the blue dash line denotes the true state.As we can seen, the KS-mADMM estimate (black line) is much closer to the ground truth than the KS estimate (red dash line), which is also reflected by a lower error.
Recall that the difference in batch and recursive ADMM running time is dominated by the x 1:T -subproblem.Fig. 2 demonstrates how the running time (sec) grows when T is increasing.Despite being mathematically equivalent, mADMM and KS-mADMM, have very different running times.The running times of mADMM and prox-ADMM have a similar growth rate whereas KS-mADMM has a growth rate that resembles a plain Kalman smoother.Due to limited memory, we cannot report the results of the batch estimation methods (prox-ADMM and mADMM) when T > 10 4 .At T = 10 4 , the running times of KS, KS-mADMM, prox-ADMM, and mADMM, were 0.34s, 1.92s, 6284s and 9646s, respectively.The proposed method is computationally inexpensive, which makes it suitable for solving real-world applications, such as the marine vessel tracking in Section VI-C.

B. Multi-Sensor Range Measurement Problems
In this experiment, we consider a multi-sensor range measurement problem where we have short periods of movement with regular stops.This problem frequently appears in many surveillance systems [3], [6].The state x t contains the position (x t,1 , x t,2 ) and the velocities (x t,3 , x t,4 ).The measurement dynamic model for sensor n ∈ {1, 2, 3} is given by where (s n x , s n y ) is the position of the sensor n.The transition function a t (x t−1 ) is The covariances are R t = diag(0.2 2 , 0.2 2 ), and Q t = diag(0.01,0.01, 0.1, 0.1).We set ∆t = 0.1, T = 60, (s 1 x , s 1 y ) = (0, −0.5), (s 2 x , s 2 y ) = (0.5, 0.6), (s 3 x , s 3 y ) = (0.5, 0.6), m 1 = 0 0 0 0 , and P 1 = I/10.We assume the target has many stops, which means the velocities x t,3 , x t,4 are sparse.We also set G g,t = 0 I , and use the parameters γ = 1, µ = 1, K max = 50, and I max = 5.We plot the velocity variable x t,3 corresponding to the time step t in Fig. 3, which indicates that our method (black line) can generate much more sparse results than the IEKS estimate (red dash line).Fig. 4 shows the relative error x err as a function of the iteration number.The values of x err generated by the regularisation methods are below those generated by iterated extended Kalman smoother (IEKS) [1].It also shows that the GN-mADMM, GN-IEKS-mADMM, LM-mADMM and LM-IEKS-mADMM can find the optimal values in around 50 iterations.IEKS is the fastest method, but the relative error is highest due to lack of the sparsity prior (i.e., µ = 0).GN-mADMM and GN-IEKS-mADMM have the same convergence results (as they are equivalent), while the latter uses the less running time.Similarly, LM-mADMM and LM-IEKS-mADMM have the same convergence results, but LM-IEKS-mADMM needs less time to obtain the result than LM-mADMM.When the number of time steps T is moderate, all the running time are acceptable.But when T is extremely large, the proposed methods provide a massive advantage.Fig. 5 demonstrates how the running time (sec) grows when T is increasing.The proposed methods are compared with the state-of-the-art methods, including the proximal ADMM (prox-ADMM) [46], mADMM [47], and IEKS [1].Despite being mathematically equivalent, GN-mADMM and GN-IEKS-mADMM, LM-mADMM and LM-IEKS-mADMM, have very different running times.GN-IEKS-mADMM and LM-IEKS-mADMM are more efficient than the batch methods.Due to limited memory, we cannot report the results of the batch methods when T > 10 4 .It is reasonable to conclude that in general, the proposed methods are competitive for extremely large-scale tracking and estimation problems.The proposed approaches are computationally inexpensive, which makes them suitable for solving real-world applications, such as ship trajectory-tracking in the next section.

C. Marine Vessel Tracking
In this experiment, we utilise the Wiener velocity model [6] with a sparse noise assumption to track a marine vessel trajectory.The latitude, longitude, speed, and course of the vessel have been captured by automatic identification system (AIS) equipment, collected by Danish Maritime Authority.Similar applications can be found in [5], [39].The state of the ship is measured at time intervals of 1 minute.Matrices H t , A t , Q t , and R t are the same with the settings in Section VI-A with ∆t = 1, q c = 1, σ = 0.3, T = 100, m 1 = 0.1 0.1 0 0 , and P 1 = 100I.We assume the process noise q t is sparse, and set G g,t to an identity matrix and use the parameters γ = 1, µ = 1, and K max = 100.The measurement data consists of 100 time points of the vessel locations.
Our method obtains the position (latitude and longitude) estimates as shown in Fig. 6.Fig. 7 shows that our method has sparser process noise than estimated by the Kalman smoother (KS) [1].We then highlight the computation advantage of our method.The difference in running time is dominated by the x 1:T -subproblem.The running times of KS, prox-ADMM, mADMM, and KS-mADMM, were 0.34s, 174s, 172s, and 5.63s, respectively.The running times of mADMM and prox-ADMM are similar whereas KS-mADMM has a smaller running time that resembles the plain Kalman smoother.

D. Autonomous Vehicle Tracking
To further show how our methods can speed up larger scale real-world problems, we apply GN-IEKS-mADMM to  a vehicle tracking problem using real-world data.Global positioning system (GPS) data was collected in urban streets and roads around Helsinki, Tuusula, and Vantaa, Finland [58].
The urban environment contained many stops to traffic lights, crossings, turns, and various other situations.We ran the experiment using a coordinate turn model [1], where the state at time step t had the positions (x t,1 , x t,2 ), the velocities (x t,3 , x t,4 ), and the angular velocity x t,5 .The number of time points T was 6865.We use the parameters γ = 0.1, µ = 1, K max = 300, I max = 5, m 1 = 4.5 13.5 0 0 0 , and P 1 = diag(50, 50, 50, 50, 0.01).We utilised the matrix to enforce the sparsity of the velocities and the angular velocity.
The plot in Fig. 8 demonstrates the path (black line) generated by our method.The running time of IEKS [32], GN-mADMM, and GN-IEKS-mADMM were 22s, 13520s and 2704s, respectively.As we expected, although IEKS is fastest, the L 2 -penalised regularisation methods push more of the velocities and the angle to zero, which is shown in Fig. 9.The IEKS estimate has many large peaks that appear as a result of large residuals, and GN-IEKS-mADMM has more sparse results.

E. Audio Signal Restoration
The proposed technique can be readily applied to the problem of noise reduction in audio signals.We adopt a Gabor regression model [19]: where signals are represented as a weighted sum of Gabor atoms g m,n (τ ) = w n (τ ) exp 2πi m M τ .Terms w n (τ ) correspond to a window function with bounded support centred at time instants τ n (windows are placed so that the time axis is with tiled evenly).Sparsity is promoted through the L 1 2 pair-wise grouping pattern described in Section II: m,n µ m,n c m,n 2 .The real representation of complex coefficients c m,n used in [19] is adopted.This batch problem is restated in terms of a state-space model: signal y is separated into P chunks y t of length L and state vectors x t = [c 2(t−1) ; c 2t−1 ; c 2t ] are defined, c t being the subvector associated to each frame.Let H 0 be a matrix containing the non-zero values of the Gabor basis functions g 0,0 , . . ., g M/2,0 as columns.Thus, atoms in subsequent frames are time-shifted replicas of this basic set and y−Dc 2 (D a dictionary matrix containing all atoms) can be replaced by , with H * = H u H 0 H and A t = 0 0 0; 0 0 0; I 0 0 .Terms H u , H are truncated versions of H 0 corresponding to the contribution of the adjacent overlapping frames.
The algorithm is tested on a ∼3 second long glockenspiel excerpt sampled at 22050 [Hz] and contaminated with artificial background noise with signal-to-noise ratio (SNR) 5dB.Experiments are carried out in an Intel Core i7 @ 2.50GHz, 16 GB RAM, with parameters γ = 5, µ = 2.6, and K max = 500 and a window length L = 512.Kalman gain matrices are precomputed.Reflecting the power spectrum of typical audio signals, which decays with frequency, penalisation is made frequency-dependent by setting µ m,n = µ/f (m), with f (m) a decreasing modulating function (e.g., a Butterworth filter gain), in a similar fashion to [18].Coefficients are initialised at zero.The average output SNR is 12.4 with an average running time of 64.6s in 20 realisations.Fig. 10 shows the visual reconstruction results.
In comparison, Gibbs sampling schemes for models (e.g., [19]) yield noisier restorations with comparable computing times.We analysed the same example using the Gibbs sampler with 500 iterations, 250 burn-in period.Hyperparameters and initial values are chosen to ensure a fair comparison with the KS-mADMM method (unfavorable initialisation may induce longer convergence times).With a runtime of ∼180 seconds, the Gibbs algorithm yields an output SNR of ∼15 dB.The Perceptual Evaluation of Audio Quality (PEAQ) [59], a measure that incorporates psycho-acoustic to asses audio signals, is adopted.The Objective Difference Grade (ODG) indicator derived from PEAQ is used to compare the reconstructions with respect to the clean reference signal, obtaining ODG = −3.910for clean signal against noisy input, ODG = −3.846for clean signal against Gibbs reconstruction, and ODG = −3.637for clean signal against KS-mADMM reconstruction (the closer to 0, the better).Despite the lower SNR (12.4 dB), the KS-ADMM reconstruction sounds cleaner (i.e., has fewer audio artefacts) than its Gibbs counterpart, which is consistent with the ODG values obtained.Devising appropriate temporal evolution models for the audio synthesis coefficients over time and investigating self-adaptive schemes for the estimation of µ (here tuned empirically) are topics of future research.

VII. CONCLUSION AND DISCUSSION
In this paper, we have presented efficient smoothing-andsplitting methods for solving regularised autonomous tracking and state estimation problems.We formulated the problem as a generalised L 2 -penalised dynamic group Lasso minimisation problem.The problem can be solved using batch methods when the number of time steps is moderate.For the case with a large number of time steps, new KS-mADMM, GN-IEKS-mADMM, and LM-IEKS-mADMM methods were developed.We also proved the convergence of the proposed methods.We applied the developed methods to simulated tracking, realworld tracking, and audio signal restoration problems, where methods resulted in improved localisation and estimation performance and significantly reduced computation load.
A disadvantage of the smoothing-and-splitting methods is that although the methods significantly improve the tracking and estimation performance, their reliability depends on userdefined penalty parameters (e.g., the parameter γ in ( 6)).See [46], [47] for further details on choosing the appropriate values of the parameters.The use of adaptive penalty parameters may improve the performance in dynamic systems, even while stronger conditions of convergence need to be guaranteed [60].It would be interesting to develop fully automated solvers with adaptive parameters.The convergence and the convergence rate of our methods are based on Bayesian smoothers and ADMM, and we have established the convergence rate of the convex case.Possible future work includes discussing the convergence rate for nonconvex variants.
Although we only consider the autonomous tracking and state estimation problems in this paper, it is possible to apply our framework to a wide class of control problems.For example, in linear optimal control problems, we could introduce splitting variables to decompose the nonsmooth terms and then use the Riccati equations to compute the subproblems arising in the optimal control problems [61].In cooperative control of multiple target systems [62]- [64], we can consider a reformulation of dynamic models of group targets into classes with different characteristics.Based on the framework, we address the subproblems in implementing optimisation-based methods such as receding horizon methods [63].The proposed framework can be extended to other variable splitting methods [48] as well as other recursive smoothers [1].Future work also includes developing other variants, for example, sigma-point based variable splitting methods.

APPENDIX A PROOF OF LEMMA 1
For proving Lemma 1, we define ζ = x w and then write variables x and w into the function Ξ(ζ), which is Using the optimality conditions of the subproblems in ( 7 For simplicity of notation, we also define We group all the variables ζ, v, and η into a single vector ξ, and then rewrite (37) as follows: Since the mapping F (ξ) is affine with a skew-symmetric matrix, it is monotonic [52].Then we have the inequality Meanwhile, using (7d), the inequality can be written as Combing ( 40) and ( 41), we can derive (39) as Let Ω = γI + G G 0 0 . We then conclude that

4 compute
and γ.Output: x * 1:T . 1 set i ← 0 and start from a suitable initial guess x (0) 1:T ; 2 while not converged or i < I max do 3 linearise a t and h t according to(23); Ãt , Qt , bt by (20);

Fig. 1 .
Fig. 1.Signals, measurements, and the estimates in the linear tracking problem.The values of xerr are 0.103 and 0.072 in KS and KS-mADMM, respectively.

Fig. 2 .
Fig. 2. Comparison of the running times in the linear car tracking example as function of the number of time steps.

Fig. 3 .
Fig. 3.The estimated trajectory in the nonlinear system.The relative errors are 0.53 and 0.46 generated by IEKS and LM-IEKS-ADMM.

Fig. 5 .
Fig. 5. Comparison of the running times in the range measurement example as function of the number of time steps (from 10 2 to 10 8 ).

Fig. 6 .
Fig.6.The position (black markers) estimated by KS-mADMM.The starting coordinate is denoted blue marker, and the ending coordinate is red marker.Contains data from the Danish Maritime Authority that is used in accordance with the conditions for the use of Danish public data.

Fig. 8 .Fig. 9 .
Fig. 8.The path tracking (black line) generated by GN-IEKS-mADMM.The starting position is blue point, and the ending position is red cross.

TABLE II FLEXIBLE
SPARSITY ASSUMPTIONS BY SELECTING Bt AND dt.