Generalised Hyperbolic State-space Models for Inference in Dynamic Systems

In this work we study linear vector stochastic differential equation (SDE) models driven by the generalised hyperbolic (GH) L\'evy process for inference in continuous-time non-Gaussian filtering problems. The GH family of stochastic processes offers a flexible framework for modelling of non-Gaussian, heavy-tailed characteristics and includes the normal inverse-Gaussian, variance-gamma and Student-t processes as special cases. We present continuous-time simulation methods for the solution of vector SDE models driven by GH processes and novel inference methodologies using a variant of sequential Markov chain Monte Carlo (MCMC). As an example a particular formulation of Langevin dynamics is studied within this framework. The model is applied to both a synthetically generated data set and a real-world financial series to demonstrate its capabilities.


I. INTRODUCTION
In the fields of statistical signal processing and machine learning, the problem of modelling dynamically evolving phenomena can be found in various application domains including financial markets [1], speech recognition [2], health monitoring [3] and more recently in generative models for images [4].Stochastic differential equation (SDE) models of realworld dynamic phenomena offer a powerful tool for capturing the inherent uncertainty in the estimated behaviour of such systems and provide a probabilistic framework that allows for the quantification and propagation of uncertainty.Furthermore, SDE models offer a physics-based intuitive representation of the dynamics and the associated uncertainty, which can be particularly beneficial in building interpretable machine learning systems.
Typically the driving (or forcing) function, which characterises the random fluctuations from the estimated deterministic evolution, is chosen as a Brownian motion.However, the Gaussian assumption implicit in the use of Brownian motion is often not justified in real-world applications, as many systems exhibit non-Gaussian features such as asymmetric or heavytailed distributions.Even with Brownian-driven paths, discretisation may require complex integrals and introduce further complexity.Instead, inference algorithms that work directly in continuous-time offer an effective alternative to discretetime algorithms for large, irregularly sampled sequential data sets by reducing sampling costs associated with grid based strategies ( [5], Section 4.6, [6]).
Recent work on SDE models extends their definition to incorporate non-Gaussian Lévy processes, allowing for a wider range of behaviours including the α-stable and Poisson processes ( [7], [8]).This extension enhances the flexibility and applicability of SDE models, enabling them to capture more complex dynamics and uncertainties in real-world systems.
In this work, we study linear SDE models driven by a generalised hyperbolic (GH) Lévy process, which specifies a broad class of stochastic processes for varying levels of non-Gaussian, heavy-tailed and asymmetric characteristics, and include the normal inverse-Gaussian, variance-gamma and Student-t processes as special subclasses ( [9], [10]).A point process simulation algorithm for the GH process has been recently presented in [11].Building on this algorithm, we present a simulation method for SDEs driven by GH processes and a novel continuous-time filtering algorithm which enables inference in such systems.Specifically, the linear vector SDE model is defined as where x(t) is the D-dimensional state vector at time t, A and L are system matrices, and W (t) is a background-driving Lévy process.The observations are assumed to be related to the state vector through where H is the observation matrix and ε(t) is assumed to be a continuous-time white Gaussian noise with covariance σ 2 ε .The system of equations of the form presented in Eqs. ( 1) and ( 2) define a Lévy state-space model (SSM) [7].Various well-known models can be treated as special cases of this general structure by choosing the particular forms of A, L and H.Some example models are the standard linear tracking models [7], continuous-time autoregressive (CAR), the CAR moving-average (CARMA) ( [6], [12]) and the Ornstein-Uhlenbeck (OU) processes [13].
As a particular case, we study a Lévy SSM formulation of Langevin dynamics defined as: where θ < 0 such that the evolution of the process reverts to zero in proportion to the magnitude of ẋ(t).Langevin dynamics find application in physics and biological sciences.The linear SDE formulation of Langevin dynamics can be expressed as: where by definition dx(t) = ẋ(t)dt.Hence, a Lévy SSM allows both the state evolution ẋ(t) and the integrated state x(t) to be inferred within the same framework.Previous works on linear vector SDE models use various particle filtering (PF) strategies for the inference of states in Eq. ( 1) ( [7], [14]).Here, we develop a novel framework using a variant of Sequential Markov chain Monte Carlo (SMCMC) ( [15], [16]) for the inference of states in a general Lévy SSM and demonstrate its applicability using Langevin dynamics driven by GH processes.
The paper is organised as follows.In Section II we review and summarise generalised shot noise methods for the simulation of GH processes.In Section III we review fundamental results on the simulation of vector SDEs driven by GH processes.In Section IV we present our novel inference algorithm for the GH SSM and demonstrate its applicability in Section V with both a synthetically generated data set and a British Pound -Japanese Yen foreign exchange rate data set.

PROCESSES
In this section, simulation algorithms for the generalised hyperbolic process are outlined.These algorithms form the basis of both the simulation of vector SDE models and inference algorithms that are considered in this work.The general framework adopted in this work for the simulation of GH processes is the generalised shot-noise representations of Lévy processes W (t) reviewed in [17] such that where {V i ∈ [0, T ]} ∞ i=1 are independent and identically distributed (i.i.d.) uniform random variables representing the arrival time of jumps, and {W i } ∞ i=1 are jump sizes independent of jump times.Based on this general methodology, detailed studies on the simulation of GH processes are presented in ( [11], [18]).We review the main results here for completeness.
An advantageous aspect of the GH family of distributions in practical scenarios is that they can be expressed as a sum of i.i.d.random variables.Hence the GH distribution is infinitely divisible and can be the distribution of a Lévy process at time t = 1 [9].The GH distribution has a five parameter probability density function (pdf) defined for random variables on the real line such that ( [19], [20]) where is the modified Bessel function of the second kind with index ν.The parameter λ ∈ R characterises the tail behaviour, α > 0 determines the shape, 0 ≤ |β| < α controls the skewness, µ ∈ R is a location parameter and δ > 0 is a scale parameter.An essential characteristic of the GH distribution, which will play a pivotal role in this study, is its connection to the generalized inverse Gaussian (GIG) distribution.Using the parameterisation γ = α 2 − β 2 , the three parameter probability density function f GIG (λ, δ, γ) of the GIG distribution is the mixing distribution in a variance-mean mixture of Gaussians representation of the GH distribution such that [19] where u is a GIG distributed random variable.
Let W (t) be a GH process on some time interval of interest t ∈ [0, T ]; the characteristic function (CF) is given by ( [21], Corollary 13.8), as where Q is a Lévy measure under certain constraints and Q(dw) is its Lévy density [11].Here the CF is used to define a Lévy process since the pdf may not have an analytical form for all settings.
The GIG process is an instance of a more restricted class of non-negative, non-decreasing Lévy processes G(t), called subordinator processes, whose CF is given by: where Q GIG (dx) characterises the density of jumps for G(t) such that the expected number of jumps of size In its simplest form, the GIG Lévy density may be expressed as ( [20], Eq. 74) where H ν (z) is the Bessel function of the third kind, also known as the Hankel function of the first kind.Analogous to the normal mixture representation of the GH distribution, the GH Lévy measure can be expressed as: Both the GH and GIG processes are examples of infinite activity processes for which ∞ 0 Q(dw) → ∞ such that there are almost surely an infinite number of jumps in any time interval in question which renders exact simulation of these processes intractable.Furthermore, the presence of an integral involving Bessel functions in Q GIG (x) is another cause of intractability for the GH process.
Generalised shot-noise methods are particularly suitable for infinite activity processes since they allow the simulation of jumps in a non-increasing order, i.e. x i ≥ x i+1 .The simulation methods in [11] produce approximate sample paths by truncating the infinite number of terms in (4) to an adaptively selected finite number of terms by bounding the residual error using concentration inequalities.In previous works on vector SDE models driven by Lévy processes the number of terms were manually tuned by hand and hence the adaptive methodology adopted in this work greatly increases the usability of such models [14].In addition, the residual error committed by truncation is further approximated by adding an appropriately scaled Brownian motion term with drift.
Algorithm 1 Generation of the jumps of a tempered stable process with Lévy density Q T S (x) = Cx −1−α e −βx (x ≥ 0) where 0 < α < 1 is the tail parameter and β ≥ 0 is the tempering parameter.
1) Assign N T S = ∅, 2) Generate the epochs of a unit rate Poisson process, • With probability e −βxi , accept x i and assign In order to address the problem of intractable Lévy densities, the methods in ( [11], [18]) rely on simulation from a tractable dominating process with Lévy measure Q 0 such that dQ 0 (x)/dQ GIG (x) ≥ 1, ∀x ∈ (0, ∞) and subsequent rejection sampling steps with rate dQ GIG (x)/dQ 0 (x) as in ( [17], [22]) to obtain the desired jump magnitudes {x i } of the subordinator process.The associated jump magnitudes {w i } from the GH process may be obtained through the normal mixture representation as The specific dominating processes for the simulation of GIG jumps studied in ( [11], [18]) can be expressed as either a tempered stable or Gamma process which have simple associated simulation procedures as outlined in Algorithms 1 and 2.
Algorithm 2 Generation of the jumps of a gamma process with Lévy density Q Ga (x) = Cx −1 e −βx (x ≥ 0) where C > 0 is the shape parameter and β > 0 is the rate parameter.
1) Assign N Ga = ∅, 2) Generate the epochs of a unit rate Poisson process, The derivation of the specific form of these dominating processes differ based on the parameter values of the subordinator GIG process.Simulation algorithms for all parameter settings, including edge cases such as the Student-t process, and their derivation are presented in [11].Here, we review the simulation algorithms for the most general parameter setting such that λ ≤ −0.5 and δ, γ > 0 as an example.The derivation and algorithmic definitions of the adaptive truncation scheme and the residual approximation procedure from [11] are omitted here.However, these algorithms are utilised in Section V.
In order to avoid direct calculation of the integral in Q GIG (x), the approach proposed in ( [11], [18]) is to consider a bivariate point process Q GIG (x, z) on (0, ∞)×(0, ∞) which has the GIG Lévy density as its marginal, i.e.
Hence, the goal is to produce joint samples {x i , z i } from the point process with intensity function Q GIG (x, z), and retain the samples {x i } as samples from Q GIG (x).To generate these joint samples, tractable bivariate dominating processes with intensity function Q 0 GIG (x, z) are designed such that thinning with probability Q GIG (x, z)/Q 0 GIG (x, z) yields samples from the desired process Q GIG .Both the bivariate target process Q GIG (x, z) and the dominating processes can naturally be factorised as a marginal and a conditional point process such that is a probability density that may be interpreted as a marking variable and (x, z) ∈ (0, ∞)×(0, ∞) form a bivariate Poisson process [23].
The bounds used in deriving the dominating processes in [11] involve a hyperparameter z 1 that splits the range of z values into two and controls the tightness such that . In fact the dominating process for the current parameter setting may be considered as a marked point process split into three independent point processes N 1 Ga , N 2 Ga and N 2 .While the point processes N 1  Ga and N 2 Ga can be considered together as the union of two independent Gamma processes with intensity function Q N1 (x), the point process N 2 can be interpreted as a modified tempered stable process with intensity function Q N2 (x).The Algorithm 3 Generation of N 1 with marginal intensity function Q N1 (x).
Finally, the set of points N = N 1 ∪N 2 is a realisation of jump magnitudes corresponding to a GIG process having intensity function Q GIG (x).The associated GH process jumps may be obtained using the normal mixture representation as discussed above.
Algorithm 4 Generation of N 2 with marginal intensity function 3) For each point x i ∈ N T S , accept with probability Γ(0.5, ) ), otherwise reject and delete x i from N T S .4) For each x i , simulate a z i from a truncated square-root gamma density 5) With probability

III. SIMULATION OF GH SSMS
In this section, we review methods of simulation for vector SDEs driven by Lévy processes.Particularly for the GH pro-cess, a conditionally Gaussian formulation of the simulation algorithm is introduced which will form the basis of our inference algorithm discussed in the following sections.
The solution to vector SDEs, as shown in Eq. ( 1), can be expressed as [24] x(t) = e At x(0) where e At is the matrix exponential of At and x(0) is an initial condition.The second term in the right hand side is a stochastic integral with respect to a Lévy process dW .When studying the solution for specific characterisations of A, L and dW (t), it is usually convenient to define the stochastic term as: where f t (u) = e A(t−u) L1 {u≤t} and T > t is some arbitrary boundary point.The integral I(f t ) can only be analytically solved when dW is Gaussian.Hence in general I(f t ) has to be approximated, e.g. using direct Euler-type discretisations as studied in [6].In this work, we adopt a point process representation of I(f t ) enabled by the series representation of Lévy processes in terms of its jumps times and sizes {(V i , W i )} as in (4).When dW (t) is a normal variance-mean (NVM) process, such as the GH process, the integral admits the shot-noise series representation where the jump sizes W i can be further decomposed as where µ W ∈ R is a skewness parameter, σ W ∈ (0, ∞) is a scale parameter, V i ∼ Unif(0, T ) are jump times, Z i are the jumps of a subordinator process and U i ∼ N (0, 1) [25] such that which converges uniformly on (0, T ] to the original integral as n → ∞.Thus the continuous-time process I(f t ) can be represented as a countably infinite discrete sum without additional discretisation schemes as in the Gaussian case.The proof of this convergence can be found in Theorem 4 of [25].
To generate random sample paths of the state vector x(t), the solution shown in Eq. ( 9) can be incrementally simulated on each interval (s, t] as given an initial state vector x(s) and jumps {(V i , W i ) : Since there are an infinite number of jumps in any interval for the GH process, the summation in Eq. ( 12) must be truncated to exclude small jumps of dW (t) as discussed in Section II.To improve the approximation, the residual approximation schemes for the series representation of GH processes studied in [11] can be incorporated into the SDE simulation framework.The effect of residual small jumps, defined as R c t = I(f t ) − I c (f t ) where c is the truncation level, can be included as a standard stochastic integral driven by a Brownian motion.The moments of the residual process for the GH case are derived in [11] and the validity of this residual approximation is studied in [25].
Notice that for NVM processes, given the set of subordinator jump sizes {Z i } ∞ i=1 , I(f t ) is conditionally Gaussian which allows for efficient simulation and inference procedures to be designed.Hence, it is useful to express the integral in Eq. ( 12) directly as a function of the subordinator jump times and sizes {(V i , Z i ) : V i ∈ (s, t]} by substituting Eq. ( 11) into ( 12) such that where The conditionally Gaussian formulation of the integral I(f t ) leads to a fairly straightforward conditional form for the incremental simulation of the sample paths from Eq. ( 12) such that ) where the notation of the state is simplified to x t := x(t) for brevity.
For the Langevin model shown in (3), the exact forms for e A(t−s) , f t (V i ) and where f t (x) = e θ(t−x) for convenience.These terms are subsequently substituted into ( 13), ( 14) and ( 15) to simulate incremental sample paths from the Langevin model.This procedure is summarised in Alg. 5.

IV. INFERENCE FOR GH SSMS
In this section we describe a novel sequential inference algorithm for the states of a Lévy SSM driven by a normal variance-mean process such as the generalised hyperbolic case.We start by briefly reviewing the recursive state estimation problem in the context of Lévy SSMs and subsequently introduce our SMCMC algorithm for this task.
The problem of filtering in a dynamic system can be described as the computation of the posterior distribution of a vector of state variables x t conditional on a batch of observations y 0:t := {y(s i ) : 0 ≤ s i ≤ t} [26], where here {s i } is a set of discrete observation times for the process.The sequential nature of dynamic systems allow the filtering problem to be solved incrementally for all t.
Given an initial filtering estimate at time s, denoted as p(x s |y 0:s ), the filtering estimate at time t > s can be obtained using a two step recursive algorithm involving a prediction and a correction step.As discussed later in this section, we will devise a sequential algorithm based on an approximate collapsing of p(x s |y 0:s ) onto a single Gaussian, see Section III.
Since the associated densities of a Lévy SSM involve a set of jump times and magnitudes, define Then, the predictive density p(x t |y 0:s , {(V i , Z i )} (s,t] ) can be expressed as p(x t |y 0:s , {(V i , Z i )} (s,t] ) = f (x t |x s )p(x s |y 0:s )dx s (16) where the state transition density f (x t |x s ) can be derived from the forward simulation of the SDE, as specified in Eq. (15).Note that the generic notation for the state transition density f (x t |x s ) omits the dependence on the jumps {(V i , Z i )} (s,t] for notational convenience.
After an observation at time t is measured, the predictive density can be updated as Since the jump times and sizes {(V i , Z i )} (s,t] are latent variables in a Lévy SSM, we want to marginalise over them in practice to obtain the filtering estimate at time t as where d{(V i , Z i )} (s,t] is used to indicate a marginalisation over all possible jump sequences in (s, t].
The Gaussian noise assumption for the observation density p(y t |x t ) and the conditionally Gaussian formulation of the transition density allows both Eqs. ( 16) and ( 17) to be analytically solved and the resulting densities are still conditionally Gaussian.Hence, given a set of jumps {(V i , Z i )} (s,t] the filtering problem can be solved by standard Kalman filtering recursions. Assuming that the previous filtering estimate is approximated as a single Gaussian, p(x s |y 0:s ) = N (x s ; µ s , C s ), and given {(V i , Z i )} (s,t] , the prediction equations from time s to t result in a Gaussian density N (x t ; µ t|s , C t|s ) such that ( [7], [14]), where F = e A(t−s) and m, S are calculated using Eqs.( 13) and ( 14).Then, given the observation matrix H and noise covariance σ 2 ε in Eq. ( 2), the Kalman correction step results in another Gaussian density N (x t ; µ t|t , C t|t ) such that where K t is the Kalman gain.Additionally, the marginalconditional likelihood p(y t |y 0:s , {(V i , Z i )} (s,t] , σ ε ) can be readily evaluated during the recursions as which will be used in later stages of our inference algorithm.
While it is straightforward to solve the filtering problem given a set of jumps {(V i , Z i )} (s,t] as described above, sampling from the posterior density over the latent jumps in (18) is intractable and hence the estimation of the filtering density also becomes intractable.In previous works, this problem was overcome using particle filtering strategies where each simulated set of jumps {(V i , Z i )} (j) (s,t] were associated with an importance weight w j such that (18) can be evaluated as a weighted sum over conditionally Gaussian densities p(x t |y 0:t , {(V i , Z i )} (j) (s,t] ).In this work, we propose a new inference framework based on Monte Carlo sampling strategies where samples from the posterior density over the latent jumps are obtained using an MCMC algorithm at each iteration.
The posterior density over the latent jumps in (s, t] can be expressed through Bayes' theorem as where the likelihood term is evaluated as part of the Kalman filtering recursions and shown in (24).The prior density p({(V i , Z i )} (s,t] ) for the jump sequence is intractable to compute, especially for infinite activity processes such as these.Nevertheless, under the truncated model with level c in Eq. ( 4), it would be possible to characterise this prior fully and hence use it to perform MCMC by proposing changes to individual jumps, for example.Here we leave such a possibility for future investigations and rely on forward simulation of the jumps from their prior only in the MCMC.
A simple and effective Metropolis-Hastings (MH) algorithm targeting the posterior density over the latent jumps may be constructed by proposing the jumps from their GIG prior p({(V i , Z i )} (s,t] ) using Algs. 3 and 4. In this case MH acceptance probabilities are obtained solely in terms of the marginalconditional likelihood term p(y t |y 0:s , {(V i , Z i )} (s,t] ).For each proposed change {(V i , Z i )} (′) (s,t] ) ∼ p({(V i , Z i )} (s,t] ) to the current state of the jump sequence at iteration j, {(V i , Z i )} (j) (s,t] , the Kalman filtering recursions ( 19)-( 24) are carried out and the new jump sequence {(V i , Z i )} (′) (s,t] is accepted with probability otherwise the move is rejected and the jump sequence remains unchanged.Such an MH algorithm is shown to be effective for producing jump samples from the posterior density in Section V.
The chain is run to convergence, and the chain and their associated filtering estimates t|t ) are combined to make a Gaussian mixture model approximation, where N is total number of iterations of the chain.This approximation is then collapsed onto a single Gaussian, in a spirit similar to [27] for the PF, as p(x t |y 0:t ) ≈ N (x t ; µ t , C t ) by moment matching, Hence Eq. ( 27) forms an approximation of the filtering density in Eq. (18).Furthermore the posterior density over the GIG jumps may also be estimated by storing the underlying samples (s,t] from the MCMC algorithm.Note that the conditionally Gaussian formulation of the stochastic integral I(f t ) greatly simplifies the design of the filtering algorithm by enabling the use of standard Kalman filtering recursions and also reduces the variance in our estimates compared to algorithms involving direct sampling of the GH process.Furthermore, the Gaussian approximation of the mixture filtering density at t ′ > t in (27) allows the estimation procedure for (t, t ′ ] to be independent of the previous jumps {(V i , Z i )} (0,t] , a simplification compared to more complex SMCMC and PF approaches to this problem.Note that our algorithm, involving the approximation of the filtering density by a single Gaussian, is distinct from the more familiar SMCMC approaches in the literature ( [15], [16]), and future work will compare the performance of these algorithms and with regular particle filters [14].

V. APPLICATIONS
In this section we demonstrate the applicability of the proposed inference algorithm for Lévy SSMs driven by a GH processes.Firstly, we present results of a synthetically generated Langevin dynamics data set with known and fixed parameters.Here, we display the performance of the proposed sequential MCMC methodology in Section IV on the estimation of the state vector x(t).Furthermore, we apply our formulation of the Langevin dynamics to a British pound (GBP) to Japanese Yen (JPY) foreign exchange rate data set sampled on 07/01/2013.For the synthetically generated data set, the parameters of the driving GH process are chosen as λ = −0.8,γ = 0.01 and δ = 1.0 which are defined in terms of the underlying GIG subordinator process.Furthermore, β = µ W = 0 which implies that the process is symmetric and the additional scale parameter σ W in Eq. ( 11) is chosen to be 1.0.An independent GIG process, and its extension to a GH process, can be simulated using Algs. 3 and 4 for this parameter setting.The damping parameter θ for the evolution of the acceleration of a particle moving under Langevin dynamics is set to −0.5 and the observation noise variance σ 2 ε is set to 0.1.The associated SDE is initialised as x(0) = 0 and the process is simulated between (0, 100] with 200 linearly spaced observations drawn according to Eq. ( 2) with H = [1 0].
The latent states x(t), ẋ(t) and the observed data y(t) are visualised in Fig. 1 together with the results of our sequential MCMC filter.Notice that all of the observations are shown in the first plot with blue dots and the second plot showing the velocity of the particle is completely unobserved.Both plots include sequentially computed latent state estimates in addition to 3 standard deviation ranges in grey showing the uncertainty in our estimates.At each time interval 100 samples are generated from the jump proposal and the associated conditional densities are accepted with probability as in (26).
The results suggest that our proposed algorithm is able to accurately infer the true latent states.Specifically, the jump causing a large change in velocity around t = 78 is successfully estimated and the model is able to track the state without any lags.The effect of this jump on x(t) can also be seen as a rapid upwards change in position in the first plot.Furthermore, due to the damping effect in Langevin dynamics the velocity ẋ(t) gradually reverts to its zero mean levels.For the foreign exchange data set, the original data was downsampled to include every 500th irregularly arriving order price for a period of 113 minutes.The observed data used for the experiment includes 200 points in total.The results are shown in Fig. 2 where the x-axis is in seconds.The parameters of the driving GH process as well as the Lévy SSM parameters are tuned by hand using grid-based strategies that maximise the average log marginal likelihood of the data.
The inferred velocity of the rates (lower plot) include rapid shifts, most notably around t = 4200, which suggest that a Brownian-driven model would be inappropriate for such a heavy-tailed process.The state on the top plot is able to follow the price accurately without losing track.Hence, the ability to sequentially estimate large deviations in state will likely be of significant assistance in momentum-based trading strategies.

VI. CONCLUSIONS
In this work, we introduced a Lévy SSM driven by a GH process which provides a flexible representation of continuoustime stochastic linear systems with non-Gaussian properties.We introduced a novel sequential inference algorithm for this model that may be readily generalised to Lévy SSM driven by normal variance-mean processes.A specific formulation of Langevin dynamics is used to demonstrate the applicability of Lévy SSMs and our sequential MCMC algorithm in Section V.However, it is worth noting that the class of Lévy SSMs is able to represent a wider range of models such as the standard object tracking including the random walk, constant velocity, constant acceleration models.Together with the broad class of processes represented by a GH process, our work enables modelling of a wide range of real-world phenomena with applications in a variety of fields.
Future research directions on these models include improved inference algorithms that incorporate proposal distributions for individual jumps, estimation of parameters by extending the state to include time-dependent unknown parameter values and non-parametric estimation of the underlying jump process.
) and the number of jumps is a Poisson random variable with mean µ [a,b] .