Shadow Magnetic Hamiltonian Monte Carlo

Incorporating partial momentum refreshment into Magnetic Hamiltonian Monte Carlo (MHMC) to create Magnetic Hamiltonian Monte Carlo with partial momentum refreshment (PMHMC) has been shown to improve the sampling performance of MHMC significantly. At the same time, sampling from an integrator-dependent shadow or modified target density has been utilised to boost the acceptance rates of Hamiltonian Monte Carlo (HMC), which leads to more efficient sampling as the integrator is better conserved by the shadow Hamiltonian than the true Hamiltonian. Sampling from the shadow Hamiltonian associated with the numerical integrator used in MHMC is yet to be explored in the literature. This work aims to address this gap in the literature by combining the benefits of the non-canonical Hamiltonian dynamics of MHMC with those achieved by targeting the modified Hamiltonian. We first determine the modified Hamiltonian associated with the MHMC integrator and use this to construct a novel method, which we refer to as Shadow Magnetic Hamiltonian Monte Carlo (SMHMC), that leads to better sampling behaviour when compared to MHMC while leaving the target distribution invariant. The new SMHMC method is compared to MHMC and PMHMC across various target posterior distributions, including datasets modeled using Bayesian Neural Networks and Bayesian Logistic Regression models.


I. INTRODUCTION
Markov Chain Monte Carlo (MCMC) methods are a vital inference tool for probabilistic machine learning models [1,2,3,4,5,6]. MCMC algorithms are preferable to variational approaches [7,8] as they are assured to converge to the correct target distribution if the sample size is adequately large [9,10]. These methods are premised on constructing a Markov chain of samples that asymptotically, as the number of generated samples tends to infinity, converge to the desired equilibrium distribution. By definition, the samples generated by MCMC algorithms are auto-correlated, which means that they will have higher variance than classical Monte Carlo techniques. This branch of inference techniques was initially developed by physicists, with famous examples being the Metropolis-Hastings [11] method of Metropolis and Hastings and the now popular and go-to Hamiltonian Monte Carlo (HMC) algorithm of Duane et al. [12]. These MCMC methods then later entered the field of computational statistics, where they are used today to sample from various complex probabilistic models [13].
HMC has been enhanced over the last decade with some examples of the improved algorithms being Riemannian Hamiltonian Monte Carlo [2] which considers the local geometry of the target posterior to better explore the density, the No-U-Turn Sampler (NUTS) [3] which automatically tunes the trajectory length and step size parameters of HMC, Quantum-Inspired Hamiltonian Monte Carlo [14] which uses a random mass matrix as inspired by the behaviour of quantum particles, continuously-tempered Hamiltonian Monte Carlo [15,16] which is suited for sampling from multimodal distributions and also produces the Bayesian evidence metric which can be utilised for model comparison, Magnetic Hamiltonian Monte Carlo [4,17] which employs noncanonical Hamiltonian dynamics to better explore the target posterior, as well as methods that use shadow Hamiltonians, such as Separable Shadow Hamiltonian Hybrid Monte Carlo [18] and Shadow Manifold Hamiltonian Monte Carlo [5], to sample from high dimensional target densities while maintaining high sample acceptance rates [19,20,18,21,5,22,23].
Magnetic Hamiltonian Monte Carlo (MHMC) utilises noncanonical dynamics to skillfully probe the target posterior distribution [4,6,20]. MHMC introduces a magnetic field to HMC which leads to reduced auto-correlations and efficient convergence [24,4]. MHMC has been extended to manifolds by Brofos and Lederman [25] and shows good improvement over MHMC. Mongwe et al. [26] present a method for the automatic tuning of the step size and trajectory length parameters in MHMC, which shows improvement over MHMC. This method is based on the NUTS methodology of Hoffman and Gelman [3]. MHMC has the disadvantage that the magnetic component has to be specified by the user. In the existing literature on MHMC, there are no automated means of tuning the magnetic component [6,4,20].
It has been previously confirmed that the performance of HMC suffers from the deterioration in acceptance rates due to numerical integration errors as the system size increases [19,5]. As MHMC is an extension of HMC, and becomes HMC when the magnetic component is absent, one would expect MHMC also suffers from the pathology of deterioration of acceptance rates as the system size increases. This deterioration in acceptance rates results in large autocorrelations between the generated samples, thus requiring large sample sizes. The decline of the acceptance rates can be reduced by using more accurate higher-order integrators, by using smaller step sizes, or by employing shadow Hamiltonians [23]. These first two approaches tend to be more computationally expensive than the latter approach [5,23]. In this work, we explore the method of utilising shadow Hamiltonians.
Shadow Hamiltonian-based samplers have been successfully employed to manage the deterioration of sample acceptance as the system size and step sizes increases and lead to a more efficient sampling of the target posterior [22,18,19]. The shadow Hamiltonians are constructed by performing backward error analysis of the integrator and, as a result, are better preserved when compared to the true Hamiltonian [27]. Numerous strategies have been proposed for sampling from shadow Hamiltonians of diverse numerical integrators [22,5,18,19,28].
Mongwe et al. [29] introduce the Quantum-Inspired Magnetic Hamiltonian Monte Carlo algorithm. Their work explored the utility of employing a random mass matrix for the auxiliary momentum variable in MHMC, which is consistent with the behaviour of quantum particles. The results showed a significant improvement in sampling results when compared to MHMC. Magnetic Hamiltonian Monte Carlo with partial momentum refreshment (PMHMC) was introduced by Mongwe et al. [17] and shows that retaining some of the chains' past dynamics can improve the sampling performance of MHMC. The disadvantage of the above two approaches is the need to manually tune the volatility-of-volatility and momentum refreshment parameters that the authors introduce. Although these works addressed important gaps in the literature, they did not consider the potential benefits of sampling from the shadow Hamiltonian instead of the true Hamiltonian in MHMC. Mongwe et al. [28] explored the benefits of combining shadow Hamiltonians and partial momentum refreshment for the Separable Shadow Hamiltonian Monte Carlo method with good results. Sampling from the shadow Hamiltonian associated with the leapfroglike integrator used in MHMC is yet to be explored in the literature.
In this work, we address this gap in the literature by deriving the fourth-order shadow Hamiltonian corresponding to the leapfrog-like integrator in MHMC. From this, we construct the novel Shadow Magnetic Hamiltonian Monte Carlo (SMHMC) algorithm, which leaves the target density invariant. We compare the performance of the proposed method to MHMC and the PMHMC algorithm in [17]. The empirical results on a multivariate Gaussian distribution with a dimension of ten, real-world benchmark datasets modeled using Bayesian Logistic Regression and Bayesian Neural Network targets show that SMHMC achieves higher effective sample sizes when compared to the other MCMC algorithms considered. The proposed method does, however, consume more computational resources than the other MCMC methods due to the requirement to compute the shadow Hamiltonian and thus leads to poor performance on a time-normalised basis. This is a crucial area of improvement of the proposed method, which we intend to address in the future.
Contributions: Our contributions in this work can be summarised as follows: • A shadow Hamiltonian that is preserved up to fourthorder by the numerical integrator used in MHMC is derived. • We combine the benefits of non-canonical Hamiltonian dynamics with the proprieties of shadow Hamiltonians to form the Shadow Magnetic Hamiltonian Monte Carlo (SMHMC) algorithm, which is guaranteed to leave the target density invariant. • Numerical experiments across diverse targets show that the new algorithm outperforms the other considered MCMC techniques on an effective sample size basis. The remainder of this paper is structured as follows: Sections II and III discuss the material that forms the basis of the new method, Section IV introduces the new method, Section V outlines the experiments conducted and Section VI discusses the results of the experiments. We then provide the conclusion in Section VII.

II. MAGNETIC HAMILTONIAN MONTE CARLO
The Hamiltonian Monte Carlo (HMC) algorithm is composed of two steps: 1) the molecular dynamics step and 2) the Monte Carlo step. The molecular dynamics step involves integrating Hamiltonian dynamics, while the Monte Carlo step employs the Metropolis-Hastings (MH) algorithm to account for any errors introduced by the numerical integrator used in the molecular dynamics step [12,19,22,30,11,31,32]. Note that if we could exactly solve the molecular dynamics step, we would not need the Monte Carlo step.
HMC improves upon the MH [11] algorithm by utilising first-order gradient information of the unnormalised target posterior. This gradient information is used to guide HMC's exploration of the parameter space [12,32]. This necessities that the target posterior function is differentiable and has support almost everywhere on R D , which is the case for the majority of machine learning models of interest. In HMC, the position vector w is augmented with auxiliary momentum variable p, which is typically chosen to be independent of w. The Hamiltonian H(w, p), which represents the total energy, from this system is written as follows: where U (w) is potential energy or the negative loglikelihood of the target posterior distribution and K(p) is the kinetic energy defined by the kernel of a Gaussian with a covariance matrix M [1]: Within this framework, the evolution of the physical system is governed by Hamiltonian dynamics [33,1]. As the particle moves through time using Hamiltonian dynamics, the total energy is conserved over the entire trajectory of the particle, with kinetic energy being exchanged for potential energy and vice versa to ensure that the Hamiltonian or total energy is conserved [33,1]. As an illustration, consider a person skating from position A to C as displayed in Figure  1. At position A, the person only has potential energy and no kinetic energy, and they only have kinetic energy at point B. At position C, they have both kinetic and potential energy. Throughout the movement from A to C, the total energy will be conserved if the individual traverses the space using Hamiltonian dynamics. This allows the individual to traverse long distances. This energy conservation property of Hamiltonian dynamics is key to the efficiency of HMC in exploring the target posterior.
The equations governing the Hamiltonian dynamics are defined by Hamilton's equations in a fictitious time t as follows [31]: which can also be re-expressed as: The Hamiltonian dynamics satisfy the following important properties, which make it ideal for efficiently generating distant proposals [34,30] 2) Reversibility: That is, the dynamics can be moved forward in time by a certain amount and backwards in time by the same amount to get back to the original position. Mathematically: Let Φ t,H w 0 p 0 be the unique solution at time t of equation (3) with initial position w 0 p 0 . As the Hamiltonian in equation (1) is time-homogeneous, we have that: 3) Volume preservation: This property serves to simplify the MH step in HMC so that it does not require a Jacobian term, as volume preservation means that the Jacobian term is equal to one [35,1]. There have also been extensions of HMC that do no not preserve volume [33]. These three properties are significant in that conservation of energy allows one to determine if the approximated trajectory is diverging from the expected dynamics, reversibility of the Hamiltonian dynamics ensures reversibility of the sampler, and volume preservation simplifies the MH acceptance step [5,1].
The differential equation in equation (3) and (4) cannot be solved analytically in most instances. This necessitates the use of a numerical integration scheme. As the Hamiltonian in equation (1) is separable, to traverse the space, we can VOLUME 4, 2021 employ the leapfrog integrator [12,31]. The position and momentum update equations for the leapfrog integration scheme are: Algorithm 1: (7) 5: Due to the discretisation errors arising from the numerical integration, the Monte Carlo step in HMC utilises the MH algorithm in which the parameters w * proposed by the molecular dynamics step are accepted with probability: Algorithm 1 shows the pseudo-code for the HMC where is the discretisation step size and L is the trajectory length. The overall HMC sampling process follows a Gibbs sampling scheme, where we fully sample the momentum (see line 3 in Algorithm 1) and then sample a new set of parameters given the drawn momentum. Magnetic Hamiltonian Monte Carlo (MHMC) is a special case of non-canonical HMC using a symplectic structure corresponding to motion of a particle in a magnetic field [4,6]. MHMC extends HMC by endowing it with a magnetic field, which results in non-canonical Hamiltonian dynamics [4]. This magnetic field offers a significant amount of flexibility over HMC and encourages more efficient exploration of the posterior, which results in faster convergence and lower autocorrelations in the generated samples [4,24,20]. MHMC uses the same Hamiltonian as in HMC, but exploits noncanonical Hamiltonian dynamics where the canonical matrix now has a non-zero element on the diagonal. The MHMC dynamics are given as: where G is a skew-symmetric 1 (or antisymmetric ) matrix and is the term that represents the magnetic field. This also shows that MHMC only differs from HMC dynamics in equation (4) by G being non-zero. When G = 0, MHMC and HMC have the same dynamics. Tripuraneni et al. [4] prove that in three dimensions, these dynamics are Newtonian mechanics of a charged particle in a magnetic field. How this magnetic field relates to the force field (e.g. are they orthogonal?) will determine the extent of the sampling efficiency of MHMC over HMC [24]. As with HMC, these non-canonical dynamics cannot be integrated exactly, and we resort to a numerical integration scheme with a MH acceptance step to ensure detailed balance. The update equations for the leapfrog-like integration scheme for MHMC, for the case where M = I, are given as [4]: The above equations show that we can retrieve the update equations of traditional HMC by first performing a Taylor matrix expansion for the exponential and then substituting G = 0. The pseudo-code for the MHMC algorithm is shown in Algorithm 2. It is important to note that we need to flip the sign of G (see lines 8-15 in Algorithm 2), as we do the sign of p in HMC, so as to render the MHMC algorithm reversible. In this sense, we treat G as being an auxiliary variable in the same fashion as p [4]. In this setup, p would be Gaussian while G would have a binary distribution [4] and only taking on the values ±G 0 , with G 0 being specified by the user. Exploring more complex distributions for G is still an open area of research.
Although MHMC requires matrix exponentiation and inversion as shown in equation (10), this only needs to be computed once upfront and stored [4]. Following this approach results in computation time that is comparable to HMC, which becomes more important in models that have many parameters such as neural networks.
As G only needs to be antisymmetric, there is no guarantee that it will be invertible. In this case, we need first to diagonalise G and separate its invertible or singular components [4]. As G is strictly antisymmetric, we can express it as iH where H is a Hermitian matrix, and can thus be diagonilised over the space of complex numbers C as [4]: where Λ is a diagonal submatrix consisting of the nonzero eigenvalues of G, columns of W Λ , and W 0 are the eigenvectors of G corresponding to its nonzero and zero eigenvalues, respectively. This leads to the following update for w in equation (10) [4]: It is worthwhile noting that when G = 0 in equation (12) then the flow map will reduce to an Euler translation as in traditional HMC [4]. (10) 5:

III. SHADOW HAMILTONIAN FOR MAGNETIC DYNAMICS
The leapfrog-like integrator for MHMC only preserves the Hamiltonian, that is, the total energy of the system, up to second order O( 2 ) [19,18,18]. This leads to a larger than expected value for δH in line 5 of Algorithm 2 for long trajectories, which results in more rejections in the MH step in line 8 of Algorithm 2. To increase the accuracy of the preservation of the total energy to higher orders, and consequently maintain high acceptance rates, one could: 1) decrease the step size and thus only consider short trajectories, or 2) utilise numerical integration schemes which preserve the Hamiltonian to a higher order, 3) or a combination of 1) and 2). These three approaches typically lead to a high computational burden, which is not ideal [5,23].
An alternative strategy is to assess the error produced by feeding the solution backward through [27,21] the leapfroglike integration scheme in equation (10), to derive a modified Hamiltonian whose energy is preserved to a higher-order by the integration scheme than the true Hamiltonian [21]. This modified Hamiltonian is also referred to as the shadow Hamiltonian. We then sample from the shadow density and correct for the induced bias via importance sampling as is done in [5,18,19,23], among others.
Shadow Hamiltonians are perturbations of the Hamiltonian that are by design exactly conserved by the numerical integrator [23,19,20,21]. In this manuscript, we focus on a fourth-order truncation of the shadow Hamiltonian under the leapfrog-like numerical integrator in equation (10). Since the MHMC numerical integrator is second-order accurate (O 2 ) [4], the fourth-order truncation is conserved with higher accuracy (O 4 ) by the intergrator than the true Hamiltonian. In Theorem III.1, we derive the fourth-order shadow Hamiltonian under the numerical integrator.
Theorem III.1. Let H : R d × R d = R be a smooth Hamiltonian function. The fourth-order shadow Hamiltonian functionĤ : R d × R d = R corresponding to the numerical integrator used in MHMC is given by: Proof. As outlined in Tripuraneni et al. [4], the Hamiltonian vector field will generate the exact flow corresponding to exactly simulating the MHMC dynamics [4]. We obtain the shadow Hamiltonian via the separability of the true Hamiltonian [4]. The numerical integration scheme in equation (10) splits the Hamiltonian as: H(w, p) = H 1 (w) + H 2 (p) + H 1 (w) and exactly integrates each sub-Hamiltonian [4]. Using the Baker-Campbell-Hausdorff [36] formula we obtain: where the non-canonical Poisson brackets [37,6] are defined as: and collapse to the canonical Poisson brackets when G = 0 [37,6]. The corresponding derivatives from the noncanonical Poisson brackets are presented in Appendix A. The shadow Hamiltonian for the leapfrog-like integrator is then: where A is the factor induced by the presence of the magnetic field G. When G = 0, the shadow in (16) becomes the shadow for Hamiltonian dynamics [19,18]. Note that the modified Hamiltonian in (16) is preserved to fourth-order [5,18,23], and is thus more accurately conserved by numerical integrator in MHMC than the true Hamiltonian [38].

IV. SHADOW MAGNETIC HAMILTONIAN MONTE CARLO
We now present the Shadow Magnetic Hamiltonian Monte Carlo (SMHMC) algorithm, which combines non-canonical Hamiltonian dynamics in MHMC with the high conservation property of shadow Hamiltonians. The benefits of employing non-canonical Hamiltonian dynamics in MHMC have already been established in [4,24,17], while the advantages of shadow Hamiltonians in general are presented in [18,5,21,20,22]. We combine these two concepts to create a new sampler that outperforms MHMC across various performance metrics. An analysis of the shadow Hamiltonian corresponding to MHMC in equation (13) shows that the conditional density for the momenta π H (p|w) is not Gaussian. This suggests that if we fully re-sample the momenta from a normal distribution, as is done in [33], we will attain a sampler that does not satisfy detailed balance [5,22]. This necessitates computationally intensive momentum generation [19] or partial momentum refreshment [23,22] with an MH step. In this manuscript, we utilise the partial momentum refreshment procedure outlined in [22,5], in which a Gaussian noise vector u ∼ N (0, M) is drawn. The momentum proposal is then produced via the mapping: The new parameter, which we refer to as the momentum refreshment parameter, ρ = ρ(w, p, u) takes values between zero and one, and controls the extent of the momentum retention [17,5,23]. When ρ is equal to one, the momentum is never updated and when ρ is equal to zero, the momentum is always updated [17]. The momentum proposals are then accepted according to the modified non-separable shadow Hamiltonian given asH(w, p, u) =Ĥ(w, p) + 1 2 uM −1 u. The updated momentum is then taken to be ρp + 1 − ρ 2 u with probability: ω := max{1, exp(H(w, p, u) −H(w, R(p, u)))}. (18) The incomplete refreshment of the momentum produces a chain which saves some of the behaviour between neighbourhood samples [17,39,5,22,23]. In Section V-D, we assess the sensitivity of the sampling results on the user-specified value of ρ. An algorithmic description of the SMHMC sampler is provided in Algorithm 3. It is worth noting from Algorithm 3 that the SMHMC sampler uses two reversible MH steps, which implies that the resulting Markov chain is no longer reversible [5,22,23]. By breaking the detailed balance condition, it is no longer immediately clear that the target density is stationary, and so this must be demonstrated [5,23].
Theorem IV.1. The SMHMC algorithm leaves the importance target distribution invariant.
Proof. The proof of theorem IV.1 is obtained in Appendix A of the paper by Radivojevic and Akhmatskay [23]. The proof involves showing that the addition of step 4 in Algorithm 3 leaves the target invariant. The result follows from [23] by making use of the fact that the explicit form of the shadow Hamiltonian, which has the additional magnetic component A = K p GK pp U w in equation (16) in our case, is not required for the proof [23,5].

5:
(ŵ,p) = Φ L ,H (w,p, G) in equation (10) 6: if ζ > u then 8: In this section, we outline the settings used for the experiments, the performance metrics, approach for algorithm tuning, and we also present the sensitivity analysis for the partial momentum refreshment parameter ρ used in SMHMC and PMHMC.

A. EXPERIMENT SETTINGS
In our analysis, we compare the performance of SMHMC against MHMC and PMHMC across a multivariate Gaussian distribution described in [17] with D = 10, the Protein dataset [20] modeled using a Bayesian Neural Network (BNN) and the Heart [20] and Pima [20] datasets modeled using Bayesian Logistic Regression (BLR). The details of the real-world datasets are shown in Table 1. Note that the BNN architecture used is an MLP with one hidden layer and five hidden units. For all the target posteriors used in this paper, the momentum refreshment parameter ρ is set to 0.7. This setting worked well on all the targets. Further experiments of the sensitivity to ρ are presented in Section V-D.

B. PERFORMANCE METRICS
We now present the performance metrics used to measure the performance of the algorithms proposed in this manuscript. The performance metrics used are the acceptance rate, the multivariate effective sample size (ESS), the multivariate ESS normalised by the execution time. We also assess the convergence of the proposed SMHMC method using the potential scale reduction factor metric. The acceptance rate metric measures the number of generated samples that are accepted in the MH acceptance step of the algorithm [29,23]. The higher the number of accepted samples for the same step size, the more preferable the method. We discuss the remaining metrics in more detail in the following sections.

1) Effective sample size
The ESS metric is a commonly used metric for assessing the sampling efficiency of an MCMC algorithm. It indicates the number of effectively uncorrelated samples out of the total number of generated samples [23,29]. The larger the ESS, the better the performance of the MCMC method. The ESS normalised by execution time metric takes into account the computational resources required to generate the samples and penalises MCMC methods that require more computational resources to generate the same number of uncorrelated samples. The larger this metric, the better the efficiency of the algorithm. This paper employs the multivariate ESS metric developed by Vats et al. [40] instead of the minimum univariate ESS metric typically used in analysing MCMC results. The minimum univariate ESS measure is not able to capture the correlations between the different parameter dimensions, while the multivariate ESS metric can incorporate this information [20,40,2]. The minimum univariate ESS calculation results in the estimate of the ESS being dominated by the parameter dimensions that mix the slowest and ignore all other dimensions [40,20]. The multivariate ESS is calculated as: where N is the number of generated samples, D is the number of parameters, |Λ| is the determinant of the sample covariance matrix and |Σ| is the determinant of the estimate of the Markov chain standard error. When D = 1, mESS is equivalent to the univariate ESS measure [40]. Note that when there are no correlations in the chain, we have that |Λ| = |Σ| and mESS = N . We now address the ESS calculation for Markov chains that have been re-weighted via importance sampling, such is the case for the SMHMC algorithm proposed in this paper [23,5,20,21]. For N samples re-weighted by importance sampling, the common approach is to use the approximation by Kish [41,5] given by This accounts for the possible nonuniformity in the importance sampling weights. In order to account for both the effects of sample auto-correlation and re-weighting via importance sampling, we approximate ESS under importance sampling by taking directions from Heide et al. [5] and using: 2) Convergence analysis TheR diagnostic of Gelman and Rubin [42] is a favored technique for verifying the convergence of MCMC chains [43]. This diagnostic depends on running numerous chains {X i0 , X i1 , ..., X i(N −1) } for i ∈ {1, 2, 3, ..., m} starting at diverse initial conditions with m being the number of chains and N being the sample size. Using these parallel chains, two estimators of the variance can be constructed. The estimators are the between-the-chain variance estimate and the withinthe-chain variance. When the chain has converged, the ratio of these two estimators should be one. TheR metric, which is formally known as the potential scale reduction factor, is defined as:R is the within-chain variance estimate andV = N −1 N W + B N is the pooled variance estimate which incorporates the betweenchains and within-chain W variance estimates, withX i. andX .. being the i th chain mean and overall mean respectively for i ∈ {1, 2, 3, ..., m}. Values larger than the convergence threshold of 1.05 for theR metric indicate divergence of the chain [6,42]. In this paper, we asses the convergence of the chains by computing the maximumR metric over each of the parameter dimensions for the given target.

C. ALGORITHM PARAMETER TUNING
As mentioned in Section II, the matrix G in the MHMC method provides an extra degree of freedom which typically results in better sampling behavior than HMC [20,4,25]. It is not immediately clear how this matrix should be set -this is still an open area of research [20,4,6]. In this paper, we take direction from the inventors [4] of the method and select only a few dimensions to be influenced by the magnetic field. In particular, G was set such that G 1i = g, G i1 = −g and zero elsewhere where g = 0.2 for the BLR datasets, and g = 0.1 for all the other targets. These settings mean that the selection of G is not necessarily the optimal choice for all the target distributions VOLUME 4, 2021  Step size and trajectory length parameters used for the MCMC methods in this manuscript. Five thousand samples were used to tune the step size for the given trajectory length using primal-dual averaging. The target acceptance rate was set to 80%. considered but was adequate for our objectives as this basic setting still leads to satisfactory performance on the SMHMC algorithm that we present in this manuscript. Tuning G for each target posterior should result in improved performance compared to the results given in this manuscript. An alternative approach to the selection of G would have been to follow [6] and selecting G to be a random antisymmetric matrix. It is not immediately clear if the approach of [6] is necessary optimal, and we plan to explore this approach in future work. The step size is tuned to target an acceptance rate of 80% using the primal-dual averaging methodology of [3]. The trajectory lengths L used vary across the different targets, with the final step sizes and trajectory lengths used for the diverse problems presented in Table 2.
Ten independent chains were run for each approach on each target distribution. Three thousand samples were generated for each target, with the first one-thousand samples discarded as burn-in. These settings were adequate for all the methods to converge on all the target posteriors. All the experiments in this manuscript were conducted on a machine with a 64bit CPU using PyTorch.

D. SENSITIVITY TO MOMENTUM REFRESHMENT PARAMETER
We investigate the effects of varying the momentum refreshment parameter ρ on the sampling performance of the proposed shadow Hamiltonian method. Ten chains, starting from different positions, of the PMHMC and shadow MHMC algorithms were ran on the Australian credit dataset for ρ ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. Note that we exclude ρ = 1.0 as the chain is likely to no longer be ergodic as the momentum would not be refreshed at all [5]. Figure 2 shows the results for the Australian credit dataset. The results show that PMHMC and SMHMC have stable acceptance rates across the different values of ρ on all this   Figure 3 shows the diagnostic trace-plots of the negative loglikelihood across various target posteriors. The results show that all three methods have converged on the four target densities analysed in this paper. The performance of the algorithms across different metrics is shown in Figure 4 and Tables 3 to 6. In Figure 4, the plots on the first row for each dataset show the effective sample    size, and the plots on the second row show the effective sample size normalised by execution time. The results are for the ten runs of each algorithm. The execution time t in Figure  4 and Tables 3 to 6 is in seconds. The results in Tables 3 to 6 are the mean results over the ten runs for each algorithm. We use the mean values over the ten runs in Tables 3 to 6 to form our conclusions about the performance of the algorithms. Tables 3 to 6 show that SMHMC produces the highest acceptance rate across all the targets. Furthermore, the SMHMC algorithm produces the largest ESS on all the targets. In particular, it outperforms PMHMC, which shows that the method is doing something extra than just incorporating partial momentum refreshment into the MHMC. However, the source of the outperformance of SMHMC seems to be mostly driven by the incorporation of the partial momentum refreshment, highlighting the benefits of utilising partial momentum refreshment in Hamiltonian dynamics-based samplers in general.

VI. RESULTS AND DISCUSSION
The results show that the MHMC and PMHMC produce the lowest execution times across all the targets, with SMHMC having the largest execution time, sometimes as much as two times that of the MHMC method. The large execution time of SMHMC can be attributed to the multiple times that the shadow Hamiltonian is evaluated, as well as the extra MH step for the momenta generation. The slow execution time is the key drawback of SMHMC, and hinders the performance of the method on a time-normalised ESS basis. We find that PMHMC outperforms all the methods on a time-normalised ESS basis. SMHMC outperforms MHMC on the BLR datasets on a time-normalised ESS basis, with MHMC outperforming SMHMC on the other targets on the same basis. Furthermore, theR metric shows that all the methods have converged, with PMHMC and SMHMC producing marginally better convergence behaviour compared to MHMC.

VII. CONCLUSION
We introduce the novel SMHMC algorithm, which combines the non-canonical dynamics of MHMC with the benefits of sampling from a shadow Hamiltonian. This combination results in improved exploration of the posterior when compared to MHMC. The empirical results show that the new algorithm provides a significant improvement on the MHMC algorithm in terms of higher acceptance rates and larger effective sample sizes, even as the dimensionality of the problem increases.
A primary limitation of the proposed algorithm is the computational time associated with the method, mainly since it involves the computation of the Hessian matrix of the target distribution. This leads to poor performance on a time-normalised ESS basis. A straightforward approach to circumvent the computational burden is to use closed-form expressions for the first-order derivatives and the Hessian matrix. This approach, however, restricts the possible targets that can be considered. We aim to address this issue in the future by using a surrogate model to approximate the shadow Hamiltonian during the burn-in period of the method as an active learning task.
Another limitation of the method is the need to tune the momentum refreshment parameter. Although typically higher values of the parameter improve the effective sample sizes, a more robust approach to selecting the parameter is still required. In future work, we plan on improving the proposed method by establishing an automated process to tune the momentum refreshment parameter.