Bayesian Detection of a Sinusoidal Signal with Randomly Varying Frequency

The problem of detecting a sinusoidal signal with randomly varying frequency has a long history. It is one of the core problems in signal processing, arising in many applications including, for example, underwater acoustic frequency line tracking, demodulation of FM radio communications, laser phase drift in optical communications and, recently, continuous gravitational wave astronomy. In this paper we describe a Markov Chain Monte Carlo based procedure to compute a specific detection posterior density. We demonstrate via simulation that our approach results in an up to $25$ percent higher detection rate than Hidden Markov Model based solutions, which are generally considered to be the leading techniques for these problems.


I. INTRODUCTION
T he problem of detecting a sinusoidal signal with randomly varying frequency, measured in additive noise, is encountered in numerous applications. Our interest derives from attempts to detect the presence of as yet undiscovered gravitational waves hypothesized to emanate from rotating astronomical objects like neutron stars [1], [2], whose frequency wanders slowly and randomly [3]. Attempts to develop optimal and good sub-optimal solutions have occupied many signal processing researchers for at least 50 years, including more recent work in [4]-[7]. In essence, the problem can be categorized as detection of a non-Gaussian random process in Gaussian noise, and the forms of the optimal detector are well known [8], [9]. However, these require the conditionalmean estimate of the signal which, apart for a small number of cases, is extremely difficult to characterize and compute.
Many approximate solutions have been developed including use of an extended Kalman filter (EKF) to track the random frequency followed by a coherent detector [10], [11]. This approach is known to be far from optimal because of the simple linearization used in the EKF. Another commonly used approximation is to assume a quadratic detector structure [12] and optimize a relevant performance cost, typically the deflection ratio. This approach leads to the use of the covariance of the random signal in the quadratic detector, but this is not optimal for a non-Gaussian random signal. One class of approximate methods relies on Hidden Markov Models (HMMs) and the Viterbi algorithm [13]- [15] to rapidly compute the maximum a posterior (MAP) estimate using the short-time discrete Fourier transform (DFT). The detector is then formed by substituting in the MAP estimate. These methods rely on a Markov assumption for the wandering frequency dynamics between time blocks. An extension of this method enforces phase continuity between time blocks [16], resulting in a further improvement. Attempts to replace the short-time DFT with more sophisticated time-frequency analysis methods, such as the Wigner-Ville distribution, appear to offer no performance advantage over a standard Viterbi approach in terms of frequency tracking accuracy or detection performance [17].
In this paper we form the detection statistic by directly computing a specified posterior density using Markov Chain Monte Carlo (MCMC) [18] methods. The essential idea behind MCMC is to construct a Markov chain of which the invariant distribution is the desired posterior distribution. When the Markov chain converges to its equilibrium, the samples generated by the chain are essentially samples from the posterior distribution of interest [18]. To construct the detector, we introduce a time-invariant binary random variable k that indicates whether a signal is present in the data or not. Thus, under the null hypothesis H 0 , (k = 0), the signal of interest is absent and under the alternative hypothesis H 1 , (k = 1), the signal is present, with unknown amplitude and wandering frequency. The detection statistic is then set to be the posterior distribution of k, given observations y, denoted by Pr(k|y). This posterior involves two distinct terms, Pr(k = 0|y) and Θ Pr(k = 1, θ|y)dθ, with θ being a parameter in the space Θ of unknown amplitude, wandering frequency and phase of the signal under k = 1. We show, in order to evaluate Pr(k|y), we have to estimate θ as well. A closely related idea is used in [19], where the signal is modelled as a superposition of several single frequency sinusoids; in that case the number of sinusoids as well as their corresponding (constant) frequencies is estimated. We differ from previous work in that we focus on detecting one wandering frequency line, modelled as a high dimensional unknown parameter vector. The generalization to multiple signals is straightforward, albeit at the price of increased computational complexity.
In this work we follow an important extension to the basic MCMC method, called reversible jump MCMC (RJMCMC) [20], which allows samples to jump between multiple spaces with different dimensions while maintaining overall equilibrium. We first derive the posterior distribution Pr(k|y) and proceed to build it with RJMCMC. We then introduce a new method for efficiently proposing a candidate frequency path while maintaining a reasonable acceptance ratio. We develop a parametrized model of frequency dynamics with varying parameter dimension, where we track the frequency at coarsely spaced time samples ("knots") while interpolating between knots with quadratic polynomials. The time between two adjacent knots is referred to here as a "block". The number of knots in this scheme is equal to the number of blocks. We show significant saving of computational resources with a reduced numbers of knots, sufficient to capture the dynamics of the underlying frequency. This characteristic is valuable since real life applications usually deal with very large amounts of data (e.g. observation data of gravitational waves typically involves a scalar amplitude channel sampled ∼ 10 11 times over an observation period lasting one year [2]). We also illustrate how to choose the number of blocks for HMM and MCMC respectively. In the end, we perform numerical simulations that demonstrate higher estimation accuracy and detection probability of MCMC, compared with HMM based methods.
The remainder of this paper is organized as follows. In Section II, a parameterized signal model is presented. In Section III, the HMM-based method is briefly explained. In Section IV, the posterior distribution for detection is formally derived. The complete RJMCMC procedure is developed and described in Section V, followed by novel methods of generating a new sample path and producing a proposal sample path for a single MCMC birth and update step, detailed in Algorithms 1-7. Numerical results are described in Section VI, where the detection performance of the algorithm is quantified by receiver operating characteristic (ROC) curves. The extra information provided by estimating frequency paths is presented as part of the detection algorithm, with root mean square error (RMSE) recorded. The MCMC and the HMM methods are compared in terms of these performance measures.

II. PROBLEM STATEMENT
Without loss of generality, we assume the observed real signals with real additive noise are first converted into complex signals via the Hilbert transform. Throughout this paper, all derivations and simulations are based on complex data. Let y = {y(t n )} n=1,...,N be the complex-valued data sequence, observed at N equally spaced instants, t 1 ≤ . . . ≤ t N , with t 1 = 0 and T = t n − t n−1 for all n. Let k ∈ {0, 1} be a statistic constant during the whole observation period, taking values 0 or 1, denoting whether the data is composed of pure noise (k = 0) or signal plus noise (k = 1). Observations y, which can have been generated either under hypothesis H 0 or hypothesis H 1 , are given by for n = 1, 2, . . . N , where ψ 0 in (1b) is the unknown initial phase and, for simplicity, we incorporate it intoã in (1c) whereã = a exp(jψ 0 ) indicates the unknown complex-valued random amplitude. This allows us to assume that the phase path starts from φ(t 1 ) = 0. In our context,ã and ψ 0 are treated as nuisance parameters. The noise, z(t n ) ∼ CN (0, σ 2 ) is distributed as a complex Gaussian with variance σ 2 . The unknown signal y c = {y c (t n )} n=1,...,N is modelled by Assuming that the continuous time-varying frequency is a Wiener process with zero drift and diffusion constant γ, we have where B(t) is the standard Wiener process with E[B(s)B(t)] = min(s, t). Then the discretized counterpart with sampling interval T is for n = 1, 2 . . . , N − 1, with w 1 (t n ) and w 2 (t n ) representing the instantaneous phase and frequency noise, which are both zero mean Gaussian random variables. Introduction of the state variable x(t n ) = [φ(t n ), f (t n )] T , allows us to write (4) in matrix form as The covariance matrix of w(t n ) = [w 1 (t n ) w 2 (t n )] T is assumed to be time invariant and can be represented as [16] The derivation of (6) is given in Appendix A.
For the rest of the paper the state path is denoted by x = {x(t n )} n=1,...,N , the frequency path by f = {f (t n )} n=1,...,N , and the phase path by φ = {φ(t n )} n=1,...,N . Given observations y, our decision of either H 0 or H 1 is based on the posterior distribution Pr(k|y).

III. HIDDEN MARKOV MODEL (HMM)
Before deriving Pr(k|y), we give a brief review of the widely accepted HMM-based Viterbi algorithm. In this method, the hidden state variable is the frequency, discretized into frequency bins, while the observations are divided into time blocks. The state dynamics capture the frequency wandering between the blocks into the transition probability matrix. A good choice for the transition probability in this context is and zero elsewhere. Here F k and F k−1 denote the bindiscretized frequencies at neighbouring time blocks. The emission probability matrix is constructed by computing the absolute value of the DFTs for each time block. The method relies on the assumption that the frequency is contained in one frequency bin within each block and jumps only occur between blocks. Hence, the size of the block is determined by the dynamics of the underlying wandering frequency, as explained in Section VI-D1. The hidden states are then estimated using the Viterbi algorithm and the detection statistics are determined by the "Viterbi score". A detailed analysis can be found in [16]. Unlike the HMM-based technique, where detection follows the estimation of the hidden frequency path, in our approach we form the detection statistic directly by computing Pr(k|y) using Bayes formula. This analysis is done in the next section.

DISTRIBUTION
In this section, we derive the expression of the posterior distribution Pr(k|y) of the detection statistic. In order to evaluate Pr(k|y), the term Pr(ã, x, k = 1|y) has to be computed, which provides us additional information about parameters other than k. In other words, the two distinct objectives, estimation and detection, normally done sequentially (as in the HMM described in Section III), are integrated naturally into one single term through this posterior distribution.
A. Structure of the detection statistic Pr(k|y) Using the law of total probability and Bayes' Rule, we write Pr(k = 0|y) ∝ Pr(y|k = 0) Pr(k = 0) (8a) where C denotes the complex numbers and D denotes the domain for x. Marginalizing out the parameters in (8b) is nontrivial since x is high dimensional. Hence, we have to evaluate the integrand Pr(ã, x, k = 1|y) by computing the posterior estimate for the parameters in H 1 space.

B. Prior distributions
For later use, we specify the prior distributions for all parameters used in the algorithm. The prior for k is assumed to be Bernoulli distributed with a tunable parameter 1−α, that ofã under k = 1, i.e., when the signal exists, is chosen to be a complex Gaussian distribution, with mean 0 and variance ∆, i.e.,ã ∼ CN (0, ∆). Usually we set ∆ to be a large number compared to σ 2 to reflect our initial uncertainty. As stated earlier φ(t 1 ) = 0 and f (t 1 ) is chosen to be uniformly distributed on the frequency interval (0, U ), i.e., U ∼ U(0, U ) with 0 < U ≤ 1/T , where U denotes the bandwidth of y c and 1/T is the sampling rate. The bandwidth is either known, or, as here, assumed to be equal to the Nyquist frequency, so that U = 1/T , although this is not very critical. 1

C. Main result
The main result of this work is the following theorem.
Theorem IV.1. The posterior for k is where the superscript H denotes conjugate transpose, and where As we show later, our algorithm does not require numerical computation of W 0 because it cancels out. We have where the likelihood term Pr(y|ã, x, k = 1) is rewritten as with The other factor Pr(ã, x, k = 1) is further expanded to where Pr(ã, x|k = 1) denotes the prior distribution forã and x under the model k = 1. As for the prior distribution of x, given an initial state of the path x(t 1 ), the statistical representation of the whole state path x is determined by the model according 1 In gravitational wave applications, U is usually much smaller than the sampling rate. A typical continuous wave search is conducted over sub-bands of ∼ 1 Hz to facilitate handling the large volume of data involved, compared to the sampling frequency 1 kHz. Continuous wave signals from neutron stars are expected to be quasimonochromatic, with intrinsic frequency bin width 10 −6 Hz [2]. to (5). Based on the above analysis and the prior distributions, we rewrite (15) as To assist MCMC sampling we now expand the quadratic form in (19) and after some algebraic manipulations, rewrite it in the following way with Notice thatā is simply the least squares solution ofã in (1c) for a given D f . We now define as the normal distribution ofã given a specific draw D f , with meanā and variance σ 2 q. Equation (12) reflects our uncertainty ofã relative toā in consequence of the observation noise. Combining the remaining terms of (19) and (20), we obtain Specifically, η(x) can be interpreted as the signal-to-noise ratio (SNR) evaluated along a sampled state path x. Combining (19) (22) and (23), we obtain the formulae for Pr(k = 0|y) and Pr(k = 1|y) as given in (9b) (9c).
Corollary IV.1.1. The posterior distribution for k is approximated by Proof. To further evaluate the integral term defined in (9b) , we define The first integral in (26a) is equal to unity. Evaluation of the second integral uses the Laplace approximation, based on the assumption that the integrand is strongly and singly peaked. The expression max (26b) is easy to evaluate for constant frequency signals through the Fourier transform F(y), i.e., but to compute (26a) for a wandering path, we resort to the MCMC algorithm in V.

A. Basic principle
Formally, the posterior probability of a parameter µ, given data D is given by where S is the domain of the parameters and Pr(µ) and Pr(D|µ) are the prior probability and likelihood probability, respectively. Equation (28) is often hard to compute analytically because of the potentially high dimensional integration appearing in the denominator. MCMC provides a way to compute (28) without evaluating the denominator by the construction of a Markov chain with equilibrium distribution as in (28). After MCMC converges, samples drawn from the Markov chain can be treated as random samples drawn from the true posterior distribution. The Maximum a Posteriori (MAP) estimate or other statistical quantities can then be approximated using these ensemble samples.
Specific algorithms designed for our scenario are described in detail in the following sections.

B. Sampling rules
In this section, based on (9), we construct the specific MCMC algorithm for computing Pr(k|y). We use (·) i to denote the value at the ith iteration and (·) to denote the proposed value. Firstly, we specify a recipe for proposals to either switch between different hypotheses ("birth"/ "death") or explore parameter space ("update") under hypothesis H 1 (or k = 1). That is, we build a finite state machine (Fig. (1)) of which the output returns the instruction for the next move. Specifically, we introduce a new random Boolean variable s i . It evolves as a Markov chain with a given transition matrix Γ, as defined in Table (IV). The value of s i combined with the previous k value k i−1 gives the instruction to either jump between H 0 and H 1 or search within H 1 . Algorithms 2-7 in Appendix B describe the implementation of MCMC as well as "birth", "death" and "update" in more detail. From (9), the 1. Diagram of the finite state machine to generate MCMC proposals acceptance ratio for traversing between "birth", "death" and "update" proposals at the ith iteration are, respectively, where W f and W ã are evaluated at the proposed (perhaps rejected) sample values x andã at the current iteration, according to (11) A update = min(1, r update ). (30c)

C. Knot-interpolation Scheme to Reduce Parameter Dimension
High dimensionality of the parameter space may cause convergence problems for MCMC [21]. To alleviate this problem. for both the HMM and MCMC approaches, the time series y (of length N and sampled at time intervals of length T ) is partitioned into consecutive blocks of equal time duration, T b , though the chosen lengths of these blocks will differ between the two approaches. The endpoints of these blocks, at intervals of time length T b are called the knots. The number of blocks is N b , so that N T = N b T b . As described in Section III, the HMM-based method requires calculation of the DFT of each block to produce the emission probabilities, whereas in the MCMC approach a quadratic interpolation between the knots is used to approximate the time series and reduce dimensionality, as we only sample x's at the knots. The interpolation between the knots is performed in the following way where M = N/N b and = 0, . . . , M − 1 denote time epochs within the mth block and T b = M T is the time duration of one block, as mentioned above. Notice that the resulting interpolated path {φ(t n ),f (t n )} n=1,...,N is of length N . The continuity of the interpolated path is ensured by solving (31) for b m 1 and b m 2 using the values at the knots. Denoted by the function "Interp", this procedure is described in Appendix B Algorithm 1. The dynamics between the knots is identical to the dynamics in (5) From now we focus on generating the sequence of values, x N b , of the path at the knots, in "birth" and "update" scenarios. For simplicity of notation, we indicate the elements of For the MCMC "birth" procedure, we generate a random path x N b with length N b and individual elements The noise term w(j) is calculated in the same way as in (6), with T replaced by T b . Here we define It is important in the MCMC algorithm to formulate a good update proposal that specifies the probability of moving to a new point in parameter space -a stochastic path x N b , given previous location x i−1 N b . In this work, we developed a unique approach to this problem, to be described here. The desired new path x N b should possess the following properties: (i) it should obey the state dynamics model in (32).
(ii) it should be "close" to the previous path to avoid a large chance of rejection. This is especially critical when the samples are near the peak of the posterior probability density function; (iii) the distance between paths should be controllable, to facilitate a flexible sampling scheme such as, for example, to be able to increase the convergence rate or to escape from local extrema. Consequently, we want to control the Euclidean distance To achieve this, we expand (32) as  . . .
with w(j) ∼ N (0, C), F = 1 T b 0 1 . To generate a new path, each w(j) is replaces by a new random vector Lq(j), where LL T = C is the Cholesky decomposition, and q(j) ∼ N (0, I 2 ) is a bivariate normal vector with unit covariance matrix. Now (34) for the new path becomes and only perturb the random noise sequence q. Under the requirement (i) above, the mean and variance of the perturbed noise sequence need to be retained. Specifically, the steps for perturbing the noise sequence at the ith iteration are: given a previous path x i−1 N b , first extract the random part: with q(0) = [0, 0] T and random vector q(j) with zero mean and unit covariance cov(q(j)) = I 2 for j = 1, . . . , N b − 1. The perturbation sequence q is independent of q i−1 , that is E[q (q i−1 ) T ] = 0. We introduce a parameter β and compute the new noise sequence as q i = q i−1 cos β + q sin β. This perturbation scheme ensures that the new noise sequence has the required mean and variance, because of the identity cos 2 β + sin 2 β = 1. It is also apparent that cos β is the correlation coefficient between each old q i−1 (j) and new q i (j) for j = 1, . . . N b − 1, thus the tunable parameter β helps control the "closeness" between the old and the new sequence, i.e., for smaller β, the correlation is greater, hence the perturbation is smaller.
This scheme has one problem related to the lower triangular shape of the matrix M 2 . The noise in the new path M 2 q i tends to accumulate along the path, meaning that ||cov(q i (n))|| > ||cov(q i (m))|| for n > m as the iteration number i increases. Our ad hoc solution to this problem is the following. Instead of setting x N b (1) = x i−1 N b (1), we start from a random position l ∈ {1, . . . , N b } and let x N b (l) = x i−1 N b (l); the sequence is then split into two, with one part propagating backward all the way to x N b (1) and the other part propagating forward until x N b (N b ). This is achieved by replacing matrices M 1 and M 2 by M 1 and M 2 as follows: (36) Notice that when l = 1, we recover M 1 = M 1 and M 2 = M 2 . This still causes noise accumulation in x N b for elements away from l in both directions, but the random choice of l at each iteration mitigates the effect in the long run.
The correlation between previous and proposed paths is . Equation (37) indicates how the correlation of previous and proposed paths can be tuned by β.
For completeness, the distance between neighbouring paths in the L 2 norm is bounded by where σ M 2 is the largest singular value of M 2 . The pseudocode of the method is provided in Appendix B, Algorithm 7.

B. Description of MCMC Parameters
Prior distributions of unknown parameters and specific values are given in Table (II) and Table (III) respectively. The number of blocks, N b , is chosen from the set {5, 20, 200, 500, 1000} to investigate how it affects the runtime and detection performance.
The parameters used in the implementation of the algorithms are given in Table (IV). The factor β = 0.1 is chosen experimentally to ensure a reasonable MCMC acceptance rate.   In Section IV and V, we show that, to compute the posterior Pr(k|y), we have to sample from Pr(k = 1,ã, x|y) as well as Pr(k = 0|y). Hence, as a part of the detection algorithm, we approximate the MCMC-posterior for the state path x as well, which is achieved by simply collecting all of the sampled state paths under k = 1. The MCMC-MAP estimate is achieved by calculating the mode of these paths. Since the HMM algorithm also estimates x, it is of interest to compare these two algorithms in terms of the estimated paths.
The MCMC-posterior for k: Pr(k|y) is approximated by counting the number of occurrences of k = 1 and k = 0 respectively. A Neyman-Pearson type detector is constructed by comparing Pr(k|y) with a pre-defined threshold to determine detection.
In the following sections, we compute the MCMC-posterior distributions for x and k, respectively. Related performance criteria like estimation error and ROC curves are also presented.

D. Rationale for choosing the number of blocks (knots)
In this section we discuss the reasoning behind the differences in the selection of the number of blocks for the HMM and MCMC methods.

1) For HMM
In HMM, within one block, we perform an M -point DFT, resulting in frequency bins of width ∆f = U/M , where U and M , as in MCMC, denote the bandwidth of the signal and block size, respectively. The number of blocks N b , or equivalently, M is chosen such that where T b = T M is the time duration within one block and κ is restricted to be a small number. With f (t) undergoing the dynamics in (3), the integral in (39) is where B(t) denotes the Wiener process at time t. We set the frequency bin width to be twice the standard deviation in (40), i.e., ∆f = 2γ √ T b . Combining the relation that ∆f = U/M and setting γ = 1 × 10 −4 (Table. (I)), we finally choose N b = N/M = 5 for the HMM in the following experiments.
2) For MCMC To implement our MCMC algorithm, N b needs to be chosen beforehand. The optimum N b , could be computed by maximizing the likelihood ratio or deflection ratio, as in [22]. In this section we describe an alternative, intuitive reasoning behind our choice of N b for the MCMC.
Consider the two matrices defined in (35), one of which is the full 2N × 2N matrix M 2 and the other is the By design, M b 2 is constructed from M 2 by keeping the rows and columns at the knots and removing the rest. The difference in "information" between these two matrices is captured by the difference between the information theoretic "Shannon entropy" H(N ) and H(N b ) of the singular values of M 2 and M b 2 respectively. For any M b 2 we write where The entropy H(N ) is computed for the full matrix M 2 . This entropy is effectively the same as the von Neumann entropy [23] in the context of symmetric matrices. Accordingly, for a chosen set of parameters, we compute the entropy for a given N b relative to H(N ) of the full matrix. The result is shown on the left hand panel in Fig. (2).
To strengthen the entropy claim, we also directly evaluate exp[η(x syn )] ((13d)). This is the dominant term contributing to the Bayesian evidence ((9)-(11)). Undoubtedly, if MCMC converges, there will be a high density of MCMC samples near the posterior probability, i.e., regions with larger exp[η(x syn )]. By probing how exp[η(x syn )] varies as N b changes, we have a better understanding of the effect of the choice of N b on the accuracy of the posterior distribution estimation. In particular, while decreasing N b , we measure the difference introduced by using exp[η(x)] compared to exp[η(x syn )], wherex again denotes the interpolated path from N b knots extracted from x syn . We first calculate the true value using noisy synthetic data y and D syn = exp(j2πφ syn ), where φ syn stands for the synthetic phase path. This value provides an upper bound. Then we compute whereD = exp(j2πφ). Hereφ is related to φ syn bỹ  Fig. (2). Observe that for e.g., N b = 20 (red circle), almost 90% of the information is retained in the reduced M b 2 with < 10 −3 . We believe that computing the entropy in (41) provides us an alternative way to select N b . However, further investigation is required to justify the claim. In the following sections, we show in simulations that the choices of, for example, N b = 20, maintains MCMC performance in both estimation and detection, while saving computational resources significantly. This is reflected in Table (V), where MCMC runtime averages over 10 3 experiments for different N b 's with different number of iterations are reported, specifically for N iteration = 5 × 10 3 , 10 4 , 5 × 10 4 , 10 5 . The runtimes are computed on a 2.4GHz central processing unit (CPU).

E. Estimation performance
Throughout this and the next section, we fix N b = 5 for the HMM (explained in Section VI-D1), and vary N b for the MCMC.
1) MCMC-posterior for the state path: Pr(x|y) In Fig. (3), a cross-section of the MCMC-posterior Pr(x|y) at time instant t 5 for SNR = 0.15 and N b = 20 is shown. The performance at other epochs is similar.
Trace plots and histograms for f (t 5 ) and φ(t 5 ), respectively, are shown. By definition, trace plots show the sampled values of a parameter over time. They reflect whether and how fast MCMC converges in distribution. Starting from a random initial point, MCMC converges after about 10 3 iterations. This, so called "burn in" period is seen in the top and third panels in Fig. (3), compressed into the left edge of the plots. After the "burn in" period, the samples drawn from the MCMC have values centered around the true value, with bias less than 0.0005 Hz (0.05 percent of the bandwidth) and 0.02 rad, and standard deviation less than 0.002 Hz and 0.5 rad for f (t 5 ) and φ(t 5 ), respectively. This conclusion can also be drawn from the histograms on the second and fourth panels in Fig. (3), the shapes of which, by definition, resemble the true posterior distributions Pr(f (t 5 )|y) and Pr(φ(t 5 )|y).

2) MCMC-MAP estimator
A typical realization of the MCMC-MAP estimates of frequency paths f * MCMC for N b = 5, 20, 200 and 1000, compared with the HMM estimated frequency path is displayed in Fig. (4). Here we can see, that the dynamics of the wandering frequency is captured even by N b = 5 knots.
In  .

2) Receiver operating characteristic
Receiver operating characteristic (ROC) curves for an omniscient, 3 The MCMC detector and the HMM detector are shown in Fig. (7), (8) and (9), computed over 10 5 simulation runs at SNR = 0.15, 0.1 and 0.2 respectively for synthetic signals with frequencies wandering according to (5) and (6), with γ syn = 10 −4 Hz sec −1/2 . 3 The omniscient detector is based on the assumption that the true path xsyn is known. It provides an upper bound for the probability of detection.
The upper panels are for the MCMC algorithm with parameters γ = 10 −4 Hz sec −1/2 , and the lower panels are for γ = 10 −5 Hz sec −1/2 . The mismatch in γ and γ syn appears to cause degradation in the MCMC detector performance. This sensitivity to γ is an unwanted effect and requires further investigation.
At relatively high SNR = 0.2, the plots show that the MCMC detector outperforms the HMM detector across the whole Pf range. For SNR = 0.1, the detection rate for the MCMC detector, although higher than the HMM detector, is quite low, i.e., around 0.17 at Pf = 10 −2 . At SNR = 0.15 both the MCMC detector and the HMM detector demonstrate better performance than when the SNR = 0.1 with the MCMC outperforming the HMM. In particular, for a false alarm probability Pf = 10 −2 , the detection probability Pd of the MCMC detector is around 0.8 with matched γ's, while dropping below 0.7 with mismatched γ's. In Fig. (7), when SNR = 0.15, the MCMC detector outperforms the HMM detector across the Pf range greater than 10 −2 for all choices of N b and γ. The HMM performs "detection after estimation", i.e., it calculates the most likely frequency path first, then compares the statistics of this path with the statistics of the noise, while the detection is directly embedded in the design of the MCMC detector. As a result, the HMM's detection performance is heavily dependent on the accuracy of estimation, as opposed to the MCMC detector, where estimation becomes a consequence of detection. The degradation of performance at low SNR, known as the "threshold effect" is a common problem in nonlinear estimation. Even though we are not able to derive it mathematically, we infer from the plots that the threshold effect for MCMC detector happens between SNR = 0.15 and SNR = 0.1. Fig. (7), (8) and (9) also show that N b has little effect on the overall detection performance of the MCMC detector. The red, yellow and purple curves overlap each other, especially when Pf < 10 −1 .
In Fig. (10), we fix the false alarm probability Pf = 10 −2 and plot Pd versus SNR varying from 0.1 to 0.25 for the MCMC detector with N b = 20 and the HMM detector respectively. Controlling the false alarm probability to be no more than 10 −2 is typically tolerated in gravitational wave astrophysics applications [24]. Similarly, the upper panel and lower panels show the MCMC detector's performance without and with mismatch in γ, respectively. In both plots the MCMC detector has higher detection probability than the HMM detector, even with γ mismatched. For example when the SNR = 0.15, the MCMC detector outperforms the HMM detector with 25% higher detection probability.

VII. CONCLUSION
In this work a Bayesian posterior density for detecting sinusoidal signals with wandering frequency in noise is derived and computed. The method is based on MCMC techniques. As part of the algorithm, our method provides computation of the posterior density of the signal parameters. For efficient computation of this density we propose a knot-interpolating technique, where we sample the signal parameters at the coarsely spaced time knots, while the rest of the signal is recovered by the interpolation between the knots. A procedure for selecting a reasonable number N b of knots, given the signal dynamics is presented and justified. This procedure relies on the computation of the (von Neumann) entropy of the dynamics matrices. Although we cannot claim its optimality, we illustrate by experiments how the procedure provides a balance between the runtime and detection and estimation accuracy.
In addition, we have developed an algorithm within MCMC for proposing new state paths that are arbitrarily close to the previous path. This method ensures dense selection of MCMC samples for highly structured multi-dimensional vectors. The full description of the algorithm is provided.
The performance of the MCMC is evaluated in terms of mean estimation errors and ROC curves and compared with the performance of the HMM-based Viterbi algorithm. We demonstrate that our algorithm presents both higher detection rates and greater estimation accuracy in all of the experiments conducted. In particular, the simulation results show that our method outperforms the HMM in estimation accuracy by around 5% and improves detection rate by up to 25%.