Bayesian Inference for Stochastic Multipath Radio Channel Models

Stochastic radio channel models based on underlying point processes of multipath components (MPCs) have been studied intensively since the seminal papers of Turin and Saleh–Valenzuela (SV). Despite this, inference regarding parameters of these models has remained a major challenge. Current methods typically have a somewhat ad hoc flavor involving a multitude of steps requiring user specification of tuning parameters. In this article, we propose to instead adopt the principled framework of Bayesian inference to conduct inference for the SV model. The posterior distribution is not analytically tractable and we therefore compute approximations of the posterior using Markov chain Monte Carlo (MCMC) methods specific to point processes. To demonstrate the flexibility of our approach, we additionally propose a new multipath model and apply our inference method to it. The resulting inference methodology is computationally demanding and our successful implementation relies critically on our novel MPC updates within the MCMC sampler. We demonstrate the usefulness of our approach on simulated and real radio channel data.


Bayesian Inference for Stochastic Multipath Radio Channel Models Christian Hirsch , Ayush Bharti , Troels Pedersen , and Rasmus Waagepetersen
Abstract-Stochastic radio channel models based on underlying point processes of multipath components (MPCs) have been studied intensively since the seminal papers of Turin and Saleh-Valenzuela (SV). Despite this, inference regarding parameters of these models has remained a major challenge. Current methods typically have a somewhat ad hoc flavor involving a multitude of steps requiring user specification of tuning parameters. In this article, we propose to instead adopt the principled framework of Bayesian inference to conduct inference for the SV model. The posterior distribution is not analytically tractable and we therefore compute approximations of the posterior using Markov chain Monte Carlo (MCMC) methods specific to point processes. To demonstrate the flexibility of our approach, we additionally propose a new multipath model and apply our inference method to it. The resulting inference methodology is computationally demanding and our successful implementation relies critically on our novel MPC updates within the MCMC sampler. We demonstrate the usefulness of our approach on simulated and real radio channel data.

I. INTRODUCTION
S TOCHASTIC multipath models, such as the one proposed by Turin et al. [1] and Saleh and Valenzuela [2], are widely used for simulating radio channels due to their simple formulation and low computational cost. In such a model, the collection of delays and gains of the multipath components (MPCs) is assumed to be a realization of a marked point process [3], [4], with the received signal being a superposition of the MPCs and additive random noise. The parameters of this underlying point process constitute the parameters of the stochastic multipath model. By adjusting these parameters, the multipath model can mimic different propagation scenarios. Usually, suitable values of the model parameters are estimated from channel measurements collected from a particular Manuscript  environment. Despite the simple formulation of stochastic multipath models, this is a nontrivial task since the likelihood function of the multipath model is analytically intractable. The computational problems related to the likelihood arise because the MPCs are unobservable due to the finite measurement bandwidth and additive noise. Currently, parameters of stochastic multipath models are usually estimated using a multistep procedure as in [1], [2], [5], [6], [7], [8], and [9]. First, the MPCs are extracted from channel measurements using high-resolution algorithms such as multiple signal classification (MUSIC), RiMAX, and space alternating generalized expectation-maximization (SAGE), among others; see [10,Ch. 5] for an overview. In the case of a cluster-based model like the Saleh-Valenzuela (SV) model, the next step involves clustering the extracted MPCs. This is achieved either via manual clustering or through the use of automated clustering algorithms such as [11], [12], and [13]. As noted recently [14], [15], the multistep approach leads to a number of concerns. Despite being widespread, this approach is not trivial as it involves implementing sophisticated high-resolution and clustering algorithms. These algorithms involve certain heuristic choices and assumptions. The performance of these algorithms is sensitive to these choices, and their assumptions may conflict. A major drawback of this approach is that the performance of the overall estimator is difficult to assess. As a result, error bounds on the parameter estimates are hardly reported in the literature, and the uncertainty of the parameter settings used in, e.g., standardized channel models remains unknown.
To overcome these challenges, a host of estimation methods have been proposed recently that circumvent the need for extracting MPCs and clustering [14], [15], [16], [17], [18], [19], [20], [21]. In [16] and [18], a method of moments estimator is proposed for the SV model [2] and the Turin model [1], respectively. In [19], a multilayer perceptron is used to estimate parameters in the SV model. The methods in [14], [20], and [21] are based on approximate Bayesian computation (ABC) [22], which is a likelihood-free inference method that relies on the summary statistics of the channel measurements to approximate the likelihood function. It is possible to use automatically generated summary statistics, as shown in [21], although these did not perform as well as handcrafted summaries. In [15], a kernel-based distance metric is used in the ABC algorithm, which alleviates the need for summary statistics at the cost of choosing a kernel. Although some good choices for summaries and kernels have been proposed, these choices are nontrivial and affect the accuracy of the method. Moreover, the accuracy of the ABC methods depends on two approximations: the Monte Carlo approximation pertaining to all the sampling methods and the posterior approximation depending on the choice of summaries or kernel functions. The magnitude of the latter effect can be assessed in simulation studies, but is in general unknown. The ABC methods are easily applicable to a wide range of stochastic models since they do not depend on the particular mathematical structure of the model, which is clearly an appealing feature to the practitioners. However, a common feature of the methods in [14], [15], [16], [17], [18], [19], [20], and [21] is that the parameters of the model are estimated without obtaining the MPCs, e.g., the delays and gains, for the particular set of measurement data. While multipath extraction is in some cases not interesting, it does provide insight into the particular set of measurement data, which is useful for some applications.
In this article, we propose to carry out approximate likelihood-based inference for the model parameters. As in [23], we apply Markov chain Monte Carlo (MCMC) techniques to handle the computational problems arising from the analytically intractable likelihood function. In the context of the Turin model, [1], [23] suggested to obtain estimates by maximizing a Monte Carlo estimate of the likelihood function, integrating out the unobserved MPCs.
In the current article, in the context of the more complex SV model [2], we instead adopt a Bayesian approach and use MCMC to approximate the analytically intractable posterior distribution. Unlike [14], [15], [16], [17], [18], [19], [20], [21], our method is general in the sense that it is not tied to a specific multipath channel model. Moreover, the proposed method is not dependent on the choice of summaries and kernels as the ABC method of [15], thus alleviating the posterior approximation arising due to them. Recasting the SV model in the framework of Bayesian hierarchical modeling further enables us to circumvent intractable likelihood computations by augmenting the posterior distribution with the MPCs that comprise an intermediate level of the model. This means that unlike the ABC method, we are also able to infer number and locations of MPCs.
In Bayesian analysis, inference is notoriously difficult when mixtures of distributions are involved and the clustered structure of the SV model imposes similar challenges. A crucial technical contribution is therefore that we introduce novel MCMC so-called split-merge updates that significantly improve the efficiency of the Monte Carlo computations relative to the MCMC scheme in [23].
Our new methodology enables us to assess and criticize the existing models. In this article, we demonstrate the shortcomings of the SV model and propose a new model that may be a promising candidate for providing a better fit to real indoor radio channel data.
The rest of the article is organized as follows. Section II presents the signal model of radio channel measurements, along with the description of the SV model. The proposed modified version of the SV model is presented in Section II-C. The proposed Bayesian inference method and the MCMC sampling procedure are described in Sections III and IV, respectively. In Section V, we conduct a simulation study to assess the performance of the proposed inference method. The method is applied to measured data in Section VI, and the concluding remarks are given in Section VII.

II. MODEL FOR RADIO CHANNEL DATA A. Signal Model
As in [23], we consider frequency-domain measurements of a single-input, single-output linear, time-invariant radio channel in the band [−B/2, B/2] obtained by a vector network analyzer. In each measurement run, the transfer function is sampled at K equispaced frequencies f k = (k − 1) f , k = 1, . . . , K , where f = B/(K − 1) is the frequency spacing between two measurement points, giving the period τ max = 1/ f of the time-domain signal. The measurement data are modeled as a random vector Y = (Y 1 , . . . , Y K ) with entries where H k is the transfer function sampled at the kth frequency f k , and N k denotes the measurement noise. The noise variables N 1 , . . . , N K are assumed to be independent and identically distributed circular symmetric zero-mean Gaussian each with variance σ 2 . We denote a realization of the measurement vector Y by y = (y 1 , . . . , y K ). Repeating the measurements M times yields the sequence of independent realizations y (1) , . . . , y (M) . Taking the discrete-frequency, continuous-time inverse Fourier transform of the measurement vector gives the time-domain measurement (with a misuse of notation) where i denotes the imaginary unit. The channel transfer function of a multipath model is of the form where Z m = {(τ, α τ )} τ ∈Z constitutes a marked point process on R + × C. The point process Z contains the propagation time delays τ . The mark associated with a delay τ is the complex-valued gain α τ . The support of the point process Z is the interval I = [τ 0 , τ max ], where τ 0 is the delay of the line-of-sight (LOS) path.
For the Turin model [1], Z is assumed to be an inhomogeneous Poisson process. That is, delays τ essentially appear independently of each other. This may not be realistic as clusters of delays may arise where each cluster is due to reflections from a particular environmental element. Saleh and Valenzuela [2] instead suggested a more elaborate model for Z , which according to [3] and [4] can be viewed as a clustered point process as detailed in the following.

B. SV Model
We let C = {t 0 , t 1 , . . .} denote the arrival times of clusters of MPCs. For the SV model, t 0 = τ 0 and the remaining arrival times {t 1 , t 2 , . . .} form a homogeneous Poisson process of intensity 1 λ 0 = exp(κ 0 ) on [τ 0 , ∞). To each t i , i ⩾ 0, a cluster X t i = {τ i,1 , τ i,2 , . . .} is associated, where conditional on C, X t i , i ⩾ 0, are independent and each X t i is a Poisson process on [t i , ∞) of intensity λ 1 = exp(κ 1 ) > 0. Let X = ∪ i⩾0 X t i . Then Z = C ∪ X and Z m = C m ∪ X m where C m and X m are the marked point processes obtained by attaching gains α t i or α τ i,k as marks to each point in C or X . Following terminology in spatial statistics, we will often refer to points in C and X as "parents" and "offspring," respectively.
Conditional on the delays Z , the gains are assumed to be independent zero-mean complex Gaussian random variables with a phase that is uniform on [0, 2π ). Thus, the magnitudes of the gains are Rayleigh distributed. Following Saleh and Valenzuela [2], we assume that their second moments satisfy: for some parameters γ 0 , γ 1 , γ 2 > 0. As noted in [3], the cluster process X is an example of a Cox point process [24]. This is because conditional on C, X is a Poisson process with intensity function where k t (τ ) = 1{τ ⩾ t}. Here, the indicator function 1{A} is equal to 1 if event A holds. Hence, evaluation of the conditional distribution of X given C does not require information regarding the parent-offspring relationships. This simplifies greatly the construction in Section IV of an MCMC sampler for simulation of C and X given the radio channel data Y. In particular, we can modify the MCMC sampler from [23] to sample X given Y and C. However, with γ 1 ̸ = γ 2 , the formulation of the gain distribution in (4) presents an obstacle for exploiting the aforementioned simplification. Evidently, unless γ 1 = γ 2 , the evaluation of the right expression in (4) requires knowledge of the parent t i for the offspring τ i,k . In the interest of computational feasibility, we have chosen to restrict attention to the case γ 1 = γ 2 . In the data example in Section VI, we in fact obtain a very good fit assuming γ 1 = γ 2 indicating that this is not a severe restriction for the considered data.

C. Generalizations and Modifications of the SV Model
The physical interpretation of the SV model is not obvious and is still being discussed in the literature [25], [26]. In particular, the parent point process C is not necessarily Poisson and may be generalized in numerous ways. For example, C could be replaced by a regular point process to ensure a desired degree of separation between the parent times in C. In this article, we consider the case where C is the so-called Strauss process and coin the resulting model the Strauss-SV model.
For a specific parent configuration c, the Strauss point process density is of the form (see [24]) 1 The term "intensity" is widely used in the theory of spatial point processes. In propagation studies, the term "arrival rate" is often used instead.
where n(c) is the number of parents in c, s R (c) is the number of pairs of parents within distance R > 0 from each other, 0 ⩽ ψ ⩽ 1 are the parameters, and Z (κ 0 , ψ, R) is a normalizing constant. Choosing a small ψ means that there is a low probability of seeing parents close to each other. With ψ = 0, a so-called hard core process is obtained where parents can never be closer than R. A Poisson process of intensity exp(κ 0 ) is obtained with ψ = 1.
A complicating feature of the Strauss model is that the normalizing constant Z (κ 0 , ψ, R) is an analytically intractable function of κ 0 , ψ, and R. For any fixed (κ 0 , ψ, R), it can be evaluated using the MCMC methods; see [24]. However, this is computationally demanding, and in our Bayesian setting we would need to recompute Z (κ 0 , ψ, R) each time an update is proposed for κ 0 , ψ, or R. This presents a major computational problem; see also the discussion in Section VII. However, for any fixed (κ 0 , ψ, R), the simulations of the Strauss process can easily be obtained by a simple birth-death MCMC sampler [24] that does not require knowledge of Z (κ 0 , ψ, R). Hence, simulation of Strauss-SV does not present a problem.
Furthermore, in (5), the kernel function k t (·) can be replaced by an arbitrary positive function instead of the indicator function used for the SV model. For the SV model, X is an infinite point process. However, due to fast decay of the gains, gains are negligible when their delays are far from the LOS By similar considerations, we also restrict C to the finite interval [τ 0 , τ max ]. Then X becomes a finite point process on I w = [τ 0 , τ max + w] so that it can be stored on a computer and is amenable to MCMC computations. Moreover, for single-point MCMC updating of the parents C, using a k t of bounded support simplifies computations because removing or inserting a point τ in C only affects a finite number of MPCs in X .
According to our simulation studies, once the kernel width w is chosen sufficiently large (relative to the decay of the gains), the results obtained are representative for the original SV model.

III. BAYESIAN INFERENCE FOR THE SV MODEL
To calibrate the SV model parameters θ = (γ 0 , γ 1 , κ 0 , κ 1 , σ 2 ) to data, we use Bayesian inference assigning a prior density p(θ) to θ . Alternatively, one could use maximum likelihood estimation; see [23]. However, there are decisive benefits of Bayesian inference in the context of the SV model as follows.
1) Prior Information: The most obvious advantage of Bayesian inference is of course that it allows use of prior information e.g., obtained from physical considerations or previous experience with radio channel models in similar settings. Prior information is especially useful in the context of complex stochastic models such as the SV model.
including the parents c m and offspring x m in Z m is analytically tractable; see Section III-A. We can therefore explore the joint

A. Augmented Likelihood
In the following, we derive p( y, c m , The first factor on the right-hand side is the Gaussian probability density for the channel response data given the MPCs, the second factor is the conditional density of the offspring MPCs x m given c m and the parameters κ 0 and γ = (γ 0 , γ 1 ), and the third factor is the density of c m given κ 0 and γ .
More precisely, the complex Gaussian density is given by where we recall that σ 2 > 0 is the noise variance, and H k (z m ) is the complex-valued channel transfer function (3) evaluated at the frequency f k . Second, technically, the densities p(x m | c m , κ) and p(c m | γ ) are to be understood as densities of the offspring and parent MPCs with respect to unit-intensity Poisson processes on I [24, Sec. 6.1]. Since the parents C m form an independently marked homogeneous Poisson point process with intensity exp(κ 0 ), we deduce from [24,Sec. 3.3] that its density is given by where f (α τ | γ ) is the complex Gaussian density for the channel gains, #c is the cardinality of the set c, and |I | is the length of the interval I .
Concerning the density of the offspring MPC, we use that conditional on C m , X is a Poisson point process of intensity given in (5). Hence, the density of X m becomes with κ 1 entering through (·); see (5).

B. Strauss-SV Model
For the Strauss-Valenzuela model, complications arise from the intractable normalizing constant of the Strauss point process density. The problem of conducting Bayesian inference in cases where the prior or likelihood involves an intractable normalizing constant has been studied in several papers, e.g., [27], [28], [29], [30]; see also the review [31]. Unfortunately, the methods proposed tend to add considerable additional computational complexity to an already computationally challenging problem. However, recent results [32] in the context of mixture models suggest that this approach may be worth exploring in future work on channel data inference.
For simplicity, we do not attempt efficient sampling of the Strauss process at this stage. Instead, leaving such refinements to future work, we assume that the user is able to specify fixed sensible values of κ 0 , ψ, and R reflecting prior beliefs of the parent MPCs. For the remaining parameters, we proceed as for the SV model.

IV. MCMC SAMPLING OF THE POSTERIOR
The posterior p(θ , c m , x m | y) is a high-dimensional and complex density and only known up to a constant of proportionality. We therefore apply the MCMC methods to sample from the posterior. Briefly, an MCMC algorithm generates an ergodic Markov chain whose equilibrium distribution is the target distribution of interest-in our case the posterior density p(θ , c m , x m | y). After a suitable burn-in, the states of the Markov chain represent a correlated sample from the target distribution. We say that the Markov chain is mixing fast if the correlation between consecutive states is small. The Markov chain is constructed using the Metropolis-Hastings scheme where in each step of the Markov chain, updates are suggested for one or more components of the posterior. These updates are then accepted or rejected according to the Metropolis-Hastings ratio.
A complicating feature of our posterior is that the point process components c m and x m are not of fixed dimension. Therefore, specialized Metropolis-Hastings updates are needed, as outlined in [24] and implemented for radio channel data in [23]. More specifically, [23] used a birth-deathmove algorithm to sample the conditional distribution of MPCs given radio channel data. We refine this algorithm by adding split-merge steps and also extend the algorithm by adding updates for θ . Below we give a brief summary of the different types of updates used. We refer to [23] for a detailed description of MCMC for sampling MPCs, where several MCMC chains were run using a parallel tempering scheme to improve mixing. However, the introduction of split-merge steps improves mixing considerably so that the further computational task of using parallel tempering can be avoided.

A. Updates of MPCs
Given a current state (θ , c m , x m ), we propose a new state for the MPC components using a birth/death proposal, a move, or a split/merge proposal. These three proposals are selected each with probability 1/3. These probabilities were selected after a few pilot runs. We believe the performance of the MCMC algorithm is fairly robust to the choice of these probabilities as long as extreme values close to 0% or 100% are avoided. Henceforth, we describe in detail how to proceed for updating the MPC components x m . For the parent MPCs, we proceed similarly, except that we do not perform a split/merge step.
1) Birth/Death: In case of a birth/death step, with probability a half, we propose a birth, i.e., to add a new MPC to x m . The delay τ of the proposed new MPC is sampled on I from a density proportional to the intensity (5). Next the associated gain α τ is sampled from the symmetric Gaussian density f (α τ | γ ). Otherwise, we propose a death, where a current MPC is selected at random for removal.
2) Move: Relying exclusively on birth/death steps leads to a very slowly mixing MCMC sampler. To move from an MPC configuration to another which differs only by a minor modification of a single MPC, this MPC would have to be first removed and then reinserted slightly changed in a subsequent birth step. If the configuration between these two steps is highly unlikely a posteriori, the MCMC sampler will rarely explore this option. We therefore introduce a move step that simply proposes to update the delay or path gain of an MPC without changing the dimensionality.
3) Split/Merge: An MCMC sampler relying solely on birth/death and move steps may get stuck in configurations where two MPCs (τ, α τ ) and (τ ′ , α τ ′ ), with τ close to τ ′ , represent the contribution of a single underlying MPC in the true unobserved MPC configuration. For any k = 1, 2, . . . , If the right-hand side represents well the contribution of a true MPC, then removal of any of (τ, α τ ) or (τ ′ , α τ ′ ) will be highly unlikely. Hence, the MCMC algorithm will not be able to explore the configuration where a single MPC replaces the two current MPCs.
To deal with this issue, we introduce a split/merge step. The split-merge steps have previously been considered in the context of Bayesian inference for mixture models or cluster point processes [33], [34], [35] where groups of observations are split or merged according to the underlying mixture component memberships. Our split-merge step is quite different since it pertains splitting one MPC into two MPCs or merging two close MPCs into one. More precisely, we allow an MPC (τ, α τ ) to merge with its nearest neighbor (τ ′ , α τ ′ ) so as to form a new MPC with magnitude given by |α τ | + |α τ ′ | and located at (τ + τ ′ )/2. We found it most effective to sample the phase of the new component at random. Note that from a conceptual point of view, it would have also been possible to define a Metropolis-Hasting step by updating the magnitude as |α τ + α τ ′ |. However, in the implementation the chosen approach made it easier to treat the magnitude and the phase separately.
To ensure reversibility of the MCMC sampler, the merge step is complemented by a split step, where a single MPC can split into two, where the magnitudes of the resulting path gains add up to the magnitude of the path gain of the original MPC and the delays of the two new MPCs are generated in a neighborhood of the original MPC. In case of a split-merge proposal, a split step or a merge step is chosen at random each with probability of half. The details of the split-merge Metropolis-Hastings ratios are given in the Supplementary, along with a block diagram of the structure of the MPC MCMC updating scheme.

B. Remaining Details of the MCMC Sampler
We use simple random walk Metropolis updates for the components of θ . With the model specification (4), the parameters γ 0 and γ 1 tend to be strongly negatively correlated which hampers the mixing of the MCMC sampler. To mitigate this, we found it helpful to reparameterize Replacing the original parameter γ 0 bȳ where t m is the center of the observation interval. This reparameterization is akin to centering the covariate of a linear regression.
Each step of the MCMC sampler comprises an update of the offspring MPC (using birth-death, move, or split-merge). The Metropolis-Hastings steps updating the parent MPCs are expensive since they entail recomputing the likelihood of both the process of parent MPCs and the process of offspring MPCs. Therefore, we update c m only each 10th step in the sampler. The same goes for θ since it does not seem useful to update θ after each minor change in the point process state x m ∪ c m due to an MPC update. Hence, θ is only updated each 25th step (updating all the components of θ ).
The MCMC algorithm for the Strauss-SV model follows the specification above, except, following Section III-B, which we fix k 0 , ψ, and R at prescribed values.

C. Setting Prior Distributions
We now elaborate on the elicitation of priors. For each of the parameters κ 0 , κ 1 ,γ 0 , γ 1 , and σ 2 , we use a uniform prior on a bounded interval with endpoints informed by the measurement settings in [36]. The time window for the data is of length 75. We expect to see on average at least 0.5 and at most 80 parents for the given time window. Similarly, the range for arrival rate of offspring is set based on the time resolution in the data, i.e., we can only observe as many offspring points as the number of time points in the window. The ranges ofγ 0 and γ 1 are set to reflect reasonable behaviors of the decay of the gains. Forγ 0 , the prior range corresponds to a dynamic range of [−108, −32 dB] for the intercept of the power-delay spectrum, encompassing that of the measurement data shown in Fig. 1. The range of γ 1 corresponds to a drop of around 65 dB (data in Fig. 1 has a drop of around 40 dB) to a drop of 0 dB over the measurement interval. Finally, the range for σ is set by inspecting the noise level in the power-delay profile of the data.

V. SIMULATION STUDY
From a strictly Bayesian point of view, the posterior distribution obtained given the data at hand and an appropriately elicited prior is the (self-contained) solution to the statistical inference problem. Following this line of thought, frequentist properties of the inference procedure are irrelevant. Nevertheless, from a more pragmatic point of view, users may not be so sure about the priors chosen and often posterior means are used as computationally convenient point estimates of unknown parameters. We have therefore conducted a simulation study to explore in our setting the performance of MCMC approximate posterior means as point estimates.
To that end, we consider the SV model on the time interval I = [25, 100 ns]. We generate 100 mock datasets with on average ten cluster parents each giving rise to on average four offspring MPCs distributed according to a rectangular kernel with bandwidth 15. Furthermore, we choose the Rayleigh parametersγ 0 = −22, γ 1 = −0.1, and a thermal noise with log-variance ln(σ 2 ) = −20. We set M = 1 and the number of frequency points to K = 750 as per [36], and then perform Bayesian inference with 40 000 000 iterations of the MCMC sampler from Section III. To avoid the burn-in of the chain, we discard the first 20 000 000 iterations. Uniform prior distributions are set for each parameter as per Section IV-C, and the resulting prior ranges are shown in Table I.
The left panel of Fig. 2 shows the histograms for the distributions of the 100 approximate posterior means for each of the parameters κ 0 , κ 1 ,γ 0 , γ 1 . For the parameters κ 1 and γ 1 , the true parameter value is within two standard errors from the overall estimated posterior mean. Although the true values of κ 0 and γ 0 fall outside the intervals, they are still fairly close to the corresponding overall means. We hypothesize that the deviations for κ 0 andγ 0 are caused by the simulated datasets where there is little difference between labeling an MPC as parent or offspring, thereby making it harder to statistically identify the parameters. The general impression is that the bias of the posterior means is relatively small compared with the sampling variability of the posterior means. Moreover, the sampling distributions of the posterior means are unimodal with moderate deviations from normality.
As mentioned in Section III, it is natural to expect cross-correlations for different pairs of parameter estimates. To quantify these effects, Fig. 2 (right) shows pair plots of the posterior means for the 100 mock datasets. For instance, we can observe a substantial negative correlation between the intensity parameters κ 0 and κ 1 . This indicates that it is often difficult to discriminate between parents and offspring. On the other hand, upon reparametrization the posterior means of γ 0 and γ 1 are essentially uncorrelated. Since substantial correlations often translate to higher rejection rates in the MCMC sampler, this decorrelation helps accelerate the procedure. The Bayesian inference for one dataset takes roughly 10 h on an Intel Xeon 2.5 GHz with 1 GB allocated RAM running on a CentOS Linux distribution.
Studying further frequentist properties of our procedure, we considered coverage of 90% credibility intervals by considering the percentages of simulations where the true parameters fall outside the credibility intervals. For the parameters κ 0 , κ 1 ,γ 0 , γ 1 , and ln(σ 2 ), this happened in 17%, 36%, 25%, 11%, and 14% of the simulations, respectively, showing that the intervals in general have less coverage than the nominal 90%. Bayesian credibility intervals are in general not guaranteed to attain nominal frequentist coverage so these results are not indicative of a failure of our method. According to the Bernstein-von Mises theorem, nominal frequentist coverage may be obtained asymptotically by including more data in the inference. However, it is not sufficient to increase the number of observations by just decreasing the sampling frequency f since we still observe just one single signal realization. Instead, one should aim for increasing the number M of independent signal realizations. Alternatively, to obtain better finite sample frequentist coverage, one may explore more sophisticated (although challenging to implement) noninformative priors as discussed in [37].
We end this section by illustrating the effectiveness of the split/merge step. Considering first the Turin model, we generate 150 randomly scattered MPCs on the time interval I = [25, 100 ns]. We again take K = 750 together with Gaussian noise of variance σ 2 = exp(−25). Then, we run the MCMC sampler for 500 000 steps with a random initial MPC structure to sample the conditional distribution of the MPCs given the simulated radio channel data. Fig. 3 (top) shows MCMC samples for exploring the posterior distribution of the number of MPCs under the Turin model where the samples are generated with and without split-merge steps. It appears that the posterior mean of the number of MPCs is higher than the true number of MPCs for the particular simulated dataset considered. The important point, however, is that the convergence to the equilibrium distribution around the posterior mean is faster when split-merge steps are used.
For the SV model, the conclusion is similar. We keep the basic simulation setting from the Turin model, but now with MPCs organized in ten clusters each consisting of 15 offspring. MCMC samples of the number of MPCs under the SV model are shown in Fig. 3 (bottom). The improvement in terms of convergence speed when using split-merge steps is even more pronounced than for the Turin model.

VI. APPLICATION TO MEASUREMENT DATA
We now analyze real indoor channel data from [36]. The data consist of the channel response for K = 750 equally   spaced frequencies in the range from 2 to 4 GHz. To fit the model to data, we rely on Bayesian inference as elucidated in Section V. After transforming the data into the time domain, Fig. 1 illustrates that we may restrict the attention to MPCs located in the time interval I = [25, 100 ns]. Beyond that, the channel impulse response becomes negligible.
We fit three different models to the data, namely, the SV model, the Strauss-SV model, and the Turin model (i.e., the SV model with λ 1 = 0). In each of the cases, the fits are based on 8 × 10 7 iterations of the MCMC sampler with the priors from Section IV. For the SV and Strauss-SV model, we choose a kernel width of 15 ns. For the Strauss-SV model, we fix ψ := 0.5 as the interaction parameter and R := 15 ns as the interaction radius. That is, the interaction radius is of the same magnitude as the kernel width. We also fix the intensity parameter as exp(κ 0 ) := 2.5 whereby the Strauss process has on average 25 points in the sampling window I .

A. Posterior Results
The posterior distributions of the parameters obtained from the MCMC samples are illustrated in Figs. 4, 5, and Table II. The model parameters seem well-determined by the data since the posterior distributions are close to normal and concentrated relative to the prior ranges. It is remarkable that the posterior results for the decay parametersγ 0 and γ 1 as well as the log noise variance ln(σ 2 ) are very similar for each of the three models, indicating that these parameters have a physical interpretation not depending on the specific model choice. Moreover, the posterior distribution for κ 1 , the intensity of offspring, is clearly bounded away from the lower end of the prior support. This indicates that the SV model fits the data better than the Turin model which is the limiting case of SV when κ 1 tends to −∞.
Although considering the log intensities κ i , i = 0, 1 are convenient for computation, they are less convenient for model interpretation. We therefore also report the posterior means of the intensities λ i . For the Turin model, the posterior mean yields an MPC intensity of λ 0 = 2.31. For the SV model, we obtain a cluster center intensity of λ 0 = 0.201 and an intensity of λ 1 = 16.57 offspring per center. In particular, not distinguishing between centers and offspring, the posterior mean of the combined intensity λ 0 (1 + λ 1 ) is 3.68 which exceeds substantially the value in the Turin model. Finally, for the Strauss-SV model we arrive at λ 1 = 10.77. Fig. 6 shows the posterior intensities of the locations of cluster centers (blue) and total intensity (green), i.e., the centers and offspring combined. To highlight the connection to the data, we overlay the intensities with the power-delay profile. The intensity of all the MPCs is quite similar for the three models. The blue parts of the middle and right panels illustrate how including Strauss interactions in the model leads to a clear separation of the cluster centers in the posterior distribution. The pair plots in Fig. 7 illustrate

B. Model Assessment
To compare the fits of the three models, we use posterior predictive model checks [38], [39]. The checking method proceeds by first drawing a parameter sample from the posterior distribution, which is then used in the model to simulate synthetic data. Repeating this process a large number of times provides a set of data which can be compared with the measurement data in terms of summary statistics. Here, we compare in terms of the posterior 95% envelopes of the simulated summaries, given as the interval between a lower quantile (2.5%) and an upper (97.5%) quantile. If the data are well-represented by the model, the data should lie within the posterior envelope. Further details of the method of posterior predictive model checking can be found in [38] and [39]. The similarity between the Turin and SV models regarding the posterior predictive distributions for the channel data magnitudes suggests that the posterior distributions of MPCs are more determined by the observed data than the underlying models. However, this does not mean that any configuration of MPCs is equally likely under the two models. The fact that the posterior distribution of κ 1 under the SV model is clearly bounded away from small values is indicative that the posterior distribution of delays is concentrated on clustered configurations which have a higher likelihood for large κ 1 for the SV model than for the Turin model (κ 1 = −∞).
To further show that there is a substantial difference in terms of model fit between Turin and SV, we consider the power autocorrelation, i.e., the autocorrelation function of the squared magnitudes of the transfer function. The power autocorrelation function for the SV model was previously studied in [40]. As discussed in relation to Fig. 8, the posterior distribution of the MPCs is to a high degree controlled by the observed data. Hence, if both the model parameters and MPCs are drawn from their posterior distribution, it becomes hard to assess the implications of the different distributional (clustered or not) assumptions for the MPCs. We therefore use  the following variant of posterior predictive model checking. As before, we first draw the model parameters from their posterior distributions. Next, given the model parameters, both MPCs and the final channel data are generated from (7). For example, in case of the Turin model, the delays are generated from a homogeneous Poisson process. In this way, for each model, we generate 200 posterior predictive simulations of channel data and compute the autocorrelation function for each simulation. Fig. 9 illustrates the posterior predictive means and 95% envelopes for the power autocorrelation function. In particular, we observe that both the SV model and the Strauss-SV model lead to autocorrelations that are substantially higher than those in the Turin model. However, neither the SV model nor the  Strauss-SV model fits the data perfectly. Hence, using our new methodology we are able to disclose shortcomings of the widely used existing models. Such shortcomings seem to have been overlooked in the existing literature using the multistep approach. It is quite possible that the chosen parameters for the Strauss parameters are not optimal. Based on the Strauss parameters (ψ, β, R) = (0.75, 20, 0.15) obtained with trial and error, Fig. 10 illustrates that there is scope for improved fit of the Strauss-SV model by changing the values of the In case of the SV model, the total intensity is exp(κ 0 )(1 + exp(κ 1 )).
Strauss parameters, and we think it could be a topic for future work.
Finally, we analyze the rms-delay spread defined as where m j is the jth temporal moment of a channel impulse response measurement, y(t), and is computed as The temporal moments and thus the rms-delay spread are calculated per measurement realization and are considered as random variables. As shown in [41], the variance is related to the power autocorrelation. From the posterior predictive empirical cdfs (ecdfs) of the rms-delay spread in Fig. 11, it appears that the rms-delay spread for the SV model and the Strauss-SV model is much closer to the value found in the data than that of the Turin model. This confirms the conclusion from the power autocorrelation analysis. It should be noted, however, that since the Turin model is more parsimonious in terms of parameters, we expect it to be less flexible than the two other models. Nevertheless, all the three models overestimate the rms-delay spread found in the data. This suggests that there is still scope for achieving a better fit of the data by resorting to a more refined model.

C. Diagnostics for the MCMC Sampler and Posterior Multimodality
We now analyze the performance of the MCMC sampler in the context of the data example. In addition to considering estimated autocorrelations and trace plots for the MCMC samples, we also assessed the chosen burn-in using Geweke's diagnostic. This did not reveal problems in case of the Turin and Strauss-SV models, but a critical value of Geweke's diagnostic was obtained for κ 0 in case of the SV model; see the Supplementary for results. Fig. 12 shows comparison of the posterior results obtained for replicated MCMC runs. In case of the SV model, the posterior results for κ 0 and κ 1 differ over MCMC runs while the posterior results for the total MPC intensity exp(κ 0 )(1 + exp(κ 1 )) are very similar. This shows a problem of identifiability where the pattern of MPCs is explained well by several modes corresponding to either few large clusters (small κ 0 , large κ 1 ) or many small clusters (large κ 0 and small κ 1 ). The posterior results reported in this section cover the few large cluster mode.
The problem of identifiability is not uncommon when fitting a complex model to a dataset with insufficient information for resolving the nature of the underlying latent components. In our simulation study, we did not experience this problem. However, in contrast to the simulation study, our posterior predictive model checking indicates that the SV model is misspecified for the data considered. This may further amplify the problem of identifiability.
Discovering both the modes within one MCMC run with the current MCMC sampler requires excessively long MCMC runs and designing an MCMC sampler that could move quickly between the modes is a very difficult task. From a practical point of view, the best remedy of the identifiability problem might be to include stronger prior information on either κ 0 or κ 1 . Moreover, more research is needed to obtain improved models for channel data as considered in this section.

VII. CONCLUSION
We proposed a Bayesian inference method for the Turin and SV models implemented using the MCMC methods. The proposed method is a statistically sound inference technique as opposed to the currently used multistep approaches involving multipath extraction and clustering algorithms. Furthermore, the method does not rely on side information of hard-toestimate quantities such as the number of MPCs or the number of clusters. Being a Bayesian approach, not only point estimates but also the whole posterior distribution, and thus information on estimation accuracy, are provided as the output.
In addition to the Turin and SV models, we considered a new model, coined the Strauss-SV model, where it is possible to model regularity of cluster parent configurations. This can lead to more parsimonious configurations of cluster parents in cases where clusters of MPCs are believed to be well-separated in time.
Our Bayesian methodology is able to fit the considered models to both the simulated and measured data. To validate the fit models, we performed posterior predictive model checking. This check revealed that all the three considered models are able to fit the transfer function data, but have shortcomings in fitting the power autocorrelation in the frequency domain and the rms-delay spread.
For the Strauss-SV model, we fixed the Strauss parameters in an ad hoc manner. Preliminary experiments indicated that a better fit might be obtained if the additional flexibility of the Strauss-SV model is used by including the Strauss parameters in the Bayesian inference. A thorough investigation of the new Strauss SV model could be done in future work by adapting advanced inference methods for doubly intractable posterior distributions available in the statistics literature.
Our main methodological focus regarding the MCMC sampler was the updating of the MPCs where we in particular introduced a novel split-merge step. However, there is certainly also scope for replacing our rather naive random walk updates with more efficient updates of the model parameters using, e.g., normal approximations of the posterior [42], [43] or adaptive MCMC [44].