Maximum Likelihood Calibration of Stochastic Multipath Radio Channel Models

We propose Monte Carlo maximum likelihood estimation as a novel approach in the context of calibration and selection of stochastic channel models. First, considering a Turin channel model with inhomogeneous arrival rate as a prototypical example, we explain how the general statistical methodology is adapted and refined for the specific requirements and challenges of stochastic multipath channel models. Then, we illustrate the advantages and pitfalls of the method based on simulated data. Finally, we apply our calibration method to wideband signal data from indoor channels.


I. INTRODUCTION
S TOCHASTIC multipath models are indispensable for simulating and analyzing radio systems for communication and localization. In a stochastic multipath model, the received signal is modeled as a superposition of attenuated and delayed signal components, each corresponding to one propagation path [1]. Such a model can be described by a marked point process where a marked point represents a delay and its associated path gain. Provided that the model is calibrated, i.e., its parameters have been estimated from measurement data, realizations of the channel can then be simulated from the model and used in system design or performance analysis, thus alleviating the need for further measurements. Calibration of stochastic multipath models is a nontrivial task for several reasons. In particular, due to the finite measurement bandwidth and the presence of additive noise, the marked point process is not observed directly but should be considered as a hidden variable. Not least in the context of point processes, estimating parameters in models with hidden variables is often a highly involved endeavor [2]. The calibration approach most widely used in the literature is a two-step procedure dating back to Turin et al. [3], outlined in Fig. 1. First, the measurement data are reduced to a set of multipath components, such as delays and path gains. Then, the parameters of the underlying point process are estimated from the obtained multipath components. Although this data reduction step was employed chiefly due to technical limitations of the measurement equipment and data processing used by Turin at that time, many works have since adopted and expanded upon this calibration method [4]- [10]. The estimation of the multipath components from measurement data involves high-resolution multipath extraction methods, such as CLEAN [11], SAGE [12], and RiMAX [13]. Depending on whether the stochastic model is cluster-based or not, an additional step of clustering the multipath components may also be employed. Clustering is either done manually, e.g., in [4], [5], [14], and [15], leading to subjective and nonreproducible results, or using automated algorithms, such as [16]- [18], which further increase the complexity of the calibration process. Implementation of these multipath extraction and clustering algorithms is typically nontrivial and requires a number of arbitrary choices to be made. Moreover, various ad hoc methods are utilized for obtaining the model parameters after multipath extraction. Another potential weakness of such two-step procedures is that the resulting parameter estimates are highly sensitive to the estimation accuracy of the particular set of extracted multipath components. Calibration techniques that do not require multipath extraction but rely on summarizing the data into a set of statistics have been introduced recently in the literature [19]- [23]. However, these methods call for definition of appropriate summary statistics that are informative regarding the model parameters. Moreover, the approximation arising due to summarizing the data may be difficult to quantify.
In this article, we propose to use the principled and recognized statistical methodology of maximum likelihood estimation (MLE) to calibrate stochastic channel models with inhomogeneous intensity function. Thus, our parameter estimates are the parameter values maximizing the probability density of the received signals given the transmitted signals. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ However, we face a missing data problem where it is not possible to evaluate the likelihood function by analytical marginalization with respect to the hidden quantities. We, therefore, use importance sampling to compute an approximation of the likelihood function using a large Markov chain Monte Carlo (MCMC) sample from the conditional distribution of the multipath components given the observed data [24]. Thus, in contrast to the previously mentioned methods, our method does not rely on the validity of just one particular set of multipath components. We believe that this will reduce bias and variance of the resulting parameter estimates. Moreover, approximated likelihoods for different models can be used in a natural way for model selection. Considering a parametric Turin model for simulated and real data, we demonstrate the feasibility of Markov chain Monte Carlo MLE (MCMC MLE) for model calibration. The MCMC method can also be adapted in a straightforward manner to sample from the posterior distribution of the parameters in case informative priors are available for the parameters.
The rest of this article is organized as follows. Section II introduces the inhomogeneous Turin model as the stochastic multipath model studied in this article. Next, in Section III, we describe our proposed procedure for approximate MLE using MCMC. We also explain what properties of the stochastic multipath model require us to develop problem-specific adaptations to standard MCMC and optimization methods. Section IV illustrates our calibration procedure using simulated data sets. While this already provides an intuition on the strengths and peculiarities of the calibration procedure, they become even more apparent in Section V, where we analyze a real data set of indoor channel measurements originally considered in [25]. Finally, Section VI concludes this article with a discussion and indications to avenues of future research.

A. Signal Model
Consider frequency-domain measurements of a singleinput-single-output linear, time-invariant radio channel in the band [−B/2, B/2] obtained by a vector network analyzer [26]. In each measurement run, the transfer function is sampled at K equispaced frequencies. The measurement data are modeled as a random vector Y = (Y 1 , . . . , Y K ) with entries where H k is the transfer function sampled at the kth frequency and N k denotes the measurement noise. The noise samples (N 1 , . . . , N K ) are assumed to be independent and identically distributed circular symmetric Gaussian random variables each with variance σ 2 . We denote a realization of the measurement vector Y by y = (y 1 , . . . , y K ). Repeating the measurements M times yields the sequence of independent realizations y (1) , . . . , y (M) . Taking the discrete-frequency, continuous-time inverse Fourier transform of the measurement vector gives the time-domain measurement (with a misuse of notation) where f = B/(K − 1) is the frequency spacing between two measurement points, giving the period of the time-domain signal as τ max = 1/ f . We denote the imaginary unit by i . The power-delay spectrum of Y (t) is defined as where s(t) denotes the transmitted signal in the time domain. 1 The power-delay spectrum P h (t) may be informally interpreted as P h (t) = E |H (t)| 2 , with |H (t)| 2 being the instantaneous power-delay profile of the channel (see [27] or [28]).

B. Stochastic Multipath Model
The channel transfer function of a multipath model is of the form where Z is a point process on the positive real line R + containing the propagation time delays τ . A complex-valued gain α τ is associated with each delay τ ∈ Z . Thereby, the process Z m = {(τ, α τ )} τ ∈Z constitutes a marked point process on R + × C. Hence, we refer to a pair (τ, α τ ) as a marked point. The support of the point process Z is the interval where τ 0 is the delay of the line-of-sight (LOS) path. Particular stochastic multipath models are obtained upon specifying the marked point process Z m . A multitude of such models have been proposed in the literature. Here, we follow the approach by Turin et al. [3], and let Z m be an independently marked Poisson process. This model is completely specified by the intensity function (arrival rate) and its mark density.
The power-delay spectrum is connected to the arrival rate and mark density. Assuming the complex gains to be uncorrelated given the delay variables (corresponding to the familiar uncorrelated scattering assumption), the power-delay spectrum factorizes as [28] where λ(·) denotes the intensity function (or arrival rate) for the delays Z . The power-delay spectrum is well studied as it is easy to measure and model. For in-room environments, the power-delay spectrum is well modeled by an exponential decaying function where T is the reverberation time and the gain factor G 0 is a positive constant (see [28], [29], and references therein). We first define the arrival rate and thereafter specify the mark density so that its second moment fulfills (5) and (6). While in his original work, Turin determined the arrival rate empirically in a nonparametric manner, a number of parametric models have occurred in the literature [28].
We consider here the flexible two-parameter model for the arrival rate proposed in [28] with c > 0 and κ 1 ∈ R. This model class includes both the constant rate model λ(t) = c, t ≥ 0, which is widely used in the literature due to its simplicity [4], [6], and the quadratic rate model λ(t) = ct 2 obtained by mirror source analysis of an empty rectangular room. 2 The quadratic rate model is able to represent the experimentally observed specular-to-diffuse transition [28], [30]- [32]. In the present study, we use the model (7). For computational convenience, we reparameterize the model as where κ 0 = log(c). Given Z , the path gains α τ for τ ∈ Z are modeled as independent zero-mean complex Gaussian random variables. Therefore, conditioned on the delays, the magnitude, |α τ |, is Rayleigh distributed, with the corresponding phase being modeled as a uniform distribution on [0, 2π). To satisfy (5), (6), and (8), we set the second conditional moment of the magnitude as where we have introduced the reparameterization γ 0 = log(G 0 ) − κ 0 and γ 1 = −1/T . Note that G 0 , T, κ 1 , κ 0 can be recovered uniquely from γ 0 , γ 1 , κ 0 , κ 1 and vice versa. In other words, conditional on τ , the real and imaginary parts of α τ are independent zero-mean normal, each with variance exp[γ 0 + γ 1 t − κ 1 log(t)]/2. The reparameterization using κ 0 , γ 0 , and γ 1 is not of critical importance but leads to somewhat nicer expressions for derivatives when using Newton-Raphson updates later on (see Section III-C2).

C. Estimation Problem and Likelihood Function
To calibrate the channel model, the parameter vector θ = [κ, γ , σ 2 ] with the shorthand notations κ = (κ 0 , κ 1 ) and γ = (γ 0 , γ 1 ) should be estimated from the data y (1) , . . . , y (M) . Following the maximum likelihood principle, the estimate is obtained as: where L(θ ; y (m) ) = p(y (m) ; θ) is the likelihood based on one realization y (m) . Denote by Z (m) m the point process associated with the measurement vector Y (m) . Suppose for a moment that in addition to the measurement data y (m) , also the corresponding point process realization z (m) m is observed. Then, the likelihood function based on (y (m) where The notion of a point density is nonstandard as the number of points varies from realization to realization. Technically speaking, the density of the delays is a density with respect to a unit-intensity Poisson process distribution [2, Sec. 6.1]. Specifically, the point process density of Z (m) m can be written The first factor is the product of the complex Gaussian densities f (α τ ; γ ) for the marks α τ , and the product of the last two factors is the Poisson point density for the delays τ ∈ z (m) . In practice, z (m) m is not available and the likelihood is obtained by marginalizing with respect to Z (m) m . More precisely, according to the law of total probability The likelihood function (11) is unfortunately not available in closed form because it is an expectation of a conditional probability that depends on Z (m) m in a complicated way and Z (m) m , moreover, does not have a fixed dimension. Consequently, the MLE cannot be obtained in a straightforward manner.
The estimation problem is complicated due to the missing data; the maximization of the likelihood would be straightforward if only the point process Z m could be observed. Thus, it is tempting to resort to a two-step procedure by first estimating Z m using well-known high-resolution path extraction techniques and thereafter to estimate the model parameters. However, such two-step procedures are problematic. Commonly, such high-resolution estimators work under the assumption that the number of points in Z m is known. This number is particularly challenging to estimate when the arrival rate is high compared to the inverse of the measurement signal bandwidth. In light of the arrival rate model (7), this situation is very relevant in our study.

III. MCMC MLE
To obtain the MLE, we propose an MCMC approach to obtain the maximum likelihood estimate of θ . Our approach is inspired by the method proposed in [2]. Thus, we rely on the observation that maximization of the likelihood function is equivalent to maximization of the likelihood ratio L(θ ; y)/L(θ 0 ; y) for a fixed reference parameter value θ 0 . This ratio can, as discussed in Section III-A, be evaluated using conditional samples of Z m given y. Furthermore, these samples can be generated using an MCMC algorithm detailed in Section III-B. Finally, the approximated likelihood ratio is maximized with respect to θ , as discussed in Section III-C. We comment in Section III-D on approaches for model selection.
For ease of exposition, we focus in the following on approximation of the likelihood ratio in case of one realization, i.e., M = 1 and denote by y the observed measurement data. The derived methodology is straightforwardly extendable to the case M > 1 since, by (9), the likelihood ratio for multiple realizations is simply obtained by multiplying the likelihood ratios for each separate realization.
The proposed MCMC maximum likelihood approach bears some resemblance to Monte Carlo EM in which it uses samples from the conditional distribution of the missing data given the observed data. However, directly maximizing the Monte Carlo approximation of the likelihood is more efficient than using EM steps for maximization (see the discussion in [33]).

A. Monte Carlo Approximations of Likelihood
Using a result from [2, Sec. 8.6.1], the likelihood ratio can be expressed as 3 where E Z m |y;θ 0 denotes conditional expectation with respect to the hidden multipath components Z m given the data y under the parameter θ 0 and the full data likelihoods on the right-hand side are given by (10). The right-hand side of (12) is an importance sampling formula allowing us to use the conditional distribution of Z m given y to integrate out Z m from the full data likelihood ratio L(θ ; y, Z m )/L(θ 0 ; y, Z m ). Thus, the right-hand side of (12) can be approximated by an empirical average based on samples 4 Z m,1 , . . . , Z m,N from the conditional distribution of Z m given y with the notation A delicate issue of the estimator (13) is its Monte Carlo variance. If θ differs substantially from θ 0 , then only a small number of terms contribute significantly to the Monte Carlo estimator in (13), which in turn leads to a very high variance of the estimator. The magnitude of the degeneracy is quantified by the effective sample size [34] The effective sample size equals N in the extreme case where W 1 = W 2 · · · = W N . In the other extreme where one term dominates, then the effective sample size approaches unity. When applied to dependent samples, such as those obtained by MCMC samples, the effective sample size can be somewhat optimistic as it does not consider the correlation between samples. However, we still find it useful for gauging the quality of the Monte Carlo estimator. Alternatively, the variance of the Monte Carlo estimator could be estimated using time series methods [35]. In case of multiple measurements, M > 1, several MCMC samplers, one for each measurement vector, would be run in parallel. In that case, it would be natural to consider the minimal ESS over the M samplers.

B. Birth/Death MCMC Sampling With Parallel Tempering 1) Birth/Death MCMC:
The challenging task of sampling from Z m |y, θ can be tackled using specialized MCMC samplers for point processes [2]. Here, we rely on Algorithm 1, which is a variant of the birth-death MCMC algorithm from [2,Ch. 7]. A diagram of one iteration of the birth-death algorithm is shown in Fig. 2.
The MCMC updates of Algorithm 1 are births, deaths, and moving of marked points (τ, α τ ). We first elucidate the Pick (τ, α τ ) uniformly at random from z m and delete it from z m . mhr ← m [remain at current state] end if end for mechanisms behind the birth and death steps. A birth proposal attempts to add a marked point (τ, α τ ) where τ is drawn uniformly at random in the sampling window I and α τ is drawn from the circular symmetric Gaussian distribution described in Section II. A death proposal attempts to remove a marked point selected from the uniform distribution on the current marked points. The proposals are accepted or rejected according to Metropolis-Hastings ratios appropriate for the setup of a varying number of points (see [2,Ch. 7]).
The birth-death MCMC algorithm, unfortunately, suffers from the problem of slow mixing. The reason for this is that if a marked point (τ, α τ ) is borne close to the true location of a delay, this may increase the likelihood of the data under the model substantially, even if the mark α τ is not entirely correct.
Since such a point dies with small probability, it likely remains in the MCMC algorithm for a long time, thus leading to slow mixing of the Markov chain.
To improve the mixing, we introduce updates that only change the mark α τ for a uniformly selected point τ . For instance, if a large mark is changed to a smaller, this may increase the chance that a death of the associated marked point becomes accepted later on. In addition, if the originally proposed mark was too small, the mark change allows for correcting this by proposal of a larger mark.
To further improve the mixing, we use a parallel tempering scheme that combines several birth-death Markov chains. Parallel tempering is a versatile technique to reduce autocorrelation in slowly mixing Markov chain samplers by running in parallel several variants of the chain that mix substantially faster [24]. Occasionally, the faster chains swap states with the slower ones, thereby reducing the mixing time of the slower chains. Here, we can construct faster chains by increasing the noise level whereby the conditional distribution of the point process given the data becomes more dispersed so that the chain does not get stuck as easily. The swaps between different chains are controlled by a Metropolis-Hastings criterion as follows. A chain in state z m with parameters θ swaps states with a chain in state z m with parameters θ with probability In the swap phase, we order the parameters according to their noise level σ and then sequentially attempt a swap move for every pair of successive parameters, that is, if parallel tempering considers K noise levels σ 1 < · · · < σ K , then we attempt swap moves between σ i and σ i+1 for i < K . Large spans of noise variances and the large number of parallel chains generally reduce the mixing time at the cost of parallelization overhead. We found that working with only six temperature levels reduces the autocorrelation substantially while maintaining a reasonable complexity. The distances between the six noise levels are chosen such that we achieve the recommended acceptance rates between 20% and 50% [36].
2) Initialization: In principle, the MCMC sampler converges for any choice of initial configuration. The number of iterations required to reach the target equilibrium, called the burn-in, can be reduced by careful initialization. We proceed in two steps. First, we run the MCMC sampler for a number of steps starting in a random initial configuration. This generally leads to a configuration with too many points and a long burn-in would be needed to eliminate the excessive points. Therefore, we remove points that are too close together to obtain a better initial configuration. More precisely, we achieved good results by removing delays that are less than 1 ns apart.
3) Thinning and Swapping: For each of the parallel chains, we apply 400 000 basic birth-death MCMC steps. After each 200th step, pairwise swaps of chains are proposed. The first 100 000 samples are discarded as a burn-in. Subsequently, to reduce autocorrelation and save storage, we only retain each 200th state of the MCMC sampler. Moreover, the states of the chains with increased thermal noise are discarded. This yields in total a sample of 1500 realizations of the conditional distribution. For the simulation study presented in Section IV, the autocorrelation plots in Fig. 3 for the sequence of logarithms of the full data likelihood log L(θ ; y, Z m,n ) illustrate that after thinning and parallel tempering, the autocorrelation remains under control.

C. Optimization Methods
To maximize the Monte Carlo approximation of the likelihood (13), we use the cross-entropy method (CEM) [37], which is a gradient-free method that works robustly in settings where the objective function is subject to Monte Carlo errors. To ensure that the MCMC approximations of the likelihood within the CEM remain valid, we define a trust region (see Section III-C1). After convergence of the CEM/trust region procedure, we fine-polish the estimate by applying a few Newton-Raphson updates. The computation of the gradient and Hessian matrix required for this is discussed in Section III-C2. Algorithm 2 summarizes the resulting procedure, and Fig. 4 shows the procedure in terms of a block diagram.
The optimization procedure is also applicable in case of multiple measurements, M > 1. In this case, given θ 0 , we would run M MCMC samplers in parallel, one for each measurement vector y (m) , m = 1, . . . , M, and approximate the likelihood for each measurement vector using (13). Finally, these approximations are multiplied to get the approximation of the full likelihood to be maximized.

1) CEM With Trust Region:
In the CEM method, a Gaussian proposal distribution is iteratively adapted so as to concentrate it to a small neighborhood of the maximum. More precisely, we first draw a number of parameter vectors independently of the current proposal distribution and evaluate the corresponding likelihoods. Then, a new proposal distribution is fitted to the elite sample, that is, the parameter vectors with the highest likelihoods. This process is repeated until the increase in the highest evaluated likelihood over the elite sample is below some user-specified threshold. The CEM requires evaluations of the likelihood function. However, the cost of these is minor If θ and θ 0 are too distant, the approximation (13) of the likelihood ratio (12) becomes unreliable. This situation is indicated by a small ESS value. Therefore, we restrict the CEM maximization to a trust region [38] around the current value θ 0 determined so that the ESS is above a certain threshold for all θ in the trust region. If the CEM maximization terminates at a value θ max well inside the interior of the trust region, this value is used as an initial value for some final Newton-Raphson updates to fine-polish the estimate. By well inside, we mean that ESS at θ max is bigger than a second threshold exceeding the first threshold used to define the trust region. Otherwise, we set θ 0 = θ max , draw a new MCMC sample, and run the CEM maximization procedure once again over a trust region centered around the new value of θ 0 with the original initial proposal standard deviation.

2) Gradient and Hessian for Newton-Raphson Updates: Let
denote the gradient of the log joint density of (y, z m ). Following [2, Sec. 8.6.2], the score function and observed information are: The conditional expectations and variances can, in general, not be evaluated in closed form. However, it is feasible to approximate these quantities by importance sampling. For instance, following [2, eq. (8.43)]: whereW n = W n / n≤N W n and W n is defined in (14).

D. Model Selection Based on Likelihood Ratios and Bridge Sampling
In addition to model calibration, a second application of the likelihood-ratio computation concerns model selection. Considering two models A and B with parameter vectors θ A and θ B , respectively, we wish to select the model that yields the highest likelihood value. In terms of the likelihood ratio, we select model B if L(θ B ; y)/L(θ A ; y) > 1. An appealing property of this criterion is that the ratio on the left-hand side is precisely of the form appearing in (13) and therefore amenable to computation via importance sampling. We have here described the likelihood-ratio approach in the case where the compared models belong to the same model class. It is, however, also possible to use the MCMC approach to compute ratios of likelihoods corresponding to different model classes, possibly with a different number of parameters.
If θ A and θ B are far apart, then the estimate (13) is unreliable since the weights are almost degenerate. This problem is tackled via bridge sampling. Let, for ease of notation, θ 0 = θ A and θ M = θ B for some M ≥ 1. Then, the likelihood ratio is expanded as L(θ M ; y) L(θ 0 ; y) = L(θ 1 ; y) L(θ 0 ; y) L(θ 2 ; y) L(θ 1 ; y) · · · L(θ M ; y) L(θ M−1 ; y) for intermediate parameters θ 1 , . . . , θ M−1 bridging figuratively the large difference between θ 0 and θ M . Subsequently, we apply Monte Carlo estimation to each of the ratios on the right-hand side.

IV. SIMULATION STUDY
In this section, we analyze how well MCMC MLE performs on simulated data. We consider two parameter configurations θ const and θ quad . For both, we use the same parameters driving the distribution of the thermal noise and the path gains log(σ ) = −10.5, γ 0 = −20, γ 1 = −0.029. P Here, the parameters are chosen to resemble the characteristics from the measurement data discussed in Section V. In the parameter set θ const , the arrival rate of the Turin model is constant, i.e., κ 0 = −0.75, κ 1 = 0.
Here, we choose κ 0 such that the expected number of multipath components agrees approximately in both models. Moreover,  within the simulation study, we fix the size of the observation window |I | = 150 and the delay τ 0 = 50 associated with the LOS path. Fig. 5 shows the power-delay profiles simulated from the two models. The settings of the CEM algorithm are given in Table I.

A. Bridge Sampling
First, we illustrate how to justify selecting either the constant or the quadratic rate model via bridge sampling. We draw a realization y from the constant rate model θ const and then compare the likelihoods of y under θ const and θ quad . For this purpose, we interpolate linearly between κ 1 = 0 and κ 1 = 2 with a step size of 0.4. The corresponding values of κ 0 are fixed as (−0.75, −2.7, −4.7, −6.6, −8.6, −10.5).
In particular, the expected total number of points does not fluctuate substantially among consecutive values. The remaining parameters log(σ ), γ 0 = −20, and γ 1 = −0.029 agree in θ const and θ quad and are therefore kept fixed. Setting θ 0 = θ const and θ 5 = θ quad , Table II shows the estimated log-likelihood ratios log L(θ i+1 ; y)/L(θ i ; y) together with the effective sample sizes based on 2000 nominal samples. The resulting log-likelihood ratio log L(θ quad ; y)/L(θ const ; y) becomes −13.02 identifying θ const as the correct model.
In order to assess how robust the model selection method is with respect to taking a different sample, we took 100 samples from the constant rate model and compared the likelihood with that under the quadratic rate model. In all of the 100 samples, bridge sampling indeed indicates a higher likelihood for the

B. MLE With Known Multipath Components
The first test case for MLE is the setting of known multipath components. Although measurements in the field do not reveal this kind of information directly, such test cases help to build intuition on how well MLE can work in an idealized setting.
As an illustration, we provide profile plots of the log likelihood, that is, we fix all but one of the parameters at their true values and then trace how the log likelihood changes when varying the considered parameter.
In general, Fig. 6 suggests that the log likelihood is maximized close to the true parameters. Still, even in this idealized setting, we do see deviations of the parameter estimates from the true values, and they become more pronounced as we now consider the case of unknown multipath components.

C. Unknown Multipath Components-Fixed κ 1
After these initial findings, we now rely on Algorithm 2 to estimate the model parameters from simulated data sets where, for each of the parameter configurations θ const and θ quad , we conduct 50 simulations. In this section, we regard κ 1 to be known and fix it at its true value. For the remaining parameters, we initialize the optimization at parameter values obtained by perturbing the true values of the parameters. Then, we maximize the log likelihood via Algorithm 2.
In the present setting, we found CEM to perform robust optimization. In the first steps, the likelihood improvements are large, but effective sample sizes at the new parameter values are small. In other words, although the new parameters indicate a substantially higher likelihood, the Monte Carlo approximation of the likelihood could be quite imprecise. However, as the optimization proceeds, the improvements become smaller, while the effective sample sizes increase. In particular, the final decision of identifying the maximum is based on a high effective sample size. Fig. 7 shows boxplot of the parameter estimates under both true parameter configurations. The boxplots illustrate that the medians of the estimates are almost identical to the true values both in the case κ 1 = 0 and κ 1 = 2. Overall the MCMC MLE seems to work well.

D. Unknown Multipath Components-Variable κ 1
Next, we optimize with respect to the full parameter vector by including also κ 1 in the optimization process. When estimating the parameters from 50 realizations, Fig. 8 shows that while the medians are still close to the true values, the estimates of both κ 0 and κ 1 now fluctuate substantially. In particular, for κ 0 , it is evident that the estimation variance is much smaller when κ 1 is fixed compared to when κ 1 is included in the estimation. This is caused by a strong entanglement of the parameters κ 0 and κ 1 that we now explore in further detail.

E. Issues With Parameter Idenfiability
Due to the complex interplay between the effects of the parameters κ 0 and κ 1 , it is difficult to optimize the likelihood jointly with respect to these parameters. For instance, both κ 0 and κ 1 influence the total intensity of points (similarly, both γ 0 and γ 1 affect the general magnitude of the path gains). The contour plot of the log-likelihood ratio for a simulated data set under θ const in Fig. 9 illustrates this issue. Indeed, this plot shows two local maxima, both exhibiting a ridge of (κ 0 , κ 1 )-combinations where the log-likelihood ratio is close to the local optima.
The parameters κ 0 and κ 1 are thus not jointly well identified by the likelihood. This entails strong correlation between the estimates of κ 0 and κ 1 as well as high variance of each of the estimates. This is further illustrated by the scatterplots of estimates in Fig. 10 for each pair of parameters. The estimates shown are those obtained from the 50 simulations under θ const . Supporting the previous findings, we detect a strong negative linear relation between the estimates of the intensity parameters κ 0 and κ 1 . Similarly, the estimate of γ 0 is nicely approximated by an affine function of the estimates of κ 0 and κ 1 .  The optimization procedure CEM relies on a Gaussian proposal distribution with diagonal covariance structure. For tightly entangled parameters, the optimization, therefore, explores the parameter space poorly. Hence, in order to account for the correlations, we transform the parameters linearly with a preconditioning matrix prior to applying the CEM. In the setting of the simulation study, we obtain an appropriate linear transformation from preliminary parameter estimates for different simulations. It may not be easy to determine such a transformation in a given application.

V. APPLICATION TO MEASUREMENT DATA
Having analyzed simulated data, we now turn to indoor channel data originally considered in [25]. The data contain the channel response for 750 equally spaced measurements in the range [2 GHz, 3 GHz]. In particular, the impulse response lies in the interval [0 ns; 750 ns] and decays rapidly after a strong peak close to τ 0 = 50 ns. Therefore, we henceforth work with a window of size |I | = 150.

A. MCMC MLE and Bridge Sampling
As we saw in Section IV, the optimization is prone to become unstable when estimating κ 0 and κ 1 jointly. Hence, we consider two fixed κ 1 values of particular interest: κ 1 = 0 (constant rate model) and κ 1 = 2 (quadratically increasing intensity).
In our MCMC setup, we found that when starting from reasonably chosen initial parameters, the optimization converges both for the constant and for the quadratic rate model. Since random fluctuations were stronger than for the synthetic data, we stabilized the optimization by increasing the number of MCMC samples from 1500 to 2500. Fig. 11 shows the power-delay profile of realizations for the fitted constant and quadratic rate models. Although the plots already provide an indication on the different structure of the arrival rates, we stress that the power-delay profile can vary substantially from one realization to another. To see clearly how the model fits to the data, we also plot the measured and estimated power-delay profiles on the same figure (see Fig. 12).
Letθ const andθ quad denote the parameter estimates under the constant and quadratic intensity models, respectively. In order to compare the fitted constant rate model with the fitted quadratic rate model, we estimate the likelihood ratio L(θ quad , y)/L(θ const , y) via bridge sampling (see Section II). For this purpose, we bridge linearly between κ 1 = 0 and κ 1 = 2 in steps of size 0.125. Also, for the other parameters, we perform linear interpolation. Fig. 13 shows the estimated log-likelihood ratios log(L(θ i+1 , y)/L(θ i , y)) for the intermediate parameter vectors together with the associated effective sample sizes. Aggregating the estimates yields a negative value for the estimated log-likelihood ratio for the quadratic rate in comparison to the constant rate model. Hence, for the considered data set, the constant rate model seems more appropriate than the quadratic rate model.

VI. CONCLUSION
The developed calibration method for stochastic multipath radio channel models is based on the well-established method of MLE. Thus, we have shown that it is possible to approach the calibration problem in a statistically sound manner without the need to resort to heuristic techniques. In particular, our approach breaks the line of the widespread approach originating from the seminal works of Turin in the 1970s where the calibration problem is broken down into (arbitrarily defined) subproblems that are tackled by estimators developed separately. Although the developed method has been used to calibrate stochastic channel models using indoor radio channel measurements, it is applicable to measurements from other scenarios as well. Another potential use of the proposed method is for calibrating stochastic channel models based on the output of a ray tracer instead of measurement data.
We find that, despite the intractability of the likelihood function, MLE is possible by estimating the likelihood function using a birth/death MCMC sampler and then optimizing it using our CEM algorithm. Obviously, being a Monte Carlo approach, it necessitates repeated sampling from a Markov chain, which entails a significant computational complexity. It was not the objective of the present work to optimize the estimator for computational complexity, and thus, we envision that more efficient samplers can be made, in particular considering the availability of more measurement data. Nevertheless, we find that the proposed method is indeed viable provided the necessary computational power.
We observed for the considered model that only a linear combination of the parameters κ 0 and κ 1 is well identified by the data but not κ 0 and κ 1 separately. This is apparent from plots of the likelihood function. On the contrary, we suspect that such a lack of identifiability may be hidden by the current stepwise methods based on initial identification of delays and gains. The optimization difficulties due to the poor identifiability can be somewhat mitigated using a reparameterization. However, we resolve the issue by a more robust discrete approach where optimization is first performed over a discrete subset of the parameter space, and afterward, the estimate is refined by the method of bridge sampling.
With a view toward avenues of future research, we note that although parametric Turin models can represent inhomogeneities in the arrival distribution, they are still based on the basic assumption of a Poisson point process. Therefore, they do not allow for interactions between the different arrivals. However, a commonly held belief is that arrivals tend to appear in clusters, which would call for a more flexible point-process model. By extending the model selection methods from this article, MCMC-based MLE makes it possible to analyze this belief on the grounds of a statistically well-established methodology.