Simultaneous Model and Parameter Estimation for Joint Communication and Positioning

Joint communication and positioning based on a unified signal structure yields synergy and can complement and assist system designs enabling higher coverage and quality of service for both communication and positioning. For the time of arrival (TOA) based joint communication and positioning the TOA estimation accuracy is crucial. It is known to translate directly into position estimation accuracy. In the presence of multipath propagation, the estimation accuracy of signal arrival times in return strongly depends on the actual as well as the estimated number of the physical path parameters, the model order. In this work, we assess the performance and the mutual impact of simultaneous model order and parameter estimation for channel-estimation-based joint communication and positioning. Besides introducing a terrestrial channel-estimation-based unified joint communication and positioning system framework, we discuss and numerically compare different methods to sequentially or jointly estimate the parameters and the model order. We show that a TOA error-minimizing model order estimation is preferable over estimating the correct model order. Furthermore, we compare the performance with a proposed focused order-related lower bound. This bound determines the optimal model order for a chosen estimator. It depends on the actual and hypothetical model order and it replaces the here unsuitable Cramer-Rao lower bound. Secondly, the comparison shows that employing the parameter and model order-dependent inverse Fisher information matrix yields a close-to-optimal approach. We numerically show for a realistic channel scenario with many multipath parameters that the method is still accurate.

(GNSS)-based system. The long term evolution (LTE) signal design considers positioning reference signals from the beginning. More precisely, the authors of [5] investigated the positioning accuracy for such approaches. Furthermore, the authors of [7] investigate a potential receiver design for long term evolution (LTE) signals, whereas the authors of [8] discuss potential mm-wave radar communication system approaches.
For positioning, the multipath delays and the number of multipath components (the number of propagating ray clusters) have to be estimated. The latter determines the model order. We assess a simultaneous model selection and parameter estimation approach and provide numerical results for a joint positioning and communication system. To illustrate the relationship between related contributions and novel contributions, we categorize our overview into open challenges, related contributions, and novel contributions.

A. OPEN CHALLENGES
Although the range of contributions in the area of model selection [9]- [22] as well as in the area of parameter estimation [23]- [30] indicates that both research areas are rather mature, there are still open questions and challenges: • How to efficiently find the position in the presence of a realistic multipath propagation scenario is still known to be a challenging problem. Such a scenario usually leads to a severe parameter estimation performance degradation [1] compared to a single path scenario. Fading multipath components and non-line-of-sight conditions are difficult to extract from the additive noise [2].
• Furthermore, an increasing number of parameters entails an increasing estimation error due to a higher problem complexity (overestimation). If, on the other hand, the number of parameters is assumed too small (underestimation), a systematic modeling error has to be taken into account. Positioning applications target an especially high estimation accuracy of the positioning-relevant parameters. Therefore, methods yielding the optimal parameter estimation performance for positioning to simultaneously estimate the parameters and model order (number of multipath clusters) can still be improved.
• Especially for estimating the multipath fading channel parameters it is obvious that automatically excluding paths from the model with a power below the noise power is beneficial [28]. Underestimating the model order is beneficial in these cases. For positioning or delay estimation, excluding paths potentially yields a lower delay estimation error [6]. Following this line of thought, the probability of correct detection is not a sufficient performance measure to assess the model order selection performance. Furthermore, the Cramer-Rao Lower Bound (CRLB) [31] requires the correct model order. If the correct model order is too high for the number of observations, the CRLB fails to assess the performance. The formulation of a more practical, order-related lower bound is required.
Numerical results for simultaneous model selection and parameter estimation should be compared to this bound. This contribution addresses all these challenges. We assess a simultaneous model selection and parameter estimation approach for a channel estimation-based joint communication and positioning framework targeting an optimal delay accuracy. Moreover, we propose a novel, practical model order-related lower bound. Similar to a problem stated in radar [32] we assume here that the correct model order does not necessarily yield the optimal performance. Our approach employs the delay's Fisher information matrix in the model selection as a natural solution to exclude undesirable paths in the model order detection. This approach outperforms classical information-theoretic approaches.

B. RELATED CONTRIBUTIONS
In the following, we provide an overview of approaches and contributions related to this work.

1) CONTRIBUTIONS USING REDUCED COMPLEXITY MODELS
Instead of performing model selection and parameter estimation together, [6], [33] propose model simplification to yield an acceptable ranging accuracy to computational cost tradeoff in multipath scenarios. Such approaches have the drawback that they result in a positioning accuracy limited by a modeling error, which can be very high depending on the applied scenario.

2) CONTRIBUTIONS ASSUMING THE CORRECT MODEL ORDER IS OPTIMAL
Many researchers assume that the model order can be estimated correctly and that this is the optimal choice [23], [24], [27], [34], [35]. The authors focused on the parameter estimation results, employed the correct model order, and neglected the possibility that this assumption is not always the optimal choice. Often these contributions focus on proposing a special estimation or detection algorithm. Hence the simulation setups are designed to prove that the algorithm reliably works in a minimalistic setup, like in a two-path scenario. The impact of model selection and realistic channel modeling exceeds the scope of these contributions.

3) CONTRIBUTIONS USING THE MODEL ORDER DETECTION PROBABILITY OF CORRECT DETECTION AS A PERFORMANCE MEASURE
In a range of contributions, researchers investigate and compare different model selection methods by choosing the probability of correct detection as a performance measure for model selection [10], [11], [14], [16]- [19], [22], [36], [37]. These contributions present various strategies to estimate the model order. Nevertheless, they do not consider the impact on the parameter estimation performance or small sample sizes. In [38], [39] the parameter estimation performance is considered. The authors use higher-order arrays combined with the parallel factor (PARAFAC) methods. These methods have the advantage of performing robustly.

4) FREQUENCY-DOMAIN-BASED JOINT MODEL ORDER SELECTION AND PARAMETER ESTIMATION CONTRIBUTIONS
Recent approaches [40], [41] to simultaneously estimate the model order and the parameters rely on a signal model described in the frequency domain. These models come with a systematic error. The parameter estimation results are compared to the CRLB and not to an order specific bound demonstrating the suitability for positioning. These contributions also do not address particular application associated requirements. Consequently, the authors do not consider channel conditions for which an underestimated model yields superior results. In [17], the authors propose VOLUME 9, 2021 a promising shift-invariance based order selection technique for exponential data modeling. This technique was then further employed in a proposal for joint parameter estimation and model order selection in [40]. The authors could numerically show the impact of model order selection on the parameter estimation. As we will clarify when introducing the system framework used here, formulating the signal model in the frequency domain like in [17], [21], [40]- [44] is impossible without some undesired signal design impairments like oversampling for the here investigated joint communication and positioning framework. Secondly, it is not possible without accepting a limited estimation accuracy. The contributions in [17], [40]- [43] furthermore do not investigate a lower performance bound tailored to simultaneous model selection and parameter estimation for positioning.

5) CONTRIBUTIONS EMPLOYING SEQUENTIAL PROCESSING AND BAYESIAN ESTIMATION
The authors of [45] base their work on a sequential signal model and propose and assess a particle algorithm for sequential Bayesian parameter estimation and model selection. Sequential processing requires initial guesses for the parameter estimation problem. Similar to [45], the authors of [46] employ a Bayesian strategy to jointly estimate the model order and parameter estimation by population Monte Carlo simulation. Bayesian approaches exploit the parameters' a priori distributions. In a practical scenario, the physical channel parameters' a priori distributions will not be known to the receiver side. Compressive sensing constitutes a valid approach for joint model selection and parameter estimation. In [47], the authors investigated the relation between sparse reconstruction and parameter estimation with model order selection, proposing to use sparsity parameters via compressive sensing. Their approach mimics classic order selection criteria. On the other hand, compressive sensing entails the necessity of defining an inexact ''on-grid'' delay model.

C. NOVEL CONTRIBUTIONS
We summarize the novel contributions in this article as: • Overcoming the limitations of reduced complexity models by performing model selection and parameter estimation simultaneously or jointly. We propose to determine the delay estimates for different hypothetical model orders. We then propose to choose the estimator and the model order yielding the best delay estimation accuracy.
• Assessing a soft information-based simultaneous model order selection and parameter estimation designed to yield accurate delay estimates for a joint communication and positioning system. We demonstrate that the following technique is particularly suited to perform well in such systems. Here we use a soft information complexity criterion (ICOMP). The ICOMP criterion utilizes the Fisher information matrix as a complexity measure. We can see this as tuning to exclude low energy paths for multipath parameter estimation and model selection.
• Focusing the model order detection to the positioningrelevant parameters, the delays. The results [6] (page 74 and page 123) indicate that simply assuming the correct model order is not the optimal choice for positioning. We formulate a for TOA error performance beneficial automatism to estimate the model order optimally. Our approach takes into account the instantaneous channel conditions.
• Providing a theoretical, more practical, and appropriate focused order-related bound (FORLB) than the CRLB to quantify a lower theoretical bound for the joint model selection and parameter estimation performance in general.
• Applying the simultaneous model selection and parameter estimation to the proposed generalized joint communication and positioning signal model. We show numerical results for a continuous delay model and time-domain processing to circumvent additional systematic error sources as described in the subsubsections I-B4 and I-B6. Furthermore, we use the channel parameters' invariance and apply block-wise estimation.
Using global optimization, we overcome the necessity of initial guesses. To target a realistic receiver design, we assume that we do not know the a priori parameter probability distribution.
• Investigating the impact of a realistic fading channel and the availability of a small number of measurements for simultaneous model selection and parameter estimation. We show that our approach yields an accurate parameter estimation performance in realistic channel scenarios. We organized this article as follows: Firstly, we present a generalized joint communication and positioning system framework based on channel estimation. Then, different model order estimation strategies and parameter estimation strategies are discussed. Afterward, we propose the model order related bound. Finally, we numerically assess the mutual impact of model order estimation and parameter estimation for a typical wireless outdoor-channel model.

D. NOTATION
Re(·) and Im(·) stand for the real and the imaginary component of a complex value. We denote vectors by a bold lower case letter notation, whereas we denote matrices by a bold upper case letter notation. Further, (·) H denotes the Hermitian transpose of a matrix. Hypotheses are represented by( ·) and estimates by( ·). The operator vec(·) reshapes a matrix into a vector by transferring the matrix column-wise into the vector. The pseudo-inverse of any matrix is denoted by (·) † .

II. THE SYSTEM FRAMEWORK A. THE SIGNAL MODEL
To calculate the position of a mobile user via the TOA or the AOA without ambiguity, several reference objects with coordinates known to the system are required. Among other network nodes, neighbouring base stations or access points can serve as reference objects. Either the network can determine the position of a device moving in the area spanned by the reference objects, or the device can determine its position via employing signals of multiple base-stations or access points. Since both options are possible and the positioning relevant parameter estimation is carried out for each link separately, in order to achieve a high positioning accuracy, a high single-link parameter estimation accuracy is required. To obtain accurate positioning estimates, we target high-resolution TOA and AOA estimates for each link. Hence, for the sake of clarity, we investigate one single link, keeping in mind that we require at least three links in an actual joint communication and positioning setup. At the transmitter side, a data matrix X and a known pilot matrix P, which we here assume to choose optimally, are constructed and combined according to rules specified by the actual multiplexing scheme. Together they build the virtual training matrix V. The term virtual training indicates that in addition to the regular pilots the communication data itself can serve as pilot data by iteratively employing the detected data at the receiver side for channel estimation for joint communication and positioning [4]. Different multiplexing schemes will yield a different composition. Independent of the multiplexing scheme the information in V is supposed to be fully used jointly for communication and positioning in the following manner: We inherently use the pilot symbols in P as a part of V for channel estimation for communication at the receiver side. We principally know that the underlying model for the channel depends on the positioning relevant parameters like the multipath delays and that we can use the channel estimates to estimate these parameters. Hence, we propose to doubly exploit the channel estimates for communication, for data detection, and for positioning, for parameter estimation. Note that for the communication detection performance purely training-based estimators are already sufficient, whereas this is not the case for positioning. Positioning algorithms rely on accurate channel estimates. Consequently, we use the total signal energy of the transmitted signal in V by employing an iterative semi-blind channel estimation strategy [4]. By iteratively including the detected communication data symbols as additional ''virtual'' pilots we can asymptotically approach an optimal channel estimation performance for the positioning side. We illustrate our joint signal design in our framework in Figure 1. Physically, a pulse shaping filter specified by g Tx (τ ) is applied before transmission over the physical channel and a matched receiver filter g Rx (τ ) is applied at the receiver side.
Let C denote the number of multipath components, here the model order. The number of receive antennas is denoted by N r . Further, let the complex path weights of multipath , . . . , C} and the cluster delays be denoted by τ = [τ 1 , . . . , τ C ]. The physical channel c v (t, τ ) is modeled as the superposition of weighted Diracs delayed by the multipath cluster delays: Thereby, τ 1 is the positioning relevant TOA. For a moderate mobile velocity the complex path weight γ c,v behaves quasi-static in blocks of a certain length. Consequently, we assume block-fading over K times the symbol duration T . That means we assume a quasi-static channel over the time duration KT and that we omit the time-dependency over this block length. From one block to another, however, the complex-valued path amplitudes are assumed to be time-varying. Note that we can assume the delays in τ to be quasi-static over a certain consecutive block number I . We will use this property to enlarge the number of measurements for the later formulated parameter estimation problem by observing KTI measurements. Further, for the matter of convenience, to formulate an equivalent discretetime channel model (EDTCM) expression the physical channel is convolved with the convolution of the pulse shaping filter g Tx (τ ) and the receiver filter g Rx (τ ), specified by g(τ ) An overall channel impulse response function h v (t, τ ) with v ∈ {0 . . . , N r − 1} can be expressed by the convolution of Let the channel memory length be denoted by L+1. Sampling at the receiver side with a symbol period T yields the To obtain a more convenient matrix vector notation let us define the channel matrix H, the so-called spatial signature matrix and the delay-dependent pulse matrix G(τ ) entrywise so that for the single-input multiple-output (SIMO) time series case we have for i ∈ {0, . . . , I − 1} Further, a horizontal matrix concatenation for all time indices i yields Then, the matrices , G(τ ) and H are related by Let N(i) be a complex Gaussian distributed noise matrix and Y(i) denote the received signals matrix. The row ranks of both matrices have to match the row rank of V, which depends on the actual multiplexing scheme. Then the system equation can be written as As pointed out earlier, the virtual training matrix V includes a multiplexing dependent combination of the data in X and the pilots in P. Without loss of generality V = X + P (for TDM and CDM) and Y(i), N(i) ∈ C (K −L)×N r .

B. CHANNEL ESTIMATION
Here channel estimation is exploited for communication and for positioning in the receiver as is visualized in Fig. 1.
We use a semi-blind channel estimation approach using the total signal energy as training. Why we use this approach, we will illustrate in the following paragraph. For data detection, a purely training-based channel estimation approach yields an acceptable bit-error-rate performance. However, it fails to yield the required accuracy for positioning. In a training-based channel estimation approach, the signal model includes a significant part carrying the communication signal data and a minor part carrying the training symbols. Then the chosen signal design determines how to divide the signal resource block into training and signal parts exactly [34], [59]. Training-based channel estimation utilizes only the received values for the complete signal resource block without exploiting the detected data symbols iteratively. Since the training part is only a small percentage of this utilized resource block, the training-based channel estimation mean squared error is known to approach an error-floor for increasing signal-to-noise ratio [4], [59]. As can be seen from Fig 1 channel estimates are fed to the parameter-estimation algorithm that calculates estimates of the physical path parameters like the delayτ . The accuracy of these estimates is proportional to the accuracy of the channel estimates. This dependency becomes clear by assessing the CRLB for the physical path parameters [4]. The parameter estimation accuracy in return impacts the position estimation accuracy. This dependency means that opposed to pure communication systems, joint communication and positioning requires a high channel estimation accuracy exceeding the performance attainable by training-based estimators. Different approaches can improve the channel estimation performance compared to training-based channel estimation. The straightforward approach to optimize the channel estimation performance targets an optimal training design [60], [61]. Recently, some approaches to improve the pilot-design consider massive MIMO scenarios and the possibility to use parameter estimates gained at the receiver as feedback to the transmitter [62], [63].
Let us assume that the pilot design is optimal. Approaches like successive interference-cancellation as assessed in [59] or signal stripping [57], [58] aim at removing the undesired signal parts from the received signal. They improve the channel estimation performance compared to the training-based estimation by removing the systematic error.
Comparing the mean squared error performance and the CRLBs of the channel estimation approaches training-based, training based with interference-cancellation and semi-blind channel estimation, semi-blind channel estimation outperforms the two competitors [59]. Semi-blind channel estimation utilizes the complete signal plus pilot resource-block energy optimally and shows a superior error performance.
Besides the training symbols, additional data symbols iteratively serve as virtual training [4], [59]. In this contribution, we focus on the simultaneous parameter estimation and model selection for JCAP. Consequently, here we assume that we have an optimal virtual training matrix that can be employed iteratively via semi-blind channel estimation. Since we know that semi-blind channel estimation employs nearly the whole matrix V(i) as training and for the sake of simplicity, we assume for now that V(i) is fully constructed of training symbols. Then the channel estimationĤ(i) is usually carried out according tô These channel estimates have twofold purpose: Firstly, we use them for data detection, and secondly, we use them for positioning in the following manner: We interpret the discrete channel estimates in each column ofĤ as snapshot measurements, which we can use to estimate the physical path parameters, by fitting the a priori known model function in (3) to these measurements. Using all columns in H instead of only a single column (SISO and single-snapshot measurement case) increases the number of observations as well as the number of unknowns.

C. PARAMETER ESTIMATION
Given any reasonable hypothetical parametrizationθ having the delay vector τ as a subvector and furthermore, given any cost function, (θ), related to this arbitrary parametrization, the parameter estimationθ in general reads:

1) DETERMINISTIC MAXIMUM LIKELIHOOD ESTIMATION
Let the stacked parameter vector for a deterministic parametrization θ dml be We have various possibilities to formulate the delay estimator.
For the purpose of a convenient notation let us rearrange the values of the received matrix in a long vector instead such that Then common way is to formulate the parameter estimation as a received values dependencyθ dml-y : Assuming that semi-blind channel estimation is inherently performed in a first step, another equivalent and less complex cost function is obtained by formulating the delay estimator θ dml-h as a dependency of these channel estimates: Since the estimation error on the channel estimates and the noise on the received values can be described by a complex Gaussian distribution, this kind of nonlinear least-squares fit in (20) and in (19) is known to be equivalent to the maximumlikelihood (ML) estimator. Note that, if the delays in τ were known, the linear parameters could be estimated in closed form. Hence,˜ in (20) and in (19) can already be substituted beforehand by its estimation result depending on τ : This parameter separability into the nonlinear and linear components and the possibility of resubstituting the closed-form expression applies to both (20) and (19) again and this would leave us with four slightly different estimation options for the delay estimation.
In [48](page 16) we provided proof for the fact that the cost functions (20) and (19) are equivalent and the delay estimates are the sameθ dml-h =θ dml-y =θ dml .
Consequently, we discard the cost function, which is based on the received values in (19), and will carry on with the cost function based on the channel estimates (20) only. Following the principles of separable nonlinear leastsquares estimation [31] (page 255), this substitution yields an elimination of the dependence on the linear parameterŝ with the orthogonal projection matrix Hence, the deterministic ML (DML) estimation task is reduced from a (2N r IC +C)-dimensional problem in (20) to a C-dimensional problem in (23). The estimator (23) is called a deterministic estimator, since the treatment of the parameters is deterministic.

2) STOCHASTIC MAXIMUM LIKELIHOOD ESTIMATION
Exploiting a time series of measurements, that is I 1, allows us to treat the problem as either a deterministic or a stochastic estimation problem. More specifically, we can treat the complex path amplitudes as stochastic variables, if they underly the characteristic of a known distribution like the circular complex normal distribution. Then, we can replace the deterministic parameter vector by a set of parameters containing the mean and covariance matrix elements. For instance, assume that the complex path amplitudes are identically and independently distributed complex normal random variables. Let C γ denote the correlation matrix of the complex path weights. Then, a covariance matrix system representation is with the so-called sample correlation matrix Cĥ and error correlation matrix C w (∼ σ w I). Let η be an auxiliary vector harbouring only the lower diagonal elements of the matrix C γ , taken column wise from the matrix. Now the underlying overall parametrization of (26), including both delays and covariance matrix, are given by Since it is reasonable to assume that 2N r I > C comparing the vector lengths of θ dml and θ sml , it becomes clear that the overall stochastic formulation of the estimation problem yields less overall unknowns than the deterministic problem formulation. Nonetheless, there also are reasons to stick to the more complex deterministic maximum likelihood.
Consider that, if we treat the parameters as deterministic it means that we use less a priori information since we exploit no information on any underlying distribution. This proves to be an advantage in cases, where the underlying distribution varies, cannot be specified, or is too complicated. Acknowledging that the number of unknowns is huge, when choosing a deterministic modelling framework at the receiver side, we are on the safe side, in that respect that we can cover the estimation of a broader model range. For this contribution, we are primarily interested in estimating the multipath delays and since we modelled them as static variables over I consecutive measurements, the number of parameters C for the desired vectorτ is the same for the stochastic as for the deterministic model as can be seen from (16) and (27). For the deterministic model the cost function in (23) is specifically determined by the deterministic maximum likelihood (DML) estimator and for the stochastic model it is determined by the concentrated form [49] of the stochastic maximum likelihood (SML) estimator [50]:

3) DISCUSSION ON A SUBOPTIMAL FREQUENCY-DOMAIN BASED CLOSED-FORM SOLUTION
Other alternative approaches to solve the underlying estimation problem here can be applied here and therefore we shortly want to discuss them here.
The first other approach to parameter estimation bases on examining the channel coefficients in the frequency domain, yielding two advantages. The frequency domain channel coefficients can simply be divided by the DFT of sampled g(τ ), to obtain a deconvolved time-domain signal. Secondly, the frequency-domain deconvolved signal exhibits an underlying rotational invariance among its subspaces. We can exploit this invariance to find a set of linear equations, which, in return, yield a closed-form search-free solution for the delay estimates, called estimation of signal parameters via rotational invariance (ESPRIT) [24], [25]. Another advantage of the algorithm is that no initial guess for optimization is required to solve the problem. Therefore, it is commonly thought of as a practical method in the initial delay acquisition phase before initiate tracking. Moreover, this method is especially practical for signal models employing orthogonal frequency division multiplexing (OFDM) like in [34], since the signal model already is inherently tailored to frequency domain processing. Hence, it is essential to point out that the ESPRIT algorithm is in principle desirable for the delay estimation in a JCAP framework.
Similarly, as explained in detail in [27], if we apply ESPRIT for delay estimation to the underlying signal model, we require oversampling. Oversampling yields a better model approximation and therefore can improve the systematic delay estimation errors occurring due to the underlying model mismatch that is present in case the delays are not integer multiples of 1/J (which they are naturally not) if we say that J is the oversampling factor. Moreover, the Fourier-transformed data model requires a discrete Fourier transform of the sampled version of g(τ ), which introduces aliasing leading to a delay estimation bias. Contrary to the optimal estimator in (23) the ESPRIT based estimator is a sub-optimal approach. Although the ESPRIT delay estimation method is a practical and in many scenarios especially suitable delay estimation tool for joint communication and positioning, it is not as suitable as the Maximum Likelihood estimator in (23) to assess the in the following sections proposed simultaneous model selection and parameter estimation methods. For the Maximum Likelihood Estimator as can be expected we could show for the proposed signal model in previous works that at least in case the model order is always known the estimator approaches the optimal performance given by the Cramer-Rao Lower Bound (CRLB). The ESPRIT method is already known to yield a suboptimal result and hence cannot approach the CRLB tightly. This error will consequently depend on the chosen parameters. It will depend on the chosen oversampling factor for the signal model. In the numerical results, we will see that the combination of ML-based model selection and parameter estimation as well yields sub-optimal results. Knowingly combining different error sources by employing ESPRIT together with model selection would shift our contributions focus from generally investigating how optimal this combination can practically get to the question of how much the overall performance degradation would be by combining it with different optimal and suboptimal estimation techniques. The second reason for not further investigating the ESPRIT method in this framework in this contribution is the ESPRITS's algorithm's requirement of oversampling in the signal model. For the proposed joint communication and positioning system the effect of oversampling was studied and discussed in depth in [51], [52] and the authors concluded that the achievable gain for the positioning side of the system is most pronounced for an increase from no oversampling to J = 1, to J = 2. On the other hand, for the communication side of the system, J = 1 already yields sufficient statistics and therefore oversampling for the communication side entails a huge complexity without improving the bit error rate performance. Consequently, if possible, we prefer to choose methods that do not require oversampling in the signal model.
Another similar frequency domain based option to determine the delays is the estimation via multiple signal classification (MUSIC) originally proposed in [29]. Similarly, as for the delay estimation with ESPRIT, the signal model for the associated delay estimation problem relies on a similar signal model and hence also on oversampling as can be understood from studying the data model provided in [30].

4) DISCUSSION ON COMPRESSIVE SENSING APPROACH
To exploit the benefits of compressive sensing to recover the physical path parameters, the authors of [57] state the problem formulation again in the frequency domain. The authors of [57] assess a joint communication and radar sensing framework based on using compressive sensing to determine the physical path parameters. The authors combine the approach with special signal processing techniques. The authors refer to these techniques as signal-stripping and clutter-reduction. With the signal stripping approach, the authors of [57], and [58] yield a problem formulation that circumvents additional communication data based errors. Such techniques aim at targeting a high physical path parameter estimation accuracy. Targeting high precision estimates is a crucial factor for all joint communication and positioning designs. Mainly, the position estimates degrade when inaccurate physical path parameter estimates feed the positioning algorithms. We explained in section II-B that semi-blind channel estimation techniques potentially perform superior due to the higher exploited virtual training percentage of the overall transmit signal energy. The authors of [47], [57] use an on-grid delay model to exploit a one-dimensional compressive sensing strategy. In this contribution, we target an optimal model to circumvent even slight modeling mismatches to simplify the discussion on the numerical results that we provide for simultaneous parameter estimation and model selection. Simultaneous estimation of the model order and the parameters will inherently entail model mismatches unequal to the correct model order. Additional model mismatches would unnecessarily complicate the discussion and interpretation.

5) OPEN QUESTIONS FOR COMBINING MODEL SELECTION AND PARAMETER ESTIMATION
If the delays have been estimated, in a next step the complex path weights can be determined via (21). More importantly, we actually used C in (3) and hence in (23) as if it would be a known number at the receiver. Unfortunately, this usually is not the case. The model order C has to be estimated.
We further investigate flexible methods to determine the optimal model order based on the instantaneous channel conditions and measurements. Some further questions arise, like: Which model order leads to the best parameter estimation performance? Which model order estimation method is most suited for this particular purpose of joint communication and positioning. More specifically, how should model selection be chosen to yield an optimal estimation performance for the positioning relevant parameter, the line-of-sight delay?
For parameter estimation, the optimality criterion is the well known Cramer-Rao Lower Bound (CRLB). Unfortunately, as we will show, for joint model selection and parameter estimation, this bound fails as a practical optimality criterion. Consequently, it is helpful to formulate an alternative lower bound to see, which is the best achievable performance and whether the proposed methods approach this bound and which joint model selection and estimation method is closest to this bound.
Furthermore, it is interesting to see whether the model order should be estimated separately from or together with the parameter estimates.
There is another reason to have a closer look at model order detection for this framework: The most popular and well-known model order selection strategies based on information-theoretic criteria mandatorily require us to fulfil a few conditions regarding the underlying signal. Hence, their applicability to this framework has to be verified.

III. MODEL SELECTION AND PARAMETER ESTIMATION
Now let us assume the model order C is unknown and hence we wish to estimate it. Consequently, we equip (15) with a hypothetical model orderC such that C is the ML cost function for this hypothetical orderC. Then the delay estimate forC then isτC witĥ whereas C monotonically decreases, the delay estimation error monotonically increases with an increasing model orderC. To find the optimalτĈ , we seek a method that smartly balances between minimizing the lack of fit (29) and minimizing the estimation error.
There are threshold-based model section strategies as well as strategies based on information-theoretic criteria. The main drawback of threshold-based strategies is that a tuning parameter has to be introduced and adjusted. On the other hand, the main drawbacks of methods, based on the principles of information-theory often are twofold: Firstly, the underlying signal measurements and the model have to fulfil special requirements, e.g., the measurements should be identically and independently (iid) distributed. Secondly, their derivation inherently requires a large number of available samples they consequently disqualify in the single snapshot measurement scenario. In the following, we first briefly introduce a suitable threshold-based method applicable to this system framework and afterwards we briefly review and discuss the most popular information-theoretic criteria and their suitability for this system framework. Finally, a method using the Fisher information belonging to the parameter estimates is proposed and assessed for this system.

A. THRESHOLD AND LEAST-SQUARES ERROR DISTRIBUTION BASED METHOD
The method described in the following yields a solution that is applicable if multiple channel estimates are available and it is also applicable if only a single (snapshot) channel estimate is available. Let χ 2 denote the so-called χ 2 -distribution [53]. Since the channel estimation error is Gaussian distributed, the least squares error dml is dml ∼ Further, note that the χ 2 -distribution depends on the number of degrees of freedom µ. This number µ is equal to the number of all (independent and dependent) measurements minus the number of unknowns, i.e. the parameter estimation problem dimension, which is determined by the number of multipath components. At this point we are considering the deterministic maximum-likelihood scenario. The degrees of freedom µ are calculated as By multiplying with the factor two, we have taken into account that the (L+1)N r I measurements as well as the N r IC amplitudes of the unknowns in the parameter vector in (16) are complex valued. Note that in the single measurement case, µ = 2(L + 1) − 3C. From (23) we see that the estimator for a hypothetical model orderC then iŝ Then, the idea is to choose an order-specific threshold C based on a tuning parameter, which for this method is a confidence level α, typically chosen close to 1 and which is chosen such that the probability P LS ≤ C like in [20]. Let the maximum hypothetical model order be denoted by C max . Then these values of C forC ∈ {1, . . . , C max } are found by either Note that the maximum number of theoretically identifiable paths can be determined by reasonably demanding that µ exceeds zero, that is µ > 0. Theñ Considering realistic channel models applicable to this problem, like for instance the WINNER models [54], the actual number of clusters typically ranges from C ∈ {8, . . . , 20}, yielding a total number of 3C ∈ {24, 36 . . . , 80} parameters that have to be estimated given L +1 observations. If for instance L = 9, then we would have 10 observations. According to (36), the maximum number of theoretically identifiable multipath components would be six. Hence, we would have to take into account a receiver-sided modelling error.

B. CLASSICAL INFORMATION-THEORETIC CRITERIA
The following methods require the availability of multiple channel estimates. Generally information-theoretic criteria consist of two additive terms: The negative log-likelihood −ln L(ĥ|θC ) (or a multiple thereof) and a penalty term P(C), which penalizes complexity according to the special information theoretic underlying paradigm. Here, joint parameter estimation and model selection can be performed in the following manner: First, the optimal parameter vectorθC has to be found for everyC and then the optimal model order (depending onC) is chosen from the set of optimal parameter vectors θ 1 , . . . θ C max . The maximum model order is determined via C max . Due to the opposing monotony of negative log-likelihood and penalty, we can compare forC sequentially knowing that if we have found a minimizing model order, we can stop the comparison. The minimum description length (MDL) criterion [10] is known to be asymptotically consistent. Consequently overfitting, like for the less suited Akeike's information criterion (AIC) [9], [55] is no issue. The MDL criterion origins in the area of coding theory and targets the model providing the shortest description for the measurements and it is known to perform as well as the Bayesian information criterion (BIC) [11]. The MDL penalty depends on the model order k (dimensionality ofθC ) and on the number of independent measurements N : Unfortunately, information-theoretic criteria like the MDL require the specific assumption that the number of parameters does not grow linearly with the number of measurements [37]. Consequently, this rules out using a deterministic description for the parameter estimation problem. This restriction, and the fact that we need multiple measurements in the first place is a disadvantage compared to the earlier introduced threshold dependent χ 2 -based method. If we assume that the complex path amplitudes fade over time in such manner that they are independent and identically and complex Gaussian distributed random variables, the stochastic system description in (26) and the parametrization in (27) apply. In [12] the authors propose a model selection solution that circumvents the effort to perform the multi-dimensional global optimization C max -times, by choosing a different but equivalent problem formulation and parametrization based on the underlying related eigen-decomposition. The approach exploits the fact that the maximum-likelihood estimates of this alternative parametrization can be calculated in closed form. Consequently, it allows to first calculate the model order in closed form. Afterwards, the parameter estimation can be performed for the estimated model order. Let l i = λ i , ∀i ∈ {0, . . . , L} denote the eigenvalues of Cĥ, sorted in descending order Following the derivations in [12], the solution for the model order estimation can be formulated for our purpose aŝ If we use this method although the complex amplitudes are not independent and identically distributed complex Gaussian distributed random variables, we condone a model mismatch. Then we cannot really say how large the impact of the modeling error will be on the overall performance. Another approach in [13] is based on a subspace decompostion and was designed to cover coherent and incoherent scenarios.

C. THE INFORMATION COMPLEXITY CRITERION
The information complexity criterion (ICOMP) promises to determine the model order more judiciously [14] than the previously discussed criteria, since it uses the inverse estimated Fisher information matrix. It is designed to choose the model order yielding the lowest estimation error and not mandatorily the true model order. For the purpose of joint communication and positioning this behaviour is desirable, since the goal is a low TOA estimation error. The ICOMP can be constructed via (37) too and consists of three additive terms. The first term involves the likelihood and therefore provides a value measuring the lack of fit, whereas the second and the third additive terms constitute penalty terms employing the inverse Fisher information matrix F −1 θC , that is the estimated optimal parameter covariance matrix approximated by Here, J θC denotes the Jacobian matrix. The information complexity criterion as applied here then is: With k = 2N r IC +C and the eigenvalues λ 1 , . . . , λ k of F −1 θC (43) can be compactly written as In (43), the first of the two penalty terms is interpreted as a lack of parsimony term and the second term is interpreted as a profusion of complexity term, which takes into account the parameter estimates interdependencies.

IV. LOWER PERFORMANCE BOUNDS
Usually, when a maximum-likelihood estimator is involved, the well known Cramer-Rao lower bound (CRLB) can be determined as a lower bound for the MSEs. The CRLB is known to be an asymptotical bound that only is valid if three conditions are fulfilled: Firstly, the underlying model has to be specified correctly. Secondly, the SNR has to be high enough. Thirdly, the number of observations has to be large enough. Note that here we cannot even guarantee to fulfil one of these conditions. These conditions indicate that the CRLB alone will not tell us what we are actually interested in, which is to know which model order dependent time of arrival MSE can optimally be attained when employing trustworthy estimatesτ 1,C for allC ∈ {1, . . . , C}. Before we formulate the optimal attainable model order and the related squared error (SE) of the TOA, we shortly introduce the actual CRLB, since we will additionally use it to explain the numerical results visually. The CRLB matrix belonging to an arbitrary parameter vector θ is known to be the inverse Fisher information matrix. Obviously, the CRLB depends on the underlying parametrization. Note that the channel estimation error σ 2 w can be substituted by either the single entry channel estimation CRLB or by a further noise estimate that extends the parameter vector by another element. For the DML parametrization the CRLB matrix can asymptotically be formulated as a blockmatrix where In the single measurement case, the asymptotical result in (47) has to be modified to Due to space limitations, for the stochastic maximum likelihood CRLB we refer to [56].
From experience we know that if we know the correct model order, at least for low orders and asymptotical conditions (high SNR and many samples), the MSE attains the CRLB. Note that, if we would assume that choosing the correct model order would lead to the best performance in terms of TOA MSE, we would mistakenly determine the CRLB for the correct model order as the best achievable TOA MSE performance, at least if the SNR is high enough. The CRLB calculation requires the correct model and hence model order.
To foresee the best achievable performance, based on the assumption that the estimates inτC are reliable and close to optimal, we formulate a focused order-related lower bound (FORLB) expression, that depends on the order-dependent focus parameter of interest τ 1,C and on the estimateτ 1,C :Ĉ opt = arg miñ In our application estimating a reduced model order occasionally yields improved parameter estimates. In such cases this focused model order-related lower bound indicates the optimal achievable performance. The FORLB requires parameter estimates that we trust to be optimal or at least close to optimal, for the different hypothetical model orders. Therefore, both bounds can be seen as an aid to optimize estimation and detection algorithms based on specific models and available information. We should not see them as the absolute lowest bound that can be achieved, since changing the estimator or the modelling or the a priori knowledge always changes the bounds as well. Any a priori knowledge would result in a beneficial estimation bias tightening the MSE to CRLB. Here we assume that we have no a priori knowledge apart from the assumption that we can define bounds for the hypothetical delays specified by a coarse estimation step that we can carry out beforehand and that yields accuracy within a sampling period T .

V. NUMERICAL RESULTS
The final goal is to assess the performance in realistic multipath scenarios, in the presence of a LOS component. Therefore, in this contribution, we use a clustered delay line model by employing the slightly modified WINNER B1-LOS (outdoor) scenario as provided in [54]. The Rician factor is R = 3.3. The main values for the excess delays and cluster powers are tabulated in Table 1 and are only modified in unifying the three parts of the second cluster within a single delay. We know that by utilizing semi-blind channel estimation techniques in combination with a small percentage of pilot versus a large percentage of data, we can at least asymptotically yield a channel estimation MSE performance that converges to the performance that would be reached by transmitting only pilots and no data. Hence, for this contribution, we ignore the performance degradation due to semi-blind channel estimation, and we assume that we have a virtual training matrix, which consists of training data only. Further, here we assume that a length I = 100 time series of blocks of length K = 1000 BPSK random pilot symbols are transmitted. Each symbol is assigned to a symbol duration of T = 100 ns. To build the EDTCM channel taps, we employed T -spaced samples of a raised-cosine function g(τ ) with roll-off equal to 0.3 for g(τ ). We window this function with a half window of size 3T . We assume, if not said otherwise, N r = 1. Furthermore, we model the channel as a clustered delay line model. We follow [54] to construct the complex path weights. We set the carrier frequency to 2 GHz and we set the receiver velocity to 50 km/h. For the χ 2 -based and the ICOMP method we use the DML parametrization and for the MDL results we used the SML Eigen-parametrization. In order to evaluate the performance of joint model selection and parameter estimation we compare the χ 2 -based method, the MDL-based method and the ICOMP-based FIGURE 2. Counter intuitively, for the optimal condition, which is the FORLB, the probability of estimating the model order incorrectly, P(Ĉ = C ), is not decreasing faster than the estimators for a multipath channel with C = 4 in the high signal-to-noise-ratio region. method. The comparison withĈ = C emphasizes that estimating the model order is beneficial. As a lower performance measure, we employ both the order specific CRLBs and the FORLB. Fig. 2 shows the probability of false detection for a channel with C = 4 and Fig. 3 shows the MSEs for the TOA for the different algorithms, forĈ = C and the FORLB. Note that we calculated all TOA MSEs by using only the 98 % percentile in order to exclude bad outliers. For the probability of correct detection, two different choices of α show that choosing a value close to 1 yields a similar performance. The probability of false detection decreases later, i.e. for higher SNR, for the ICOMP criterion-based method than for the χ 2 -based method. Comparing the MSEs of those algorithms, which show the worst behaviour for correct detection, nevertheless, perform better than those, which lead to estimating the correct model order earlier. This behaviour can be understood by studying Fig. 4, which depicts the probability for estimating a hypothetical model orderC, showing that the methods which lead to a lower TOA MSE shown in Fig. 3, tend to estimate a lower model order. Note that the FORLB indicates that the distribution for the hypothetical model order should generally be wider. Further, we can see that if the probability for estimating a specific hypothetical model order is very high for a specific SNR, the TOA MSE approaches the CRLB τ 1 for that specific model order. Note that the gap between the FORLB τ 1 and the MSEs can be explained by the fact that the FORLB τ 1 is constructed, by laying focus on the TOA, whereas the algorithms are designed to yield a compromise for all components. An optimal solution would target minimizing the squared TOA bias plus the TOA variance. Unfortunately, the TOA bias is unknown, since it depends on the actual τ 1 . VOLUME 9, 2021 FIGURE 5. The probability distributions forĈ for the different model order selection methods and the FORLB show that optimal distribution for each model order is wider than the distributions obtained via algorithms.
Principally, the same behaviour as in Fig. 4 can also be seen for the realistic scenario C = 8 in Fig. 5. We depicted this realistic scenario and the scenario C = 4 via squared error distributions instead of MSEs to provide more detailed information. For this realistic scenario, we further see that the TOA SE performance degrades for the very high SNR region Fig. 6 to 11. In the high SNR region, choosing high model orders close to C max becomes more probable and hence the result more often becomes unreliable. Fig. 3 shows MSEs versus SNRs: The MSEs approach different CRLBS for different SNR regions. Approaching different CRLBs is an atypical trend. The Fig. 6-8 and Fig. 9-11 provide deeper insight into the squared error distribution for the selection and estimation strategies compared to the squared error distribution for the best solution we can choose, the FORLB. The figures show split violin plots. 1 A violin plot shows the actual distribution for each point, which is 1 A violin plot is similar to the better-known boxplot. The violin plot is more intuitive and informative since it depicts the data distribution. A boxplot, on the other hand, provides insight about only five values that are known to be significant for normally distributed data. These significant values are the median, upper and lower quartile and the user-specified whiskers and sometimes additionally outliers.   reasonable if the data is not normally distributed, as in this case. Hence we can see that all three compared algorithms for C = 4 and C = 8 provide close to optimal results for a major   part of the estimation approaches. The MSE degradation can be explained by a few non-trustworthy estimates that belong to the minority of the overall data, and it is lowest for the Fisher information based ICOMP detection method.
Especially in the very high SNR region, the probability to optimally select a higher model order is higher. We pay this possibility with lower estimation reliability. We can use different mechanisms to either detect an unreliable behaviour or to enforce a lower MSE by defining different additional constraints, like for instance constrainingσ 2 τ 1 to be smaller than an upper, user-defined SE limit that we should not exceed.

VI. CONCLUSION
Combining model selection and parameter estimation in a joint communication and positioning system outperforms parameter estimation approaches based on assuming either the correct or a very low or very high model order. Firstly, the calculation and comparison with an assisting focused order-related lower MSE bound, that depends on the correct and the hypothetical model order, can be used to see, which model order we should optimally choose for a particular estimator and SNR. Secondly, the comparison shows that employing the parameter and model order-dependent inverse Fisher information matrix yields a close-to-optimal approach. The approach even works in inherently problematic and ill-conditioned estimation problems, by balancing lack of fit and parsimony in favour of achieving optimal estimation accuracy. Further investigations should find solutions that are even closer to the optimal solution and optimally do not require a successive evaluation of all hypothetical solutions.