Localizing Basestations From End-User Timing Advance Measurements

Although mobile communication has become ubiquitous in our modern society, operators typically treat the underlying networking infrastructure in a secretive manner. However, detailed topology information is a key enabler for operator benchmarking and can serve as ground-truth data for system-level simulations or user equipment localization schemes. Still, existing approaches for base station localization that are either based on received signal strength or on the spatial distribution of measurements are not accurate enough for such use cases. Accordingly, we propose a localization scheme that operates on end-user measurements of the 4G & 5G timing advance parameter, which acts as a quantized distance measure between the user equipment and the base station. By directly incorporating GPS noise, multipath propagation, and quantization into our stochastic system model, we obtain an estimator that offers a reliable measure of confidence and requires only the configuration of two hyperparameters with a dedicated physical interpretation. We evaluate our approach using a set of drive-test measurements consisting of 190 LTE eNodeBs with ground-truth locations confirmed by an Austrian mobile network operator. Our selective estimator can either operate without prior knowledge, resulting in mean distance errors of below 100 m, or in a classification setup, where it correctly identifies up to 95% of eNodeBs from a set of candidate cell tower locations. To allow for reproducibility, we make our dataset and a reference implementation publicly available.

anticipatory networking schemes [15]. Moreover, we see network layout inference as a crucial stepping stone to fully exploit the potential of digital twins for 5G & 6G networks [16], [17]. With detailed network topology data, researchers can assess potential challenges and risks for novel applications before deployment. Clearly, also UE positioning itself benefits from improved BS localization: In a bootstrap manner, we can use noisy UE position estimates to determine BS locations, which can in turn act as a basis for the positioning of UEs whenever Global Navigation Satellite systems (GNSSs) are unavailable [18]. Overall, we identify two recent developments that render such detailed inference of the network topology feasible. First is the widespread availability of the 4G & 5G timing advance (TA) parameter on crowdsourcing platforms, which can be interpreted as a quantized distance measure between the UE and the BS [19]. Second is the public accessibility of cell tower candidate locations in many countries [20], [21]. This effectively renders the problem of identifying the correct BS as a classification task. In this work, we introduce a novel scheme for base station localization from end-user TA measurements, which takes both of these developments into account.
We summarize our contribution as follows: (i) Our estimation scheme relies on a stochastic system model that is grounded in wireless propagation research and requires only the specification of two hyperparameters with a straightforward physical interpretation. Also, it is flexible enough to identify the most likely location from a set of cell tower candidates or to evaluate the area of interest without any prior knowledge.
(ii) We design our estimator in such a way that allows us to make predictions with a reject-option. Hence, we abstain from prediction whenever the expected confidence is insufficient to ensure reliable inference.
(iii) Further, we evaluate our approach in an LTE network through an extensive drive-test campaign, consisting of a rural and an urban subsets to account for varying propagation environments. An Austrian mobile network operator (MNO) validated the ground-truth locations of the 190 considered eNodeBs.
Due to the more widespread adoption of LTE systems at the time of writing, we focus on the localization of eNodeBs in particular. However, the proposed scheme is directly applicable to 5G. In fact, we can expect a higher precision due to the increased spatial resolution of TA in 5G -see Appendix VIII for details. To allow for reproducibility, we also make a reference implementation and an anonymized version of our dataset publicly available -the link is provided in the footnote. 1

II. RELATED WORK
The idea of using time delay measurements for positioning is not particularly new. In fact, LTE provides a dedicated downlink (DL) signal for UE localization through the 1 https://squid.nt.tuwien.ac.at/gitlab/leller/ieee_access_enodeb_ localization positioning reference signal (PRS) [22]. Correspondingly, the UE position can be derived through the time difference of arrival of PRS signals from multiple BSs via multilateration [23] -given that the exact eNodeB locations are known. For BS localization, we do not require signals from different eNodeBs. Instead, we consider a set of measurements-possibly from multiple UEs obtained via crowdsourcing -connected to the eNodeB of interest.
The most commonly found schemes in literature can be split into two categories: (i) approaches based on received signal strength (RSS) and (ii) geometric ones operating on the spatial distribution of measurements [24]. While RSS-based approaches have been successfully applied for related problems such as fingerprinting based small cell discovery [25], full localization schemes require accurate modelling of the expected path-loss over distance [26], [27]. This is especially tedious for urban environments with significant deviations due to shadow fading or blockages. While fusing measurements from multiple sectors can improve the performance, the obtained mean positioning error is still in the range of 900 m [26].
Geometric methods, on the other hand, infer the eNodeB location indirectly from the cell assignment by operating on the spatial distribution of the measurements. While the centroid method used in OpenCellID is a very basic variant of such an approach [2], there exist several extensions, most notably the weighted centroid method, also accounting for the reported reference signal received power (RSRP) at the UE location [28]. Nevertheless, these approaches have been found to be highly unreliable when comparing the results to ground-truth eNodeB locations [28]- [30].
To resolve these limitations, recent work in the field has used supervised machine learning techniques to automatically select the most promising algorithm for a given set of input data [29]. Similarly, the authors in [30] utilize unsupervised methods to identify predictive data instances, providing them with a measure of confidence. Another promising approach is to natively account for the sectorization of eNodeBs, as the asymmetry of measurements for a sector causes deviations between the centroid and the BS location. In [31], the authors address this and proposes a purely geometric approach that infers BS positions by minimizing a cost matrix implied by a sectorization model, thus reducing the mean positioning error to around 500 m.
Although these results are promising, we argue that the use of the now-widely available TA parameter can significantly improve the positioning accuracy as it is in a sense the direct equivalent of dedicated PRS-based localization. In fact, using TA for BS localization has already been proposed in GSM networks, where the low granularity of TA was identified as the main limitation [32]. With increased resolution, TA has gained renewed interest in LTE networks, especially for UE localization [33]- [35]. Most inference schemes do, however, use a deterministic system model or treat the overall error term as a single Gaussian [32]. We find this to be a drastic simplification, considering that the statistics of individual error sources, such as multipath propagation, are well understood in mobile communications [36,. Meanwhile, the machine learning approach in [37] treats BS localization as a classification task, with candidates obtained from publicly available cell tower databases. While the image classification reformulation allows for effective fusion of RSS-, geometric-, and TA-based approaches, it requires a vast amount of training data, in contrast to a carefully designed model based approach accounting for propagation conditions. The influence of the wireless channel on location accuracy has also been studied in [22], where the authors state that actively handling multipath effects is expected to increase the performance of UE localization based on the PRS. Likewise, a stochastic scheme that can identify and discard non-lineof sight (NLOS) measurements is proposed in [38]. For their TA based UE localization scheme, the authors in [18] extend this notion by explicitly accounting for three different error sources that distort the reported travelled distance between UE and BS. As such, they introduce generic Gaussian measurement noise and a dedicated quantization term modeled as a Gaussian or uniform distribution. They further suggest to model the multipath component via a positively biased random variable.
For our BS localization scheme we further strengthen these assumptions through the following: (i) We discard the quantization noise, which we treat as a deterministic procedure, by incorporating it directly into our system model. (ii) We refrain from including a generic measurement error, as this holds the possibility of obtaining a negative value for the expected distance. Instead, we model the distance as a Rician, which is equivalent to assuming Gaussian noise on top of the reported UE position estimates. (iii) We utilize an exponential distribution for the additional multipath distance, which is in accordance with state-of-the-art propagation models.

III. DISTANCE FROM TIMING ADVANCE
TA is an integer k ∈ K, K = {0, . . . , K − 1} that a BS communicates to each served UE in order to compensate for different travel-times of signals in the uplink (UL). In LTE, the maximum TA value is given by K − 1 = 1282 [19,Section 4.2.3]. In a nutshell, TA tells the UE how much earlier it has to transmit its UL frames relative to the time of DL frames' arrival, such that all UL frames from all UEs arrive at the BS approximately at the same time. Hence, TA characterizes a quantized round trip time between the BS and the UE. For more details, see [39,Section 8.1].
In our use case, we are interested in a measure of distance between the BS and the UE. According to [34], we can recalculate TA into a quantized one-way delay and-assuming the speed of light in vacuum-into the quantized distance s k ∈ S, S = {s 0 , . . . , s K −1 } : where d TA can be interpreted as the spatial resolution of TA for localization purposes. We further denote the non-quantized travelled distance between the BS and the UE as s. Note that s can take any nonnegative value s ∈ R + 0 . Accordingly, the BS obtains k from s via quantization. In our system model, we assume that the BS quantizes the TA by selecting the k that minimizes the difference between the quantized and non-quantized round trip time (i.e., nearest neighbor). In accordance with (1), we define the quantization q : R + 0 → S in terms of the travelled distance s as with bin edges e 0 , . . . , e K : We stress that this one particular implementation of q(·) is not the only possible choice. Another alternative could be, for instance, truncation, i.e., the floor operation from [32]. Again, we refer the interested reader to Appendix VIII, for a discussion of the TA granularity in 5G systems -where we have d TA values of as low as 4.883 m.

Fig. 2 illustrates our system model assumptions.
Our goal is to estimate the exact BS location x BS = x BS,0 x BS,1 ∈ R 2 via multilateration from UE positions x UE and their relative distances d to the BS. 2 In practice, we receive only noisy estimatesx UE of the user position via dedicated APIs [40], which requires us to incorporate the expected noise variance into the system model. Further, the relative distance d is not directly available but we only have access to quantized TA measurements k. We also need to consider that the direct line of sight between the BS and UE may be blocked (Fig. 2). The non-quantized distance s that the electromagnetic signal has to travel can thus be larger than the Euclidean distance d. Accordingly, the reported TA value is only a distorted measure of the (unknown) true distance d required for correct multilateration.
For a set of M measurements, with TA measurements stacked into the vector k = k (1) , . . . , k (M ) and noisy UE locations into the matrixX UE UE , we can formulate the above problem in the Bayesian framework. The estimated base station locationx BS shall then maximize the posterior distribution given the measurements: Following Bayes's rule this requires us to specify the system model via the likelihood. Assuming independent measurements, we obtain In the succeeding sections, we will derive the samplewise likelihood function p k|x BS ;x UE step-by-step -to incorporate the assumptions presented in Figure 2. A block diagram highlighting all the components of the estimator is provided in Fig. 3. Further, we refer the reader to Appendix VIII for a summary of the used notation.

A. INCORPORATING GPS NOISE
We first consider the likelihood f d|x BS ;x UE of the true distance d between the BS and the (unknown) UE location x UE . Here, we use the fact thatx UE and x BS enter only viad, which is given by the Euclidean distance according tô and further model the GPS error as normally distributed with variance σ 2 : Note, that the above formulation requires us to first convert the geographic coordinates to Cartesian coordinates via an appropriate transformation. 3 It then follows from (6) and (7) that we obtain the Rice distribution for the unknown distance d conditioned ond [41]. As such, we have with the modified Bessel function of the first kind and order zero. The noise-variance σ 2 acts as a hyperparameter and controls the confidence in the reported position estimateŝ x UE . For most measurement platforms, the expected accuracy of the location is provided together with the estimate itself. For instance, measurements that are based on the Android location API [40] report the horizontal accuracy via ρ -the radius of 68% confidence. We refer the interested reader to Appendix VIII, for the derivation of the noise standard deviation σ corresponding to a particular value of ρ.

B. MULTIPATH PROPAGATION
Due to multipath propagation, the travelled distance s is typically larger than the length of the direct path d between x UE and x BS . Hence, we model the distribution of s as an exponential shifted by d. Accordingly, we have where u(·) denotes the unit step function Again, λ has a straightforward interpretation: its inverse, 1 λ , specifies the mean of the expected distance increase caused by the multipath propagation in meters.
The choice of modelling s with (9) is motivated as follows: (i) The exponential distribution maximizes entropy [42], given the first moment and support [0, ∞). We possess no additional information that would justify entropy reduction, i.e., more restrictive distribution, or specifying further moments. (ii) Also, the choice is in accordance with the 3GPP 3D channel model [36,, which employs the exponential distribution for the time-of-arrival delay of multipath components.
To eliminate d from (9), we first note that s | d | d. I.e., s andd are conditionally independent given d. If d is known, thend and σ convey no additional information about s. Accordingly, we have and we can express the joint conditional distribution as a product of (8) and (9): By marginalizing over d, we obtain the distribution of s conditioned ond: where we use that u(s − d) = 0 except for the case of s ≥ d.
Equation (13) can now be directly evaluated for fixed values ofx UE and x BS via (6).

C. TIMING ADVANCE AND QUANTIZATION
From s, we obtain the one-way delay distance s k through the quantization step defined in (3). Note that the reported TA values can then be derived from (1). Respectively, we obtain the conditional probability mass function (PMF) for s k given d by integrating (13)  By inserting (9), (8), and (13) into (14) we get, In practise, we rely on numerical integration methods to evaluate (15). Ford/σ 1, which is often the case, we can further approximate the Rice distribution via a Gaussian which in turn allows us to express the integral in p s k |d; σ, λ via the complementary error function We refer the interested reader to Appendix VIII for a detailed derivation of the approximation. In our implementation, we select the approximation ford σ ≥ 10. For ρ = 20 m, this is equivalent to the distance thresholdd > 133.33 m.
With the expression for the PMF of s k , we can now evaluate the likelihood of the TA measurements given the noisy distanced betweenx UE and x BS .

V. EVALUATION OF THE POSTERIOR
With the likelihood of the TA measurements as a function of UE and BS positions derived as (15), we can obtain the posterior following Bayes's rule: The evaluation of the evidence p (k) requires us to compute the integral Hence, we limit ourselves to the finite set X c containing C candidate locations. I.e., we evaluate (19) for a categorical prior p x (i) BS , which turns the integral (20) into the following summation: For the case of a uniform categorical prior the maximum a posteriori (MAP) estimate then collapses into a maximum likelihood (ML) scheme. Alternatively, one could bypass evaluating the evidence in (20) by employing Markov Chain Monte Carlo (MCMC) methods, which would also allow us to select continuous priors for x BS . In fact, we can adapt our system model in a straightforward manner to obtain an MCMC variant in the form of a PyMC3 implementation [43]. One way to achieve this would be to model the quantization step via a uniform distribution spanned by the bin edges e k , e k+1 . For our categorical approach, we select the prior candidates X c from the set of feasibly nearby locations X f , which we define through a distance threshold such that: lim for any i ∈ {1, . . . , N }}, (22) with the distance threshold d lim given by In practise, we approximate X f via the smallest enclosing rectangle. We then either (i) sample X f uniformly in a gridsearch manner to obtain X g , with the granularity specified by the grid distance D s , or (ii) choose the candidates from publicly available cell tower locations X t such that VOLUME 10, 2022 X c = X t ∩ X f . In the remainder, we denote the first approach as the grid-search prior and the latter as the cell-identification use case.

VI. DRIVE-TEST DATASET AND RESULTS
We evaluate our approach through a set of measurements from a live LTE network collected in a drive-test campaign in the surroundings of Vienna, Austria, during one day in August 2021. We used a Keysight Nemo measurement phone, which allows us to obtain TA measurements and their corresponding UE locations with a sufficiently short temporal interval [44]. Besides this, the used measurement phones are typical consumer grade Samsung Galaxy Note 4 devices. During the measurement we fixed the phones in the car trunk with a running download script and an active band lock at 800 Mhz -the equivalent measurement setup is described in more detail in [45,Sec. 3B]. Further, the ground-truth eNodeB locations for all of these measurements have been confirmed by an Austrian MNO. We want to stress, that the drive-test was in no way optimized for eNodeB localization, i.e., the route was not adapted to increase the number or the spatial coverage of measurements per eNodeB. Instead we follow the direct route to the selected destination ∼ 70 km east of Vienna's city center, as we intend to evaluate our approach under a realistic end-user data collection scheme. Fig. 4 highlights the characteristics of our dataset, 4 with the measurements split into Rural and Urban subsets to asses the influence of the environment on the performance of the proposed estimator. It is apparent from Fig. 4 that the cell deployment is significantly denser in the Urban subset. The Rural subset has a larger cell radius; thus, we have a median number of measurements of 35 compared to 6 for the Urban counterpart. At the same time, the area of feasible candidates X f and the median number of cell tower candidates from 4 Bars correspond to quantiles according to [46]. X t ∩ X f within it are substantially higher. Accordingly, the spatial range of the measurements per cell is also lower in the Urban subset due to frequent handovers.
For illustration, Fig. 5 highlights selected measurements from the drive-test campaign. The values of s k , which are derived from the reported TA integers via (1), are shown over the covered drive-test distance-we further mark the cell identifiers in color to highlight handovers. Interestingly, we observe a triangular pattern for s k as we enter and leave the coverage area of individual eNodeBs. In fact, we can even derive the cell radius, which we again identify as substantially smaller in the urban area than in the rural surroundings of Vienna.

A. VISUAL ANALYSIS OF POSTERIOR
Before evaluating the performance on the complete drive-test dataset, we first conduct a detailed visual analysis of a single prediction, in order to obtain an intuitive understanding of the inner workings of our estimator. Fig. 6 showcases the prediction result for an exemplary eNodeB from the Rural dataset, generated with λ = 0.01 and ρ = 20 m. Here, the uniform prior was evaluated in a gridsearch manner such that X c = X g . For reference, we provide the spatial distribution of the measurements together with the radius of the quantized distance measure s k . We also plot the cell tower candidates described in Sec. V and show a satellite image of the surroundings to highlight the propagation conditions. 5 In the single-measurement case in Figs. 6a and 6b, we obtain the circular symmetric posterior shown in Fig. 6c, where the locations inside the circle of Fig. 6c are more likely due to multipath propagation. Accordingly, the estimator correctly captures the uncertainty in the dataset and communicates it via the posterior distribution. When we increase the number of measurements to a total of M = 14, we obtain the posterior shown in Fig. 6f, with the measurement distribution indicated in Figs. 6d and 6e. Clearly, the uncertainty decreases significantly, with the posterior now concentrated near the ground-truth eNodeB position. By directly reporting the MAP estimate, we obtain an error of 47 m, for a grid sampling distance of D s = 10 m. Alternatively, we can treat the estimation problem in the cellidentification framework. When we consider the posterior in Fig. 6c, we note that even a single measurement allows us to effectively rule out a significant portion of the possible cell tower candidates. In the case of M = 14 we can already identify the correct eNodeB in Fig. 6f with high confidence. Clearly, the performance of such a scheme depends highly on the density of the cell tower deployment in the area of interest. Hence, we can expect the classification approach to be significantly more challenging in urban areas.

B. CELL TOWER IDENTIFICATION WITH REJECT OPTION
We now evaluate the cell-identification approach over the complete drive-test dataset. As such, we want to identify the serving eNodeB from the set of candidates X c = X t ∩ X f . Motivated by the results from the visual analysis, where our estimator reliably communicated the uncertainty in the prediction, we target a selective classification approach.
As a measure of confidence, we directly utilize the maximum posterior probability among all candidate locations: We further define the selection function g(·) [47], which rejects the prediction whenever the confidence threshold t p is not fulfilled: Hence, the estimator abstains from uncertain predictions. Accordingly, we also make use of the notion of coverage φ, which denotes the ratio of predictions accepted by g(·): The performance of our selective classifier can then be visualized by computing the misclassification rate (1−Accuracy) over the coverage for all feasible values of t p . In both cases, coverage can be traded off for accuracy. Fig. 7 shows such a risk-coverage curve for all eNodeBs in the drive-test dataset. Clearly, our estimator allows us to trade-off accuracy for coverage. For the Rural dataset with λ = 0.01, we achieve a high baseline accuracy of over 90%. We can further increase this to 95% by abstaining from 10% of the predictions. Although the baseline accuracy of the Urban dataset is significantly lower, this is again reliably communicated through the confidence measure. At λ = 0.001, we observe a consistent increase in accuracy for lower coverage rates. We thus conclude that the estimator works reliably on the given dataset.
For reference, Fig 7 also includes the risk-coverage curves of two baseline estimators. The Baseline No-Multipath scheme denotes a variant of our estimator, which does not incorporate multipath propagation into the system model. Hence, it only accounts for GPS noise and operates directly on the Rician likelihood defined in (8), which considerably reduces performance. To further evaluate the effect of multipath propagation we also include the Multipath Oracle estimator. In a sense, this scheme acts as the upper bound for the performance of our estimator; it operates on the TA values derived from our system model that we compensated for multipath effects. As such, we use ground-truth eNodeB locations to compute discrete TA values k for the direct path between the UE and the BS in accordance with the quantization step defined in (15). The performance of the No-Multipath estimator on this new dataset can then be interpreted as the remaining error induced by the quantization and the GPS noise under the absence of multipath. In summary, we can conclude that our estimator can compensate for a significant part of the deviations caused by the propagation environment.
When comparing the different hyperparameter configurations in Fig. 7 we note, that a higher value of λ = 0.01 is beneficial when operating on the Rural subset, compared to λ = 0.001 for the Urban dataset. This is as expected, considering that multipath effects due to NLOS scenarios are typically more common in densely populated areas. We see this as a major strength of our approach -it requires only two dedicated hyperparameters with a straightforward physical interpretation. Accordingly, we can select suitable values based on a priori knowledge of the propagation conditions.
Tab. 1 summarizes the results on both datasets, where we added the mean distance errors and the coverage values corresponding to the selected threshold t p . Note that the error is calculated as the distance between the ground-truth location and the selected candidate. With t p = 0.8, both datasets achieve a comparable accuracy, albeit we have to discard a significantly higher number of predictions for Urban. Correspondingly, the mean distance error is 5 m for Urban and 15 m for Rural. In summary, we find these results promising, considering that even for the lower threshold of t p = 0.5, the distance error amounts to 66 and 14 m for Urban and Rural, respectively.

C. RESULTS FOR GRID-SEARCH PRIOR
Analogous to the cell tower identification use case, we can also conduct selective prediction for the uniform prior evaluated in a grid-search manner. Here, we directly utilize the spatial layout of the prior candidates to obtain a measure of uncertainty via the posterior standard deviation.
We thereby treat the standard deviation component-wise in each direction, such that σ post = (σ post,0 , σ post,1 ) . The first component σ post,0 can then be computed by evaluating the expression with x (i) BS,0 taken from the ith BS candidate location on the grid. Hereby, we obtain the component-wise mean µ BS,0 via In both cases, we use p i as a shorthand for the posterior, i.e., Clearly, the procedure is equivalent for the second component σ post,1 . We further adapt the selection function, such that it rejects predictions based on the standard deviation threshold t sigma : Hence, the measure of uncertainty is given by the maximum over the posterior standard deviation vector σ post . Fig. 8 showcases the performance of the selective regressor, with grid granularity D s = 10 m in the form of a riskcoverage curve. Interestingly, the performance for both Rural and Urban is similar over a wide range of the coverage even though the results of the cell-identification use case suggest differently. Hence, we can conclude that the significant difference between the performances of Rural and Urban in Fig. 7 is mainly due to the denser deployment in urban areas, where even a small distance error can result in incorrect identification of eNodeBs. For larger values of the coverage, we do note a drastic increase in the mean distance error for Rural. This is expected considering the result shown in Fig. 6c. In a rural environment, we obtain significantly larger offsets for estimates with high uncertainty. We can, however, identify such cases reliably and subsequently abstain from prediction.
Again, the results show that our estimator works well on the real-world dataset. Clearly, we can trade off prediction error for coverage; for low thresholds of t sigma , we reach mean distance errors close to the TA granularity d TA . This allows us to conduct reliable eNodeB location inference, even without access to prior cell tower candidates. Meanwhile, the performance of the Multipath Oracle again showcases that multipath propagation is the most significant error source for TA-based localization. Our estimator, which directly incorporates the multipath statistics, again performs better than the No-Multipath baseline. The improvement is especially large for low values of the coverage, where also the performance gap between our estimator and the Multipath Oracle gets smaller.
For comparison, the mean distance error for commonly used baseline approaches is shown in Tab. 2. In particular, for the centroid, weighted centroid and max RSRP estimators discussed in [28]. The comparison with Fig. 8 shows, that the proposed estimator achieves a notably smaller positioning error than these geometric approaches, especially on the Rural subset with significantly larger cell radiuses. Unsurprisingly, these geometric approaches can only operate on crowdsourced data with numerous measurements and fail when applied to sparse drive-test data such as ours. Interestingly, our TA based approach also outperforms estimates from major localization platforms, even though these utilize a large set of crowdsourced measurements. Tab. 2 provides the positioning errors obtained by querying Open-CellID and Google Geolocation for the eNodeBs in our drivetest dataset [2], [48]. In comparison, our estimator seems to be well suited even for a small number of measurements, considering that our estimates are only based on a single non optimized drive-test campaign. In such cases, the uncertainty is reliably captured via the measure of confidence, while providing additional measurements further increases the achieved coverage. Correspondingly, we see the proposed scheme as an appropriate choice for both drive-test and crowdsourced data.

VII. DISCUSSION
When applying our approach to a set of LTE drive-test measurements in the surroundings of Vienna, Austria, we identify significant differences between the Rural and Urban subsets of our drive-test campaign. Due to the denser deployment in urban areas, identifying the cell towers from candidate locations is significantly more challenging in Urban than in Rural. On the other hand, the close proximity of neighboring eNodeBs in Urban results in a lower mean distance error as compared to Rural. In both cases, the notion of uncertainty provided by our estimator proved to be a critical feature. It allows us to abstain from prediction whenever the confidence of the given estimate is not sufficient. Accordingly, we can achieve an accuracy of up to 95% for eNodeB identification and a mean distance error well below 100 m for position inference in the grid-search prior framework. Thus, our proposed estimator outperforms all considered baseline approaches as well as positioning estimates from major crowdsourcing plattforms. Notably, this is also a considerable improvement over the positioning errors reported for comparable datasets in the literature, with around 500 m for the best performing approach [31]. We also identify the correct modeling of multipath propagation as a crucial requirement for TA based localization, considering that the evaluated Baseline No-Multipath estimator suffers from a severely higher error. Similarly, the reference Multipath Oracle estimator, operating on a dataset compensated for multipath effects, reveals that the additional travelled distance is the dominating error component. In contrast, the errors due to quantization and GPS noise are significantly lower and thus not a limiting factor for such TA-based approaches for inferring the network topology -especially when considering the increased TA resolution for 5G networks, which will provide a spatial granularity of as low as 4.883 m.

VIII. FINAL CONCLUSION
In this work, we introduced a stochastic scheme for inference of BSs from end-user TA measurements. The proposed approach directly incorporates GPS noise, multipath propagation, and quantization into the system model, resulting in an estimator that requires only the configuration of two hyperparameters with a dedicated physical interpretation. Hence, we purposely avoid introducing generic error terms, but explicitly account for the individual components of our system model. Our estimator can then be deployed in two variants: (i) either in a cell-identification setup to identify the serving BS from a set of cell tower candidates or (ii) without any prior knowledge, where we sample the area of interest in a grid-search manner. In both cases, the proposed estimator provides a reliable measure of confidence and significantly outperforms the baseline approaches as well as existing crowdsourced localization platforms, achieving an identification accuracy of up to 95% and a mean positioning error well below 100 m. Overall, these results are promising, considering that our drive-test campaign was purposefully not optimized for the particular use case of BS localization to account for realistic end-user behaviour. Correspondingly, one can apply the proposed algorithm to existing drive-test or crowdsourcing data and evaluate the need for additional dedicated measurements through the provided measure of uncertainty. We see this as a key enabler for deploying the proposed BS localization scheme for large scale, possibly country-wide inference of the network topology.

APPENDIX A TIMING ADVANCE RESOLUTION IN 5G
Analogous to LTE, TA for 5G is reported as the integer k ∈ K, with K = {0, . . . , K − 1} -denoted as T A in the 3GPP standard with a maximum value of K − 1 = 3846 [49,Section 4.2].
A particular value of k corresponds to the time duration given in the 5G-NR basic time-unit of T c [49,Section 4.2], which can be computed as with f max = 480 · 10 3 Hz and N f = 4096 [50, Section 4.1].
In (31), µ denotes the subcarrier spacing configuration, which is described in detail in [50,Section 4.3.2]. With the speed of light in vacuum (∼ 3 · 10 8 [ m s ]) and k = 1, the TA granularity for localization purposes d TA can then be computed as In Tab. 3, the granularity d TA for the possible configurations of µ is presented in meters. Note, that the LTE granularity is obtained as the baseline case with µ = 0 -see [34] for comparison.

APPENDIX B ANDROID LOCATION ACCURACY
For the case of Android, we can derive a suitable value for σ from the reported accuracy measure. The API [40] defines the horizontal accuracy ρ as a radius of 68% confidence. That is, with probability P = 0.68, the true location x UE lies inside the circle with centerx UE and radius ρ. Integrating a twodimensional Gaussian in polar coordinates yields P = Hence, the standard deviation σ can be calculated from the accuracy ρ as follows: With P = 0.68, we obtain σ ≈ ρ/1.5.

APPENDIX C GAUSSIAN APPROXIMATION
Thus, avoiding the numerical computation of the integral.