Efficient Mobile Location Tracking and Data Reduction for Proximity Detection Applications

The paper considers mobile location tracking and trajectory data reduction techniques for applications pertaining to mobile contact tracing and proximity detection in wireless cellular networks. Unscented Kalman filtering with non-line-of-sight bias mitigation is first applied for robust mobile trajectory estimation. An approach for modeling and analysis of pair-wise proximity and multi-mobile clustering scenarios is then introduced within a hypothesis testing framework, and a thorough performance evaluation is presented to assess the achievable detection and false alarm probabilities based on factors pertaining to proximity distance and timespan, ranging accuracy and bias statistics. For scenarios of practical interest, results show that correct proximity detection rates in excess of 70-to-80% range can be achieved while maintaining very low false alarm rates. Data reduction using the discrete Haar transform is subsequently applied for efficient storage of trajectory data. An analysis of the tradeoffs between reduction level and proximity detection reliability is presented to demonstrate the viability of the proposed approach with its low complexity and good performance at moderate reduction levels. Additional comparative analysis is presented to assess the impact of specific distance measures and wavelet types, and it is found that the Chebyshev distance offers improvements in detection accuracy compared to Euclidean and Manhattan measures, while wavelet change, when retaining short support, didn’t have a significant impact.


I. INTRODUCTION
Mobile user localization and tracking techniques are finding increased applications in a wide array of location-based services (LBSs). Examples include user tracking for security, healthcare or marketing purposes, vehicular traffic and sensor monitoring in transportation, logistics, Internet of Things (IoT), smart cities, and various other fields [1], [2], [3], [40]. One particular application of interest pertains to mobile users' localization for contact tracing in the context of infectious diseases monitoring and control, as with the corona virus COVID19 pandemic [4], [5], [6], [8], [41]. For such conditions, the use of accurate location tracking algorithms is important for achieving reliable detection of mobiles' close The associate editor coordinating the review of this manuscript and approving it for publication was Stefano Scanzio . physical proximity (within a few meters). Recent works have presented results with different decentralized short-range protocols such as Bluetooth, Radio Frequency Identification (RFID) and ZigBee [7], [9], [42], [43]. In this paper, emphasis is placed on mobile trajectory tracking in challenging radio propagation environments, and our approach is based on centralized, network-based robust localization combined with data reduction for efficient processing and reliable proximity detection in scenarios applicable to cellular networking environments covering urban areas with dense pedestrian mobile users. In general, localization methods may use mobile-based connectivity with satellite navigation systems, as well as network-based techniques exploiting wireless cellular networks or other short range local-area Wi-Fi, wireless sensor networks and ultra-wideband systems [10], [11]. With the wide prevalence of mobile cellular communication infrastructure, network-based localization solutions offer many advantages and are finding increased adoption and active research [12], [13]. Different algorithms, e.g. based on maximum likelihood estimation, weighted least squares (WLS) minimization, Kalman filtering or particle filters, have been developed for the statistical processing of the mobile station (MS) noisy signals received at a number of networked base stations (BS) using time-of-arrival (TOA), time-difference-of-arrival (TDOA), angle-of-arrival (AOA) or received signal strength (RSS) measurements (see, e.g. [14], [15], [16], and references therein). In these studies, the focus is on single mobile positioning, while in this work emphasis is placed on pair-wise localization and detection of multiple mobiles coming within close proximity of each other. Also, in applications related to contact tracing, the achievable detection accuracy was not previously quantified vis-à-vis different scenarios of interest that depend on factors such as inter-mobile distance separation and exposure time extent, ranging accuracy and biased measurements. In particular, the use of robust mobile trajectory tracking in the presence of various impairments is an important prerequisite to achieve reliable pairwise proximity and group clustering detection capability. The selected localization approach in this work uses time-based TOA measurements that can be readily accessible in cellular mobile networks, with less complexity requirements compared to AOA-based or RSS fingerprinting methods. Within the given dynamic user mobility context, an algorithm will be presented herein to achieve optimum recursive minimum mean squared motion-state estimation and robust trajectory tracking by means of the unscented Kalman filter (UKF) [17], found suitable for the given pedestrian-motion state-space model and nonlinear, TOA-based biased ranging measurements. The algorithm relative merits for the application at hand will be emphasized in the sequel. It is also noted that clear line-of-sight (LOS) radio links between mobiles and base stations may often be lacking, obstructed non-line-of-sight (NLOS) links present a major impairment that introduces biased measurements hindering localization accuracy. Different techniques were presented for LOS/NLOS identification based on various layouts [19], [20], [21]. Other methods were applied for MS-BS link classification for the purpose of handling NLOS-biased links and properly weighing different LOS/NLOS measurements [22], [23], [24]. Since in practice a sufficient number of direct links may not be available, it becomes necessary to process biased TOA measurements, and this is readily addressed with a proper approach for NLOS mitigation generalized from [18] and [23] and seamlessly integrated with the UKF algorithm. In a first part, mobile localization is applied in the context of the specific applications of interest in this work, and validated with new comparative results showing its merits for providing accurate trajectory tracking. Subsequently, we present a novel approach for modeling and analyzing different scenarios of mobiles' pair-wise proximity as well as multi-mobile clustering cases, and we introduce a hypothesis testing framework for performance analysis with metrics based on correct detection and false alarm probabilities. We also conduct a thorough sensitivity analysis to quantify the proximity detection variability vis-à-vis different impacting factors, such as inter-mobile distance separation, cluster group size and radius, proximity timespan, as well as TOA ranging error and NLOS bias statistics. As a result, it is then possible to identify the system operating requirements for achieving high correct detection rates (e.g., in excess of 80% range) while maintaining acceptable false alarm rates.
In another aspect addressed in this work, it is noted that handling of multi-user location information over extended periods of time generates large volumes of spatio-temporal data, which highlights the need for compact data representations [25], [26], [27], [28]. Processing techniques applied to proximity and community detection [29], [30], 31], and trajectory similarities and cluster formations [32], [33], [34] have also been of interest recently. For our purpose, emphasis is placed on applying low-complexity reduction to achieve a balanced tradeoff between data compression savings and retained accuracy in mobiles' proximity detection. To this end, the Discrete Haar Transform (DHT), as a special type of Discrete Wavelet Transforms (DWTs), is found to be an efficient mean for processing such types of data [35], [36], [38]. Indeed, the DHT is characterized by simple arithmetic operations (consisting of repeated averages and differences) with very low computational requirements and distancepreservation properties, which are important features for the applications at hand. Extensive comparative results are presented, with different wavelet types, to illustrate the achievable performance tradeoffs. In addition, we also note the importance of using specific distance measures with the compressed trajectory data representations. To further explore this point, we present new results using Euclidean, Manhattan and Chebyshev distance measures with the DHT-reduced trajectory data, and further quantify the achievable reliability of proximity detection as a function of data reduction levels.
The rest of the paper is organized as follows. UKF mobile tracking is presented in Section II, followed by the proximity detection hypothesis formulation in Section III, with performance evaluation results given in Section IV. Next, Section V presents data reduction approaches and tradeoffs with respect to detection accuracy and reduction level. Final conclusions are given in Section VI. A block diagram illustrating the proposed network-based proximity detection framework is shown in Figure 1 below.

II. UKF-BASED MOBILE TRACKING WITH NLOS BIAS
In a first part of this work, we address the problem of accurate mobile user tracking in dynamically changing environments hindered by non-line-of-sight obstructed propagation. In this context, we focus on network-based, TOA localization techniques which are commonly used for this purpose. A dynamic, discrete-time system model with state space representation is used in conjunction with a recursive estimation framework based on the UKF algorithm, found to be efficient for state estimation of dynamic systems with nonlinear VOLUME 10, 2022 models (as in the case at hand). NLOS bias mitigation will be integrated with the UKF processing using constrained least squares optimization to mitigate the bias errors corrupting TOA measurements collected by the base stations involved in mobile tracking, as discussed next.
A system model is adopted with mobile stations (MSs) moving within a cellular network coverage area served by a number of fixed base stations (BSs). The communication protocols are assumed to be synchronized to a common network clock reference, so transmission and reception epochs can be processed to obtain accurate MS-to-BS TOA measurements and their corresponding range estimates (proportional to TOA, to within the speed of light constant). It is noted that a minimum of three TOA BS measurements are required to obtain MS location estimates (in 2D plane). The TOA-based range measurements are corrupted by additive measurement noise and may be subject to positive bias error due to signal reflection and refraction under NLOS propagation conditions. Assuming a discrete-time sampled model, the measured range between a given MS and a given i th BS at a sampling time k t (with sampling period t) may be expressed as: where d i,k and r i,k are the true and measured distances between the MS and the i th BS, respectively; x k and y k represent the MS coordinates at sampling time instant k, and θ k = x k y k T its corresponding position vector; while x i y i T denote the coordinates of the i th BS. The term n i,k represents measurement noise, and b i,k captures the additional bias term associated with the i th MS-BS link. The function h (θ k ) in (2) is a nonlinear mapping giving the range measurement in terms of MS and BS coordinates. The complete measurement model, with a total number M of MS-BS links at a given sampling instant time k, may be written in compact vector representation as follows: It is noted that the measurement noise n k is typically modeled as a zero-mean white Gaussian process with a diagonal covariance matrix R that depends on the timing resolution of the signaling schemes and data rates used (i.e., better resolution is obtained with higher rates and smaller bit intervals). On the other hand, the NLOS bias term is modeled as a random process with positive values over a given range that depends on the propagation environment. For dynamically evolving systems, the discrete-time model of the MS trajectory evolution in a 2D plane may be obtained in state space representation as: where s k = x k y kẋkẏk T denotes the MS motion state at instant k;ẋ k andẏ k are the x-axis and y-axis velocities, respectively. The mapping F may also be expressed as F (s k ) = s k , where the matrix is given by: where I 2×2 is the 2 × 2 identity matrix. The noise term is expressed in the form v k = Gγ k with γ k = γ x,k γ y,k T representing an acceleration vector modeled as zero mean white Gaussian with covariance Q, and the matrix G is given by: For our purpose, we assume piecewise linear MS trajectories with constant speed and small random acceleration fluctuations. With the given dynamic system model, a robust technique for recursive motion vector estimation is adopted based on Kalman filtering. Noting the inherent nonlinearities in the range measurements, we choose to apply the unscented Kalman filter (UKF), which is known to offer better robustness against nonlinearities compared to other algorithms such as the extended Kalman filter (EKF) [23]. For completeness, the UKF main steps are presented herein as in [24], first for bias-free operation. Then, bias estimation and mitigation will subsequently be integrated with location tracking, as it is a crucial factor for achieving the reliable proximity tracing sought in this work. The UKF algorithm processes a set of sample vectors (sigma points) propagated through the system model and used to estimate the given N -dimensional state vector, starting from initial stateŝ 0|0 and recursively alternating between two phases; prediction and filtering. More specifically, at the k-th sampling time, a set ξ k|k is formed by a concatenation of (2N + 1) sample vectors: with columns generated around the current state estimateŝ k|k as:ŝ k|k ± ζ n , n = 0, . . . , N , where ζ n is the n th column of the matrix B = (N + λ) P s,k|k , with tuning parameters λ = (α 2 − 1)N , 0 < α < 1, and P s,k|k denoting the current state covariance matrix. The matrix B can be obtained by a Cholesky factorization satisfying BB T = (N + λ) P s,k|k . Applying the state transition function F gives the updated sample vectors ξ j,k+1|k = F(ξ j,k|k ), j = 1, .., 2N + 1 . The new predicted state estimate is then computed by a weighted sum of the elements ξ j,k+1|k obtained as: where the weights are specified by w m Similarly, the predicted measurement output update is also found by taking another weighted sum with elements obtained from the set {ψ j,k+1|k = h(χ j,k+1|k )} where χ j,k+1|k = ξ j,k+1|k (1 : 2) denotes the first two locationrelated components of ξ j,k+1|k . We then have: On the other hand, the prediction state covariance matrix P s,k+1|k update is computed from: with weights specified as: w c 1 = 3 − α 2 + λ (N + λ) and w c j = 1 2(N + λ), j = 2, . . . , 2N + 1. Similarly, the output covariance update is found by: while the cross-covariance update is obtained from: Next, in the filtering step, the updated filtered state estimate is obtained according to: where the Kalman gain K k+1 is given by: (14) and the filtered state covariance matrix is updated as: Taking NLOS bias correction into account, an efficient approach was applied with EKF [23] and is extended herein with the UKF filtered state estimation [24]. First, for the target position stateθ k|k of interest, a reduced measurement model is adopted in the form: where H k is the Jacobian matrix of d k with respect to [x, y] at time instant k, and is given by: For a specific bias vector, the location estimateθ k|k may be found asθ k|k =θ k|k − k b k , whereθ k|k is a bias-free estimate obtained asθ k|k = k ρ k , with the matrix k given by The required bias estimation may be computed using an adjusted observation variable z k = ρ k − H kθk|k . More specifically, this can be formulated as a least squares estimation problem with: where L k = I − H k k , and ε k denotes a noise term with covariance given by Additionally, the bias components have confined ranges as: where the lower bounds are b L i,k = 0, while the upper bounds can be set as [18], . . , M , i = j with l ij denoting the distance between the i th and j th BSs. The NLOS bias vector b k may then be estimated by solving the constrained least squares minimization problem with quadratic cost function J (b k ) given by: which can be readily addressed with standard optimization techniques. The estimated bias vector, denoted byb k , can then be incorporated into the UKF processing flow to correct the filtered state estimate in Eq.(13) according to: In the sequel, the bias-mitigated UKF algorithm will be applied to track mobile trajectories as a precursor to the proximity detection analysis. Before proceeding, to demonstrate its viability, we present comparative results for the residual localization error obtained by the proposed method and other commonly applied localization techniques, including RSS and WLS-based approaches. Since RSS data is commonly accessible in mobile devices, it has been recently utilized for localization scenarios as highlighted in [5] and [6]. In particular, for the proximity applications relevant to this work, several techniques based on RSS data processing were evaluated in [6] for the purpose of joint inter-mobile distance estimation and proximity contact tracing, and an efficient approach found to yield best performance was elaborated using a ''spring model''. This approach solves the localization problem iteratively in a distributed manner by minimizing the stress on each mobile node, whereby stress is analogous to forces that arise if each node-to-node distance were to be a stretched or compressed spring. It is noted that the mapping of the RSS measurements to distances is based on an underlying pathloss model which encompasses a distance exponent term and random shadowing. Another standard method included for comparison is based on WLS minimization of a cost function given by the sum of the squared errors between the noisy TOA-based MS-BS range estimates and their respective true distances [11], [14], with larger weighting factors applied to favor unobstructed range measurements and smaller ones to penalize less reliable obstructed ones. Figure 2 shows cumulative distribution functions (CDFs) for the residual localization errors (differences between true and estimated locations) generated with simulation runs for slowly moving mobiles (v x = v y = 0.2 m/s, and γ x = γ y = 0.002 m/s 2 ), where TOA range measurements are affected by 50 m mean bias error and have 0.5 m standard deviation (STD). For the RSS model, a distance pathloss exponent of 3 is used with 4dB log-normal shadowing [6]. The results demonstrate the viability of the UKF-based localization which has least residual error (reflected by the left-most CDF) as it benefits from efficient Kalman motion state tracking with bias error mitigation suitably integrated in its filtering step. The WLS-based approach, albeit with simpler implementation, remains hindered by larger residual errors. On the other hand, the RSS-SM scheme has better performance, but it is heavily dependent upon the precise mapping of several RSS measurements to their corresponding distances, which is more challenging at a larger scale. It is also noted that the networkbased UKF TOA localization is centralized and uses few, BS-centric measurements, while the RSS-SM is distributed and relies on the collection and processing of a larger volume of RSS data. Subsequent results presented for the networkcentric proximity detection scenarios will further establish the merits of our proposed approach.

III. MOBILE PROXIMITY AND CLUSTERING MODELS
The trajectory data obtained by the mobile localization and tracking algorithms is applied for the evaluation of proximity detection and clustering situations. For illustration, a network model is assumed with four base stations and several mobiles moving within the coverage zone at slow pedestrian speed. To assess the detectability of mobiles' close proximity over extended time periods, two specific scenarios will be considered as follows: (i) Case 1: specific mobile users are labeled as reference targets (e.g., disease carriers), and the trajectories of the other users are processed to detect physical proximity and extended exposure (to within specific minimum distance and time period) to the aforementioned targets. (ii) Case 2: the focus is on monitoring clustering situations with multi-user groupings of a given size, which is based on detecting multi-mobile proximity within a small area over a specified time period.
As will be seen, the accuracy of proximity detection for these different scenarios is strongly dependent upon various parameters of interest. To proceed with the analysis, the detection problems at hand are formulated in a binary hypothesis testing framework and quantified in terms of correct detection and false alarm probabilities. More specifically, different hypothesis tests are introduced as follows. For Case 1, Hypothesis H 1 corresponds to the true ''ground truth'' (GT) distance between a given mobile (with actual coordinates θ (a) ) and the reference mobile (with actual coordinates θ (r) falling below a specified distance threshold d s for a minimum time duration defined by T min . With the mobiles coordinates given by θ (a) = [x (a) y (a) ] T and θ (r) = [x (r) y (r) ] T , this is expressed as: over a given time index span k = k 0 , . . . , k 0 +k min −1, where k min t = T min is the observation time.
On the other hand, Hypothesis H 0 is assumed when the aforementioned conditions are not met. The target mobiles proximity testing is then deemed to achieve correct detection when the measured distances (based on the UKF-estimated coordinates) meets the specified criteria given that hypothesis H 1 is true. That is: over the specified range. On the other hand, a false alarm occurs when we have: Likewise, regarding Case 2, we apply a similar formulation with the added modification that the proximity testing is done for all mobiles that may form a clustering of a given size S c within a coverage area (disk) of radius specified by R c , while also observing the minimum time duration given by T min . Here, it is emphasized that the proximity test condition (as in Eq.(21)) for hypotheses H 1 and H 0 must be met for all pairwise distances among the different candidate mobiles of a given grouping, and likewise for the correct detection and false alarm tests similar to Eq.(22) & Eq. (23). The different performance metrics for the outlined scenarios are now evaluated using the simulation setup described next.

IV. PERFORMANCE ANALYSIS A. SIMULATION SETUP
To investigate the performance of the proposed schemes, a simulation setup is used with a network model consisting of four base stations placed at the corners of a square mile coverage zone. For proximity applications, low-mobility users are of interest, and a representative case is selected with v x = v y = 0.2 m/s, and γ x = γ y = 0.002 m/s 2 . Mobile users tracking is based on NLOS-mitigated UKF localization with t sampling time of 0.5 sec, and multiple trajectory data sets are collected with Monte Carlo simulations using 8192 samples per trajectory. Multiple instances of mobile agents are initiated with different paths and varying levels of proximity through the coverage area. Range measurements between mobiles and base stations are randomly generated with different noise statistics for TOA range error and NLOS bias (as specified in the subsequent examples). It is noted that in an initial stage, the true (ground truth) data reflecting the actual mobiles proximity status is first obtained based on a priori knowledge of mobiles' trajectories with known patterns. Subsequently, realistic trajectory estimation is applied under various impairments to reflect practically achievable localization accuracy. The noisy, estimated trajectories are processed to obtain new a posteriori (possibly erroneous) proximity decisions. Comparisons with the available ground truth data will then serve to assess the achievable performance accuracy of the proposed schemes. To illustrate the scenarios used, Figure 3 considers sample trajectories for two mobiles (denoted by M1 and M2) with their resulting separation distance. As seen in Figure 3-(a), the actual and UKF-estimated trajectories are in very close agreement. Figure 3-(b) also shows the ensuing inter-mobile distance fluctuations. Assuming in this example a threshold distance d s of 3 m with T min of 1min, different proximity decision outcomes are illustrated  including correct detection, missed detection or false alarms, as discussed in Section III.
It is also noted that the complete simulation runs include randomization of multiple mobiles trajectories across the entire coverage area, with multiple Monte Carlo statistical realizations for the various relevant parameters in order to obtain the numerical evaluation results discussed next.

B. EVALUATION RESULTS
Evaluation results are presented for the proposed proximity detection schemes to investigate their performance with respect to different factors. First, comparative results are shown in Figure 4 to further illustrate the advantage of the UKF-based scheme compared to the RSS and WLS ones (similar to Figure 2). Results are given in terms of proximity VOLUME 10, 2022 FIGURE 5. Impact of the variability in ranging error standard deviation on the accuracy of mobiles' proximity detection at various separation distances. detection probability P de change with increasing inter-mobile separation distance d s . Since in many cases of practical interest the mobiles' proximity exposure time in an important factor, different T min values are investigated, ranging from instantaneous (e.g., 1 sec) to more prolonged periods (30 sec, 1 min and 2 min). As shown, UKF-based trajectory tracking being more accurate, inter-mobile distance estimation and subsequent proximity detection is achieved with higher probability. It is also seen that detection accuracy increases for larger d min, (as further discussed subsequently). In addition, it is noted that for smaller timespans the detection probability P de is slightly higher since the proximity test criterion is more stringent for longer times, leading to accrued chances for missed detection of true proximity cases.
Another important factor to consider is related to the ranging measurement precision which has a strong impact on localization accuracy and inter-mobile distance estimation. For illustration, three values for the ranging error STD σ rng are selected as 0.5 m, 1 m and 1.5 m, with two BS links assumed to undergo uniform NLOS bias with mean 50 m and 20m spread. A proximity timespan T min value of 2 min is also adopted. The results shown in Figure 5 give the proximity detection probability P d as a function of the mobiles' separation distance d s . As can be observed, the correct detection rate is noticeably improved with increasing ranging accuracy (i.e., smaller σ rng ), which is attributable to the fact that lower timing synchronization errors yield better TOA estimation and hence improved tracking. Also, as expected, it is noted that proximity detection is sensitive to the threshold d s . For example, with d s = 4 m, P d increases above 0.8 (for all σ rng values) and exceeds 0.9 with σ rng = 0.5 m. On the other hand, it is lower for smaller d s separation (e.g., staying below 0.5 range for d s = 1 m), which is due to the residual error floor affecting mobiles' location estimates and consequently their relative proximity detection. With regards to false alarm performance, the obtained P fa results gave very small values in a narrow 0.5%-to-1.5% range across the d s distance span,  and there was little variation with respect to the ranging STD. This observation is attributed to the fact that, when two given mobiles are distant apart, there is a small likelihood that errors in their estimated locations will be large enough so as to lead to a false proximity alarm.
In another aspect, Figure 6 illustrates the impact of NLOS bias error with µ nlos values ranging from 25 m, 50 m to 100 m, respectively. The ranging error STD σ rng is held fixed at 0.5 m. As seen, the proximity detection rates show small variability, with lower bias yielding better accuracy as expected. This is attributed to the efficiency of NLOS mitigation integrated with UKF tracking, largely reducing its impact. It is also noted that, similar to the previous example, the false alarm rates remained quite low, in the 1% average range. Additional results in Figure 7 further illustrate the impact of varying NLOS BS links, where one, two and three NLOS BSs are assumed with σ rng and µ nlos set at 0.5 m and 50 m, respectively. Here, the results, benchmarked against the all-LOS case, show a smooth degradation with increasing NLOS order. With regards to multi-user clustering scenarios, similar experiments were carried to analyze events where multiple users, with a given cluster size of S c mobiles, come in close proximity within a specific radius R c . For illustration, operating parameters are chosen with ranging error σ rng of 0.5 m, two NLOS links and µ nlos of 50 m. Results are shown in Figure 8 which gives the variability of cluster detection probability in terms of users' vicinity radius and cluster size. As can be noticed, when the clustering is relatively small, better detection rates are achieved, and are consistently higher as the cluster radius is extended. For example, with a cluster radius of 3 m, the detection probability is seen to approximately span a range from 0.75, 0.65, 0.5 and 0.35 for cluster sizes of 4, 5, 6, and 7 users, respectively. This is mainly because, for larger groups, the requirement to have all pairwise distances simultaneously within the threshold limit is more difficult to achieve. However, the detection probability improves considerably as the cluster radius span increases. It is thus seen that for practical purposes, with the aforementioned system modeling, groupings of 4 to 5 users within a vicinity of 3 to 4 m radius are detectable with accuracy rates in excess of 70%. Likewise, it was again found that for these clustering scenarios, false alarm probabilities remained very low (in 1% range), as in the previous cases.
In summary, as illustrated with these various examples, it is found that mobiles' trajectory estimation by means of the NLOS-mitigated UKF tracking is providing reliable data for the purpose of detecting pairwise mobiles' proximity as well as multi-clustering cases with acceptable accuracy. The achievable performance of proximity detection was also investigated vis-à-vis various relevant factors including intermobile distance separation, exposure time span, TOA ranging error and NLOS bias conditions. It is noted that the different examples were based on processing the mobiles' trajectory raw data using the conventional Euclidean measure for distance evaluation. Next, we reconsider the proximity detection problems at hand when more efficient, reduced data representations and different distance measures are applied.

V. PERFORMANCE ANALYSIS WITH DATA REDUCTION
We explore data reduction techniques to deal with the challenge of increased large data volumes especially when many mobiles are involved. In particular, we consider the application of the discrete Haar transform-DHT to reduce the trajectory time-series raw data into vectors of ''reducts'' by only keeping their principal components. The reason for selecting the DHT is driven by its simplicity and low computational requirements while providing acceptable performance metrics as shown in the sequel. More specifically, for a given input data time series (with length size that is a power of 2), the DHT proceeds at a specific level by taking two consecutive data elements to generate a new pair that includes a scaled average and difference of these elements. As such, the DHT will produce reducts at a given level from the data elements of the upper level. To illustrate, let {x i } denote a time series of real values. The DHT reduct obtained from data elements x i and x i+1 is given by { 1 For our purpose, the principal (i.e., average) components in the DHT output will be kept for mobile trajectory reduction and subsequent proximity processing, thus yielding a 50% reduction in data volume with each additional DHT level.
As can be noted, using reduced data allows for faster processing and lower storage requirements, but this is also expected to affect the detection accuracy. It is therefore important to explore this tradeoff, which will also depend on the reduction level being applied. To this end, we consider experiments related to proximity scenarios of interest using DHT-reduced, UKF-estimated trajectory data. For space considerations, results are limited to the pairwise twouser proximity case. Figure 9 outlines the detection accuracy variability with DHT reduction levels 1, 2 and 3. As shown, some accuracy degradation is clearly noticeable at higher levels, which is a result of the cumulated loss in data trend when more reduction is applied. In particular, it is seen that, with DHT-Level 1 and distances of 2-to-3m, correct detection can still be maintained above 60-to-70% range, while this drops to 50-to-60% with level 2. Given that a huge reduction (namely 50-to-75%) in data volume is achieved, this offers a good tradeoff between accuracy and storage requirements. In another aspect, it is also of interest to explore the application of different distance measures in conjunction with the reduced data representations. To this end, we compared the following three measures: Figure 10 illustrates the variation in proximity detection accuracy with distance measure when level-1 DHT-reduced data is used, and it is particularly noticed that the Chebyshev measure offers some performance improvement in this case. To further illustrate this aspect, Table 1 gives additional results when deeper reduction levels are applied, and it is again seen that the same order of merit is maintained, namely that the Chebyshev distance achieves better detection accuracy, followed by the Euclidean and Manhattan ones, and this is mainly attributed to the fact that, with reduced (i.e., averaged) data, the sharper Chebyshev measure still keeps better track of the discrepancies and data trends in the mobiles' trajectories.
Additionally, we consider the impact of changing the prototype mother wavelet by including reduction examples based on Daubechies db2 and Symlet sym2 in addition to the Haar wavelet [37], [39]. A small support was chosen for simplicity and better resolution in trajectory data analysis. As can be seen from Table 2, the wavelet impact is quite small for all practical purposes, mainly owing to the fact that the principal components retained in the reduced data representations didn't change significantly with the wavelet type. It is therefore noted that the DHT application remains attractive due to its simple implementation and acceptable performance. Further corroborative results are also given in Table 3 based on the Chebyshev distance metric, which show an additional improvement in proximity detection rates compared to the Euclidian case.

VI. CONCLUSION
The paper introduced mobile users' data trajectory estimation and reduction techniques for proximity tracking in cellular networks. An algorithm employing UKF processing with NLOS bias correction was shown to yield accurate trajectory tracking, which was next used for the purpose of detecting user proximity and clustering scenarios. Performance evaluation was used to demonstrate the accuracy of proximity detection with sensitivity analysis vis-à-vis various parameters and operating conditions based on distance span and proximity time, ranging and bias statistics. Results showed that correct detection rates exceeding 70% are feasible (with low false alarms) for many scenarios of interest. In a second phase, DHT data reduction was applied to minimize storage requirements, and additional results were presented to quantify the tradeoffs between reduction levels and proximity detection accuracy. Furthermore, different distance measures and wavelet types were also compared, and it was found that the Chebyshev distance offers some improvement in detection reliability, while the wavelet type had no major impact. The advantages of the proposed DHT-based approach for proximity detection and contact tracing were established due to its low complexity and good performance.