CSI-Based Proximity Estimation: Data-Driven and Model-Based Approaches

Proximity estimation forms the backbone of contact tracing solutions, as quarantining potentially infected individuals is essential for controlling the spread of epidemics. It is also essential for device discovery in Device-to-Device (D2D) and Vehicle-to-Vehicle (V2V) communications, which will be critical in future 6G networks. Despite the widespread coverage of cellular networks, no previous work has evaluated cellular-based proximity estimation in an experimental setting. In this paper, we collect Channel State Information (CSI) measurements from an actual cellular network and utilize them to train and evaluate two proposed solutions. Capitalizing on CSI spatial correlation, we propose a data-driven method to classify devices based on their respective proximity. The proposed neural network has an accuracy of 91.18% when classifying devices as within 5 meters from one another or not, from only 10 seconds of CSI measurements. This method is complemented by a model-based approach that provides a solid theoretical model for estimating proximity. The model-based approach uses Bayesian inference of the conditional distribution of power correlation across devices, assuming spatially-correlated shadowing and stochastic geometry tools. The proposed Bayesian regressor fits our dataset better than the standard exponentially-decaying correlation model, reaching a $2.8 \times $ lower RMSE while requiring fine-tuning of only two more parameters. A Bayesian classifier is also proposed and reached a 91.67% accuracy on the binary case while also outperforming the data-driven model significantly at higher distances.


I. INTRODUCTION
D UE TO the COVID-19 pandemic, there has been a surge of interest in proximity estimation, where the goal is to identify individuals or devices in physical proximity to each other.Contact tracing solutions provide essential information for controlling epidemics; they rely on proximity estimation, as it allows for identifying and keeping track of social encounters between individuals.Furthermore, social networking, Device-to-Device (D2D) communications [1], [2], and Vehicle-to-Vehicle (V2V) communications [3] are among the emerging applications that motivate research on efficient proximity estimation algorithms and solutions.Knowledge of the proximity between cellular devices allows for interference management [4], [5], which is critical for achieving high throughputs in spatially-and spectrally-dense future 6G networks.Other applications of proximity estimation include but are not limited to advertising, content creation [6], and crowd counting.The three main tracks for proximity estimation research are: • Localization-based [3], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18]: each device's position is estimated, and then the distance separating the devices is computed.The most widely used localization techniques are multilateration and fingerprinting [7].
The multilateration technique uses the ranges from the target to multiple Base Stations (BSs) to localize a device on two-or three-dimensional space [8], [9].Although this technique works well outdoors and forms the backbone for Global Navigation Satellite Systems (GNSSs), multilateration fails to provide accurate estimates when the target is indoors since there is rarely a line of sight in such cases.Fingerprinting, in contrast, is based on a measurement campaign performed across a predefined area or volume, which takes measurements of the channel response at several points, resulting in a database of channel characteristics that each point experiences [3], [10], [11], [12], [13], [14], [15], [16], [17], [18].Positioning is performed by selecting the position that best fits the observed channel measurements.Fingerprinting solutions require extensive and expensive measurement campaigns and are rendered useless as soon as the mapped environment changes; • Beacon-based [19], [20], [21], [22], [23], [24], [25], [26], [27]: each device broadcasts its beacon signal to notify other devices; devices measure the received signal and then classify the sender as in proximity or not.Such estimation can be decentralized or centralized (i.e., network-assisted, when cellular networks are used).Beacon-based estimation has been widely used for proximity estimation.However, the main issue with beacon-based estimation is the high power needed, a major concern in power-constrained systems, e.g., battery-powered ones [28].Power is not only required during beacon transmission but also on the receiving device that must keep its radio on to record incoming beacons; • Correlation-based [2], [29], [30], [31], [32], [33]: this approach builds upon the spatial correlation of radio propagation components.Components that show high spatial correlation are mainly large-scale fading (or shadowing) and delay spread [2]; measurements of these components and others form up the Channel State Information (CSI).The proximity between devices can be estimated by analyzing how similar CSI components are among them.Since the CSI is collected by default in commercial standards for channel coding and Quality of Service (QoS) maximization, no extra signaling or power is needed.The problem of proximity estimation is tackled via a model-based approach or a data-driven approach.In this work, we demonstrate how Long Term Evolution (LTE) CSI can be explored for proximity estimation with both data-driven and model-based approaches independently.Our solutions capitalize on spatial correlation, an attribute present in all existing wireless channels [34] and even more significant in dense 5G [35] and future 6G networks.Thus, our approaches are expected to outperform current results when deployed in future networks.

A. RELATED WORK
Existing work on proximity detection can be categorized with respect to their methodology (localization-, beacon-, or correlation-based), radio technology (Bluetooth, WiFi, or cellular), and whether their techniques are primarily modelbased or data-driven.
There are numerous beacon-based studies.While most rely on Bluetooth [19], [20], [21], [22], [23], there are works based on WiFi [24], cellular [25], [26], [27], and ZigBee [24].Some works also integrate other smartphone sensors, such as accelerometer and gyroscope [19], [23].In [19], the authors reported up to 92% accuracy for classifying devices as within 2 m or not.They used Machine Learning (ML) techniques on Bluetooth, accelerometer, and gyroscope data.Although the World Health Organization (WHO) recommended social distancing of 2 m at first, recent work showed an infection still is likely to happen up to 10 m away from the infected [36].Despite applying data-driven methods, they design and extract features from data first, i.e., mixing data-driven and model-based methods.This is a common approach [21], but it is known that deep Neural Networks (NNs) can learn abstract features and discover hidden patterns from data without the need for handmade features.In [20], a purely data-driven approach is applied; a 1D convolutional NN was used.On the opposite end of the spectrum, in [25], a purely model-based study was performed using stochastic geometry modeling, but no experimental results were reported.When D2D peer discovery is the motivation, research work often follows the beacon-based approach [1], [28].Recent publications investigated new coordination techniques for managing interference and power issues.
Correlation-based approaches have been investigated using Bluetooth [29], [30], WiFi [31], [32], and cellular networks [2].In [31], the authors follow a purely modelbased approach by applying a linear classifier on WiFi CSI sub-carrier amplitude correlation and covariance, reaching up to 98% accuracy when classifying individuals as within 2 m or not.However, their goal is to detect motion proximity; their study does not cover static individuals.In [30], the authors employ both the beacon-and the correlation-based strategies.In addition to D2D Bluetooth beacons, smartphone microphones were used to build ambient sound amplitude histograms, which are compared between phones using the Kullback-Leibler divergence.The authors reported up to 86% accuracy when classifying devices as within 1.5 m or not.In [2], a model-based approach is employed on cellular networks for D2D device discovery.However, CSI correlation is used only to shortlist potential peers, and then a beacon-based strategy is used.The authors do not report the results on experimental data.In [29], [32], [33], the order of the closest access points is compared between devices, using Bluetooth (iBeacons) [29], WiFi (access points) [32], or cellular networks (BSs) [33], respectively.Although multiple contact tracing solutions have targeted 2 m as the critical distance [19], [20], [21], [22], [23], [24], [25], [26], [27], recent works have shown that infection also occurs at higher ranges [36].Similarly, in D2D, device discovery detection at higher distances is also targeted.
In summary, work in proximity estimation follows either the localization-or beacon-based tracks, which incur larger infrastructure and higher device power requirements, respectively.Correlation-based techniques are desirable since they can be implemented from existing infrastructure and require no additional signaling.However, no empirical evaluations of correlation-based solutions were performed in cellular networks.There was also no purely data-driven solution proposed in this track.

B. CONTRIBUTIONS AND ORGANIZATION
This paper investigates the proximity estimation problem using CSI measurements gathered from an actual 4G LTE network.Since we rely on received power measurements, the proposed techniques can also be applied in future B5G/6G deployments.As our solutions do not require additional signaling or infrastructure, they are in line with the concept of Integrated Sensing and Communication (ISAC), which will be an integral framework in future 6G networks [37], [38], [39].By pursuing a correlationbased approach, the proposed solution does not require continuous data collection since CSI measurements are inherently correlated in real-time.We propose two solutions: a model-based and a data-driven approaches. 1 To the best of our knowledge, this is the first work that evaluated cellular-based proximity estimation with real data using a correlation-based technique.The main contributions of the paper are summarized as follows: Section II introduces the data-driven model, the dataset, the model architecture, and its performance under empirical measurements.Section III presents a Bayesian inferencebased model as an alternative to the data-driven model and evaluates both classifier and regressor under simulations and empirical data.In Section IV, we compare the proposed models, discuss their efficiency and limitations, and suggest potential future work.Lastly, we conclude the paper in Section V.

II. PROXIMITY ESTIMATION MACHINE LEARNING MODEL
In this section, we explore a data-driven approach to solve the proximity estimation problem through the use of ML.The goal is to use a ML model to estimate the distance separating a pair of devices, denoted by d, given each device's LTE CSI.We begin by describing our data collection setup, data augmentation strategy, and the resulting dataset.We then present the preprocessing steps, the proposed ML model, and its variations.Finally, we show the performance of the proposed model applied to the collected data and discuss the observed results.

A. EXPERIMENT SETUP
We start by collecting data from practical experiments, which will be used to validate our proposed models.Specifically, downlink Reference Signal Received Power (RSRP) and Global Positioning System (GPS) measurements were captured from multiple phones.We performed two different experiments: Experiment 1 (Ex.#1) used three measuring devices while only two were used in Experiment 2 (Ex.#2).The used devices were Samsung TM Galaxy Note20 smartphones, equipped with Nemo Handy software, which allows us to access the CSI logs collected by each device.
Ex. #1 was focused on longer distances; devices moved such that their D2D distances and speeds were kept approximately constant.The experiment consisted of three devices driving around the campus neighborhood along the path highlighted in Fig. 1(a).Two devices were driven in a car, while another followed behind on a bike, such that a roughly fixed distance was kept between them.A picture taken during the experiment is shown in Fig. 1(b).The same path was traversed twice: first at a medium separation distance (≈ 40 m) and then at a relatively larger distance (≈ 120 m).Ex. #2 consisted of two devices moving together and aimed to cover a wider range of propagation scenarios and movement patterns and augment our dataset.Ex. #2 was repeated across residential, highway, and suburban areas, both by car and walking.Ex. #1 lasted 42 minutes, while Ex.#2 lasted 58 minutes.Three individuals were actively involved in (and consented to) the data collection.

B. DATASET
In the abovementioned experiments, we collected long sequences of RSRP measurements and global coordinates (latitude and longitude) in real scenarios.The objective is to obtain a model that can estimate the D2D distance from short RSRP sequences; thus, we split long sequences into smaller subsequences.We preprocessed the collected samples and assembled a dataset (training and test sets) by pairing subsequences and labeling them with their average distance at the time of recording.The average distance is computed from the global coordinates recorded along with the RSRP measurements.
As MIMO-enabled systems become widespread, multiple RSRP sequences are available from each phone; specifically, one for each transmitter-receiver antenna pair.All combinations of two subsequences are paired in the dataset, provided they are from different antennas.Pairs can be categorized into two types defined as follows.
Definition 1 (P2P and P2S): Phone-to-Phone (P2P) pairs are made of subsequences measured from antennas in different phones.Phone-to-Self (P2S) pairs are made of subsequences measured from antennas from the same phone.A pair's labeled distance is the average distance between the phones at the times of recording, e.g., zero in case of P2S pairs recorded at the same time.
P2P and P2S pairs are depicted in Fig. 2. Since we aim to estimate the distance between different phones, we consider only P2P samples in the test set, while P2S samples are used for augmenting the training set.P2S data is an alternative that works indoors since it does not rely on GPS positioning.In P2S pairs, the distance between the phone and itself is either zero (static device) or non-zero (moving device), in which case it is measurable through the device's accelerometers.Indoor data collection campaigns must consider P2S samples.
Although we are interested in real-time fingerprinting, we cannot enforce that the subsequences are collected simultaneously due to hardware limitations.Since each device has its own variation in clock frequency (although they are assumed to be relatively close) and might be offline sporadically (not collecting CSI), the pairs of subsequences do not have to share identical timestamps.Due to the fast variations in the wireless channel (i.e., short coherence time), the temporal correlation between measurements taken at the same location fades with time.To overcome the problem of time-varying radio channels, we impose a maximum time difference t max between the subsequences' times of recording, a limit below which the propagation channel is still relatively correlated.All subsequence pairs with an average time difference above the threshold are discarded.The enforcement of a maximum time difference t max implies that P2S samples are strongly biased towards lower distances.However, lower distances are especially important since the applications that motivate our work, e.g., contact tracing, D2D communications, and advertising, are more concerned with whether devices are close to one another or not, instead of accurately estimating their distance in meters.
In our dataset, the RSRP sequences are sliced in approximately 10-second-long subsequences with no overlap, i.e., each subsequence, given a sampling frequency of 3 Hz, contains 30 samples.When pairing subsequences, we only consider subsequences recorded at a time difference of at most t max = 30 seconds.A high t max yields a larger dataset, as more pairs can be made; however, pairs with sequences recorded at instants that have time difference above the channel coherence time, even if in the same environment, might experience different radio propagation and thus render misleading samples.We found t max = 30 seconds to be the value that best suits this trade-off.In the end, Exs.#1 and #2 yield, respectively, 7189 and 2920 subsequence pairs.Since each device has 4 antennas, the total number of subsequence pairs (both P2P and P2S) becomes much larger.
We choose not to include same-antenna P2S samples in the dataset as they follow a different distribution and thus do not contribute to the diversity of the dataset.This characteristic was observed when performing dimensionality reduction, using both PCA and Uniform Manifold Approximation and Projection (UMAP) [41].We trained the dimensionality reduction algorithms on P2P data and found that the reduced representations of P2P and P2S data differ significantly when same-antenna P2S samples are considered.This distribution shift is a consequence of small-scale fading correlation.The subsequence pairs are then fed to the ML model, as shown in Fig. 3.
The cumulative distributions of the separation distances observed from both experiments are shown in Fig. 5.We noted that the separation distances are biased toward low values, especially for P2S data.This is caused by the maximum time difference t max = 30 seconds.In Ex. #1, distances of around 40 m and 120 m are also common due to the way the experiment was executed.In Ex. #2, although more than half of the distance samples fall under 10 m, some samples range up to 200 m.

C. ML MODEL DEFINITION AND NEURAL NETWORK ARCHITECTURE
After gathering and preprocessing the dataset, we train an ML model to approximate a mapping from two subsequences of RSRP measurements to the physical distance between the source devices (that collected the subsequences) at the time of recording.A diagram of the used model is shown in Fig. 3.The collected RSRP measurements are first processed  before being fed as input to a neural network that outputs the separation distance.By applying a nonlinear function approximator (NN), the model can learn complex features in the data, e.g., correlation, which correlates with the spatial correlation between devices and is not affected by constant attenuations that one or both devices face (e.g., vehicle or building walls).
Most contact tracing applications do not require highaccuracy estimation: detecting when devices are within 5 meters of one another or not is usually enough.In light of this, the ranges of distances are split into classes, and the goal of the ML model is to identify which class of distances D i best fits the distance separating the two devices given two subsequences of RSRP measurements as input.The number of distance classes and their respective thresholds is a design choice; more classes can be considered if higher accuracy is required.By taking this simplified approach, model training is expected to require less data to achieve good performance.
To build the distance classifier, an architecture containing a Long Short-Term Memory (LSTM) [42] model followed by a feedforward NN, also referred to as Multilayer Perceptrons (MLPs), is used instead of traditional alternatives (e.g., linear regression) due to the NN's proven ability to approximate nonlinear functions.The proposed NN architecture is shown in Fig. 4. The network consists of a recurrent layer (in our case, an LSTM), which is ideal for encoding time-series data, followed by a MLP with two hidden layers.Each layer's size is a hyperparameter explored in a hyperparameter search when running experiments.Each linear layer is followed by a Leaky Rectified Linear Unit (ReLU) activation function [43] and a batch normalization layer [44] (with the latter varying between active/inactive states in the search).A normalization layer is also added before the first linear layer, and the state of each of them (on or off) is to be optimized in the grid search.The loss metric is the cross-entropy loss function, and the optimizer is ADAptive Moment estimation (Adam) [40].

D. RESULTS
We run a hyperparameter search with various choices of sizes for each layer and whether each layer is followed by batch normalization or not.Table 2 shows the complete set of parameters in the grid search.While the test set is made of P2P samples only (P2S samples were introduced simply as means of data augmentation during training), the training set source is arbitrary, yet likely the most impactful decision.The set of distance classes and their thresholds also significantly impacts the model's performance.Along with the hyperparameter search, we investigate how the model's performance is affected by the choice of distance classes and the source of the dataset.In total, 8656 NNs were trained in our search, each for 100 epochs, for various hyperparameter combinations and random number generator seeds.Table 1 shows statistics of the test set accuracy for various training subsets (named (a)-(f)) and distance thresholds.By investigating the obtained results, the main conclusions are as follows.
• Better accuracy with fewer classes: For all cases, splitting the distances into fewer classes leads to significantly higher accuracy.This is expected since learning to classify between fewer classes is easier, given a model of the same size, and data unbalance (some classes poorly represented in the dataset, e.g., 5 ≤ d ≤ 42) leads to biased models.other experiments.This indicates that such data does not follow the P2P distribution.
• Training only on P2S (from the same experiment) yields good performance: Case (d) shows that training on P2S data only incurs a slight decrease in performance (on average) when fewer classes are considered, while a higher decrease in accuracy (maximum over models) is observed for more classes.Cases (e) and (f) show that performance substantially decreases when considering out-of-distribution P2S data.In summary, we notice how NNs can be trained on P2P data and reach up to 91.18% accuracy when classifying proximity distance d among d ≤ 5 m and d > 5 m.Although augmenting the dataset with P2S data from the same environment conditions does not improve accuracy, in the absence of P2P data NNs can be trained on only P2S data and still achieve good performance: our best-performing model achieved 86.39% accuracy in this case.However, a data collection campaign for training such a model is expensive, and there are no guarantees that the collected data covers the desired range of scenarios.

III. PROXIMITY ESTIMATION BAYESIAN MODEL
We develop a model-based solution to use CSI measurements for proximity estimation as an alternative to data-driven approaches, which require data collection and computational resources for training.Deterministic models of RSRP measurements are impractical to derive, as they are deviceand environment-specific (assuming noise is filtered out by averaging).We thus resort to stochastic models and consider the fading coefficients and BS positions as Random Variables (RVs) with known distributions.Specifically, we build upon tools from the stochastic geometry framework to model the network and derive the distributions of the required metrics.The distribution of the proximity distance d can be inferred using the Bayesian inference framework, from which an estimate d can be sampled either through regression i=1 are convex, mutually exclusive, and exhaustive subsets of R + ).

A. BACKGROUND ON BAYESIAN INFERENCE
Bayesian inference is an important technique for developing robust estimators that can tell how likely a hypothesis is, given some evidence (or data), while incorporating prior knowledge.The general Bayesian inference framework is defined as: where

P[H] is our prior belief of H, P[D | H] and P[D]
are likelihoods of the evidence being observed in a general scenario and in a scenario where the hypothesis H is true, respectively, and P[H | D] is the posterior probability of H given D. In the estimation problem, the hypotheses and evidence are usually continuous variables and can thus take an infinite range of values.In this case, the likelihoods and prior are modeled as Probability Density Functions (PDFs).

B. PROBLEM DEFINITION
We aim to estimate d (a continuous set of hypotheses) given a radio measurement (evidence).In particular, we choose the RSRP correlation measured from transmissions of the BS closest to the receiving device ρ 1 over time as our measurement, as ρ 1 can be used to estimate the spatial correlation and is highly dependent on the target variable d.
The subscript 1 in ρ 1 denotes the index of the transmitting BS when ordered in increasing distance to the device, i.e., ρ i denotes the RSRP correlation measured from transmissions of the i-th closest BS to the device.We choose ρ 1 as it is always available in LTE CSI measurements, regardless of the network architecture.When i grows, so does the expected distance between BS and the receiving device (as shown in [45] and detailed in Section III-C), and thus ρ i becomes less dependent on d, rendering it only marginally useful for D2D estimation.As we assume that small-scale fading is uncorrelated for different links, the RSRP correlation is the same as the shadowing correlation, which makes the analysis simpler.
Since the correlation measurement ρ and the distance d are continuous variables, the probabilities in (1) become probability density functions, and the Bayesian inference framework for proximity estimation can be expressed as: This framework requires the definition of a prior distribution of d, along with the conditional and unconditional likelihoods of the correlation.In the rest of this section, we derive expressions for these distributions, define regression and classification estimators, and evaluate their performance using simulated data.

C. SYSTEM MODEL 1) NETWORK MODEL
The setup we consider for this work consists of two devices (e.g., smartphones) with 2D positions p 1 , p 2 ∈ R 2 (unknown) and a set of BSs with positions modeled as a homogeneous Poisson Point Process (PPP) = {x i } with density λ, where x i ∈ R 2 is the position of the i-th BS.We assume that all BSs transmit with equal power and that devices are associated with their nearest BS.The devices periodically measure the power received from their closest BSs.Our goal is to estimate the distance d between the two devices, given received power measurements from the closest BS to each device, which we assume to be the same BS, i.e., devices are within the same Voronoi cell, as illustrated in Fig. 6.
Assuming that devices are within the same Voronoi cell is reasonable both for theoretical and practical reasons.In medium-density, urban LTE deployments, BS range is usually in the range of hundreds of meters, and it decreases for high-density deployments.This means that devices associated with different BSs (i.e., different Voronoi cells) are often many hundreds of meters apart, and such distances are out of the scope of our target applications.From a technical standpoint, LTE measures RSRP from the associated BS more frequently than from other ones.RSRP measurements from secondary cells are usually only available in the case of multi-BS cooperation.Since our work requires measuring RSRP from a single BS at multiple devices simultaneously, we can only perform experiments when users are associated with the same BS, i.e., within the same Voronoi cell.We leave the case where devices are close to each other but associated with different BSs when close to cell borders as a subject for future work.
We consider device #1 as the origin of the coordinate system and that the BSs are indexed in ascending order of distance to the origin (BS #1 is the closest to device #1, and so forth): x 1 2 ≤ x 2 2 ≤ . . . .Similarly, we consider device #2 to be on the x-axis of the coordinate system.Both assumptions incur no loss of generality since the PPP is translation-and rotation-invariant.As the BSs positions follow a PPP with intensity λ, the distance R n between the n-th BS and the origin follows a generalized Gamma distribution [45], with PDF defined as: where (n) = (n − 1)! denotes the Gamma function.

2) PROPAGATION MODEL
Due to the nature of cellular communications, the wireless channel is modeled as a product of a distance-dependent deterministic path-loss with path-loss exponent α, a largescale spatially correlated shadowing that follows the lognormal distribution with zero mean and standard deviation σ S , and a small-scale Rayleigh fading with exponential distribution and unit mean.We define P ( j) i (t) as the power received from the i-th BS at the j-th device, at the time instant t, as which can also be expressed in the logarithmic domain as where i (t)) and l(r), h ( j) i (t), and S ( j) i (t) are, respectively, the deterministic path-loss, small-scale, and large-scale fading.
Large-scale and small-scale fading are affected by the physical channel path and the objects surrounding the device.Also, similar communication links (path and surroundings) can be represented with a similar model.Mathematically, this means that channels are spatially correlated.However, the variance of large-scale fading is inversely proportional to the sizes of the obstructing objects (tens of meters outdoors; less when indoors).On the other hand, the variance of small-scale fading across space is much higher since it models constructive and destructive addition of multipath signal components.Large-scale fading is spatially correlated in the order of meters, while small-scale fading is spatially correlated to very short distances, in the order of the signal wavelength [46].Spatial correlation is at the core of this work.Since small-scale fading varies significantly with minor distances, we assume it is not spatially correlated; thus, we only explore the correlation of large-scale fading.Since the received power is a function of the shadowing coefficients, it is also spatially correlated.Thus, we can use received power from different devices to estimate their separation distance.
To model the shadowing autocorrelation, we refer to the link correlation model proposed in [47], which accounts for the relative lengths and the angle between two paths.This model provides a better alternative than the most common model in the literature [48], which provides the correlation as an exponentially decaying covariance function with the distance.Although the model in [48] accurately fits the shadowing correlation perceived from one common BS to two close devices, it fails to model the shadowing correlation between the links from two BSs to a common device (site-to-site correlation), since the propagation paths are less correlated in the former case.This correlation cannot be ignored; thus, a more appropriate model should be considered.The link correlation between two paths is modeled as in the following definition [47].
Definition 2 (Link Correlation): In a 2D cellular network, the correlation coefficient between two links r 1 , r 2 ∈ R 2 is given by where ), d cor is the decorrelation distance (defined as the distance at which shadowing autocorrelation equals 1/e), γ T = 0.3 is a constant accounting for the surrounding terrain and building clutter and indicates how rapidly the correlation decays as the angle increases, θ T = 2 sin −1 ( d cor 2d 1 ), and θ = cos −1 ( is the angle between the two links.6) for different path lengths, given one of them is fixed (in red).For d 1 ≥ d cor 2e (Fig. 7(a)), the correlation decreases as the angle and the path length move further from the reference link.The decay is a function of d cor (decays slower for a larger decorrelation value) and γ T .If the angle is small, the paths will be affected by the same objects; thus, their correlation is high.Furthermore, the higher the path length disparity, the longer the path will be affected by more objects and thus have a lower correlation.Fig. 7(b) shows the link correlation for d 1 < d cor 2e .We can see that the correlation does not depend on the angle, only on the length ratio.This is mainly because, as the shortest link has a length much smaller than d cor , the shadowing for any other link with the same length will be highly correlated to it since it is likely that both transmitter-receiver pairs are in the same close environment.
By accounting for the link correlation as in Definition 2, the correlated shadowing L ( j) i (t) = log(S ( j) i (t)) has the following correlation properties: where σ 2 L represents the shadowing variance and ρ(r i ) are spatial cross-correlation coefficients between links, as given in Definition 2.

D. PRIOR DISTRIBUTION AND LIKELIHOOD
In this section, we derive the prior distribution of d along with the conditional and unconditional likelihoods of the correlations.We also define regression and classification estimators and examine their performance using simulated data.

1) PRIOR DISTRIBUTION
Although we are interested in the distribution of the distance d between a pair of points within the same 2D Voronoi cell, there is no clear way of deriving an analytical expression for it.Hence, we resort to numerical evaluations.Specifically, we empirically fit the distribution of d for various values of λ to the Nakagami distribution with parameters μ and ω as Thus, the PDF of d is provided as Further details are presented in the Appendix.

2) LIKELIHOOD
To obtain the likelihood function, we define ρ i as a RV denoting the correlation coefficient between the RSRP measurements transmitted by the i-th BS and received at devices 1 and 2, respectively.Thus, ρ i is given as where R (1) i (t) are the RSRP measurements from the i-th BS at devices 1 and 2 in the logarithmic domain and are given in (5).The correlation ρ i is a deterministic function of the BSs spatial distribution and the position of the devices and is given as where d (1) i and d (2) i are the path lengths between the i-th BS and devices 1 and 2, respectively.ρ i does not depend on the position of all BSs, only on the position of the i-th BS.We can translate and rotate the coordinate system, without loss of generality, such that p 1 = [0 0] T and p 2 = [d 0] T .We are interested in the distribution of ρ i |d to use it as likelihood in the proposed Bayesian estimator.Since ρ i is a deterministic function of the position of the i-th BS, its PDF can be expressed as where δ(•) is the unit impulse function, R i is the distance of the i-th closest BS to the origin, f R i (•) is its PDF, given by (3), and 2 is the angle of the i-th closest BS.We can marginalize over R i and θ i to get the PDF of ρ i |d as We evaluated the distribution of ρ 1 |d for multiple combinations of parameters and found that it is unlike any 2. U (S) denotes the continuous uniform distribution over a set S.
other known parametric distribution.Therefore, we resort to numerical evaluations of the likelihood of ρ 1 |d from now on.

E. BAYESIAN-BASED ESTIMATORS
Given that we have a method to compute the likelihood and an approximation of the prior distribution of distances within a cell, we propose a Bayesian proximity estimator.We explore two different estimators based on the Bayesian inference framework: a regression estimator to estimate d precisely and a classifier that estimates the most likely range of distances d is in, given a predefined set of ranges.Although our derivations consider the correlation ρ i of measurements from the i-th closest BS, we consider only ρ 1 in our estimators.This choice makes evident the minimal infrastructure required for a correlation-based technique.However, considering information from other BSs will likely increase performance.The regression estimator consists of a maximum likelihood estimate from the posterior and is given as Given a set of predefined classes and a maximum likelihood estimate d from the Bayesian regressor, our Bayesian classifier digitizes d, i.e., identifies which class d belongs to.It can be formally defined as follows.Given a set of convex, mutually exclusive classes {D i } N i=1 , such that they are an exhaustive, mutually exclusive partition of R + , the class estimate Ĉ ∈ {1, . . ., N} is defined as the class that contains the maximum likelihood estimate d: where 1(•) is the indicator function.We evaluate the regression estimator both in simulation and on our collected data.The classifier is evaluated on the collected data and compared against the data-driven approach in the remainder of this section.An alternative Bayesian classifier consists of computing the probability that the distance is within each class by integrating posterior over each range of distances.The binary version of this classifier is formulated as follows: In other terms, it classifies the distance as below a threshold T if the conditional probability of such a hypothesis is higher than the others.We investigate the performance of this classifier in simulations in the remainder of this section.

F. RESULTS
To validate the performance of the proposed Bayesian-based proximity estimators, we conducted extensive Monte Carlo simulations of BS positions and device positions within the same Voronoi cell and then averaged the estimation error across all scenarios.Furthermore, we compared the results obtained from our estimator with actual data generated from the conducted experiments.

1) SIMULATION RESULTS
The performance of the regression estimator and binary classifier was tested in simulations.In Fig. 8 We are also interested in evaluating the d cor (binary) classifier defined as 2 ).However, d cor is not necessarily tied to λ, and if d cor is much lower or much higher than common values of d (according to the prior distribution), the classification becomes trivial.Accuracy does not properly evaluate how good the classifier is because it does not consider how unbalanced the dataset is since, in this case, a naïve classifier that only outputs the most likely class would have a high score.A metric that solves this problem is the F 1 -score, defined as √ λ d cor , is induced.This ratio is chosen because changes in λ will not affect the final estimator performance if it is kept constant.The F-score increases as the d cor ratio decreases, meaning the estimator performs best for low values of d cor .

2) VALIDATION WITH EXPERIMENTAL DATA
To validate the accuracy of the proposed regression estimator and general classifier, we use the measurements collected in experiments 1 and 2 detailed in Section II-A.Specifically, we compute the empirical correlation across time given some distance d, denoted by ρ1 | d, from the measurements collected from both experiments.Since the distance between devices varies throughout the experiments, we separate all samples into distance ranges at the time of recording and compute the correlation for each of these subsets of samples.Nine different BS identifiers were found during the experiments.In order to estimate the BS density λ of the cellular network, we computed the area of the traversed region and divided it by the number of detected BSs, yielding λ = 5.7971 BSs/km 2 .The decorrelation distance was chosen as a value between denser semi-urban and less dense suburban areas, i.e., d cor = 15 m.These parameter choices are appropriate given that the devices go through both residential and semi-urban areas, as well as indoors.
We treat the computed correlation values as ground truth and compare the experimental results with our Bayesian maximum likelihood estimator and the exponentiallydecaying correlation function proposed in [48].Fig. 9 shows the comparison for a range of distances, where both estimators have the same parameters.Notably, using realistic parameters, our estimator fits well to most of the observed data, achieving a 2.8x lower RMSE than the well-established exponentially decaying model.
We also evaluate the general classifier's performance on the experimental data we collected.Table 3 shows the accuracy of the general classifier and compares it with the best data-driven model.The Bayesian classifier outperforms the best NN models in all 3 threshold partitions.In the binary classification case, their performances are equivalent.However, the Bayesian classifier shows significantly superior performance for higher distances, making it suitable for   scenarios where accurately identifying higher distances is desirable and longer sequences are available (minutes per sample).

IV. DISCUSSIONS AND FUTURE DIRECTIONS
In this work, we present a NN-based classifier (data-driven) and Bayesian regressor and classifiers (model-based) for proximity estimation using real CSI measurements.These models differ substantially; Table 4 compares their strengths and weaknesses.Although the Bayesian estimator requires less training data (just enough to calibrate d cor , λ, and γ ) and can reach high accuracy and precision, it requires many samples at inference time (i.e., for each distance estimation step).On the opposite end, the NN-based classifier initially requires many training samples and can return precise and fast estimates with very few samples (10 seconds from each device) at the expense of accuracy since it can only return the predicted range of values, instead of pinpointing a single one.
Although we have seen that classifier performance decreases as a higher distance resolution is sought, adding more higher-distance samples to the dataset should improve the accuracy for such cases.Compared with the data-driven model, our Bayesian estimator requires fewer parameters to tune each model and can reach high distance resolution.Since the Bayesian model relies on accurate estimates of power correlation, it requires many samples to achieve significance (around 30 times more than our data-driven models).In light of this, we believe that a NN that takes as many RSRP samples as the Bayesian model would be able to reach a distance resolution as high or higher than the data-driven model we presented.However, fast estimation (using few measurements and computations) is not a core requirement for some use cases, e.g., for contact tracing (the risk of infection is higher when people are Too Close for Too Long (TCTL)) and user profiling for advertising (quick visits to stores do not characterize significantly a user's preference).
Although we explore these approaches separately, there is much to be gained by mixing the data-driven and modelbased approaches.In practice, models are seldom purely data-driven or model-based.There is a wide variety of ways of joining these techniques.For instance, models can be mainly data-driven-the likelihood and prior could be estimated using variational auto-encoders and then applied in a Bayesian estimator-or mainly model-based-hard-coded features could be extracted from power measurements and only then fed to the NN, ensuring the model only receives relevant data.
In addition, potential future work should consider a larger dataset containing samples from a more diverse set of environments, e.g., from various cities, mobility patterns, and network densities.The ultimate goal is to arrive at a model that can perform well in unseen environments (zero-shot performance) or a general model that can be finetuned to specific environments (few-shot performance or domain adaptation).It is also necessary to investigate the impact of distribution shifts with time, i.e., whether a trained model can keep its performance indefinitely or needs to be retrained.One assumption that we considered throughout our work is that devices that are close to one another are associated with a common BS (i.e., in the same Voronoi cell).However, that is not necessarily true when devices are close to cell boundaries.Exploring this scenario is challenging, especially since RSRP measurements of secondary BSs are rarely available in LTE networks.However, 5G and future network protocols will likely contain richer CSI, making such measurements easier to collect.Hence, we believe this is a potentially promising line of work to explore in future work.

V. CONCLUSION
In summary, we investigate and solve the seldom-explored problem of cellular-based proximity estimation through the lenses of data-driven and model-based approaches.
The data-driven approach uses neural networks that estimate the proximity distance using only 10 seconds of radio power measurements from each of the subject devices, without geolocation data or any feature extraction.These NNs are classifiers that estimate the most likely distance range and are trained in a supervised manner on a dataset of power and GPS measurements from multiple phones.When classifying devices as within 5 meters from one another or not, our model can reach up to 91.18% accuracy.We also show how NNs can be trained on data from individual MIMO-enabled phones (P2S data) and still reach high accuracy without requiring predefined data collection experiments.Our lightweight model can be executed on virtually any device, increasing the practical coverage of relevant applications, such as contact tracing.As an alternative that can be applied when fewer measurements from the environment are available, we propose a model for the distribution of power correlation given distance.We then implement Bayesian estimators from this distribution.We first present their performance under simulations and show that the binary Bayesian classifier achieves high F-scores for various BS densities λ and decorrelated distances d cor , even when the classes are unbalanced.We then evaluate the Bayesian regressor and general classifier on our dataset.By comparing the regression model with the well-established exponentially-decaying spatial correlation model, we see that it fits better the empirical data using the same d cor value, reaching a 2.8x lower RMSE.The general classifier outperforms the best NN models in all cases (reaching up to 91.67% accuracy), especially in higher distances, making it suitable for scenarios where accurately identifying higher distances is desirable.However, the model-based solutions achieve such results at the expense of speed since they require multiple minutes of measurements in order to deliver reliable estimates.

APPENDIX PRIOR DISTRIBUTION OF D2D DISTANCE
There is no closed-form solution for the PDF of the distance d between a pair of points in a Voronoi cell when the cell is generated by a PPP with intensity λ.This appendix describes how to estimate the distribution of d using Monte Carlo simulations.
1) Simulate a PPP with intensity λ within an area; 2) Select valid BSs (those whose cells do not hit the simulation boundaries); 3) Generate uniformly distributed points and assign them to their closest BS; 4) For each valid BS, compute the distance between every possible pair of points within its cell; 5) Compute the histogram of all distances between points.In order to avoid the need for a lookup table with precomputed values for the PDF of d or estimation of the PDF through expensive simulations, we can approximate the distribution of d using a parametric distribution.After experimenting with several distributions, we noticed how the Nakagami distribution fits the histogram from simulations well for various values of λ.We took note of the parameters of the best-fit Nakagami distributions and noticed how μ is mostly constant and ω is inversely proportional to λ.We then identified expressions for the Nakagami parameters μ and ω as functions of λ and arrived at the distribution shown in (10), with PDF given by (11).
Table 5 shows the Kullback-Leibler (KL) divergence between empirical distributions observed from simulations and both the proposed approximation and the best-fit Nakagami distribution (obtained by maximum-likelihood estimation from the empirical data) for various values for the BS intensity λ.Although the best-fit Nakagami consistently outperforms our approximation, both distributions fit the simulated data reasonably well.The parameters of our proposed approximation are μ = 0.9, ω = 0.6/λ.The bestfit parameters are close to the proposed ones, as shown in Table 5.

FIGURE 1 .
FIGURE 1.(a) Path traversed in Ex. #1, highlighted in blue.(b) Phone #3 equipped on a bike following a car, where phones #1 and #2 are, across the campus neighborhood.

FIGURE 2 .
FIGURE 2. Diagrams of Phone-to-Phone (P2P) and Phone-to-Self (P2S) pairing.Subsequence pairs are made of slices of RSRP measurements at different antennas.Subsequences from antennas from distinct phones can be paired (P2P), as well as from antennas within the same phone (P2S).

FIGURE 3 .
FIGURE 3. Diagram of the proposed ML model.The dataset used for training the ML model consists of pairs of subsequences of RSRP and global coordinates (latitude and longitude) taken from either the same (P2S) or different phones (P2P).The time difference t between the subsequences is at most tmax , which ensures the channels are time-coherent.The NN classifier (detailed architecture shown in Fig. 4) compares both subsequences and outputs an estimated D2D distance d, which is digitized into one of the distance classes.The cross-entropy loss measures the difference between the predicted and target classes and is used by the Adam optimizer [40] to tune the network parameters, denoted by θ.

FIGURE 4 .
FIGURE 4. Architecture of the NN classifier in Fig. 3.It contains five layers.However, each of the layers' size, as well as batch normalization layers, have multiple possible configurations that are explored in a grid search (see Table2).Modules with varying hyperparameters are shown in dashed lines.

FIGURE 5 .
FIGURE 5. Cumulative distribution of D2D distance observed from experiments, y-values represent the fraction of total samples with distance lower than the x-value.

FIGURE 6 .
FIGURE 6.Typical setup represented as a Voronoi tessellation.BSs are represented as red dots.Blue lines represent cell boundaries, at which a device can be associated with multiple BSs.

FIGURE 7 .
FIGURE 7. Heatmap of the correlation between a fixed link to the origin (shown in red) and a varying link.dcor = 1, γT = 0.3.

FIGURE 8 .
FIGURE 8. Performance of the proposed Bayesian estimators in Monte Carlo simulations.
(a), we plot the RMSE of the maximum likelihood estimator of d for multiple values of λ, along with the expected distance (E[d|λ]).We note how both are exponential functions of 1/λ.They show a linear relation: RMSE ≈ 0.4 • E[d|λ].

F 1 = 2
Fig. 8(b) shows how the d cor estimator F-score changes with λ and a relation called d cor ratio, defined as d cor ratio := √ λ d cor , is induced.This ratio is chosen because changes in λ will not affect the final estimator performance if it is kept constant.The F-score increases as the d cor ratio decreases, meaning the estimator performs best for low values of d cor .

FIGURE 9 .
FIGURE 9. Comparison between exponential and Bayesian regression estimators and the correlation values from experiments.