Online Learning of Any-to-Any Path Loss Maps

Learning any-to-any (A2A) path loss maps might be a key enabler for many applications that rely on a device-to-device (D2D) communication, such as vehicle-to-vehicle (V2V) communications. Current approaches for learning A2A maps have a number of important limitations, including i) a high complexity that increases rapidly with the number of samples, making the problems quickly intractable, and ii) the inability of coping with a time-varying environment, among others. In this letter, we propose a novel approach that reconstruct A2A path loss maps in an online fashion. To that end, we leverage on the framework of stochastic learning to deal with the sequential arrival of samples, and propose an online algorithm based on the forward-backward splitting method. Preliminary simulation results show a significant decrease in complexity, while its performance is comparable to that of a batch approach.


I. INTRODUCTION
M ANY applications in wireless networks can benefit from information related to the spatial distribution of path loss. Among them, applications involving Device to Device (D2D) communication are the most challenging because of fast increase of communication links when the number of nodes grows. Such applications include machinetype communications (MTC) or vehicle to vehicle (V2V).
As an example, consider a platoon of vehicles that have to constantly exchange information about their position, acceleration, and so on. If the path loss between any two vehicles along the route was known in advance, this information would give the vehicles enough time to adapt their distance accordingly and save a considerable amount of fuel [1]. Other benefits include reliability of communications and safety. However, in realistic networks, the acquisition of all path loss values in D2D communication is infeasible mainly because we would need to measure the path loss between any location to any other location of a map. In addition, we have to take into consideration that measurements can be outdated due to changes in e.g., the propagation environment, network density or weather conditions, to name a few. In general, any-toany (A2A) maps are very suited for those applications, but the Manuscript received December 9, 2020; accepted January 1, 2021. Date of publication January 8, 2021; date of current version May 6, 2021. This work was supported by the Federal Ministry of Education and Research (BMBF) of the Federal Republic of Germany as part of the AI4Mobile project (16KIS1170K). The associate editor coordinating the review of this letter and approving it for publication was Z. Zhao. ( challenge is to cope with a rapid increase of complexity with the size of the map while keeping high prediction accuracy. The learning of radio maps has been a major topic of interest both in academia and the industry for years [2], [3]. In recent years, the framework of tomographic projection technique (TPT) has gained a great deal of attention as a model that characterizes the long-term shadowing of links caused by objects such as buildings or trees [4], and in turn this shadowing is used as a proxy to characterize the path loss. In TPT, a spatial loss field (SLF) captures the absorption generated by objects in a field, while a window function models the influence of each location on the attenuation that a link experiences. The shadowing is then modeled as the weighted integral of the SLF across the field. A line of research [4] has exploited the high correlation in SLF of nearby locations, using Kriging interpolations to estimate radio maps. A different approach exploits the concept of the Fresnel zone [5], [6]. However, the main limitation of these approaches is the fact that they rely on a fixed model, and models cannot track reality under all circumstances.
More recently, the authors in [7] proposed an algorithm that learns both the SLF and the window function in a blind manner, i.e. no model is assumed and both the SLF and the window function are learned. In [8], this approach is further improved by capitalizing on the fact that both the SLF and the model are assumed to be block-sparse. A problem with elastic net regularization and multi-kernels is formulated, and an algorithm based on the ADMM is used to obtain a solution.
Both algorithms in [7], [8] have a key limitation since they batch algorithms, i.e. upon arrival of new measurements, the algorithms are carried out again including all the historical data, which increases the complexity dramatically. This poses a tremendous hurdle for real-world applications because i) in both approaches the problem complexity and the number of variables that have to be stored in memory increases cubicly with the number of pixels in the map, and ii) because they are not able to track accurately a changing environment over time due to e.g. different traffic profiles or change in the underlying map. As a motivational example, in both approaches [7] and [8], for a map of 10 × 10 pixels, we already require the storage of floating-point vectors with up to 495K entries, and such vectors increase to 31.92M for 20 × 20 maps and to 364.095M for 30 × 30 maps. The reason for this is that TPTs require the storage of structures that track the relationship between every link and every pixel, so if an A2A map is divided into P pixels, we have P (P − 1)/2 possible links assuming reciprocity, and we need to track P 2 (P − 1)/2 variables. Such rapid increase in complexity renders batch methods inadequate for real-world applications.
Because the approach proposed in this letter is based on an online algorithm, it can overcome these limitations. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ More precisely, we extend the work in [8] and pose an optimization problem that, upon arrival of new measurements, aims at obtaining new estimates of both the SLF and the window matrix. We do this by minimizing the least squared error regularized by elastic nets [9]. We leverage the framework of stochastic learning to address the limitation of the sequential arrival of samples. The original problem is highly ill-posed, so we impose additional structure on the problem by considering a non-linear kernel method. We propose an algorithm based on the descent version of an alternating minimization where we take one step of the forward-backward splitting method to update the SLF, followed by another step to update the model.

II. SYSTEM MODEL
Consider a two-dimensional area A ⊂ R 2 . We model the long-term average path loss between any two points x i , where δ > 0 is the path loss exponential decay, s : R 2 ×R 2 → R represents the shadowing between two points, and s > 0 is a scalar that accounts for the error in the measurements. As in [5], [7], we model the shadow fading based on the TPT: 2 are functions that measure the length of the direct link and the length of the path going through an intermediate point, respectively; P x , P y is the number of horizontal and vertical pixels of the map, respectively, and x p is the coordinate of the pixel p ∈ {1, . . . , P }, with P = P x P y . Intuitively speaking, the shadowing between any two points in a map is potentially influenced by the SLF at any point of the map. This assumption is due to the multi-path nature of radio signals propagation. To capture these effects, the SLF is weighted by a window function that models the influence of each position on a link.
We now proceed to write (2) in a matrix form. To this Assuming channel reciprocity in the path loss between any two points, the total number of links T in a map of P pixels is T := (P (P − 1))/2, and the index set M of all links is M := {1, . . . , T }. We define a bijective mapping L : P × P → M : m → P (j − 1) + i that maps any two indexes i, j ∈ P onto an index m ∈ M. By W ∈ R T ×P we define a matrix containing all possible weight values of the map, where w m,p : The shadow fading vector s ∈ R T is generated by stacking all possible values of s(x i , x j ), such that: III. PROBLEM STATEMENT Consider that measured path loss values arrive at a central entity at different time instants. Letŝ t ∈ R Mt be the (noisy) shadowing measurements acquired at time instant t ∈ N, and Ω t is the measurements index set with cardinality M t . For the sake of simplicity, we assume that M 1 = M 2 = · · · = M for the reminder of this manuscript. The elements inŝ t represent a selection of all elements contained inŝ ∈ R T , which is in turn a vector containing all possible (noisy) measurements in a map. Analogously, the matrix W t ∈ R M×P denotes the weight matrix whose rows are related to the link measurements received at time t.
The proposed approach is based on the assumptions that both the vectorized weight matrix vec(W ) and the SLF vector f are block-sparse vectors [10]. The block-sparsity of vec(W ) results from the fact that far pixels from a link have little or no impact on the shadowing experienced by the link, while the block-sparsity of f is justified by the fact that most pixels of a map represent the free space, whose absorption value is negligible compared to the absorption of solid bodies, and therefore assumed to be zero. Further, non-zero entries of f are those belonging to walls and other physical structures, therefore they are concentrated in groups.
In light of the above assumptions, an intuitive approach to estimate the weight matrix and the SLF is to minimize the least squared error regularized by elastic nets to improve blocksparsity.
Previous studies such as [7], [8] in this particular application domain have shown that attempts at minimizing W fail to give good results because the problems are in general severely ill posed. To address this limitation, we impose additional structure on W by considering a non-linear kernel approach similar to [8].
With some abuse of notation, we define the vector c : as a two-dimensional vector with arbitrary c m , d m,p stacked together; and we assume that the window function can be written as a function of a positive definite kernel in the following form: (4) where α q(m,p) ∈ R are appropriate scalars to be determined, are the ordered elements of Ω t , and q(m, p) is an index obtained as q(m, p) := P (m − 1) + p. In particular, in this study we use radial basis function (RBF) as kernel: where σ is the width of the kernel. Define the kernel matrix K t ∈ R MP ×MP with (q, q ) element given by κ q,q , and also define the vector α t as α t = [α 1 , · · · , α MP ] .
Considering the non-linear kernel approach, we can write an optimization problem over f and α τ , τ = 1, . . . , t as follows: where μ 1 , μ 2 , λ 1 and λ 2 are selected regularization parameters [9], and A f = I M ⊗ f ∈ R M×MP is the Kronecker product of the identity matrix I M ∈ R M×M and f .

IV. ONLINE SLF LEARNING
Problem (5) is not jointly convex in f , α 1 , . . . , α t , but it is convex in f if α 1 , . . . , α t are fixed, and vice versa. Therefore, we consider an alternating minimization strategy to address (5) where, at time t of arrival of new measurements, one set of variables is optimized while the remaining ones are kept constant until all sets of variables have been addressed or a solution is obtained.

A. Addressing the SLF Subproblem
To address the SLF subproblem, we define (∀t ∈ N), (∀α 1 , . . . , α t ∈ R MP ) a new functionȟ t : R P → R such that: where A α τ = M n=1 e n ⊗ (α τ K τ ) ∈ R M×P , and e n ∈ R M is a unitary vector with all zeros but the nth entry one.
The motivation behind this approach lays on the fact thať h t (f ) is convex ∀f ∈ R P , since it represents the sum of 2 and 1 norms, and the sum of convex functions is also a convex function [11]. We can rewriteȟ t (f ) in a more convenient way for our online algorithm in the following way: where Tr(·) is the trace of a matrix, C = Tr(ŝ tŝ t )/2t,Ā t = t τ =1 A ατ A α τ ∈ R P ×P , and b t = t τ =1 A ατŝ τ ∈ R P . Note that the structuresĀ t and b t do not change size with increasing t. This suggests an algorithm in which, at time t, we keep track and updateĀ t and b t , and a new estimatef t is found after minimizing functionȟ t (f ) in (7) w.r.t. f . Note that the functionȟ t is coercive, proper and strongly convex, thereforef exists and is unique ∀α 1 , . . . , α t ∈ R MP [12]. The functionȟ t can be expressed as the sum of two functions g 1 and g 2 , where g 1 is convex and differentiable, while g 2 is convex but non-smooth. More precisely, define and The problem min fȟt (f ) can be then expressed as: This kind of problems are well understood and there is a plethora of algorithms to solve them. We propose using the forward-backward splitting method [13]. It can be shown [13] that Problem (9) admits one solution and that, for a certain γ ∈]0, [ and > 0, its solution is characterized by the fixed point equation f = prox γ,g2 (f − γ∇g 1 (f )), where prox γg2 is the proximal operator of g 2 with attracting factor γ. An iterative solution to (9) is then given by where soft λ1 (·) is the soft thresholding function with threshold λ 1 , and n ∈ N is the iteration index.

B. Addressing the Subproblem to Learn the Model
Unlike the minimization of f in Problem (5), the minimization over α 1 , . . . , α t is not coupled in the summation of functions through its variables, meaning that we can separate Problem (5) with respect to α 1 , . . . , α t , while keeping f fixed.
Similar as in the previous section, we define the problem of minimizing the sum of two convex functions ∀f ∈ R P , where k 1 is continuous and differentiable, and k 2 is continuous but non-smooth. The problem of estimating α t becomes min.
Using again the forward-backward splitting method, we obtain the following iterations: where a (n) t t , and β is the attraction factor of the operator prox βk2 .

C. Algorithmic Solution
The missing piece in the online SLF learning problem is the combination in an algorithm of the stochastic estimatesf t of (8) with the iterative solutions of f (n) t and α (n) t in (10) and (12), respectively.
One important caveat of the online algorithm is that we implement a "descent" version of the outlined alternating minimization process. This means that instead of running the iterations in (10) until a stopping criterion is met, and then proceed with the iterations in (12) again until improvements are small enough, we take only one step at a time of the iterations (10), and another step of iterations (12), alternatively until a combined stopping criterion is met. The rationale behind this strategy is that the improvement from f (1)  might not be relevant enough to justify finding the optimal solutions of the two convex sub-problems in α t and f t alternatively. Indeed, simulations for this particular application have consistently shown better performance and shorter execution time with the "descent" strategy.
Assume that i) the samplesŝ 1 ,ŝ 2 , . . . are i.i.d. samples drawn from a common distribution p(ŝ) with compact support, ii) the iterates (f t ) ∞ t=1 are contained in a compact set, iii) γ < 2L −1 g for any iteration in (10) and any t, and iv) β < 2L −1 k for any iteration in (12) and any t, where L g and L k are the Lipschitz constants of ∇g 1 and ∇k 1 , respectively. Finally, sinceȟ t is expected to be close toȟ t−1 for large values of t, so are under suitable assumptionsf t andf t−1 , which makes it efficient to usef t−1 as "warm" initialization for computing f t . Our procedure is summarized in Alg. 1.

Algorithm 1 Online Algorithm
, since updates (12) and (10) decrease monotonically the objective function. Further, h t (f , α 1 , . . . , α t ) is bounded below since it represents the sum of non-negative functions. Therefore, we can state that Alg. 1 converges in the objective.

D. Complexity
The complexity of Alg. 1 is dominated by the two matrix multiplications (one to compute α (n+1) t , the other one to compute f (n+1) t ) required in each iteration n, which we assume O(n 3 ). We also assume that the stopping criterion in both cases is given by a maximum number of iterations N . The complexity is then given by O(t max N (P 3 M 3 + P 2 M )).

V. NUMERICAL EVALUATION
To assess the performance of the proposed algorithm, we simulate a vehicular network whose users communicate within a synthetic map based on the Madrid scenario [14]. The original scenario has a size of 140 × 97 meters, and we discretize it into a 13×9 map, with each pixel being 10.8×10.8 meters of size. The map has four 3 × 2 buildings, one 3 × 2 park, and other four 3 × 1 buildings. The rest of the scenario represents roads connecting the different parts of the map. The normalized SLF at each location, i.e. the attenuation that a link experiences while crossing that location, is set for buildings at 1, for the park at 0.1, and for road pixels at 0, since the SLF of the air is negligible. Vehicles are only allowed to be at road locations, which means that no measurements inside the buildings and park are acquired. After 100 iterations, we change the map to evaluate how the online algorithm adapts to changes in the underlying structures. Figures 1a and 1b show the SLFs of the two scenarios. To generate a synthetic window function, we use the normalized elliptical model from [5]: where η is the signal wavelength. We set the wavelength to η = 0.1499m in our simulations. The maximum number of vehicles, which coincides with the total number of road locations, is P tx = 72. The total number of links in the map is given As evaluation metric, we use the expected normalized mean squared error (NMSE) of the reconstructed vectors, given by where v is any vector andv its reconstructed version.
We compare the proposed algorithm with the batch approach presented in [8], for which all the possible 2556 samples are used. Figure 3a shows the NMSE of the three reconstructed structures for the online algorithm, namely the normalized SLF vector f , the vectorized window matrix w := vec(W ), where each entry is obtained from the vectors α as in (4); and the shadowing vectorŝ calculated from f and W as in (3). Note that, although we are focused on the reconstruction ofŝ, f conveys important information that can be exploited in other applications, since it resemblances a physical map.
First, Figures 2a and 2b show the reconstructed SLF of the proposed algorithm after 100 and 200 iterations, respectively. The resemblance to the original maps in Figs. 1a and 1b is clear. In more detail, the dashed lines in Fig. 3a represent the performance of the batch algorithm for each of the three structures in question. There are two distinct regions with respect to the number of iterations displayed in Fig. 3a: from iterations 1 to 100, and from 101 to 200. The region in the left  represents the results of the algorithms with the original map in Fig. 1a. At iteration 101, we modify the map and change the park for road locations (i.e. changing the SLF of the park pixels from the original 0.1 to 0, as shown in Fig. 1b) in order to analyze how the online algorithm adapts to changes.
Note the non-monotonicity of the online reconstructed structures even when the map does not change. This is because of the nature of stochastic optimization. For iterations 1 to 100, the batch algorithm performs better than the online version for the three structures. In any case, the error of the reconstructed vector f seems to converge to the error of the batch approach. At iteration 101, one can clearly see the effect of the map changing. The online algorithm shows an instant decrease of performance, but, in a few iterations, it is able to track the new scenario. The NMSEs decrease to similar values as before the change happened. For the contrary, the performance of the batch algorithm degrades considerably, showing higher error forŝ f and w compared to the online algorithm. This is due to the nature of the batch algorithm. The batch algorithm cannot cope with a changing environment, because once it obtains a solution, such solution cannot be updated without running the algorithm with complete recalculations. Therefore, a change in the environment means that the obtained solution diverges from the ground truth with the corresponding loss in precision.
In order to compare the complexity of the online approach with the batch method in [8], note first that the complexity of the online algorithm for these numerical evaluations is O(t max N (P 3 M 3 + P 2 M )), while for the batch approach is O(N (P 3 T 3 tx + P 2 T tx )). Since in general T tx is much larger than both M and t max (here T tx = 2556, M = 67 and t max = 200), we state that the computational complexity for the online algorithm is several orders of magnitude smaller than the batch one. To analyze in more detail the reduction of computational complexity, we show in Fig. 3b a comparison between the baseline and the online approach for different map sizes. Concretely, we run simulations for maps of size 5 × 5, 10 × 10, 15 × 15 and 20 × 20 pixels, with T tx = 100 in case of the batch algorithm, while M = 4 and t max = 25 for the online one. Simulations were run on a 64-bit Intel(R) machine Core(TM) i7-6600U CPU with 2 cores (4 virtual cores) at 2.60GHz with 16GB RAM of memory. One can clearly see in Fig. 3b the exponential increase of computational time for the batch algorithm, while for the online approach the increase is almost linear.
VI. CONCLUSION In this letter, we have addressed the online learning of path loss maps. More concretely, the original problem is defined as the minimization of the least squared error of the measurements and the TPT-based model, regularized by the elastic net, i.e. the linear combination of the 1 and 2 norms.
The problem is highly ill-posed, so we add certain structure by considering a non-linear kernel approach. We propose an online algorithm based on stochastic optimization and alternating minimization to tackle the challenge posed by the sequential arrival of samples. Simulation results comparing the proposed algorithm with a batch method from the literature show that a reasonable performance compared to the baseline can be achieved, while greatly reducing the complexity.