Bayesian Active Learning for Received Signal Strength-Based Visible Light Positioning

Visible Light Positioning (VLP) is a promising indoor localization technology for providing highly accurate positioning. In this work, a VLP implementation is employed to estimate the position of a vehicle in a room using the Received Signal Strength (RSS) and fixed LED-based light transmitters. Classical VLP approaches use lateration or angulation based on a wireless propagation model to obtain location estimations. However, previous work has shown that machine learning models such as Gaussian processes (GP) achieve better performance and are more robust in general, particularly in presence of non-ideal environmental conditions. As a downside, Machine Learning (ML) models require a large collection of RSS samples, which can be time-consuming to acquire. In this work, a sampling scheme based on active learning (AL) is proposed to automate the vehicle motion and to accelerate the data collection. The scheme is tested on experimental data from a RSS-based VLP setup and compared with different settings to a simple random sampling.


I. INTRODUCTION
I NDOOR localization technologies have become indispensable in the modern world [1]. Many applications in an indoor setting benefit from or even require location awareness. The need for indoor localization technologies arises from the fact that Global Navigation Satellite Systems (GNSS) [1], which are the prevalent technology in outdoor environments, cannot provide accurate localization indoor due to signal attenuation and multipath effects. Examples of use cases for indoor positioning systems can be found in consumer electronics, where indoor navigation assistance and location-based advertising are of interest. Further use cases present themselves in industry where accurate indoor localization is of interest, amongst others, Manuscript  for asset tracking in manufacturing, preventing accidents and indoor fleet monitoring. Due to the fact that indoor environments are typically more complex when compared to outdoor settings, there is currently no single prevalent technology for indoor positioning. As a consequence, several competing technologies exist which in turn are typically tailored for a specific set of use cases. The most common ones that are discussed in literature [2], [3] are based on Radio Frequency (RF) signals [4], [5], [6], acoustic signals [7], [8], optical wireless-based signals [9] and vision-based systems [10]. The typical signal features that are used for positioning are the Received Signal Strength (RSS), Time of Arrival (ToA) and Angle of Arrival (AoA). Based on such features, the most common techniques that can provide a location estimate are circular lateration, angulation, finger printing and Machine Learning-based models. Machine Learning (ML) methods have become an important tool for improving indoor localization technologies. A major benefit of ML is that it is data-driven and does not require meticulous quantification of every parameter in the propagation model to achieve accurate and robust positioning. ML has been used in the context of indoor positioning using both supervised and unsupervised methodologies [11], [12] for, amongst others, NLOS-detection [13], feature extraction [14] and for directly providing location estimates [15].
Due to the ubiquitous deployment of LEDs in indoor environments and their favourable properties for providing wireless communication [16], optical wireless-based positioning technologies have gained significant research interest. In the context of optical wireless-based positioning, the solutions using infrared light [17] and visible light [18] are commonly reported in the literature.
The receiver hardware that is typically used for data acquisition are a single photodiode (PD), an array of PDs [19], a quadrant PD [20] equipped with an aperture to provide angular diversity and a camera/image sensor [21]. In this work, a localization approach using visible light is employed, which in literature is more generally known as Visible Light Positioning (VLP). It uses LEDs as transmitters, a single photodiode as receiver sensor and it leverages the RSS as a feature for providing data-driven positioning based on Gaussian process regression [22]. In recent years, the interest for the application of ML in the context of VLP has risen significantly. Previous work using ML models for positioning has shown that data-driven approaches can drastically outperform classical multilateration-based methods This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ which require a full description of the propagation model both in terms of accuracy and robustness [23]. Furthermore, in non-ideal light propagation scenarios, classical VLP techniques struggle to provide accurate localization while ML approaches can easily cope with these environments. The feasibility of VLP has also been proved for 3-D position estimation, using either classical multilateration methods [24] or ML models [25].
In the literature on VLP, ML techniques such as Artificial Neural Networks (ANN) [26], Gaussian Processes [27], Support Vector Machines (SVM) [28] and K-Nearest Neighbours (KNN) [29] have already been employed to improve localization accuracy and robustness. In previous related work, the focus predominantly has been on model selection and data efficiency in order to achieve the best possible localization performance. Other papers have been focusing on possible optimizations in datadriven VLP approaches, using for example simulation-based data or interpolation to augment available dataset and describe methods on how to scale the ML approaches to large setups [30], [31]. However, these approaches only aim at reducing the number of samples, without directly optimizing the trajectory of the receiver. Moreover, a very important aspect that is not yet sufficiently investigated, is how to efficiently acquire a training dataset in the offline phase of ML-based approaches and how to guarantee that the acquired dataset is representative for the setup where localization is to be deployed. This work addresses this problem and offers the following original contributions. First, a new Bayesian active learning [32] scheme is proposed, using two stochastic ML models: one model drives the vehicle towards the positions where the RSS values are most uncertain, while the other model provides the position estimation using the RSS samples collected along the way. Second, two different position estimation techniques are analysed and compared. Third, a linebased sampling strategy is introduced to enhance the efficiency of the Bayesian scheme: multiple samples are collected along the linear motion of the vehicle, between two consecutive positions of interest. This strategy allows for a greater estimation accuracy, for the same distance travelled by the vehicle.
The rest of the paper is organized as follows. Section II formally states the problem of defining a sampling strategy in RSS-based VLP setting and it provides an overview of typical sampling strategies. In Section III, the new Bayesian AL methodology is described in detail and its main functional blocks are analysed. Next, Section IV reports the results of the new strategy, when tested on RSS data collected in a suitable experimental setup. The strategy is also compared with different settings against a simple random sampling. Finally, conclusions are drawn in Section V.

II. PROBLEM STATEMENT
In the typical RSS-based problem setting, the vector of the received signal strength i, hereby denoted as intensity values, is dependent from the position x of the receiver. This dependency can be represented by a direct mapping f : where N i is the number of light transmitters and N p = 2 in a 2-D reference system. The goal of the RSS-based VLP is to estimate the position x given the intensity values i. In other words, the goal is to identify an inverse mapping g: Note that inverse mapping g exists only if the direct mapping f is bijective in the domain of possible positions U .
In the classical RSS multilateration scheme, the inverse mapping is analytically obtained as the solution of a non-linear system of equations. Then, the system can be solved with a least squares optimization [33], for each possible i. Instead, ML techniques can provide an approximate inverse modelĝ ∼ g that estimates the position for any vector of intensities. The position estimation via a ML model is convenient when the defining analytical relations between i and x are infeasible and require heuristics to be exactly quantified, due to the high complexity of the VLP environment.
The main drawback of ML models is the necessity of a sufficiently large set of measured samples S = {i s , x s } N s=1 , which can be time-consuming to collect. In fact, in the VLP setting, the receiver must perform a physical movement through space in order to measure intensity samples at different positions. Furthermore, collecting samples without an appropriate sampling strategy may lead to insufficient performance in locations where the setup does not provide a reliable RSS coverage. Fortunately, a sampling strategy can be design such that the accuracy of the inverse modelĝ is maximized, while limiting the time spent to move the receiver.

A. Sampling Strategies
The main task of the sampling strategy, known as exploration, is to select positions x that are sufficiently spread, in order to maximize the modelling accuracy in the whole domain. Exploration can be easily performed with simple one-shot strategies such as random search, grid search (or boustrophedon [34]), or latin hypercube design [35]. One shot-strategies select a fixed number or samples that are spread in the domain, based on geometrical criteria. They allow to rapidly identify the sampling positions before any measurement is executed. However, they require a good prior estimate on the number of sample to draw: scarce samples would produce inaccurate models (undersampling), while excessive samples would need long measurement times and provide little additional information (oversampling). Consequently, one-shot strategies typically need to be adapted to the problem setting with several trials.
A more efficient way to select sampling positions is via active learning (AL), also known as adaptive sampling [32]. Using AL, a stochastic ML model can be trained to predict the intensity values and their confidence interval, for any position that is not yet sampled. Then, by interrogating the stochastic model, it is possible to sequentially choose new positions x that mostly reduce the model uncertainty about the intensity i. In the considered VLP setup, such AL task can be performed by a stochastic extension of the direct mapping f . In fact, the direct mapping already provides the intensity values for any position of the receiver.
The next Section presents a new strategy to obtain an accurate inverse modelĝ by collecting samples using AL on a model of the direct mappingf ∼ f . This strategy is based on the following assumption: samples that reduce the uncertainty of thef are also beneficial to improve the inverse modelĝ.

III. METHODOLOGY
The previous Section introduced two models of the relation between positions and intensities in a RSS-based VLP setup. One is the inverse modelĝ, that performs the position estimation task, given the intensities measured by the receiver. The other is the direct modelf that allows to define an AL sampling strategy in the space of positions. In this work, the Gaussian process (GP) [22] is proposed for both direct and the inverse model.

A. Gaussian Processes
The Gaussian process provides a stochastic representation of an observed function, given a set of data samples. In fact, the GP assumes that each function sample is a realization of a random variable, obeying to a prior Gaussian distribution. In addition, the correlation among the input samples is represented by a user-defined covariance function, also known as kernel. Subsequently, the GP allows to compute a posterior probability distribution for the function value at any input sample that is not yet observed. The posterior is also Gaussian and can be obtained analytically via Bayesian inference [22]. In other words, for each new input sample, the GP returns an expectation value and a confidence interval, which are represented by the mean and the variance of the posterior, respectively.
The main advantage of the GP is that is non-parametric: unlike ANN and other popular ML models, it does not require trainable parameters to represent the observed function; instead the GP model stores the mutual information between any pair of input samples, in the form of a covariance matrix. Consequently, the GP is highly accurate and computationally cheap with low amount of data samples. These properties allow the GP to efficiently perform active learning tasks, when the modelled function is defined over low-dimensional spaces.
In RSS-based VLP, the GP has demonstrated high accuracy in the inverse modeling, and high robustness to the degradation of the receiver's sensor [15]. Thus, in this contribution, the GP is also applied to model the direct mappingf for the selection of sampling positions.

B. Active Learning
Based on the Gaussian process model off , the active learning scheme in Fig. 1 is considered. First, few initial samples are collected by moving the receiver and measuring the emitter's intensity until the position x N 0 is reached. Second, a direct GP model is trained to predict the intensity values for any position. At this point, the algorithm ends if a stop condition is met. Otherwise, an acquisition function α(x) assigns an importance score to any position, based on the prediction of the direct GP model. Then, the receiver is commanded to move towards the position x N 0 +n s that maximizes the acquisition function: Along the receiver trajectory, new intensity samples are collected at the positions [x N 0 +1 , . . . x N 0 +n s ] and added to the initial set. Then, the direct GP model is updated with the new data. This iteration is repeated until the stop condition is met.

C. Inverse Modeling Methods
At each iteration of the described AL scheme, the inverse modelĝ can be build to predict any new positions, based on the samples collected so far. A second GP is proposed for this task:ĝ = E[GP (i)]. In fact, as shown in [15], the GP has been successfully used as inverse model in RSS-based VLP.
A more simple method to obtain inverse modeling predictions is to perform a Monte Carlo sampling over the direct model. This method is referred to as direct random search (DRS) further on. The first step of DRS is to compute thef prediction for many random query positions U T ; second, the predicted intensities are compared to the receiver's current measure. Then, the estimated position is approximately the position for which the predicted intensities are closer to the measured values. In other words, the inverse modelling task can be executed as: where | · | is the Euclidean distance. Note that a GP is used as This inverse prediction approach has already been employed in [36] to define an acquisition function in an inverse problem setting. However, in [36] the active learning strategy aims at finding the best inverse solution to only one observation, rather than training an inverse model for the whole observable domain.
The advantage of the DRS method is that it does not require the training of an additional model. However, the direct model has to be interrogated for a large number of positions, which need to be carefully chosen to avoid excessive computational cost. Moreover, in contrast with the inverse GP model, it does not provide a confidence interval for the estimated positions.
This strategy allow to increase the number of collected samples, while keeping the same total travel distance.

A. Experimental Setup
The proposed active learning strategy is applied in an experimental VLP constellation. The setup was built in an industrial logistics test environment, pictured in Fig. 2, which in total spans 6 m x 4 m where four LED-based transmitters are installed on the ceiling at a height of 5.71 m above the plane in which the receiver moves. The transmitters are installed in a rectangular pattern and provide VLP coverage for the entire area. However, for the evaluation of the sampling strategies in this work, only a subset of the area spanned by the four LEDs in the center of the setup was considered. The setup is described in detail in [37].
In order to test the different sampling strategies, a dataset of positions and RSS values is collected by manually moving the receiver across the room (Fig. 3). Along the receiver's trajectory, 31511 RSS-positions pairs are measured. The ground truth position of the receiver is measured by a high-accuracy LIDAR localization system, while RSS values are measured as the absolute intensity of the light on the receiver's sensor. The LIDAR localization system, which have been also employed in [37], guarantees a positioning precision of ±0.02 m.
Subsequently, the simulation of any sampling strategy can be performed by extracting intensity values at the desired positions via Delaunay interpolation over the experimental dataset. For simplicity, the simulation is confined to a rectangular patch of 4 m × 3.5 m in the room, represented by the green patch in Fig. 3. In addition, a validation set {(i t , x t )} N t t=1 is obtained by interpolating the collected samples over a grid of N t = 50 × 50 sampling positions in the rectangular patch. Fig. 4 represents the validation set intensity values across the positions, for each light emitter.
Using the same collected data, the strategies can be consistently evaluated, without being affected by time-dependent variations of the environment. The sampling strategies are also examined using RSS in form of relative intensity i rel , according to the following definitions: where i is the measured intensity n is the emitter index. Note that i rel values are scaled in the range [0,1]. Subsequently, the active learning scheme in Fig. 1 is tested. First, an initial sample is drawn for a random position of the receiver inside the room. Next, for 30 iterations, the receiver is  The proposed scheme is executed using the maximum variance strategy and line-based strategy, as described in Section III-D. For reference purpose, a random sampling strategy is also tested: the direct GP and the inverse modelling methods are updated progressively, drawing intensity samples at random positions. These evaluations are performed by using a squared exponential kernel [38] for the GP models and 2048 random query positions for the DRS method. Furthermore, a default step size d = 0.5 m is chosen for the line-based strategy.

B. Results and Discussion
A first verification of the basic assumption of the proposed AL scheme is performed: the best samples for the direct model also improve the accuracy of the inverse model. For this purpose, the mean relative error (MRE) and the maximum absolute error (MAE) are computed at each iteration between the models' prediction and validation samples, for different sampling strategies: where x t , i t are the test data samples, | · | is the 2-D Euclidean norm andĝ k is the inverse model prediction at the k-th iteration. Then, the MRE values are averaged across 10 runs, for different starting positions, and reported in Fig. 5. Here, it is evident that when the absolute intensities are used ( Fig. 5(a)) both the direct GP model and the DRS method gradually improve, while the inverse GP model is highly inaccurate for many iterations. On the contrary, by using relative intensities ( Fig. 5(b)-(d)), the direct GP and the inverse modeling methods improve simultaneously, for any sampling strategy. A possible explanation for this result is the following: using the absolute intensity, samples may be too close in the input space of the inverse GP model; this can cause numerical errors in the GP, due to kernel matrix inversion. Next, the cumulative density function (CDF) of the position error is evaluated on the validation set after a 40-meters travel of the receiver, for all the sampling strategies and for both the inverse modeling methods. The obtained CDFs are shown in Fig. 6. The Figure indicates that the best positioning accuracy is reached using the line-based sampling and the GP as inverse   model. Therefore, the line-based sampling appears to be the most efficient method with respect to the total travel distance. Table I reports the p50 and p95 values of CDF, together with the defined MRE and MAE metrics, for the examined sampling strategies and inverse models. Interestingly, the positioning error using the line-based sampling is lower than 0.10 m for 95% of the test samples. Moreover, the DRS method appears significantly less accurate than the inverse GP model, for any strategy.
In addition, Fig. 7 analyses the position error in the line-based strategy, for different step sizes. It is apparent the improvement of the inverse GP model is gradual, but marginal for a step size smaller than 1 m. Note that small steps correspond to a high number of new samples at each iterations; since the GP model complexity grows as O(n 3 ) with the number of training samples, smaller step sizes may require different modeling techniques in order to avoid excessive computational costs. The effect of the step size can also be evaluated in a typical positions scanning over an orthogonal grid, also known as boustrophedon [34]. For this purpose, the inverse GP model has been trained on a grid of points with different step sizes. Then, the p95 error is plotted in Fig. 8 against the step size, for both the grid points and the line-based samples. Note that for a fair comparison, the line-based sampling is stopped when reaching the same travel distance of a complete grid with same step size. Fig. 8 shows that  the line-based sampling consistently produces more accurate inverse models than grid sampling. For illustration purpose, an example of receiver trajectory for different sampling strategies is provided in Fig. 9. This figure also shows the intensity of the first emitter predicted by the direct model, after collecting 30 samples (red dots). It can be observed the the maximum variance sampling produces a more uniform coverage of the position space. However, the line-based sampling produces a more regular trajectory using the same number of samples.
Finally, in order to corroborate the choice of Gaussian processes in the proposed line-based sampling strategy, other machine learning techniques are tested as inverse models. This analysis is performed by simply swapping the Inverse GP model in the AL scheme ( Fig. 1) with other models, namely ANN [26], SVM [28], KNN [29] and Gradient-boosted Trees (GBT) [39]. The results are reported in Table II, from which the GP emerges as the most accurate inverse modeling technique, confirming the findings in [15]. In this work, a new active learning strategy has been applied in a RSS-based Visible Light Positioning setting. The AL strategy employs a direct GP model that select the most interesting positions to be measured, based on the maximum variance approach. Subsequently, an inverse GP model is trained to predict the position of the moving receiver given the measured intensity values, using the previously collected samples. In this method, expressing the collected samples in the form of relative intensity values prevents numerical instabilities in the inverse GP model. In addition, the efficiency of the sampling strategy is enhanced by collecting multiple samples on a straight line between the indicated positions. The resulting line-based approach outperforms the random sampling strategy and the basic maximum variance sampling. After 30 iterations, the inverse GP model is able to estimate the position with an error lower than 0.1 m in 95% of the space, while the maximum recorded error is smaller than 0.2 m. Further research is needed to assess the robustness of the technique to different environmental conditions, such as light noise, sensor degradation and obstacles between the receiver and the transmitters.