Assessing Wireless Sensing Potential with Large Intelligent Surfaces

Sensing capability is one of the most highlighted new feature of future 6G wireless networks. This paper addresses the sensing potential of Large Intelligent Surfaces (LIS) in an exemplary Industry 4.0 scenario. Besides the attention received by LIS in terms of communication aspects, it can offer a high-resolution rendering of the propagation environment. This is because, in an indoor setting, it can be placed in proximity to the sensed phenomena, while the high resolution is offered by densely spaced tiny antennas deployed over a large area. By treating an LIS as a radio image of the environment relying on the received signal power, we develop techniques to sense the environment, by leveraging the tools of image processing and machine learning. Once a holographic image is obtained, a Denoising Autoencoder (DAE) network can be used for constructing a super-resolution image leading to sensing advantages not available in traditional sensing systems. Also, we derive a statistical test based on the Generalized Likelihood Ratio (GLRT) as a benchmark for the machine learning solution. We test these methods for a scenario where we need to detect whether an industrial robot deviates from a predefined route. The results show that the LIS-based sensing offers high precision and has a high application potential in indoor industrial environments.


I. INTRODUCTION
M ASSIVE multiple-input multiple-output (MIMO) is one of the essential technologies in the 5th generation of wireless networks (5G) [1]. Compared with traditional multiuser MIMO systems, the base station of a massive MIMO system is equipped with a large number of antennas, aiming to further increase spectral efficiency [2]. Looking towards post-5G, researchers are defining a new generation of base stations that are equipped with an even larger number of antennas, giving raise to the concept of large intelligent surface (LIS). Formally, a LIS designates a large continuous electromagnetic surface able to transmit and receive radio waves [3], which can be easily integrated into the propagation environment, e.g., placed on walls.
Recently, the performance of LIS in communications problems has received considerable attention, and several works analyze the use of these surfaces for, e.g., data transmission [3][4][5][6][7][8], physical layer security [9][10][11] and positioning [12,13]. However, the potential of LIS could go beyond communications applications. Indeed, such large surfaces contain many antennas that can be used as sensors of the environment based on the channel state information (CSI).
Sensing strategies based on electromagnetic signals have been thoroughly addressed in the literature in different ways, and applied to a wide range of applications. For instance, in [14], a real-time fall detection system is proposed through the analysis of the communication signals produced by active users, whilst the authors in [15] use Doppler shifts for gesture This work has been submitted to IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 813999.
K. Kansanen is with Norwegian University of Science and Technology, Trondheim, Norway (email: kimmo.kansanen@ntnu.no). At the time of initiation of this work he was with Aalborg University. recognition. Radar-like sensing solutions are also available for user tracking [16] and real-time breath monitoring [17], as well as sensing methods based on radio tomographic images [18,19]. Interestingly, whilst some of these techniques resort solely on the amplitude (equivalently, power) of the receive signals [16,19], in the cases where sensing small scale variations is needed, the full CSI (i.e., amplitude and phase of the impinging signals) is required [17,18].
On a related note, machine learning (ML) based approaches are gaining popularity in the context of massive MIMO, mainly due to the inherent complexity of this type of systems and their sensitivity to hardware impairments and channel estimation errors. Hence, deep learning techniques arise as a promising solution to deal with massive MIMO, and several works have shown the advantages of ML solutions in channel estimation and precoding [20][21][22][23][24][25][26]. Due to the even larger dimensions of the system in extra-large arrays, deep learning may play a key role in exploiting complex patterns of information dependency between the transmitted signals. Also, the popularization of LIS as a natural next step from massive MIMO gives rise to larger arrays and more degrees of freedom, providing huge amount of data which can feed ML algorithms.
Despite all the available works dealing with beyond massive MIMO and sensing, both topics have been addressed rather separately. This has motivated the present work, where the objective is to assess the potential of the combined use of deep learning algorithms and large surfaces for the purpose of sensing the propagation environment. To that end, the received signal along with the LIS is treated as a radio image of the propagation environment, giving raise to the use of image processing techniques to improve the performance of sensing systems beyond purely radio-based approaches. Also, we analyze the pros and cons of this image sensing proposal, comparing it to alternative solutions based on classical postprocessing of the received radio signal. Specifically, the contributions of this work are summarized as follows: the received signal power at each antenna element of an LIS. These power samples are processed to generate a high resolution image of the propagation environment that can be used to feed ML algorithms to sense large-scale events. ‚ A ML algorithm, based on transfer learning and local outlier factor (LOF), is defined to process the radio images generated by the LIS in order to detect anomalies over a predefined robot route. ‚ We show the advantage of representing the radio propagation environment as an image, allowing us to use a denoising autoencoder network (DAE) for augmenting image resolution and significantly increasing the performance of the system. ‚ We derive a statistical test, based on the classical generalized likelihood ratio test (GLRT), to carry out the same sensing task, and perform a comparison with the ML solution in terms of generality, performance and further potential applications.
The performance of both solutions is tested in a simple indoor industrial scenario with the goal of determining if a robot has deviated from its predefined route, where the impact of the array aperture, sampling period and the inter-antenna distance is thoroughly evaluated.
The reminder of the paper is organized as follows. Section II presents the problem of robot deviation detection in an industrial setup. The classical solution based on hypothesis testing is derived and characterized in Section III. Section IV introduces the concept of sensing based on radio images, and the proposed ML algorithm is detailed in Section V. With the ML solution presented, the model validation is carried out in Section VI, whilst simulated results are discussed in Section VII. Finally, some conclusions are drawn in Section VIII.

II. SYSTEM MODEL AND PROBLEM FORMULATION
We consider an industrial scenario where a robot is supposed to follow a fixed route. Assume that, due to arbitrary reasons, it might temporarily deviate from the predefined route and follow an alternative (undesired) trajectory. The goal is to be able to detect whether the robot is following the correct route or not, based on the sensing signal transmitted by the target device.
In order to perform the detection, we assume that a LIS, consisting of M antenna elements, is placed along a wall. The sensing problem reduces to determine, from the received signal at each of the M LIS elements, if the transmission has been made from a point at the desired route or from anomalous ones. Formally, if we define the correct trajectory as the set of points in space P c P R Npˆ3 " pp 1 , . . . , p Np q, and the received complex signal from an arbitrary point, p k P R 1ˆ3 , as y k P C Mˆ1 , then the problem reduces to estimating whether p k P P c based on y k . Note that this formulation can be generalized to any anomalous detection based on radio-waves in an arbitrary scenario.
The complex baseband signal received at the LIS from point p (the subindex is dropped for the sake of clarity in the notation) is given by with x the transmitted (sensing) symbol, h P C Mˆ1 the channel vector from point p to each antenna-element, and n " CN M p0, σ 2 I M q the noise vector. Also, we consider a static scenario where the the channel h only depends on the user position, neglecting the impact of time variations. In order to reduce deployment costs, and because we are interested in sensing large scale variations, we consider the received signal amplitude (equivalently, power). This assumption may lead to simpler system implementations, avoiding the necessity of performing coherent detection.

III. STATISTICAL APPROACH: LIKELIHOOD RATIO TEST
Let us consider that the system is able to obtain several samples from each point p k belonging to the correct route P c during a training phase. Then, once the system is trained, the problem can be tackled from a statistical viewpoint by performing a generalized hypothesis test, as shown throughout this section.
To start with, let us assume that the value of σ 2 in (1) is perfectly known and, without loss of generality, that x " 1.
Since we are considering only received powers, the signal at the output of the i-th antenna detector is given by where y i , h i and n i for i " 1, . . . , M are the elements of y, h and n, respectively. When conditioned on h i , w i follows a Rician distribution (in power) with probability density function (PDF) given by [27] f wi pw|h i q " where I ν p¨q denotes the modified Bessel's function [28, eq. 843 1]. Due to the statistical independence of the noise samples, the joint PDF of the vector of received powers, i.e., w " pw 1 , . . . , w M q T , is given by In this context, we formulate in this section a statistical solution to determine whether the robot has deviated from the correct route or not based on the received power signal w at the LIS.
Once trained (evaluation phase), the LIS receives another set of samples w k for k " 1, . . . , N v from an arbitrary point p. Therefore, the objective is, based on w 0,j and w k , to determine whether p " p 0 or not. To that end, we formulate a binary hypothesis test as where p g " pp g i , . . . , p g M q T is the channel vector estimated from w k . The test is hence formulated based on the GLRT, but replacing the knowledge of the null hypothesis by its estimated counterpart, i.e., Taking logarithms in (6) and introducing (3) and (4), the test is reformulated as where w i,k denote the i-th entry of w k . Replacing the true value of g 0 by its estimation introduces a non-negligible error in the test that has to be considered in the threshold design, as we will see in the following subsections. To completely define the test in (7), the next step is the characterization of the estimators and the threshold value.

B. Estimator for g
In conventional likelihood ratio tests, the estimation of the involved parameters is carried out through maximum likelihood. However, since in our problem the distribution of the received power signal w k @ k is a multivariate Rician, the maximum likelihood estimation implies solving a system of M non-linear equations [29]. This may lead to a considerable computational effort taking into account the large number of antennas (M ) that characterizes the LIS. To circumvent this issue, we proposed a suboptimal -albeit unbiasedestimator based on moments.
Since Ernn H s " I M , the estimation of the channel at each antenna element can be solved separately. Then, we can estimate g i in both the training and evaluation phases as where w 0,i,j are the elements of w 0,j . It is easily proved that the estimators in (8) and (9) are unbiased with normally distributed error for relatively large number of samples. To that end, let us rewrite (9) (the proof is equivalent for (8)) as The first error term A is a scaled and shifted central χ 2 distribution with N v degrees of freedom. When N v is large (in practice, N v ą 20 samples is enough), the central limit theorem holds and we can approximate its distribution by N p0, σ 4 {N v q. The second error term B is simply the sum of N v Gaussian variables, and thus its distribution is given by N p0, 2σ 2 g i {N v q. Hence, the channel estimators are expressed in terms of their errors as where it can be observed that Erp g i s " g i and Erp g 0,i s " g 0,i . Also, the larger the number of samples the lower the variance of both estimators.

C. Threshold design
The last task to complete the characterization of the test in (7) is defining the threshold. The classical approach is fixing a false alarm probability (i.e., the probability of deciding H 1 when H 0 holds), and determine η from the distribution of logΛ under the null hypothesis assumption. The asymptotic properties of logΛ have been well studied in the literature (see, e.g., [30]). However, these general results are not valid in our case because i) we are replacing the true value of g 0 by its estimation, and ii) we are using moment-based estimators instead of the optimal one.
A more general result, which is the starting point of our derivation, is that the the limiting distribution of´2logΛ, under the null hypothesis, is given by [31, eq where we have replaced g 0 by p g 0 . In (13), p Ñ stands for convergence in probability and J P R MˆM is the Fisher information matrix of w k with respect to g 0 . Due to the statistical independence of the noise samples in (1) (and due to assuming σ 2 known), J is a diagonal matrix whose i-th element is given by [32] Introducing (3) and (4), and using [28, eqs. (6.643 2) and (9.220 2)], the elements of J reduce to Unfortunately, (15) does not admit closed form expression, and it has to be solved numerically or by quadrature methods.
On the other hand, considering the expression of the estimators in terms of the error in (11)- (12), the limiting distribution of the test is rewritten aś where 0 " p 0,1 , . . . , 0,M q T and " p 1 , . . . , M q T . As proved before, both error vectors are Gaussian distributed, but they vary at very different time scales. During the training phase, the system obtains the estimate of the true channel g 0 from the correct points in the trajectory and stores these values for the evaluation phase. This implies that, although the estimator introduces a random error 0 , this realization of the error remains constant during the whole evaluation phase until the system is retrained. In turn, each time the system evaluates a point, takes a different (random) realization. This is of key importance when designing the threshold value of the test, since we have to consider that underestimating the training error will have a negative impact in the performance during many evaluations. With that in mind, we propose choosing η based on a worst case design, i.e., we consider an estimation error 0 that overestimates the true error at 1´α 0 percent of the time. That is, where F 0,i stands for the cumulative distribution function (CDF) of a Gaussian variable with zero mean and variance Nt pσ 2`2 p g 0,i q. Note that we have replaced the true channel value by its estimation in the calculation of the aforementioned percentile.
Finally, conditioned on 1 0,i , the distribution of the test for large number of samples is given bý which corresponds to a non-central Gaussian quadratic form. Therefore, given a predefined false alarm probability α, the proposed threshold value iś where F D is the CDF of D, which can be obtained by Monte-Carlo simulations or by using some of the approximations given in the literature for Gaussian quadratic forms (see, e.g., [33][34][35]), and the test reads aś Note that, in (19), we have again replaced the true channel values by their estimations. In our proofs, this seems to have a negligible impact on the threshold distribution unless the number of samples is very low (in which case the asymptotic analysis here presented does no longer hold). A summary of the proposed statistical test is provided in Algorithm 1, where the here presented pointwise comparison is performed along the whole route P c .

Algorithm 1: Statistical test for sensing
Training phase: for each p P P c do I. Estimate p g 0 using (8) II. Compute 1 0,i for i " 1, . . . , M from (17) for a confidence value α 0 III. Compute Jpp g 0,i q for i " 1, . . . , M from (15) IV. Compute´2η using (19) for a confidence value α end Evaluation phase: Received w j for j " 1, . . . , N v , do for each p P P c do I. Estimate p g using (9) II. Compute´2logΛ using (7) III.

IV. HOLOGRAPHIC SENSING
In the previous section, we have presented a statistical method to sense the environment based on the received power signal at the different antenna elements of the LIS, and hence detecting route deviations from a predefined correct trajectory. This approach exploits the large number of antennas in the LIS in the same way as in massive MIMO systems. However, the high spatial density of antennas and the large array aperture of LIS can be exploited in an alternative way. The basis of this novel technique is using the received signal across the surface as a radio image of the environment, giving raise to what we consider appropriate to name as holographic sensing.
In a wireless context, a LIS could be described as a structure which uses electromagnetic signals impinging in a determined scatterer in order to obtain a profile of the environment. That is, we can use the signal power received at each of the multiple elements of the LIS to obtain a high resolution image of the propagation environment. Using this approach, the complexity of the multipath propagation is reduced to using information represented as an image. This provides a twofold benefit: i) the massive number of elements that compose the LIS leads to an accurate environment sensing (i.e. high resolution image), and ii) it allows the use of computer vision algorithms and ML techniques to deal with the resulting images.
As an illustrative example, Fig. 1 shows the holographic images obtained from different propagation environments. Specifically, Fig. 1a

A. Model description
We introduce a ML model to perform the anomalous route classification task based on the holographic images obtained at the LIS. The main advantage of this proposal, as we will see, is that it is independent on the data distribution, and no assumptions are needed to its implementation. This is in contrast with section III where we considered the noise is Gaussian distributed with noise variance known for the sake of simplicity in the analytical derivations. In reality, these assumptions may not hold.
In our considered problem, the training data is obtained by sampling the received power at certain temporal instants while the target device is moving along the correct route. In order to reduce both training time and scanning periods, which may be heavy tasks for large trajectories, we resort on transfer learning [36]. Because of this matter, the risk of overfitting due to our constraint of short scanning periods is quite significant. Transfer learning takes advantage of previous knowledge in a different task, which convergence have already been acquired. In this way, the network parameters have been already optimized for a specific problem using a really large dataset. This can be advantageous as an starting point that can be used to transfer important feature information to our new task, diminishing the risk of overfitting. For instance in [37] they address their problem of small dataset for emotion recognition using a pre-trained architecture through transfer learning. Similarly, this allow us using a small dataset and therefore improving the flexibility of the system in real world deployments. Among the available strategies for this matter, we will use feature representation.
One of the main requirements for transfer learning is the presence of models that perform well on already defined tasks. These models are usually shared in the form of a large number of parameters/weights the model achieved while being trained to a stable state [38]. The deep learning Python library, Keras [39], provides an easy way to reuse some of these popular models. In our case, we choose the VGG19 architecture [40]. Due to our specific use case, and the training data constraints, we propose the use of an unsupervised ML algorithm named as LOF which identifies the outliers presents in a dataset (i.e., the anomalous positions of the target robot).
The proposed model is detailed in Fig. 2, where, in order to perform the feature extraction, we remove the last fully connected layer (FC) that performs the classification for the purpose of VGG19 and modify it for our specific classification task (anomaly/not anomaly in robot's route). We note that the architecture has been frozen for our case, i.e., the weights and biases in VGG19 are fixed and re-used to generate the features to feed the LOF classifier.

B. Local Outlier Factor
LOF is an unsupervised ML algorithm that relies on the density of data points in the distribution as a key factor to detect outliers (i.e., anomalous events). In the context of anomaly detection, LOF was proposed in [41] as a solution to find anomalous data points by measuring the local deviation of a given point with respect to its neighbors.
LOF is based on the concept of local density, where the region that compounds a locality is determined by its K nearest neighbors. By comparing the local density of a point to the local densities of its neighbors, one can identify regions of similar density, and points that have a substantially lower density than their neighbors, the latter are considered to be outliers. This approach can be naturally applied to the anomalous trajectory deviation detection. Hence, the points belonging to the correct route are used to learn the correct clusters. The strength of the LOF algorithm is that it takes both local and global properties of datasets into consideration, i.e., it can perform well even in datasets where anomalous samples have different underlying densities because the comparison is carried out with respect to the surrounding neighborhood. For the reader's convenience, a brief description of the LOF theory is provided in the following 2 .
The algorithm is based on two metrics, namely the Kdistance of a point A, denoted by D K pAq, and its Kneighbors, which is the set N K pAq composed by those points that are in or on the circle of radius D K with respect to the point A. Note that K is a hyperparameter to be chosen and fixed for computing the clusters. Also note that this implies |N K pAq| ě K, where |N K pAq| is the number of points in N K pAq. With these two quantities, the reachability distance between two arbitrary points A and B is defined as where dpA, Bq is the distance between points A and B. Figure  3 illustrates the RD K concept. This means that if a point B lies within the K-neighbors of A, the reachability distance will be D K pAq " 3 (the radius of the circle containing points C, D and E), else the reachability distance will be the distance between A and B. In the example, RD K pA, Bq " 6. Note that the distance measure is problem-specific, being in our case the Euclidean distance between the different features extracted by the VGG19 network. From (21), the local reachability density (LRD) of a point A is defined as the inverse of the average reachability distance of A from its neighbors, i.e., According to the LRD, if neighbors are far from the point (i.e. more the average reachability distance), less density of points are presented around a particular point. Note this would be the distance at which the point A can be reached from its neighbors, meaning this measures how far a point is from the nearest cluster of points, acquiring low values of LRD when the closest cluster is far from the point. This finally give rise to the concept of LOF, which is given by Observe that, if a point is an inlier, the ratio is approximately equal to one because the density of a point and its neighbors are practically equal. In turn, if the point is an outlier, its LRD would be less than the average LRD of neighbors, and hence the LOF would take large values. In our specific problem, we propose using the LOF values to determine whether a point belong to the correct trajectory or from any other point due to a robot deviation.

C. Dataset format
With the algorithm and the model introduced, the remaining component to fully characterize the proposed ML solution is the dataset. In our considered problem, the dataset is obtained by sampling the received signal power at each element of the LIS while the robot moves along the trajectories. Formally, we can define the possible trajectories (including those composed by both correct and anomalous points) as the set of points in the space P t P R Npˆ3 being N p the total number of points in the route. Let us assume the system is able to obtain N s samples at each channel coherence interval @ p j P P t , being p j for j " 1, . . . , N p an arbitrary point of the route. Hence, the dataset consists of T " N pˆNs samples (monocromatic holographic image snapshots of received power). Each sample is a gray-scale image which is obtained by mapping the received power into the range of [0, 255]. To that end, we apply min-max feature scaling, in which the value of each pixel m i,j for i " 1, . . . , M and j " 1, . . . , N p is obtained as where w i,j are the elements of w j , i.e. w i,j " }h i,j`ni,j } 2 , m MAX " 255 and m MIN " 0, and w MAX,j " max ti"1,...,M u w i,j , w MIN,j " min ti"1,...,M u w i,j (25) are the maximum and minimum received power value from a point p j along the surface. The input structure supported by VGG19 is a RGB image of n c " 3 channels. Due to our monocromatic measurements, our original gray-scale input structure is a one-channel image. To solve this problem, we expand the values by copying them into a n c " 3 channels input structure.
Once the feature extraction is performed, the output is n c " 512 channels of size n w " 7 and n h " 7 pixels. Since LOF works with vectors, the data is reshaped into an input feature vector formed by 7ˆ7ˆ512 " 25088 dimensions, meaning our dataset is tx piq u T i"1 , where x piq is the i-th n-dimensional training input features vector (being n " 25088) and x piq j is the value of the j-th feature.

VI. MODEL VALIDATION
In order to validate both sensing solutions, namely the statistical hypothesis testing and the holographic sensing algorithm, we carried out an extensive set of simulations to analyze the performance of the systems in a simple, yet interesting, industrial scenario. To properly obtain the received power values, we use a ray tracing software, therefore capturing the effects of the multipath propagation in a reliable way. Specifically, we consider ALTAIR FEKO WINPROP [42].

A. Simulated scenario
The baseline set-up is described in Fig. 4a, a small size industrial scenario of size 484 m 2 . We address the detection of the deviation of the target robot (highlighted in red color) in 2 cases: i) it follows a fixed route parallel (Fig. 4b), and ii) the correct route is normal to the bottom wall, in which the LIS is deployed (Fig. 4c). To evaluate the performance in the detection of anomalies, we consider that the robot may deviate from the correct route at any point, and we test the ability of both systems to detect potential deviations at a distance of, at least, ∆d " 50 cm and ∆d " 10 cm. These two distances correspond to the cases ∆d " λ and ∆d « λ, respectively, denoting λ the wavelength.
For the aforementioned cases, we simulate in the ray tracing software N p points, which correspond to different positions of the robot in both the correct and anomalous routes. Then, N s holographic image snapshots of the measurements are taken at every p j , j " 1, . . . , N p . The most relevant parameters used for simulation are summarized in Table I.     In our simulations, we set N p " 367 and N s " 10, thus the dataset is composed of T " N pˆNs " 3670 radio propagation snapshots containing images of both anomalous and non-anomalous situations, as described in Section V-C. Out of N p " 367, N c " 185 are the snapshots corresponding to the correct route, meaning we have T c " N cˆNs " 1850 correct data samples, while the remaining are anomalous points. To train the algorithm with the correct points, we split the correct dataset into a 80% training set 10% validation set and the remaining 10% for the test set. During the training phase, the optimum value of K " 3 (the LOF parameter) is obtained by maximizing the accuracy score in the correct validation set.

B. Received power and noise modeling
The complex electric field arriving at the i-th antenna element at sample time t, r E i ptq, can be regarded as the superposition of each path, i.e. 3 , where N r is the number of paths and r E i,n ptq is the complex electric field at i-th antenna from n-th path, with amplitude E i,n ptq and phase φ i,n ptq. From (26), and assuming isotropic antennas, the complex signal at the output of the i-th element is therefore given by 3 Note that the electric field also depends on the point p j . However, for the sake of clarity, we drop the subindex j throughout the following subsections.
with λ the wavelength, Z 0 " 120π the free space impedance, Z i the antenna impedance, and n i ptq is complex Gaussian noise with zero mean and variance σ 2 . Note that (27) is exactly the same model than (1); the only difference is that we are explicitly denoting the dependence on the sampling instant t. For simplicity, we consider Z i " 1 @ i. Thus, the power w i ptq " }y i ptq} 2 is used at each temporal instant t both to perform the hypothesis testing in (7) and to generate the holographic image, as pointed out before. Finally, in order to test the system performance under distinct noise conditions, the average signal-to-noise ratio (SNR) over the whole route, γ, is defined as 4 where M denotes the number of antenna elements in the LIS.

C. Noise averaging strategy
The statistical solution presented in Section III has been derived taking into account the presence of noise, and consequently it has implicit mechanisms to reduce its impact in the performance. However, the presence of noise may be more critical in the holographic sensing, since it impacts considerably the image classification performance [43].
Referring to (2) and (27), since we are considering only received powers, the signal at the output of the i-th antenna detector is given by where we have dropped the dependence on t. Also, let us assume the system is able to obtain S extra samples at each channel coherence interval @ p j P P t . That is, at each point p j , the system is able to get N 1 s " N sˆS samples. Since the algorithm only expects N s samples from each point, we can use the extra samples to reduce the noise variance at each pixel. To that end, the value of each pixel m i,j is not computed using directly w i,j as in (24) but instead where w i,j,s denote the received signal power at each extra sample s " 1, . . . , S. Note that, if S Ñ 8, then meaning that the noise variance at the resulting image has vanished, i.e., the received power at each antenna (conditioned on the channel) is no longer a random variable. Observe that the image preserves the pattern with the only addition of an additive constant factor σ 2 . This effect is only possible if the system would be able to obtain a very large number S of samples within each channel coherence interval.

D. Stacked Denoising Autoencoder for image Super-Resolution
An autoencoder is a type of neural network which tries to learn a representation (denoted as encoding) from the data given as input, usually for dimensionality reduction purposes. Along with the encoding side, a reconstructing side is learnt, where the autoencoder tries to reconstruct from the reduced encoded data a representation as close as possible to its original input. There are several variations of the basic model in order to enforce the learned representation to fulfill some properties [44]. Among all of them, we are interested in the DAEs [45].
The goal of the denoising autoencoder is to reconstruct a "clean" target version from a corrupted input. In our context, let us assume that we can obtain a target image t P N X r0, 255s N and a corrupted input c P N X r0, 255s M result of the received power mapping explained in (24). Also, consider N " M and that t was obtained from a less noisy environment, i.e., the average SNR γ t is greater than that of c, denoted by γ t ě γ c . Then, we can perform a resizing of both images such as R : r0, 255s d Ñ R X r0, 1s R , meaning we resize both images towards the same dimension R and we normalize the values dividing by 255, being t r " Rptq and c r " Rpcq the target and corrupted input used to train our DAE. Note that, although the two images (t r and c r ) are identical in dimension (R pixels) after the resizing procedure, the resolution of the target one is higher because it is obtained in a more favorable scenario (larger SNR) and from an initially higher number of pixels N .
To illustrate the approach, a one hidden layer explanation is made for simplicity. The denoising autoencoder can be split into two components: the encoder and the decoder, which can be defined as transformations Φ and Ψ such that Φ : R X r0, 1s R Ñ F, Ψ : F Ñ R X r0, 1s R . Then, the encoder stage of the DAE takes the input c r and maps it into e P R l " F as e " ρpWc r`b q, being l the dimension of the compressed representation of the data, known as the latent space, ρ the element-wise activation function, W the weight matrix and b the bias vector. These weights and biases are randomly initialized and updated iteratively through backpropagation. After this process, the decoder stage maps e to the reconstruction r c r of the same shape as c r r c r " ρpW 1 e`b 1 q, (33) being ρ the activation function, and W 1 and b 1 the parameters used in the decoder part of the network. In our specific case, the reconstruction error, also known as loss, is given by the mean squared error (MSE) of the pixel values of the target image and the reconstructed image (R pixels), that is 5 For the sake of reproducibility, a detailed summary of the proposed architecture is provided in Figure 5. We have used the Keras [39] library, so the description of the layers corresponds to its notation. The ADAM optimizer with a learning rate α " 0.001, exponential decay for the 1st and 2nd moment estimates β 1 " 0.9 and β 2 " 0.999 and " 10´7 have been used for updating the gradient, minimizing the MSE loss function. For the encoder part, Conv2D layers with filter size 64 and 32 have been used, kernel sizes of 3x3 and stride = 2. The activation function LeakyReLU has been used with a slope coefficient β " 0.2. Then, a Batch Normalization layer has been used to maintain the mean output close to 0 and the output standard deviation close to 1. The Flatten layer is used to reshape the output into a vector to feed the Dense layer with a number of neurons l " 16 which corresponds to the dimension of the latent space. The dimension l was determined by analyzing the learning curves in the training procedure.
In the decoder part, a Dense layer is used again to recover the previous size of the feature vector while the reshaping recovers the initial 2D input structure. Then, Conv2DTranspose layers have been used to perform the reconstruction of the input structure, having an identical configuration than in the encoder side but changing the order of the filters (32 and 64). The LeakyReLU activations and the Batch Normalization are identical. The last layer is composed by 1 filter, kernel size of 3x3 and stride = 1. This last layer is for recovering completely the size as the input structure. Furthermore, the DAE network trains itself to augment the resolution of the input image, because it will remove artifacts resulting from a lower resolution, by learning from a high resolution target. Then, this reconstructed image will be used for our anomaly detection algorithm. This is advantageous for our problem, leading to a strategy for improving the performance of the system.

E. Performance metrics
To evaluate the prediction effectiveness of our proposed method, we resort on common performance metrics that are ‚ Positive F1-Score (P F 1 ) and Negative F1-Score (N F 1 ) as the harmonic mean of precision and recall: Note that although the training procedure is fully unsupervised, for our specific evaluation we know the labels of the data samples, meaning we can calculate these metrics, well-known in the supervised learning literature.

VII. NUMERICAL RESULTS AND DISCUSSION
We here present some numerical results in order to analyze the performance of both proposals (statistical test and holographic sensing) in our evaluation setup described in Section VI. Generally, in the considered industrial setup, it would be more desirable to avoid undetected anomalies (which may indicate some error in the robot or some external issue in the predefined trajectory) than obtaining a false positive. Hence, LIS 32x32 Avg S=100 LIS 32x32 Extra S=10 LIS 32x32 Avg S=10 LIS 32x32 Extra S=100 Fig. 6. P F 1 score for holographic sensing with M " 32ˆ32 antennas, interelement distance of ∆s " λ{2, correct route parallel to the LIS, anomalous points placed at ∆d " 10 cm, and different numbers of samples.
all the figures in this section shows the algorithm performance in terms of the P F 1 metric.
Also, we mainly focus our results on the holographic sensing algorithm since it is the proposal with a larger number of tunable parameters, whilst the statistical hypothesis testing is used as a benchmark of the ML based solution.
A. Impact of sampling and noise averaging First, we evaluate the impact of both available number of samples and the noise averaging technique in the holographic sensing algorithm. To that end, we consider a LIS compounded by M " 32ˆ32 antennas and a spacing ∆s " λ{2 for a ∆d " 10 cm parallel deviation.
We evaluate two approaches: i) using the S extra samples directly as input to the algorithm, being T c " N cˆNsˆS ,  and ii) using the S extra samples for averaging. For this particular case, N 1 s P t1000, 100u. Then @ p j we use S " N 1 s Ns samples for obtaining N s S-averaged samples for training the algorithm, being still T c " N cˆNs " 1850. Note that the number of samples N 1 s would depend on the sampling frequency and the second order characterization of the channel, i.e., the channel coherence time and its autocorrelation function. Figure 6 shows the performance of the system when using S extra samples and S averaged ones respectively. As highlighted previously, noise contribution is critical in image classification performance, leading to not achieving a valuable improvement when augmenting the number of samples presented to the algorithm. However, when performing the averaging, results are significantly improved due to the noise variance reduction. As expected, when noise level is higher, more samples are needed to preserve the pattern by averaging, being N 1 s " 1000 the one which yields a better performance. For the following discussions, this sampling strategy will be used, meaning we are using S " 100 extra samples.

B. Impact of antenna spacing
The next step is evaluating the impact of inter-antenna distance in the ML sensing solution. We fix the aperture to 5.44ˆ5.44 m and S " 100 averaged samples. Then, we assess the performance in both ∆d " 50{10 cm for the parallel deviation, and we analyze different spacings with respect to the wavelength (λ{2, λ and 2λ).
The performance results for the distinct configurations are depicted in Fig. 7. As observed, the spacing of 2λ -which is far from the concept of LIS -is presenting really inaccurate results showing that the spatial resolution is not enough. We can conclude that the quick variations along the surface provide important information to the classifier performance. Besides, this information becomes more important the lower the distance between the routes is. Specially in the range of LIS 128x128 Avg S=100 LIS 64x64 Avg S=100 LIS 32x32 Avg S=100 Fig. 8. P F 1 score for holographic sensing with variable aperture, interantenna distance of λ{2, correct route parallel to the LIS, anomalous points placed at ∆d " 10 and S " 100 samples.
γ P r10, 4s for λ{2 where the detection is almost identical regardless of the extra precision needed to detect the deviation when the routes are closer. Furthermore, the effect of antenna densification for a given aperture is highlighted and it can be seen that the lowest spacing leads to the best results.

C. LIS aperture comparisons
In this case, LIS with different apertures have been evaluated. The spacing is fixed to λ{2, S " 100 averaged samples are used while the deviation is ∆d " 10 cm parallel.
Looking at Fig. 8, the aperture plays a vital role in the sensing performance. Increasing the number of antennas leads to a higher resolution image, being able to capture the largescale events occurring in the environment more accurately. Note the usage of incoherent detectors is yielding to a good performance when the aperture is large enough. The key feature for this phenomena is the LIS pattern spatial consistency, i.e., the ability of representing the environment as a continuous measurement image.

D. DAE for image Super-resolution evaluation
In this case, the impact on performance by using DAE is evaluated and compared to the hypothesis test in (7). We fix the aperture to M " 32ˆ32, for a parallel deviation of ∆d " 10 cm and an antenna spacing of λ{2.
For this evaluation, the performance is analyzed in 4 cases: i) no pre-processing of images performed, ii) S " 100 averaging strategy, iii) image resolution augmentation using DAE, and iv) The hypothesis test proposed in Algorithm 1. For the DAE, we assume we have access to a target reference image t P N X r0, 255s N | N "128ˆ128 with γ " 10 dB and our corrupted input is c P N X r0, 255s M | M "32ˆ32 with γ P r10,´10s dB. Then, we finally resize both images R : r0, 255s d Ñ R X r0, 1s R | R"224ˆ224 , obtaining images of t r P N X r0, 1s 224ˆ224 and c r P N X r0, 1s 224ˆ224 pixels. Techniques comparison d = 10cm LIS 32x32 LIS 32x32 Avg S=100 LIS 32x32 Avg S=100 DAE LIS 32x32 Hypothesis test Fig. 9. Comparison between holographic sensing and the statistical solution in (7) for M " 32ˆ32 antennas, correct route parallel to the LIS, spacing of λ{2 and different processing techniques.
Regarding Fig. 9, one can see the raw-data (blue line) is yielding to a really poor performance. This is expected taking into account noise can interfere significantly in the local density of the clusters, leading to wrong results. Also, the noise averaging strategy is good enough when noise contribution is negligible, meaning that for improving the results in lower SNR scenarios we would need to obtain a higher S which would be unpractical. Finally, the usage of DAE for image super-resolution outperforms both methods, allowing to improve the system performance and even work in quite unfavourable SNR scenarios. In turn, the hypothesis test derived in Section III provides in general the best performance.
However, we must take into account that the statistical test is built based on some key a priori knowledge, namely Gaussian noise with known variance. In the context of estimation, the holographic sensing solution can be seen as a non-parametric approach, which is valid for any baseline distribution and no further assumptions are required. Nevertheless, the performance of the ML solution (when DAE is employed), presents almost no difference with respect to the ad-hoc test up to 2 dB of average SNR. This is a promising result, since the application of more refined image processing techniques may lead to an increase in performance. Also, note that here we are considering a rather simple scenario, where the scatterers do not move. In a more realistic environment, with the rapid changes in the channel and the temporal dependencies due to the relative positions between users and scatterers in movement, ML-based sensing seems a promising solution to learn temporal dependencies in those scenarios where classical solutions become impractical.

E. Route deviations evaluation
Finally, we evaluate the impact of the separation of deviations and different types of routes in both holographic sensing and the statistical test. To that end, we fix the aperture to M " 32ˆ32, and a antenna spacing of λ{2. We will be using   all the improvements in the preprocessing of the images to leverage the performance of the ML system.
In Figure 10, we can see the performance of the system under different deviations and SNR conditions. We can see the system works better the closer the deviation of the routes are. This is an advantage of our proposed approach, the closer the routes are, the more accurate the reconstruction of the DAE is, taking into account the corrupted image c r will be more similar to the target image t r , allowing a better augmentation of the image resolution, so the correct clusters can be learned more accurately. In this way, the ML proposed algorithm works better in the cases a standard wireless sensing system would be more prone to failure. Also, the parallel deviations are easier to detect than the normal deviations. The path loss of the points in the parallel routes remains almost identical regardless of the specific point, making it easier to detect. It is important to highlight the SNR definition presented in (28) can influence in the pattern acquisition in the normal deviation cases when points are far from the LIS, which will have a significantly lower instantaneous SNR leading to a more difficult detection.
Note that the abrupt decrease on performance for the hypothesis test is due to the fact that we are using a pointwise test to perform a detection over a whole route (collection of points), as shown in Algorithm 1. Whilst the ML algorithm performs an anomaly detection over the whole correct route, the proposed statistical test is a pointwise comparison, i.e., it checks the validity of the null-hypothesis for each point in the correct route separately. This implies that, in order to detect a point as anomalous, the test has to reject H 0 on all the correct points. Consequently, failing in a single point is equivalent to fail in all the points.

VIII. CONCLUSIONS
We have made the first step towards the use of LIS for sensing the propagation environment, exploring and proposing two different solutions: i) an statistical hypothesis test based on a generalization of the likelihood ratio, and ii) a ML based algorithm, which exploits the high density of antennas in the LIS to obtain radio-images of the scenario. We provide a complete characterization of the statistical solution, and also pave the way to the use of ML technique to improve the performance in the second case. As an example, we have shown that the use of denoising autoencoders considerably boosts the performance of the ML algorithm. Both proposals are tested in an exemplary industrial scenario, showing that, up to relatively low values of SNR, the performance of the two presented techniques is rather similar. The ML solution implies a larger computational effort than the statistical test, but in turn does not require any a priori knowledge, as is the case of the test in which the variance of the noise is assumed in order to reduce analytical complexity.