Expert-knowledge-based data-driven approach for distributed localisation in cell-free Massive MIMO networks

Massive Multiple-Input Multiple-Output (MaMIMO) communication networks are recently being investigated for hltheir high potential for localisation services. This is enabled by the high-dimensional channel state information (CSI) captured by the many antennas in the system. Previously, it has been shown that these systems can achieve a very high localisation accuracy. However, many challenges still remain, we identiﬁed two of them. First, the recent trend towards cell-free MaMIMO with many highly distributed Access Points (AP), leads to the question of how this impacts the localisation methods. Current localisation methods process the signals in a central processing unit (CPU), resulting in a high fronthaul requirement when deploying these algorithms in a distributed network, limiting the deployment and scalability. Second, there exists a trade-off between using model-driven and data-driven localisation methods. In this work, we propose two new localisation methods which employ a distributed processing strategy and compare them against two centralised localisation methods. In addition, the four analysed methods explore the trade-off between being model-and data-driven. Moreover, the proposed ML-MUSIC method blurs the lines between the two by combining Machine Learning and traditional signal processing. Next to comparing the localisation accuracy, we evaluate the performance in a dynamic setting, the scalability and fronthaul requirement of the methods. The proposed Machine Learning-enhanced Multiple Signals Classiﬁcation method, ML-MUSIC, reaches a median error of 34.2 mm on the test set while only using 500 training samples. Due to ML-MUSICs distributed design, the fronthaul throughput requirement is reduced 1200-fold in comparison to the centralised methods. Furthermore, ML-MUSIC has the lowest computational complexity of all analysed methods, making it an ideal method to localise users in upcoming distributed cell-free MaMIMO networks.


I. INTRODUCTION
With the introduction of 5G, massive Multiple Input Multiple Output (MaMIMO) enabled communication systems will be deployed all around the world. Such MaMIMO systems are characterised by employing a large number of antennas at the base station (BS) to beamform the signal power towards the intended user [1]. In this way, multiple users can be served using the same time and frequency resource as they are multiplexed in the spatial domain. In order to effectively beamform towards the users, the channel state information (CSI) for each user has to be measured. This CSI contains all information about the wireless channel between the user and the BS and can be used to localise users. The challenge is now to estimate the position in the most effective way.
In [3], the authors envision that future distributed massive MIMO networks will enable six-dimensional localisation. in this vision, not only the 3D location will be estimated and provided by the cellular network, but also the orientation of the user. This opens interesting opportunities for unmanned aerial vehicles (UAVs), ground robots in industrial settings, autonomous cars, but as well other applications in the world of augmented and virtual reality. Ubiquitous six-dimensional The literature has seen a recent surge of research interest in localisation based on MaMIMO systems, resulting in highly accurate localisation methods. The main advantage of using MaMIMO systems for localisation is the high number of antenna elements at the base station. In [12], the authors show using a statistical channel model that the localisation performance increases with a rising number of antennas at the base station. In [10] , we show the same trend, i.e. an increased localisation performance due to an increased number of base station antennas, while using a measured dataset.
However, next-gen MaMIMO communication systems are evolving towards distributed and cell-free configurations [2]. A cell-free system is divided in space by distributing the BS into multiple access points (AP) and spreading the APs over the targeted coverage area. It has been shown [4] [5], that the distribution of the antennas increases the spatial diversity, which decreases the channel correlation between users. As a result, inter-user interference decreases and the spectral efficiency of the system increases. These benefits are both true for outdoor and indoor scenarios. However, when adopting a different antenna configuration, the impact on the localisation performance has to be analysed.
Savic and Larsson [6] presented, based on simulations, that increased spatial diversity is beneficial for localisation purposes. Later, we showed a practical distributed antenna system that is indeed able to reliably locate users based on measured data [10]. Moreover, [7] and [11] show that for an indoor scenario, a distributed antenna deployment increases the localisation accuracy in comparison to co-located systems.
In addition, we have shown before [9] that when using a collocated antenna array, the localisation performance is strongly influenced by moving objects in the vicinity of the user. More specifically, when the line-of-sight (LoS) link between the user and the BS is blocked, the localisation accuracy reduces strongly. By distributing APs over an area, we increase the probability that at least some of the APs will have a LoS connection with the user. In general, when external factors influence the channel between the user and the BS antennas, a distributed system has the advantage that on average fewer links between the user and BS antennas are affected, resulting in a more reliable localisation system.
Previous methods processed the CSI centrally, which requires that the CSI is transmitted from each AP over a fronthaul link to a central processing unit (CPU). However, since the CSI is updated very frequently and a distributed MaMIMO system consists of many APs, the transmission of the CSI will require a high throughput of the fronthaul link. When this is the case, this may lead to increased latency of the position updates, resulting in hlpotentially dangerous situations when the localisation service is used by autonomous drones or cars. A trade-off between latency and accuracy has to be made here. It can be solved by installing a higher capacity fronthaul link and increasing the processing power at the CPU, however, this induces an extra cost for the providers. Therefore, in order to lower the required fronthaul throughput and computational requirements of the CPU, we propose to distribute the localisation task over the APs.

B. MODEL-DRIVEN VERSUS DATA-DRIVEN LOCALISATION
When designing a localisation method based on a wireless communication system, multiple options can be considered. The literature classifies localisation algorithms into two categories [14]: 1) Model-driven: The localisation method computes the location of the user based on how the channel is expected to behave based on channel models. 2) Data-driven: The localisation method locates a user based on a channel dataset. The channels in the dataset need a positional label to link channels to locations. A data-driven localisation method links the channel of the user with channels in its dataset to estimate the user's position.
Between the two classes, there exists a trade-off, depending on the given system and targeted deployment. We will give a short overview of the two categories and discuss the advantages and disadvantages.

1) Model-Driven Localisation
Model-driven localisation methods use the geometric properties of the communication system and the theoretical knowledge of the user's signal to triangulate the position of the user [18]. To do so, Time-of-Arrival (ToA), Time-Difference-of-Arrival (TDoA) or Angle-of-Arrival (AoA) can be used at the APs. ToA and TDoA estimate the distance between the user and the APs. AoA methods estimate the direction from which the received signal is originating.
In ToA systems, the time is measured between sending the signal at the user and receiving the signal at the receiver's antennas. The propagation speed is known, the speed of light, hence, the time difference can be used for ranging. When the range to different receivers is known, the user can be located using the geometry of the receivers. TDoA systems measure the time difference between receiving the signal at the different receivers. In this way, the source location of the signal can be calculated based on the location and time differences of the receivers. In ToA and TDoA based systems, accurate synchronisation in time is required. In addition, in multipath scenarios, it is difficult to extract a correct ToA [22].
When using AoA-based positioning methods, the received signal at an antenna array is used to compute the direction from which the signal originated. When multiple AoAs are estimated at different arrays, the exact origin of the signal can be calculated based on the geometry of the communication system. In [12], Garcia et al. localise the users by using the AoA, assisted by the ToA, measured at different distributed APs of a MaMIMO system.
In general, the advantage of model-driven methods is that they are very deterministic and require no training data, given that the antenna arrays are calibrated. However, the required calibration is one of the biggest disadvantages of modeldriven methods. Furthermore, in complex environments, e.g., where many reflections are present or users experience a nonline-of-sight connection to the APs, model-driven methods are inaccurate since these effects are hard to model.

2) Data-Driven Localisation
Data-driven localisation methods rely on measured databases to estimate the location of a user. First, during an off-line step, a labelled database is gathered, containing information about the channel of the user. The information can consist, for example, of received signal strength indicators, power delay profiles or CSI. This information serves as the fingerprint of the user, hence, these methods are often referred to as fingerprinting techniques. Afterwards, in an online step, the measured fingerprint of a user is compared to the database to estimate the location. The localisation performance of datadriven methods is fundamentally limited by the size and label accuracy of the recorded database, a larger number of samples and a very accurate label are key.
For data-driven localisation methods, the biggest challenge is how you most effectively link the measured fingerprint with the fingerprints in the database. The authors of [20] use weighted K-nearest neighbour (WKNN) to localise a user in a MaMIMO network based on a power delay profile database. However, WKNN requires searching through big parts of the database, making it computationally inefficient. Furthermore, as a larger database is beneficial for accuracy, the practical use of methods that have to compare against the database is limited. In [6], Savic and Larsson proposed to use Machine Learning (ML), more specifically Gaussian Process Regression, to learn an unknown non-linear function to directly map the fingerprints to a location. Vieira et al. proposed in [13] the use of convolutional neural networks (CNN) in order to link the fingerprint of the user to the fingerprints in the database. The CNN learns the direct mapping between the fingerprint and the location of the user in an offline way. Afterwards, during the online step, the CNN can directly estimate the location of the user without accessing the database, resulting in a much more efficient online step.
Ever since Vieira et al. proposed the use of NNs for fingerprinting with MaMIMO CSI, many studies have adopted this technique. Arnold et al. showed a fingerprinting method using CNNs on measured data, including data of users in a non-line-of-sight scenario [14]. Ferrand et al. showed the performance in indoor and outdoor environments using a measured dataset, they explore the generalisation of the approach and data ageing [15]. In [16], Foliadis et al. combined NNs and careful feature design to reach cm-level accuracy in indoor environments.
The main drawback of using data-driven methods is that data-driven methods have to start learning how to locate a user without any prior knowledge. In practice, this translates into the need for a very large dataset. Gathering a large dataset containing both the wireless channel and the exact location of the corresponding user is very time consuming and, therefore, an expensive process. As a result, the challenge for data-driven methods is to minimise the required size of the dataset, hence, improving the data efficiency.

C. CONTRIBUTIONS AND OUTLINE
In this article, we study the localisation performance of a distributed MaMIMO system. We study the localisation accuracy, the computational complexity and the required fronthaul throughput of several proposed localisation methods. Our main contributions are: • A new publicly available measured dataset, containing data of a nomadic environment; • Introducing two new distributed localisation methods; • A comparison of distributed and centralised localisation methods using multiple performance indicators; • Exploring the advantages and disadvantages of modeldriven and data-driven methods; • The proposed ML-MUSIC method reduces the computational complexity and fronthaul requirements, while reaching a high localisation accuracy.
The paper is organised as follows: In Section II, the testbed used for the measurements is described as well as the measurement scenario. Section III introduces the different localisation methods used in this work. Afterwards, Section IV analyses the performance of the different methods. Next, in Section V, we delineate some implications on the privacy of the users using the proposed localisation methods. Finally, Section VI concludes this study.

II. MEASURED MAMIMO DATASETS
In this section, the datasets used to study the performance of the different proposed methods are introduced. In order to study the localisation performance of different methods, a dataset of Massive MIMO channels is needed. To gather such a dataset, there are two possible options.
The first option is to simulate the channels. This can be done using various channel models or, preferably, using raytracing software. One of the advantages of using simulated data is the total control over the environment, which leads to a perfect knowledge of the ground truth. Having accurate positional labels is crucial when developing and evaluating localisation methods. Furthermore, when using simulated data, extra data is generated with ease, resulting in a possibly very large dataset.
The second option is to gather the data using a testbed and measuring real-life channels. Creating a large accurately labelled dataset is very hard with measured data. Measuring the channel at a large number of locations is very time consuming. Furthermore, acquiring an accurate positional VOLUME 4, 2016 label for such measurements is very hard. However, we argue that simulated data can not capture all the different nonidealities that we observe in the real world. Therefore, we decided to base this study on measured data. To do so, we have to overcome the challenges that such a measurement introduces.
First, we describe the distributed Massive MIMO testbed which we use for the measurements. Next, we delineate how we extended the testbed in order to overcome the following challenges: i) Highly-accurate labels and (ii) recording a very large dataset. To end this section, the two datasets collected using the testbed for this study are presented.

A. DISTRIBUTED MASSIVE MIMO TESTBED
The KU Leuven Massive MIMO testbed is equipped with 64 patch antennas, that can be combined into an array by employing a flexible mounting system. This allows it to quickly change the antenna deployment from a centralised uniform rectangular array to a uniform linear array or a distributed array. When utilising the distributed array, the antennas are distributed into eight APs, each being a uniform linear array of 8 antennas.
For this study, the APs were placed in an octagonal shape around the region-of-interest (ROI), with the patch antennas facing the middle of the octagon. The users were placed inside the ROI. The placement of the distributed antennas is shown as the green rectangles on Fig. 1. This setup was build inside our lab, which is 6 m by 9.5 m. Around the ROI, the lab is very cluttered with desks, cabinets, measurement equipment and other miscellaneous object, resulting in an environment with a lot of scattering.
The testbed uses a time division duplexing frame structure, it switches between a down-and up-link phase. Since the channel is assumed to be reciprocal, we assume that the uplink and downlink CSI is the same. The testbed measures the channel by the use of uplink pilots. During a specific time slot, every 0.5 ms, each user sends a known uplink pilot. The base station receives these uplink pilots and estimates the channel by comparing the known pilot tones with the received pilot tones. The estimated channel is used to compute the combining and precoding vectors to enable MaMIMO communications. As a results, the CSI is already available for communication in the base station and can easily be used for localisation services without introducing extra overhead in the communication system. Afterwards, the CSI samples are saved to a hard disk at the base station during the collection of the dataset.
For each antenna, the testbed can measure the channel at 100 subcarriers. Therefore, while performing measurements, the measured channel has the form H ∈ C 64×100 . The measured data contains the I/Q sample of the channel for the corresponding antenna and subcarrier. The distance between subcarriers is 180 kHZ, resulting in an effective bandwidth of 18 M Hz. As a centre frequency, 2.61 GHz is used (λ = 11.49 cm). The space between two antennas centres in the same subarray is 70 mm. These APs are attached to the BS by coax cables with a length of 8 m. This allows us to spread the APs over an indoor location. The antennas of the APs are placed at a height of 1 m above the floor.
Twelve users can be connected at the same time to the testbed. The user equipment is synchronised to the BS by the use of coax cables. This ensures a good time and frequency synchronisation between the devices. All devices are controlled using the NI MIMO application software running in LabView. Overview of the indoor measurement campaign. The 64 BS antennas (green boxes) are distributed over eight APs, each configured as a ULA, and were spread around the region-of-interest. Each of the four users scans an orange rectangle by the use of a CNC xy-table. Every 5 mm, the xy-table stops and the channel is recorded for the static user. In this way a highly accurate dense dataset is achieved, spanning a grid with 252,004 measurement points.

1) Accurate positional labels
In order to overcome the challenge of having accurate positional labels for our measured datasets, we focused our efforts in developing a robotically controlled automated measurement set-up. The general idea of the proposed set-up is to use Computer Numerical Control (CNC) XY-tables to move the antenna of the users. These machines have an sub-mmaccuracy when placing something in the XY-plane. In this way, the labels of the users locations are very accurate. For these experiments four CNC XY-positioners (OpenBuilds ACRO1515) were placed in the lab, each moving the antenna of one user. The antennas were connected to the corresponding devices using coax cables with a length of 5 m.
The second challenge to overcome when recording the dataset is the number of samples we can record. As moving the antenna and taking a measurement takes some time, doing this manually will take too much time, resulting in a dataset spanning a small area or with a very low density. Since the positioners can be controlled by a computer, an automated measurement set-up was be developed. The four positioners are connected to a central control PC, which is able to send commands and move the antennas to an arbitrary location inside the reach of the positioners. The LabView MIMO application framework running on the BS was extended to allow automated CSI measurements. This is done by sending TCP packets from the central PC to the BS. When a packet arrives, the BS takes one CSI snapshot for all connected users. The packet contains an identification number which is used to identify the samples of measured data. In this way, the correct positional label can be matched with the corresponding measured CSI sample. As a result, the testbed and XY-positioners can perform automated measurements, recording the CSI of users at defined positions with a high accuracy and without any human intervention.

C. DATASETS
For this study, two different datasets are employed. The first one is a large and dense dataset recorded in a static environment where the users have an unobstructed LoS connection to the APs. This dataset is used to train or calibrate the employed methods. Furthermore, it is used to test the accuracy of the different methods with regards of the number of training/calibration samples used. The second dataset is recorded while a person was moving in the environment. This dataset is called the nomadic environment dataset. This dataset is used to test the methods in a changing environment. The moving person alters the channel of the users by changing the multipath. Furthermore, the moving person can even cause a loss of LoS with some APs. The two datasets are presented below in more detail are publicly available at IEEE Dataport 1 .

1) Dense Channel Survey
The first dataset is a dense channel survey. By employing the CNC XY-tables, we recorded a very large dataset. We set the XY-tables to move over a grid with a grid-size of 5 mm and stop at each node of the grid for 0.5 s. While the XY-tables stopped, one CSI-sample was recorded by the APs for this position. The total grid spanned an area of 1.25 m by 1.25 m, resulting in 63,001 CSI-samples per user. We employed four users, therefore, this dataset contains the channel for 252,004 different locations in the region-of-interest. The full scenario for this measurement is shown in Fig. 1.
When using ML to develop a localisation model, the accuracy is only as good as the dataset it was trained on. Therefore, when the CSI changes, for example by movement in the area of the users, and this affected CSI is not represented in the dataset, the localisation accuracy will suffer. To check the performance of the proposed methods in such nomadic environments, a second experiment was designed.
During this experiment, the userss were put in static positions in the middle of their respective XY-table. Next, we recorded the CSI for two minutes with a half second interval in between the CSI samples. While performing the measurements, a person was moving in the room, following predetermined paths. In total seven different scenarios were performed. During the first scenario, nothing was moving in the room, this measurement can be used as a reference to see how stable the localisation is over time when the scenario remains the same. During the other six scenario, a person was walking back and forward along one of the edges of the positioners. The six paths can be seen in Fig. 2, depicted by the large pink arrows. Next, a person walked back and forth along a given trajectory for two minutes while the channel for the user was captured every 0.5 s. In total six trajectories were recorded, shown by the red arrows on the figure. Also a reference measurement was taken when nothing was moving in the environment.
The nomadic measurement was performed right after the recording of the dense channel survey. This was done to ensure that the environment of the measurement campaign stayed exactly the same. The purpose of this dataset is to see if the proposed methods, which will be trained on the static data of the dense channel survey, are still valid when the environment slightly changes. VOLUME 4, 2016

III. LOCALISATION METHODS
This section outlines the different localisation methods proposed in this study. The methods are split into two categories. The first category, centralised localisation methods, delineates localisation methods which process the full CSI at the CPU. The second category, distributed localisation methods, handles the localisation task by partially distributing the processing of the CSI to the APs.

A. CENTRALISED LOCALISATION METHODS
Centralised localisation methods gather the CSI from every AP at a CPU, which combines the CSI to locate the user. This leads to the full availability of the CSI, combining all the APs into one large array, therefore, moving the user to the near-field (NF) of the array [19]. The distance at which the near-field of an array stops, and the far-field begins, is often assumed to be the Fraunhofer distance The Fraunhofer distance depends on the size of the array D and the wavelength λ of the used carrier. When using only one AP, the ROI is located partially inside the NF region of that AP. However, when they are combined, the full ROI is located in the NF region of the array. Inside the NF region, the wavefront of the signal transmitted by the user has a spherical shape. Analysis of the curvature of this wavefront enables simultaneous ranging and AoA estimation. As a result, the location of the user can be estimated with high precision [17].
Since centralised localisation methods need all CSI at the CPU, they put a high required throughput on the fronthaul link. As cell-free MaMIMO networks scale, more APs will be deployed, resulting in more CSI that has to be sent to the CPU. As bigger antenna arrays lead to a higher localisation accuracy [10], being able to process the CSI from all these APs is beneficial. Therefore, the use of a centralised processing method, may limit the scalability of the system.
Here we outline two centralised localisation methods. The first method is data-driven, the second is model-driven.

1) End-to-end machine learning
When using end-to-end (E2E) ML, a model learns all the steps between the input and the outputs of the model and all the different parts of the method are trained simultaneously. In recent years, E2E ML was proposed for position estimation based on the CSI of Massive MIMO systems. In a couple of years, the research field has made tremendous strides, going from validation work, based on simulations [13], to cm-level localisation accuracy on measured datasets [14] [10] [16].
The general idea for localisation with data-driven methods, is to feed the CSI measured at the APs to a neural network (NN) running at the CPU. The NN estimates the position of the associated user based on the CSI. To learn the mapping from CSI to a location estimate, the neural network is trained on a CSI database, labelled with corresponding positional labels. This is done during an online training phase. During the training phase, the NN learns the relevant features of the CSI to estimate the position of the user. The NN learns the direct relation between the CSI and the coordinates of the users.
E2E ML models that are currently used in literature gather the full CSI at a central node and process it in one NN. Therefore, we implemented such a centralised ML model as a benchmark. In order to achieve this, all APs send their CSI H i over the fronthaul link to the CPU. The CPU constructs the full CSI matrix H, by joining the H i ∈ C 8×100 matrices from each AP.
For this study, an ML model based on the combination of a convolutional neural network (CNN) and a fully connected neural network (FCNN) was designed. The full implementation of this NN can be found in the code repository 2 accompanying this study. An overview of the architecture of this method can be seen in Fig. 3.

2) Calibrated Near-field MUSIC
Model-driven methods are based on how the channel is expected to behave and estimate the position of the users accordingly. This can be achieved by employing the well known MUSIC algorithm [23]. MUSIC is used to estimate the angle-of-arrival of a signal. Furthermore, since the ROI is in the near field of the antenna array, near-field (NF) MUSIC [24] can be adopted to estimate the exact location of the users. For NF MUSIC, all information needs to be processed in a centralised way, otherwise, the ROI would not be fully in the near field of the array.
Model-driven methods extract the location information from the theoretical knowledge of the signal and, therefore, expect a signal from a perfectly calibrated antenna array. In this case, the antenna array is not calibrated and the calibration parameters are not known. As a results, without extra processing to calibrate the data, NF MUSIC can not be used.
Traditionally, to calibrate an antenna array and estimate its calibration parameters, the array is placed in an anechoic chamber to measure the response of the array. However, by moving the array, the calibration can change due to small mechanical differences induced by the move. As a result, the calibration accuracy achieved is limited in practise. To combat this, [7] proposes a method to calibrate NF MUSIC using data measured during the dense channel survey presented in Section II.
Measuring the calibration parameters for an array is performed in an anechoic chamber for the reason that there is no multipath and the response can be measured perfectly. The main novelty of the method proposed in [7], is to use on-site measured data where a great deal of multipath is present. However, when using enough measured samples and applying mean-squared-error, the multipath components cancel each other out and the calibration parameters are correctly estimated. This calibration step is performed during an off-line step, before the system is used. Afterwards, NF MUSIC can estimate the position of the user very accurately.

B. DISTRIBUTED LOCALISATION METHODS
Distributed localisation methods process the CSI at the APs before sending the extracted information to the CPU. Which processing is applied and which information is extracted depends on the used method. Next, the CPU estimates the exact location of the users based on the information received from the distributed APs. The goal is to compress the CSI by extracting relevant information that has a smaller dimension than the full CSI. In this way, less data has to be sent over the fronthaul link, lowering the throughput requirement of this link.
We outline two different distributed localisation methods. The first method, employs a fully data-driven approach. The second method combines data-and model-driven methods.

1) ML with Distributed AoA learning
Instead of sending the full CSI matrix from the AP to the CPU, first, the AoA is estimated at the AP. In this way, only the AoA for every AP has to be sent to the CPU, lowering the fronthaul requirement. Once the CPU receives all the AoA estimates, the location of the user can be estimated based on this information.
As shown in [8], the AoA of a user can be estimated by ML using the CSI of the user. To accomplish this, an NN is deployed at the AP to estimate the AoA of the users signal. This NN consists out of a CNN and an FCNN. It is trained using H i ∈ C M ×N , where i is the index of the AP, M is the number of antennas at the AP and N is the number of subcarriers in the CSI. The AoA ϕ i is used as the label during the off-line training phase. ϕ i is calculated based on the position and orientation of the i-th AP and the recorded location of the user.
Once the APs estimate the AoA ϕ i , the estimate is transmitted over the fronthaul link to the CPU. At the CPU, each ϕ i is gathered and combined in to the vector ϕ. Next, a simple FCNN is employed to estimate the location based on ϕ. The full compute graph for this method is shown in Fig. 4. . To distribute the localisation task, the APs estimate the AoA of the users signal in relation to the corresponding AP. Each AP uses its partial CSI Hi to estimate the AoA ϕi using an NN. Next, all AoA estimations are gathered in a CPU to estimate the exact location of the user using an FCNN.

2) ML-MUSIC
In order to lower the amount of data samples needed to train the NNs, model-based expert knowledge can be included. in order to do so, we use the MUSIC algorithm. However, this algorithm only works well when the antenna array is well calibrated. However, the array used for recording the channels is not calibrated.

a: CSI calibration
In order to use the MUSIC algorithm, a calibration step can be applied to the CSI. As mentioned before, this calibration step can be performed by analytical algorithms [7] [11]. However, these methods are far from trivial. Therefore, we saw an opportunity to employ neural networks to perform the calibration step. In this way, the CSI matrix H i ∈ C M ×N , with M the number of antennas at the AP and N the number of subcarriers, is first be passed through an autoencoder. An autoencoder is an NN that preserves the shape of the data. The autoencoder is trained to apply the calibration to H i . This results in the calibrated CSI,H i . Afterwards, H i is passed through the MUSIC algorithm to obtain the AoA vector ϕ i . Once ϕ i is estimated for every AP, they are combined into the vector ϕ. This vector is passed to another FCNN in order to estimate the exact location. Fig. 5 shows the overview of the ML-MUSIC compute graph. Instead of only relying on data to train a NN and estimate the AoA, we can use some expert knowledge. Therefore, the MUSIC algorithm was implemented inside the NN framework. MUSIC is very effective in extracting the AoA from signals if the array is calibrated. Instead of calculating the calibration matrix using by traditional methods, a NN was placed in front of the MUSIC algorithm. This will make sure the a calibration step is performed on the CSI.
To calibrate the CSI, an NN was designed. For this application an FCNN was used. When building an autoencoder, it is easy to start from the simplest network first and gradually increase the depth and complexity of the network. However, we noticed that the proposed setup already worked really well by just employing an FCNN with one hidden layer. VOLUME 4, 2016 Therefore, for simplicity and minimisation of computational cost, this NN was kept.

b: Applying MUSIC
After calibration by the NN, MUSIC is applied toH i . In order for the NN to optimise its weights using back propagation, the MUSIC algorithm needs to be implemented in the ML framework. In this way, the derivative can propagate through the algorithm and the weights can be optimised using labelled CSI-samples. Furthermore, we propose to modify the MUSIC pseudo-spectrum and approximate it using a normalised range in order to map the ML labels to the pseudospectrum. Moreover, since ML-frameworks are optimised to run on graphical processing units (GPUs) and perform matrix operations, we implemented the algorithm as much as possible to incorporate matrix operations instead of using code loops.
We will detail the implementation here. First, the covariance is calculated: The eigenvalues are located on the diagonal of Γ in ascending order. The largest eigenvalues are attributed to the signal and the small ones to the noise. To extract the noise subspace, the noise eigenvectors are selected as where K is the number of incident signals. We assume that there is only one signal in H i , therefore K is kept low. In this study, we assume K = 2. The eigenvectors in U n are normalised.
Next, the steering vector a(θ) is constructed as: a(θ) = 1, e jωθ , e j2ωθ , . . . , e j(M −1)ωθ T , where ω = 2πd λ , d is the spacing in between the antennas and λ is the wavelength of the carrier wave.
Using the steering vector, the MUSIC pseudo-spectrum can be evaluated for every arbitrary θ. The angle θ for which the pseudo-spectrum has the highest response is the estimated AoA ϕ i . However, in practice, the number of angles along we can search for the highest response is limited. For this study, θ ranges from 0 to 180 degrees with a 1 degree resolution, resulting in a vector with a length of 181. The vector containing these discrete angles is written as b = [0, 1 180 π, 2 180 π, . . . , π].
Afterwards, the pseudo-spectrum of MUSIC is calculated. This spectrum is also the output of the ML model at the APs. However, the traditional MUSIC algorithm does not have a pseudo-spectrum that is normalised to a specific range. Furthermore, supervised learning assumes that the exact label, i.e. the value of the pseudo-spectrum, is known in order to teach the NN this value. Since the exact value is not known for the MUSIC algorithm, we propose an alteration to the MUSIC algorithm to normalise the pseudo-spectrum. As a result, at the AoA, the pseudo-spectrum tends towards 1, and at the other angles it tends towards 0. In this way, we can label the training data by setting all angles to 0 except for the known AoA, which is set to 1, i.e., we one-hot encode the labels.
To explain the modification, we start from the unmodified pseudo-spectrum. The traditional MUSIC pseudo-spectrum is given by: .
When θ is not the AoA of the signal ϕ i , the denominator of f (θ) will be maximised and can reach a value of maximum M . When θ ≈ ϕ i , the denominator tends towards 0, hence, the pseudo-spectrum reaches a high response, with an unknown maximum value. The range that the spectrum can reach becomes defined when inverting the equation. The range becomes: To normalise the range, a division by M is performed. As last step, the response is flipped by subtracting it from 1. The new pseudo-spectrum f is calculated for every θ in b as: As ML frameworks are optimised to run on GPUs, which prefer matrix operation, the full pseudo-spectrum f can be calculated for every θ in one time using matrix operations. First, all steering vectors are combined in to the matrix A ∈ C 181×M Where the matrix A is constructed as: The new pseudo-spectrum f is calculated in the following way: and the diag() function selects the elements on the diagonal of the square input matrix and transforms them into a vector. The resulting pseudo-spectrum contains the response for every angle θ in vector b. The angle θ with the highest response is selected as ϕ i .
For the next step, the CPU gathers all AoA estimates ϕ i into the vector ϕ. Afterwards, another FCNN uses ϕ to estimate the position. The used FCNN consists out of 5 hidden layers. To train this network, a two step method is proposed. Since the task that the network should learn is very well defined, we can first train the NN on simulated model-based data. This has as the major advantage that we can simulate an infinite amount of model-based data, which will give the network a good starting point. This is very advantageous when only a low amount of training samples is available.
To create the model-based data, we use a geometry-based model of the environment. We sample random positions of the possible user positions and calculate ϕ based on the location and orientation of the APs. Next, we add some random errors to ϕ to emulate errors and blockages in the system. In this way, the FCNN learns how to deal with erroneous ϕ i estimates. The number of angles ϕ i that is affected by noise is decided by sampling integers from an exponential distribution with the rate parameter λ = 1 and rounding this number to the nearest integer. Which of the AoAs ϕ i are affected is decided randomly using a uniform chance for each. To the selected angles we add a random angle, drawn from a normal distribution with a mean of 0 rad and a standard deviation of π 4 rad. In this way, the NN learns how to deal with noisy data that can contain errors.
In the second training step, we use the actual measured data, presented in Section II, to train the NN. This will teach the NN about the scenario specific imperfections and deviations from the theoretical deployment, further improving the accuracy of the model.

IV. PERFORMANCE ANALYSIS
In this section, the performance of the different proposed localisation methods is analysed. First, we study the localisation performance in a static environment. Second, we will analyse the performance in nomadic environments. In this case, there will be a moving person in the environment, interacting with the propagation of the signals sent by the users. Afterwards, we analyse the computational complexity of the different methods and how it scales for a rising number of APs in the communication system. To end, we calculate for the different methods the required fronthaul throughput.

A. PERFORMANCE IN A STATIC ENVIRONMENT
To analyse the performance of the four proposed methods, they are trained/calibrated on a couple of datasets, each with a different size. This was done to see which method is most data efficient, as gathering labelled data is very time consuming and, therefore, expensive. In total five training sets were created with sizes of 100, 500, 1000, 10,000 and 50,000 samples. These samples were randomly selected from the 252,004 labelled samples in the dense CSI dataset. At the same time, a test set of 20,000 randomly sampled CSIsamples was selected from the same dataset while making sure there is no overlap between the test set and the training sets. For each of the CSI-samples in the test set, the position was estimated by the four different methods after being trained on the different training sets. We can see a strong trend that when more data is used, the performance becomes better. Furthermore, model-driven methods need less data to train or calibrate the method. However, when ample data is provided, the data-driven methods match and even surpass the performance of model-driven methods.
The median localisation error is shown in Fig. 6. The median localisation error is computed by estimating the location for all samples in the test set and computing the error of these estimations. From these error values, the median was computed and reported. The first trend to notice is that all methods become better when more training samples are available. However, model-based methods require less data to achieve a good localisation performance. In general, when a low amount of samples is available in the training set, the two methods that (partially) rely on MUSIC perform best. This is due to the expert knowledge present in these methods. The algorithm did not have to learn from scratch how it can efficiently position users, as is the case with the methods that solely rely on ML. The pure data-driven methods only reach a similar performance when using 10,000 or more samples in the training set. However, when an even larger amount of data is available, the data-driven methods can perform better than the model-driven ones.
The figure shows that the calibrated NF MUSIC method has the best performance and already reaches a very high performance when only using 100 samples in the training set. The second best method is the ML-MUSIC algorithm, which reaches an adequate localisation performance with only 500 samples to train on. Both of these methods reach an accuracy lower than 50 mm while only using 500 training samples, which is 20 times less samples than required by he purely data-driven method. Therefore, these two methods have our preference from the view of sample efficiency. This shows the strength of model-driven methods. When a low amount of training data is provided, data-driven methods can not reach the same level of performance as model-driven methods. VOLUME 4, 2016 Next, these trained models were tested on the nomadic dataset presented in Section II. All CSI samples in the nomadic dataset were given to the four different methods. The full nomadic dataset was used here as the test set. The training sets are the same as before, randomly sampled from the dataset of the static environment. The results can be seen in Fig. 7. First of all, the conclusions of the static environment are still true for the nomadic environment. As the number of training samples go up, the accuracy of the methods increases. Second, the accuracy for this test set is lower as for the one of the static environment, as would be expected since the CSI is impacted by the movement of the user and the methods are not trained/calibrated for this.
Surprisingly, the calibrated NF MUSIC method was not impacted by the movements of the user. This was the only method that was still able to estimate the users position with a very high accuracy due to the centralised combining and processing of the data. Moreover, the method never performs worse than 50 mm error, which is really impressive in comparison to the other methods. The second best method is ML-MUSIC, which is only slightly impacted by the nomadic environment. Again, both of these methods reached a median error of below 50 mm, while only using 500 training samples. Furthermore, the impact on the purely data-driven methods is higher than the impact on the methods that use model-based information.
These experiments showed that model-based approaches are still a very powerful tool. In addition, they show that centralised and distributed methods can both reach a high localisation performance. Hence, we do not have to send the full CSI to the CPU to reliably localise users in distributed MaMIMO systems.

C. COMPUTATIONAL COMPLEXITY
To compare the different proposed methods in more depth, we assess the computational complexity. For a localisation system to function properly, this metric should be as low as possible, otherwise, it is not possible to estimate the location in a timely manner, which is important for future applications as autonomous vehicles and UAVs which require a low latency update of their location.
Since execution time is the most important aspect of computational when localising users, we will focus on the number of operations to be performed to localise a user. More specific, since multiplications are the most time consuming operation, we will count the number of multiplications that have to be performed in order to localise a user.
Furthermore, we also assess the scalability of the methods. The scalability of the system can be assessed in two ways. i) How the computational complexity scales with a growing number of APs and ii) how the complexity scales by a growing number of users to be located. Since all methods estimate the location for a user one-by-one, the computational complexity of the methods scales linearly with the number of users. Therefore, to compare the scalability of the different methods, the number of APs used for localisation is used as the main variable. We assume the network consists of a varying number of APs, each AP consisting of an array of eight antennas.

1) Central E2E Learning
For the NNs, we can compute the computational complexity in following ways: For an FCNN, for every layer, the size of the input vector is multiplied by the number of hidden nodes in the layer. Afterwards, we sum these values for all the layers. For a CNN, the calculation becomes a bit more complex. For each layer, the computational complexity C can be calculated in the following way: with h in the height, w in the width, and c in the number of channels of the input of the layer. k v and k h are the vertical and horizontal kernel size and c out the number of output channels.
For the Central E2E Learning method, we have to compute the computational complexity by applying the above formulas for both the CNN and FCNN. In this case h in depends on the number of used antennas, therefore, it will scale with the number of APs. Furthermore, also the FCNN to estimate the position based on the AoAs depends on the number of APs, however, here only the first layer will increase in size.
As a result, the computational complexity of Central E2E Learning increases linearly with the number of APs.

2) Central NF MUSIC
When computing the complexity of the centralised calibrated NF MUSIC method, we have to look at all the different algebraic operation that are being performed. First the covariance of the CSI is calculated, its complexity corresponds to the one of a matrix multiplication, resulting in a complexity of O(M 2 N ). Next, the eigenvalues should be calculated, this has a complexity of O(M 3 ). All following computations have to be applied to a steering vector. For NF MUSIC, there is a steering vector for each possible user location. We set the number of steering vectors to n. when calibrating the steering vectors, O(M N ) multiplications have to be performed. Afterwards, these steering vectors have to be multiplied with the noise space with dimension M − 1, computed using the eigenvalue decomposition, this results in a complexity of O(M (M − 1)).
As a result, the computational complexity of the centralised calibrated NF MUSIC can be calculated as: with M the number of antennas, N the number of subcarriers and n the number of positions that have to be evaluated. In [7], this was set to all possible user locations, i.e. n = 252, 004. The computational complexity is strongly influenced by the number of antennas in the antenna array and the number of locations to be evaluated. When the number of antennas is increased, the computational complexity of this method rises in a cubic manner, strongly limiting the scalability of the method.

3) Distributed AoA Learning
For the distributed AoA learning method, the CNN architecture is the same as the one for the central E2E learning, however, it is distributed per AP. Therefore, the complexity of the CNN at each AP is the same as the complexity of the central E2E for one AP. This complexity is multiplied by the number of APs, therefore, scaling linearly. Also the FCNNs of the distributed AoA learning method and the central E2E learning method are mostly the same. As a result, they have the same complexity and scale in the same way. However, in this case, the computations are distributed over the different APs and the CPU in the network, lowering the per device requirement of computation power.

4) Distributed ML-MUSIC
For the computational complexity of the distributed ML-MUSIC a comparable formula to equation (1) can be found for the complexity of the MUSIC block in the computegraph: However, here we have to multiply everything by N AP , the number of APs, but the number of antennas per AP, M AP , is lower and fixed. Since M AP is fixed and only N AP rises when increasing the number of APs, the complexity rises in a linear way for this method. Furthermore, in this case, n is the number of angles we search along the array, in this study, n = 181. In comparison to NF MUSIC, n is much lower for ML-MUSIC, as in NF MUSIC n = 252, 004. To complete the calculation, the complexity of the calibration NNs and of the NN to estimate the location, have to be added. For the calibration, a one-layer-deep FCNN is used per AP, therefore, the complexity of this rises linearly with the number of APs. Next, the other FCNN to combine the multiple AoAs to a 2D position also depends on the number of APs, but only the size of the input changes, increasing the complexity in a linear way. As a results, the total complexity of the ML-MUSIC method increases linearly with the number of APs.  Furthermore, the computational complexity of Central NF MUSIC also rises faster than the other methods, cubic in comparison to linear. As a result, calibrated NF MUSIC becomes an impractical method for future cell-free Massive MIMO networks. Note that the number of evaluated positions, n in equation 1, is kept constant in this figure. When the cell-free network grows and the number of APs rises, also the number of possible locations rises, increasing the complexity even further.

5) Comparison of the Computational Complexity
Note that the value shown in Fig. 8 is the total amount of computations that has to be performed. However, for the distributed methods, these computations are further divided over the distributed APs and the CPU, resulting in an even lower amount of computations per device. This in comparison to the centralised methods, which need all the computations to be performed at the CPU, putting a high requirement on a single device.

D. FRONTHAUL REQUIREMENT
For the localisation methods to work, information has to be shared by the different APs to a central node. For the two centralised methods, all CSI is needed to be transmitted to the CPU. For the distributed methods, only a small amount of information is needed to be shared. In this part of the study, we will calculate the required information throughput from the APs to the CPU for the proposed methods.
We will base this analysis on the KU Leuven MaMIMO testbed presented in Section II. The presented MaMIMO system has 64 antennas and measures the channel for each user for 100 subcarriers. Each measured symbol consists of a complex number, using 16 bit for the real and imaginary part. Therefore, the number of bytes needed for one sample for one user is equal to 25.6 kB. In the testbed, which is based on the LTE framework, the channel is measured 2000 times per second for up to 12 users.
For the distributed methods only the AoA is sent to the CPU. Since the AoA is discretised at 1 degree per step, and we consider a range from 0 to 180 degrees, 181 different values should be encoded. To do this, we need 8 bits or 1 byte. Therefore, for the studied system, 8 bytes are needed to be send to the CPU per user per CSI update. As a result, the required fronthaul throughput for the central methods is 4.915 Gbps. While for the distributed methods, this is reduced to 1.536 Mbps. This is a 1200-fold reduction of the required throughput. The reduced fronthaul requirements of distributed localisation methods will enable a practical deployment of localisation services in cell-free MaMIMO systems.

V. PRIVACY-CENTRIC LOCALISATION
There are rising concerns about user privacy in recent times as more people start to be aware that a big part of their digital life can be tracked. This has even given rise to multiple conspiracies about that 5G will be used to monitor and control persons. Therefore, it is important to address these concerns and think about how we will keep the location of the users safe. To do this, one can employ the principles of "privacy by design" [25].
When processing the information in one central place, e.g. a server connected to the BS, the mobile internet service provider will have the full knowledge of the exact location of the user. This might be vulnerable for malicious attacks of hackers stealing the valuable data. Furthermore, exact localisation data could be used for surveillance by governments that do not respect civil rights.
As we have shown, distributed localisation methods achieve a high accuracy while reducing the computational complexity and fronthaul requirement. This allows use to move the CPU that combines the information of the separate APs to be moved to the user equipment itself. In this case, the full information will only be available at the device of the user, keeping its exact location for itself, increasing the privacy of the user. In conclusion, we identify privacy-centric localisation as an interesting topic for future research.

VI. CONCLUSION
In this study, we compared four user localisation methods based on a distributed MaMIMO network. We compared methods that process all data centrally and methods that offload part of the computation to the distributed APs. Second, we compared model-driven and data-driven methods. The localisation performance of the different methods is compared using measured data from two scenarios. The computational complexity and fronthaul requirement were computed.
By employing distributed localisation methods, the fronthaul requirement was reduce 1200-fold in comparison to the centrally processed methods, making the technology scalable for future networks with tens of APs. In addition, the computational complexity of the distributed methods was the lowest of the analysed methods, reducing the computational requirements of the distributed APs and CPU. In addition, to achieving a localisation accuracy below 50 mm, by including model-driven techniques, we were able to reduce the amount of training samples needed by a factor of 20 in comparison to using purely data-driven methods.
We found that the proposed distributed ML-MUSIC algorithm, which combines machine learning and MUSIC to localise the users, is the best candidate for the presented task and scenario, as it is able to reach accurate localisation while using a moderate amount of training samples. It reaches an accuracy of 34.2 mm when only using 500 training samples. When using more training samples, the accuracy further improves. Furthermore, at the same time, it keeps the computational complexity and fronthaul requirement to a minimum, making it an ideal candidate to be used in large distributed MaMIMO networks. SOFIE POLLIN is professor at KU Leuven focusing on wireless communication systems. Before that, she worked at imec and UC Berkeley. Her research centers around wireless networks that require networks that are ever more dense, heterogeneous, battery powered and spectrum constrained. She pioneered a 5G testbed for distributed Massive MIMO at KU Leuven, and is now leading the way towards 6G tests in multiple large EU projects. VOLUME 4, 2016