Using the Power Delay Profile to Accelerate the Training of Neural Network-Based Classifiers for the Identification of LOS and NLOS UWB Propagation Conditions

ultra-wideband (UWB) technology enables centimeter-level localization systems based on the accurate estimation of the actual distance between transmitter and receiver, by means of the precise estimation of the signal time-of-flight. However, this is only possible when correctly detecting the first path of the incoming signal instead of a bounce or a reflection, which becomes challenging in non line-of-sight (NLOS) situations. There are many different approaches in the literature to alleviate the wrong detection of the first incoming UWB signal path. One of them considers machine learning techniques to design classifiers capable of distinguishing between line-of-sight (LOS) and NLOS propagation from available signal features. However, the performance and complexity of the obtained classifiers depend largely on the size of the input data associated to such features. Thus, features such as the channel impulse response (CIR) produce large amounts of data, yielding very complex classifiers. In this paper, we propose using a downsampled power delay profile (PDP) as an alternative feature consisting of input data much smaller than the CIR, although sufficiently representative, hence resulting in a lower computational cost while exhibiting a similar classification performance. Furthermore, another of the tasks addressed in this work is the study of the impact on the classification results of using a dataset for training where the samples of each class are not balanced from the point of view of energy. Finally, this work also studies how the classifiers based on the CIR or the PDP improve their performance when considering additional signal features such as the estimated range value or its energy level.


I. INTRODUCTION
Indoor positioning has experienced great advances in recent years, driven by an increasing number of commercial technology solutions capable of achieving positioning with centimeter accuracy. One of the radio technologies that achieves the best results in terms of precision is UWB, above others traditionally used [1], [2]. Among the The associate editor coordinating the review of this manuscript and approving it for publication was Yeliz Karaca . commercial implementations of UWB for localization purposes, the DW1000 integrated circuit (IC) from Decawave [3] stands out for its high availability in many devices. Nowadays, this chip has become almost a de facto standard both at the enterprise-level (indeed, Decawave has more than 50 related partners [4]) and at the research-level (searching for the term ''DW1000'' in Google Scholar yields almost 8000 different results).
The DW1000 implements the ultra-wideband (UWB) IEEE802.15.4-2011 standard, which details the mechanisms VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and protocols needed to operate with UWB in ranging tasks (distance estimation). In order to obtain signal synchronization and estimate the distance between transmitter and receiver, the DW1000 IC implements a threshold-based energy detection mechanism consisting in accumulating energy at the receiver until exceeding a certain predefined threshold value. In situations with a clear line-of-sight (LOS) between transmitter and receiver, the energy of the originally emitted signal is detected and the first path of the incoming signal is accurately detected. However, in non line-of-sight (NLOS) situations, the original signal can be attenuated or blocked completely, so that the energy exceeding the detection threshold in the receiver might not correspond to that of the first path, but that of a bounce or a delayed reflection, yielding an erroneous estimation of the signal time of flight (ToF), which later translates into an error in the range estimation much higher than that in the LOS case. Many different works in the literature try to differentiate the signals received under LOS propagation conditions with respect to those corresponding to NLOS. In general, selectively discarding the estimates coming from NLOS propagation conditions yield a significant improvement of the positioning accuracy.
There are several approaches to this LOS-NLOS classification problem, including those proposed by the authors in previous works. More specifically, in [5], [6] we consider classical machine learning techniques and shallow neural networks to attempt this classification. For this purpose, several measurement campaigns were carried out with devices equipped with the DW1000 IC to obtain a sufficient set of training samples. Later, the performance of these systems was analyzed when classifying the measurements coming from a different scenario. On that occasion, and in order to implement the classification system in computationally reduced devices, it was decided to use only received signal strength (RSS) moving averages and range estimation as the training features. In addition, to carry out the classification in real time, the idea of using the channel impulse response (CIR) samples to feed the training process was discarded due to the long time required to extract the CIR measurements corresponding to the ranging processes between the transmitter device and all the possible receiver nodes deployed in the environment and needed to generate a position estimate. However, the LOS-NLOS classification directly from the CIR samples is also useful in scenarios where either that real-time restriction is not present, or the hardware device allows for obtaining and processing the CIR faster than the DW1000.
Although in this work we focus on approaches based on machine learning, it is necessary to emphasize that there are several other parallel approaches also based on the CIR but using the samples in a different way. For example, with more heuristic and deterministic algorithms [7]- [9], or employing statistical analysis [10]. Other related works do not directly employ the CIR, but features derived from the signal such as its energy, the signal detection time instant [11]- [13], or even additional data such as sensor acceleration values [14]. Others approach the problem exclusively from the simulation point of view [15], [16].
In addition, it is also worth mentioning that these techniques can be applied in other radio technologies [17], [18], although this work is focused on UWB.
Regarding the LOS-NLOS classification problem with UWB, there are some authors that have considered the use of the CIR samples together with machine learning or deep learning techniques to find a solution. In these cases, the process typically consists in employing all the samples from the CIR directly as training features [19], yielding classifiers with a high computational cost due to the large number of data samples provided by the CIR. A possible approach to alleviate this problem consists in employing only a subset of the CIR samples [20].
However, as in any other process where automatic learning techniques are used, in this case the initial data is the key to obtain representative results that can be exploited in final applications. Thus, both [19] and [20] used for their results the same dataset ( [21]). This dataset contains 42000 samples extracted from a DW1000 module in LOS and NLOS situations captured in different indoor locations. These locations include two offices, a small apartment, a kitchen with living room, a bedroom, a small workshop, and a boiler room. The dataset is perfectly balanced, with 50% of samples from each of the classes. In addition, in order to avoid possible biases induced by the different locations, the set of samples is randomized. The dataset does not include absolute position references, but it includes the values provided by the DW1000 after a regular ranging operation, such as the range estimation itself, the CIR samples or the index within the CIR where the first path of the UWB signal was detected. This same dataset has been used in the present work, so that the results obtained with the features introduced could be compared with all the other works based on these same measurements.
Similarly, the neural architecture used in this work is very close to that presented in [19], [20]. In this way we guarantee that the differences observed between the different proposals are due exclusively to the set of features selected, since both the training data and the algorithms are the same.
In this article we present three main contributions. First, we perform a comparison in the classification of LOS-NLOS measurements using a downsampled version of the PDP versus using the complete CIR or only a part of it. The idea of this experiment was to check if similar results could be obtained while reducing the input size of the neural network. Section II details the experiment and the results obtained.
The second contribution is to analyze the impact of using the CIR to classify when both classes have a very different level of energy in their samples. Section IV describes the problem and shows the classification results after applying a normalization and noise removal process on the CIR samples.
Finally, the third contribution is to show the performance of a classification system based on the CIR samples when two additional features are added: the range value itself and the energy value of the samples. Section V shows the experiments performed and the results obtained.

II. USING THE PDP AS TRAINING SET
This section introduces the use of the PDP samples instead of those from the CIR to perform the training of a multilayer convolutional neural network capable of classifying the UWB measurements captured under LOS or NLOS propagation conditions. The samples employed are those corresponding to the dataset available in [21], originally created by the authors of the paper [20]. This dataset includes 42000 UWB measurements obtained in up to 7 different indoor locations, and it has the same number of measurements for both of the LOS and NLOS classes. Notice that, although obtaining the CIR is typically a hardware-dependant task requiring low-level software development, computing the PDP from the CIR is a generic operation that does not depend on the hardware.
Section II-A details the processing of the CIR samples to obtain the PDP, while Section II-B shows the details of the considered convolutional neural network (CNN) and the selection mechanism for the training and test sets.

A. TRAINING FEATURES
The DW1000 stores, for each measurement, 992 samples for the 16 MHz mean pulse repetition frequency (PRF), or 1016 samples for the case of 64 MHz mean PRF [22]. For the dataset considered in [21], a 64 MHz mean PRF was set, hence, for each measurement, we have 1016 valid CIR samples. The DW1000 provides the real and imaginary values for the CIR samples. However, the considered dataset [21] only provides the modulus of the samples, which is defined for a given measurement as |h[n]|, n ∈ Z and 0 ≤ n < 1016. (1) The selected training features derived from the CIR and considered in this work are defined as described below.

1) ALL THE CIR SAMPLES (AS IN [19])
The idea considered in [19] consists of using all the CIR samples as training features in the same way as Eq. (1).

2) FIRST 152 SAMPLES OF THE CIR AFTER THE FIRST PATH (AS IN [20])
As noted in [20], most of the information about the propagation characteristics is contained in the 152 samples of the CIR starting at the one corresponding to the first path. Thus, if the first path starts at index n 0 , then the training feature is where we assume that n 0 ≥ 0 and n 0 + 152 < 1016. For each measurement, the index of the first path is also stored by the DW1000 in an internal register which can be accessed later. Although this value could be used to extract the most representative set of samples from the CIR, in our case we decided to use a different algorithm. The reason is that the DW1000 uses a threshold algorithm to decide which is the first sample of the first path. In an LOS scenario this method works well, but in NLOS scenarios it may sometimes not detect that first path correctly if its energy level is low. That is why, since in our study we do not have the immediacy requirements that the algorithm implemented in the chip has, we decided to perform a CIR processing to try to detect more accurately that first significant sample.
For that matter, we employed a similar method to the one proposed in [23] to calculate the so-called rise time feature. This value is defined as follows: where t H corresponds to the instant when the signal exceeds a high level based on the maximum value of the signal, and t L corresponds to a later instant when the signal goes below another threshold dependent on the noise level. Mathematically: being r(t) the received signal, σ n the standard deviation of the thermal noise, r max = max{|r(t)|}, and α and β two constants selected empirically. Thus, similarly to [23], but in discrete time, we obtain n H as where h max = max{|h[n]|} is the maximum amplitude of the CIR, and β ∈ [0, 1] is a parameter used to select the threshold to detect the first path based on the maximum value of h[n].
The value of β is chosen empirically and in our case we used β = 0.4, manually inspecting the results to check that the first path was accurately detected. It should be also noted that the CIRs have a small rise time n r from its actual start to the detected index n H . Therefore, in order to include this rise time in the selected CIR samples, n 0 is obtained as In our case, we use a value n r = 3, which was also chosen empirically by inspecting the results. Fig. 1 shows an example of the CIR for one measurement where the two estimates of the beginning of the first path have been marked: the one returned by the DW1000 and the one generated by our algorithm. It can be seen how in this case the estimate included in the measurements does not include a previous sample that is clearly above the noise level.

3) DOWNSAMPLED PDP (PROPOSAL)
Although most of the CIR information is contained in 152 samples (see Eq. (2)), we can reduce this number even further while retaining a similar performance, hence speeding up the training time of the network. This is feasible because: 1) closely spaced CIR samples use to be highly correlated, and 2) even though the CIR may exhibit rapid variations, which reduce the correlation of closely spaced samples in VOLUME 8, 2020 some cases, we assume that both the amplitude and the overall shape of the CIR play a more important role in our case.
To do this, we consider the PDP, which gives us the average power of the channel impulse response as a function of the delay. In our case, we can obtain the PDP as the squared modulus of the CIR, i.e., |h[n]| 2 . Therefore, to reduce the input size of the neural network, we propose as a training feature a downsampled version of the PDP with a given factor L. The proposed feature considers the average of the PDP over equispaced intervals, defined as follows: To select the L value with the best result, a comparison was made between different values. Details of this comparison are shown in Section IV, since the results of this comparison are closely related to the energy normalization process described in that section. Note that the number of samples in this proposal depends on the factor L, and this value corresponds to 152/L . Thus, for example, for L = 5 the number of samples is 30. Fig. 2 shows, for an exemplary measurement, the PDP, |h[n]| 2 , and the corresponding downsampled PDP, p[m], for L = 7.

B. NEURAL NETWORK TOPOLOGY AND PARAMETERS
A neural network architecture similar to the one proposed in [20] was generated to evaluate the impact of using the PDP instead of the CIR in the training. Based on the description proposed in [20], a CNN was built with the architecture shown in Fig. 3. Note that, although not shown in Fig. 3, several batch normalization layers were used between each layer to speed up and to facilitate the training. Note also that the design of the neural network is not an objective of this paper, instead we try to replicate the structure used in the other works in order to obtain comparable results. As shown in Fig. 3, the neural network consists of several convolutional layers. The idea of each of these layers is to extract the relevant features from a section of the input signal (in this case the CIR or PDP samples) and provide them to the next layer where the process is repeated. In our case, the activation function of each neuron in these layers was a rectified linear unit (ReLU). This function is defined as: Another layer type used in the network is the Reshape layer, whose mission is to transform the shape of a matrix into another different shape. It is used only to adapt the output of one layer to the expected input of the next one. A spatial reduction layer is typically placed in a CNN after several convolutional layers. In this work, as in the reference architecture described in [20], a Max Polling layer was considered to output the maximum input value depending on the size of the window selected.
Finally, a Dense layer of multiple neurons fully connected is added to the stack. Its mission is to perform the actual classification using the features extracted by the previous convolutional layers.
The network implementation was done using Tensorflow version 2.2.0 [24], a free, open source Python library capable of running on different operating systems. A PC running Windows 10 LTSC on an AMD Ryzen 5 3600 processor, 16 GB of RAM, and an NVIDIA GeForce GTX 1660 Ti GPU was used for testing. The implementation was configured to use the GPU, so the times presented in the results section are those obtained with it. In an attempt to minimize the possibility of overfitting, an iterative mechanism was designed consisting of the following steps: 1) Of all the samples, 60% were assigned to the training set, 20% to the validation set, and the remaining 20% to the test set, ensuring that all sets contained the same amount of samples from each class (LOS and NLOS). 2) The training phase was performed using the training set. 20 epochs and a batch size of 64 samples were considered. 3) After the training was completed, the time spent was saved and the network was tested using the test set. The results were also recorded.
This process was repeated N times and, finally, the results were averaged. For this experiment a value of N = 10 was set. The entire software implementation created for these experiments is publicly available [25].
As an illustration of the training process, and following the approach in [26], Fig. 4 shows the evolution of the loss function with respect to the epoch index for both the training and validation sets. This example corresponds to one of the repetitions using the PDP.

III. RESULTS CONSIDERING PDP
The first experiment consisted on comparing the effect of considering the PDP samples versus employing all the samples from the CIR [19] or only the first 152 samples after the detection of the first path [20]. The CNN was trained  using each of these sets separately. Fig. 5 shows the results for the three sets of features seen in Section II-A (i.e. all CIR Samples, CIR; First 152 samples, CIR[152]; and Downsampled PDP with L = 5) after repeating the training and test processes 10 times for each of the different sets of features. The F1-score is a common metric in the literature, by relating in a single value the indicators of precision and recall, and is defined as where precision = t p /p and recall = t p /f , with t p being the set of true positives, p being the union of the true positives and the false positives, and f the union of the true positives and the false negatives. For this comparison a value of L = 5 was used to generate the PDP. The choice of this value is detailed in Section IV. With regard to the results shown in Fig. 5, it can be seen that, despite obtaining close values in all three cases, using all the CIR samples as training features yields the worst and most variable results. The reason for this lies in the fact that most of the CIR samples are just noise, and the actual CIR information is contained only in the samples following the arrival signal. This set of samples with significant information is however very reduced in comparison to the total number of CIR samples captured. Thus the remaining samples (which do not provide any useful information) cause the deterioration of the classifier performance. This can be easily verified by comparing the value obtained with that of the case where only the first 152 samples from the CIR are used from the first path. In the latter case, a higher and more precise (with lower variance) F1-score value is shown in Fig. 5. Consequently, the training result is more predictable, no matter how the samples are distributed in the training and test sets. When considering only the first 152 CIR samples the results in Fig. 5 are quite similar to those obtained in [20], in which the obtained F1-score value was 87.6 and falls within the range obtained in the present experiment (see Fig. 5). This also indicates that the CNN considered reproduces the one used in [20] obtaining similar results.
Regarding the use of the PDP, the results shown in Fig. 5 are on par with those obtained using only the first 152 CIR samples. However, it is important to remember that in this experiment the PDP only contains 30 samples, which will have a positive influence on the computational cost and therefore on the time needed to train and run the network. Fig. 6 shows the training times averaged over the different executions of the experiment. The time corresponding to the PDP case also includes the time necessary to extract the PDP from the CIR. This time, in the same machine where the simulations were carried out, was 9.807 s. Fig. 6 shows that, in spite of this extra time, the final result is the lowest of the three variants. Obviously, training with all the samples from the CIR is the most expensive approach, whereas doing it with only 152 samples reduces the time considerably. Training using only the 30 samples of the PDP is the fastest approach and, since it does not worsen the classifier performance with respect to the other two approaches, it is the best alternative in this case.

IV. EFFECT OF THE DIFFERENT ENERGY LEVELS OF EACH CLASS ON THE RESULT
In order to choose a suitable L value when generating the PDP, classification tests were carried out with different values of this parameter. The results (Fig. 7) did not show significant differences, despite the fact that the final number of samples went from 30 (L = 5) to only 3 (L = 40). This counter-intuitive result indicated that some other factor outside the simple relationship between the CIR samples was being overlapped. Thus, a comparison was made by analyzing the energy level of all the samples for each of the two classes, LOS and NLOS. The results can be seen in Fig. 8. Clearly,  the energy distribution is very different in both classes, which makes the classifier use this underlying information as a main feature when classifying, obtaining an overfitting for this dataset. Probably, this distribution is a consequence of the way in which the NLOS samples were obtained, maybe with an obstacle in between (i.e. a wall).
In order to be able to observe the results without this effect, we decided to perform an energy normalization to the CIR. However, note that normalizing the CIR energy has the unwanted effect of changing the noise power, which could be also used by the classifier. To avoid this problem, we decided to also remove the noise samples, leaving only those samples with a value above a certain threshold.
It should be noted that, for this particular dataset, the detection value of the first path stored in the measurements, as well as the one calculated by us, is always above the 700-th sample, so we can consider that the first samples of the CIR always correspond to noise. Thus, we considered the first 500 samples of the CIR to obtain the noise power estimation. Using 500 samples allows us to obtain a good estimation of the noise power and ensures that we are only taking noise samples and not samples where the actual CIR is present. Thus, for each CIR measurement |h[n]| we estimate the noise power as Then, for each measurement |h[n]|, we calculated the normalized and noise-free CIR as where E h is the energy of the CIR, calculated without the effect of the noise estimated in Eq. (11), as n 0 in Eq. (12) is the index of the first path as defined in Eq. (6), and n L in Eq. (12) is the instant when the signal exceeds the threshold of the noise, defined in a similar way as t L in Eq. (4): where α is a parameter used to select the threshold to detect the last index n where the CIR power is still significantly higher than the noise power. In our case we empirically set α = 6 by manually inspecting the results to check that the noise samples were successfully detected and removed. Fig. 9 shows the classification results with PDP and different L values after energy normalization. In this occasion we observed that the F1-score values decrease as the number of input features decrease. Thus, with the value of L = 5 (which gives 30 samples) the best result is obtained, whereas with L = 40 (only 3 samples) the worst result is obtained.
Regarding the training time, Fig. 10 shows the results per execution for the different values of L. As expected the time  increases as the number of input features increase, however, it can be seen that the differences are small with the dataset used. In view of these data, it is clear that with the energy of the CIR normalized and eliminating the noisy samples, the best results will always be obtained with the smallest value of L, that is, with the highest number of samples. Thus, for the comparison of Section II we decided to select the value of L = 5, since this was the smallest L value among all the analyzed ones.

V. RESULTS CONSIDERING ADDITIONAL FEATURES
In the second experiment, the objective was to test the performance of the classifier by adding a number of extra features in addition to the PDP or the two CIR variants. Based on a previous work ( [27]), where these properties demonstrated to contain enough information to make a good classification when using shallow neural networks, two of the chosen features were the range value returned by the DW1000 chip and the RSS estimation. However, on this occasion and since we have the samples from the CIR, we decided to replace the RSS estimate with the energy extracted from these samples. Thus, a new experiment was carried out in which the following features were added to those already analyzed previously: • Range: The distance estimation obtained by the DW1000.
• Energy: The energy of the CIR, i.e., the zero-th order moment of the PDP, as shown in Eq. (13).

A. NEURAL NETWORK TOPOLOGY AND PARAMETERS
For this second experiment, we decided to create a new differentiated neural network to handle the new features. Although a CNN is useful for treating information spatially or temporally related (as is the case with the CIR or the PDP samples), it is not the best solution for features that do not have these properties. This is why the new neural network was designed with two different branches: a first one composed of several convolutional layers to treat the CIR/PDP samples (similar to Section II-B), and a different branch composed of a full connected layer whose input corresponds to the two extra features. The final network design is shown in Fig. 11. To carry out the training and test of this network, each sample of the training set was separated into two parts, each one corresponding to one of the two branches of the network. Obviously, the LOS/NLOS class value was common to both.  does not use the extra features. The variant that uses the 152 CIR samples suffers a slighter increase, with a final difference of 179, 7 s. Finally, the slower variant employing all the CIR samples is the one that suffers the largest increment with 1474.9 s more than the version without the extra features. In view of this data, we can conclude that additional training features, such as range and energy enables for improving the classifier performance at the expense of a increase of the training time, which depends on the total number of features considered.

VI. CONCLUSION
A series of experiments to test two hypotheses related to the classification of UWB measurements into LOS and NLOS according to its propagation conditions was carried out using deep learning mechanisms. On the one hand, one of the questions to be analysed was the impact of considering a downsampled version of the PDP as a source of information in the learning process instead of directly using the CIR samples. For this purpose, an experiment was carried out in which the same CNN was trained with three variants of the same set of samples: using all the CIR samples, considering only a subset of these containing most of the information, and employing the PDP samples. The results obtained showed that the use of a downsampled PDP offers similar or superior classifier performance than the other alternatives, but requiring less training time.
Another point of interest in this work was to analyze the impact that an unbalanced energy distribution in the classes has on the final classification result. We analyzed the problem showing the values before and after a process of energy normalization and noise removal, checking substantial variations in the final results of the F1-score. This shows us that for the particular case of LOS-NLOS classification of UWB measurements using the CIR or its derivatives, it is not only sufficient to have a balanced dataset in terms of the number of measurements of each class, but it is also necessary that the energy level is similar in both classes. Otherwise the energy level itself is sufficient to make a satisfactory classification that can mask the real contribution of the relationship between the CIR samples.
Another question to which this work was intended to answer is whether the performance in the classification using the CIR or the PDP samples could be improved by adding to the training set other additional data derived from the CIR, such as the range or the energy. The results obtained in this case showed an improvement in the performance with respect to the first experiment at the cost of an increase in the computational cost.
Although the considered dataset was useful to compare the results with other works based on these data, the conclusions obtained cannot be generalized without performing new experiments considering a different set of measurements. As demonstrated in the present work, the measurements of the considered dataset can be classified at some degree only by their differences with respect to their energy level, beyond the possible relationship between the different paths observed in the CIR. Thus, it becomes mandatory for a future work to obtain a more balanced dataset in which the range and variety of measurements (for example, with long distance LOS samples), allows for assessing the contribution of each information source to the final classification result.