A Machine Learning Approach to Predict the Average Localization Error With Applications to Wireless Sensor Networks

Node localisation is one of the significant concerns in Wireless Sensor Networks (WSNs). It is a process in which we estimate the coordinates of the unknown nodes using sensors with known coordinates called anchor nodes. Several bio-inspired algorithms have been proposed for accurate estimation of the unknown nodes. However, use of bio-inspired algorithms is a highly time-consuming process. Hence, finding optimal network parameters for node localisation during the network set-up process with the desired accuracy in a short time is still a challenging task. In this article, we have proposed an efficient way to evaluate the optimal network parameters that result in low Average Localisation Error (ALE) using a machine learning approach based on Support Vector Regression (SVR) model. We have proposed three methods (S-SVR, Z-SVR and R-SVR) based on feature standardisation for fast and accurate prediction of ALE. We have considered the anchor ratio, transmission range, node density and iterations as features for training and prediction of ALE. These feature values are extracted from the modified Cuckoo Search (CS) simulations. In doing so, we found that all the methods perform exceptionally well with method R-SVR outperforming the other two methods with a correlation coefficient (R = 0.82) and Root Mean Square Error (RMSE = 0.147m).


I. INTRODUCTION
A WSN consists of a set of miniature and inexpensive sensors that are spatially distributed over an area to measure the physical parameters or monitor the habitat conditions and also have many practical areas of implementation such as target tracking, precision agriculture, etc., [1]- [6]. In most of the applications, these sensors need to estimate their coordinates accurately with minimum resource requirements. These sensors can quickly locate their coordinates using an integrated Global Positioning System (GPS) system. However, it is not practically feasible to integrate GPS in all the sensors due to its size and cost. An alternate approach is to use the concept of localisation algorithms in which several anchor nodes (with The associate editor coordinating the review of this manuscript and approving it for publication was Tie Qiu . integrated GPS) will assist the unknown nodes to determine their coordinates accurately.
A large number of localisation algorithms have been introduced to solve different localisation problems [7]. These algorithms are expected to be flexible so that it can work well in various diverse indoor and outdoor scenarios and topologies. These localisation algorithms have been divided into two categories, viz., range-based algorithms and range-free algorithms. In range-based algorithms, the location of the unknown nodes is computed with the help of distance between the anchor and unknown sensor nodes. They utilise the ranging metrics such as the angle of arrival, time of arrival, and the Received Signal Strength Indication (RSSI) [8]- [10]. In contrast, the range-free algorithms such as ad-hoc positioning system [11] and centroid [12], etc., make use of simple operations related to the connectivity to localise the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ unknown node. They only need the existence of the beacon signal in the medium by the anchor node. Among both, the range-based algorithms are widely employed and preferred over the range-free algorithms [13]- [15].
To design a less complicated algorithm; various bioinspired algorithms have been proposed for range-based approach [16]. Initially, Gopakumar and Jacob [17] rendered a node localisation method formed on Particle Swarm Optimisation (PSO) [18], which imitated the behaviour of a fish swarm to search for food. This algorithm showed good initial results, but the implementation tended to get caught in a local optimum, which results in premature convergence. In 2014, Goyal and Patterh [19] implemented CS for node localisation in WSNs. It showed noticeable results to minimise the localisation error. This is mainly because of the tuning parameters in the CS algorithm, which ease the calculation process. Recently, a modified version of CS was proposed by Cheng and Xia [20], which improved the convergence rate of the conventional CS algorithm. They modified the random walk step size and the mutation probability to improve the search process.
ALE metrics assess the accuracy of these localisation algorithms. We select an algorithm that has the minimum ALE value. The major problem after selecting a bio-inspired algorithm for node localisation is the computational time. During any network set-up, we need to run the algorithm many times in order to find the optimal network parameters (such as anchor ratio, transmission range, node density, etc.,) and to tune the ALE below the threshold for the desired scenario. To deal with this limitation, we have proposed an efficient machine learning approach for accurate and fast prediction of ALE in such a scenario. As far as we know, no other study has been conducted and published to address this issue.
In this article, we have presented three methods based on the SVR model. We have selected and extracted four features, namely anchor ratio, transmission range, node density and number of iterations from the modified CS algorithm. Eventually, we input this data to train the SVR model and obtained the predicted ALE using the trained SVR model for all the three methods.
Further, we have divided this article into six sections. In Section II, we have discussed the related works. In Section III, we have discussed the system model for the node localisation problem. Furthermore, we have also discussed the details of the features importance, hyperparameter tuning and SVR model. Afterwards, in Section IV, we have discussed the simulation scenarios and parameters for the modified CS and SVR model. In Section V, we have discussed the results of all the three methods for ALE prediction. Finally, in Section VI and VII, we have presented the discussion and conclusion respectively.

II. RELATED WORKS
In this section, we have discussed the several methods for improving the node localisation accuracy. Several studies have been conducted to improve localisation accuracy using machine learning. Morelande et al. [21] introduced a Bayesian algorithm for node localisation in WSNs. The proposed algorithm is a refinement of a previous work referred to as progressive correction [22]. Both these methods are compared in different scenarios keeping Cramér-Rao bound (CRB) as the benchmark. The proposed algorithm proved to be more accurate than its predecessor. Further, Ghargan et al. [23] presented an approach in which Artificial Neural Network (ANN) is hybridised individually with three optimisation algorithms: Particle Swarm Optimisation (PSO), Backtracking Search Algorithm (BSA) and Gravitational Search Algorithm (GSA). The GSA-ANN hybrid proved to outperform the other methods with a mean absolute distance estimation error of 0.02m and 0.2m for outdoor and indoor scenarios, respectively. In a recent survey, Ahmadi and Bouallegue [24] compiled the different state-of-the-art machine learning techniques utilised in node localisation in WSNs. It compared the cumulative localisation error distribution curve of various techniques like ANN, Support Vector Machine (SVM), Decision Tree (DT) and Naive Bayes (NB) method. This study reported that NB outperformed all the other machine learning techniques based on their cumulative localisation error distributions. Bhatti et al. [25] developed an outlier detection algorithm named ''iF_Ensemble'' for an indoor localisation environment using a combination of different supervised, unsupervised, and ensemble machine learning methods. Here, the supervised learning techniques are K-nearest neighbour (KNN), Random Forest (RF) classifiers and SVM, whereas unsupervised learning techniques is isolation Forest (iForest). These techniques are used with stacking, that is an ensemble learning method. The model, including stacking, is compared with the individual performances of the machine learning algorithms involved. The stacking model provides high localisation accuracy of 97.8% with proposed outlier detection methods. Recently, Wang et al. [26] introduced a node localisation algorithm named Kernel Extreme Learning Machines based on Hop-count Quantization (KELM-HQ). The trained KELM computes the locations of the unknown nodes. The proposed algorithm proves the localisation error to be improved by 34.6% when compared with fast-SVM, 19.2% when compared with GADV-Hop algorithm, and 11.9% when compared with DV-Hop-ELM algorithm.
Overall, this study aims to overcome the limitation of localisation accuracy in previous studies by using a regression-based machine learning approach.

III. SYSTEM MODEL
In this section, first, we have discussed the system architecture designed for the node localisation process. Then we have discussed the method to compute the distance between the anchor and unknown nodes. Afterwards, we have discussed the objective function formation and working of the modified CS algorithm for node localisation. Finally, we have discussed the details of the machine learning model used.

A. SYSTEM ARCHITECTURE
The sensor nodes are considered to be deployed randomly inside a region with area X × Y square units. The system consists of M anchor nodes. These anchor nodes act as a reference for all N unknown nodes of the network, which need to be localised. All the sensors can transmit/receive data within a transmission range of R distance units. The anchor's positional information is utilised as a reference to evaluate the coordinates of all the localisable unknown nodes. An unknown node is considered localisable only if it has at the minimum three anchor nodes inside its communication range.

B. DISTANCE CALCULATION AND OPTIMISATION PROBLEM FORMATION
The RSSI is used by the unknown nodes to calculate their distances from the anchor nodes. Sensors experience a power loss during the exchange of information because of shadowing and multipath fading. This path loss is modelled as log-normal shadowing [27], which is expressed as shown in Eq. (1): In Eq. (1), PL(d), PL 0 , and d represent total path loss (transmitted power -received power), path loss at a reference distance d 0 , and the distance between the transmitter and the receiver respectively. Besides, η denotes the path loss exponent showing how the strength of the received signal decreases with the increase in distance between transmitter and receiver [28]. The value of η relies on various parameters such as signal frequency, antenna height, and the propagation environment [27]. Generally, the value of η lies in the range of 2-6 [29] and is higher than 4 for indoor or shadowed environment [30]. Furthermore, σ represents the standard deviation of shadowing effects, and its value varies with the signal propagation environment and is generally higher than 4 dB [31]. X g is a Gaussian random value representing the attenuation caused by fading.
A ranging error is experienced as the result of log-normal shadowing. This ranging error observes a zero-mean Gaussian distribution. Its variance σ 2 is expressed in Eq. (2): where, γ represents the localisation error between the actual and measured Euclidean distance D ij between i th node (x i , y i ) and the j th node (x j , y j ) and is known as Gaussian noise having mean zero and standard deviation one. We have considered the value of γ equal to 0.1 as it is the most appropriate value used in literature [20], [32]. Eq. (2) shows that the standard deviation of the ranging error varies linearly with the actual distance between two nodes. The real distance D ij can be calculated using the following Eq. (3): A circular disk model has been adopted to establish network connectivity: two nodes i and j can converse with each other only if D ij R, where R is the transmission range of both the sensor nodes.
The measured distance is represented by D ij , and is given by the expression in Eq. (4): (4) where, N ij is the ranging error between node i and j.
While calculating the position of the unknown nodes, there always exists a ranging error. So, we need to evaluate the position of the unknown nodes as precisely as possible, considering this inevitable ranging error. To achieve this, we formulate an Optimisation Function (OF), which is the mean of the square of the error between the actual distance of evaluated node coordinates and the estimated distance of actual unknown node coordinates from the neighbouring anchor nodes. Let, (x i , y i ) and (x j , y j ) be the position of i th unknown node and j th anchor node respectively. The OF is given in Eq. (5): where, M 3, because an unknown node should have at the minimum three anchor nodes within its transmission range to be considered as localisable (trilateration rule). The (x i , y i ) corresponding to the minimum value of the OF is the evaluated position of the unknown node.

C. MODIFIED CS ALGORITHM FOR NODE LOCALISATION
Modified CS is a bio-inspired meta-heuristic algorithm [20] used for node localisation in WSNs. It estimates the coordinates of the unknown nodes in the network by initialising a random population of candidate solutions for every unknown node. Afterwards, it calculates the fitness value for each solution using the OF (using Eq. (5)). The worst out of the candidate solutions are replaced by a new set of randomly allocated candidate solutions. This process continues over a predetermined number of iterations, then the coordinates corresponding to the global best solutions are selected as the coordinates of the unknown nodes in the network for each of the node.

D. MACHINE LEARNING MODEL
Broadly, learning algorithms are divided into supervised and unsupervised learning. Further, supervised learning is classified into classification and regression learning, whereas unsupervised learning is classified into clustering and dimension reduction techniques [33].
In this article, our objective is to assess the potentiality of regression-based machine learning algorithms for estimating the node localisation error. The key objective of regression-based machine learning algorithms is to predict the predictand based on a mapping function. This mapping function is modelled by feeding a set of features and predictand data known as training data set. In doing so, we have selected the SVR algorithm. SVR is used in many applications such as image processing [34], [35], remote sensing [36], and blockchain [37]. It has superb generalisation competence along with high accuracy. Also, the computational complexity is independent of the input feature data set [38].

1) FEATURE IMPORTANCE
In this article, we have evaluated the feature's importance by regression ensemble approach. First, we have trained a regression ensemble model. It contains the results of boosting hundred regression trees (number of ensemble learning cycle) using LSBoost ensemble aggregation approach, feature data and the predictand data. We have used the regression tree, weak learner, with unity learning rate. After creating an ensemble, we calculated the estimate of the predictor or feature importance by summing these estimates over all the weak learners in it. In doing so, we plotted the feature importance graph (Fig. 1). We found that out of the four features, the node density is the most important feature followed by the number of iterations. In contrast, the anchor ratio and the transmission range has nearly equal importance. Further, we have estimated the partial dependency of the features on the predictand (Fig. 2). In the same plot, we have also plotted the individual conditional expectation of each data.

2) HYPER-PARAMETER OPTIMISATION
SVR is used to learn from data indicating excellent performance in prediction and pattern recognition. It is also benefited from the big data collected from onboard analysis. The hyper-parameters have a significant influence on SVR's predictive efficiency. The SVR's efficiency is determined by the different hyper-parameters such as C and , which helps in identifying the training error. If the residual is higher than hyper-parameter , then the parameter C penalises the training error. Thus, minimal C values lead to computational complexity, while too large C values lead to model under-fitting.
In this article, we have used the universal grid search approach to optimise the hyper-parameter present in the SVR model. In this study, we optimise the penalty factor, C, in the SVR model by keeping the , constant. We have selected the famous Mean Square Error (MSE) function as the loss or objective function (using Eq. (6)) for optimisation.
208256 VOLUME 8, 2020 We have selected the C value, which corresponds to the minimum value of the objective function for all the three methods.

3) SUPPORT VECTOR REGRESSION MODEL
SVR was initially proposed by Drucker et al., which is a supervised learning technique, based on the concept of Vapnik's support vectors [39], [40]. SVR aims at reducing the error by determining the hyperplane and minimising the range between the predicted and the observed values. Minimising the value of w in the Eq.(7) is similar to the value defined to maximise the margin, as shown in Fig. 3.
where, n i (ξ i ) represents an empirical error. Hence, to minimise this error, Eq. (8) is being used.
where, α * i , a i ≥ 0 represents the Lagrange multiplier, K (x, x i ) represents the kernel function and B represents the bias term. In this study, we have used the Polynomial kernel given by: where d is the polynomial degree and γ is the polynomial constant. SVR performs better performance prediction than other algorithms like Linear Regression, KNN and Elastic Net, due to the improved optimisation strategies for a broad set of variables. Moreover, it is also flexible in dealing with geometry, transmission, data generalisation and additional functionality of kernel [41]. This additional functionality enhances the model capacity for predictions by considering the quality of features [42].
The training samples influence the SVR model's fitting performance since the SVR algorithm is sensitive to the interference in the training data. Besides, SVR is useful in resolving high dimensional features regression problem, and well-function if the feature metrics is larger than the size of the sample [43]. In this study, we have extracted four features, VOLUME 8, 2020 namely anchor ratio, transmission range, node density and the number of iterations from modified CS algorithm simulation.
Feature scaling is essential for SVR because, when one function has greater magnitudes than others, the other features will dominate while measuring the distance. To avoid this, we have used various standardisation approaches. Based on which, we have proposed three methods, as shown in Fig. 4. The method I is S-SVR (Scaling SVR). In this method, we first standardised the features using Eq.(10): where x is the feature vector, x s is the standardised data, and σ is the standard deviation of the feature vector. The method II is Z-SVR (Z-score SVR). In this method, we have standardised the features using Eq.(11): where x is the mean of the feature vector. The method III is the R-SVR (Range SVR). In this method, we have standardised the features using Eq.(12): Afterwards, we trained and tested the SVR models in 70:30 ratio, as shown in Fig. 4. In this study, the dimension of the features vector are 107 × 1. Hence, we have used 75 data for training and the remaining 32 for testing.

IV. SIMULATION EXPERIMENT
In this section, we have discussed the simulation environment of the modified CS algorithm and the SVR model.

A. ALE SIMULATION USING MODIFIED CS ALGORITHM
For the calculation of ALE, we set up a simulation environment of 100 × 100 m 2 , and we vary the parameters like node density, anchor ratio and transmission range of each node to calculate ALE for different network configurations. Modified CS has some tuning parameters like step size α and mutation probability P a , which lie in the ranges 0.9 to 1.0 and 0.05 to 0.25 respectively. The number of candidate solutions is fixed at 25. The maximum number of iterations allowed to localise each unknown node is set to 100.

B. SVR SIMULATION FOR ALE PREDICTION
For simulating the SVR model, we performed the hyper-parameter tuning through the grid search algorithm. In doing so, we fixed one of the hyper-parameter (i.e., at 0.01) and applied the grid search algorithm to find the value of the other hyper-parameter. We created a 100 × 100 grid for the penalty factor, C. Each grid represents a specific value of C. On simulating the grid search algorithm, it finds an optimal grid that corresponds to the minimum value of the MSE. The range of optimal C for all the three methods along with the other simulation parameter value is given in Table 2.

V. RESULTS
In this section, we have presented the results of the method I, II and III for ALE prediction in the respective subsections.
We have plotted a linear regression curve between the predicted ALE and the simulated ALE for comparison.

A. PERFORMANCE OF THE METHOD I
We have compared the predicted ALE results, thus obtained by the method I with the simulated results of the modified CS algorithm. We found that predicted results accorded well with the simulated results and gathered along the straight regression line with mild scattering (Fig. 5). The shaded grey region corresponds to the 95% Confidence Interval VOLUME 8, 2020

B. PERFORMANCE OF THE METHOD II
Once we calculated the predicted ALE through method II, we have evaluated its performance with the simulated results of the modified CS algorithm. In doing so, we found a good agreement between the both with R = 0.81 and RMSE = 0.20m (Fig. 6). However, some observed values lie outside the CI of the regression line due to the overestimation of the ALE value by the SVR model. The overestimation probably occurs due to the positive bias. This type of error comes under systematic error which is mainly due to model or approach used.

C. PERFORMANCE OF THE METHOD III
We have compared the predicted ALE of the method III with the simulated ALE obtained through modified CS algorithm. In this case, also, we found a strong correlation between the variables (Fig. 7). Here, we found a pragmatic correlation of R = 0.82 with RMSE = 0.15m.

VI. DISCUSSION
In this section, we have first discussed the performance of all the three methods in terms of computational efficiency. In doing so, we have calculated the computational time required to predict or calculate the ALE. Further, to ensure a fair comparison of the proposed methods with the existing modified CS, we have compared the obtained results with the computational time of the modified CS simulated results for three different configurations i.e., computational time for node density 100, 200 and 300 have been plotted by taking the transmission range of 20m and an anchor node of 20 in 100 × 100 m 2 area (Fig. 8). In this figure, the time axis is in log scale. The dotted line shows the computational time required by all the three methods when it is compiled in a single script.
On comparing, we found that the time taken by all the three methods is significantly lower than the time taken by the modified CS algorithm. Further, method III taken the least time followed by method II and method I respectively.
Various other studies have been carried out to improve the localisation accuracy based on Adaptive Neural Fuzzy Inference System (ANFIS) [44] with a Mean Absolute Error (MAE) of 0.283 m and backpropagation based artificial neural network (BP-ANN) model [45] with a mean localisation error of 0.921 m. Both these studies have reported a high localisation accuracy. In this study, we have reported a minimal RMSE of 0.15m. However, to ensure a fair evaluation of the proposed methods, we need to compare the results of SVR with other regression-based machine learning model. We have selected Gaussian Process Regression (GPR) for comparison because it is widely used, robust and accurate model [46], [47]. In doing so, we have compared the obtained results with the corresponding variants of GPR. The three corresponding GPR variants are Scaling GPR (S-GPR), Z-score GPR (Z-GPR) and Range GPR (R-GPR) as illustrated in Table 3. We have used R, RSME and computational time for comparing the results of all the methods. In doing so, we found that the method III is the most effective method among all the methods.
Although the proposed methods perform better than the corresponding variant of the GPR, the SVR based methods are susceptible to under-performance when dealing with noisy data. In such scenarios, GPR is more likely to perform better [48]. Also, the performance of the proposed methods depends on the choice of the kernel and features.

VII. CONCLUSION
In this article, we presented and investigated three SVR based machine learning model for ALE prediction. These methods are defined based on the standardisation method used. In the method I, II and III, we have used scaling, Z-score and range standardisation methods respectively. Afterwards, we trained the SVR model with the polynomial kernel using the standardised data and evaluated its performance using correlation of coefficient and RMSE metrics. In doing so, we found that range standardisation (using Eq.(12)) of the features (i.e., method III) results in lower RMSE in ALE prediction. Also, the coefficient of correlation is highest in method III.
Further, we have also compared the performance of all the three models in terms of the computation time requirement.
Again, method III performs better than the other two methods. It requires less time than the other two methods. Hence, method III can be used for ALE prediction during network set-up process to cut down the time requirements.