AP Optimization for Wi-Fi Indoor Positioning-Based on RSS Feature Fuzzy Mapping and Clustering

In indoor environments, Access Points (APs) are widely deployed in various locations of buildings, and thereby the AP optimization-based Wi-Fi indoor positioning technology is of great significance for achieving the satisfactory indoor Location-based Services (LBSs). However, the current Wi-Fi indoor positioning methods rarely pay attention to the diversity of Received Signal Strength (RSS) features for AP optimization, which may result in the low positioning accuracy and high positioning overhead. In order to deal with such issues, this article proposes a new concept of multi-dimensional RSS feature fuzzy mapping and clustering for AP optimization in Wi-Fi indoor positioning. Besides, the extensive experiments conducted in an actual indoor environment show that compared with the existing positioning methods, the proposed method can not only achieve higher positioning accuracy by using the optimized APs but also reduce the positioning overhead in the online phase.


I. INTRODUCTION
In recent years, with people's longing for a convenient and technologically equipped smart life and the development of communication technology, Location-based Services (LBSs) [1] have been given a different meaning from the conventional indoor positioning, such as the indoor personnel collaborative navigation guidance, intelligent warehouse location management, and merchant advertising. In such a massively interconnected smart-device environment [2], [3], how to ensure the reliability and convenience of LBSs with high data rate and spectral efficiency has become the main focus of many relevant researchers at home and abroad. Besides, in the actual indoor environment, due to the obstruction of the furniture, walls, and many other obstacles including the human body shadowing, indoor wireless signals are severely affected by the signal strength The associate editor coordinating the review of this manuscript and approving it for publication was Kunjie Xu. attenuation and multi-path fading during the propagation, which limits the development of indoor positioning technologies. At present, there are a large number of common indoor positioning technologies such as Ultra-wide Band (UWB) [4]- [6], Infrared Ray (IR) [7], Radio Frequency Identification (RFID) [8]- [10], Bluetooth [11], Ultrasonic Wave (UW) [12], ZigBee [13], Visible Light (VL) [14], and Wi-Fi [15] indoor positioning technologies. Most of them involve the high deployment cost due to the additional hardware requirement, which makes them difficult to be widely applied. In comparison, the Wi-Fi indoor positioning technology can be selected as a good candidate in indoor scenarios due to its advantages of easy deployment, low cost, and extensive coverage range [16].
In the era of ubiquitous computing [17], technologies such as the terminal communication, data collection, and information processing have achieved sufficient development, and the Wi-Fi indoor positioning technology in indoor environment has been highly valued and deeply studied as a VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ hot topic. Without incurring any additional hardware, Wi-Fi indoor positioning methods generally fall into two categories, namely propagation model-based method and location fingerprinting-based method. The former is based on the signal propagation model, which utilizes the relationship between RSS features and the corresponding propagation distances. In this method, the target position is generally estimated by using the geometric positioning algorithm, such as the Approximate Point-in-triangulation Test (APIT) [18], with the known locations of Wi-Fi Access Points (APs). However, since the Non-line-of-sight (NLOS) scenario often exists during the signal propagation in the indoor environment [19], how to construct an appropriate signal propagation model in the target environment to precisely describe the diversity of Received Signal Strength (RSS) features is challenging [20]. Comparatively speaking, independent of the signal propagation model construction, the latter involves two phases, namely offline phase and online phase.
In the offline phase, we collect the RSS readings at several pre-calibrated points in the target environment to construct the fingerprint database, and these pre-calibrated points are called as Reference Points (RPs). Then, in the online phase, the proximity matching algorithm, such as the K-nearest Neighbor (KNN) [21], is used to estimate the target position by matching the newly-collected RSS features against the pre-constructed location fingerprint database. The structure of this article is organized as follows. An overview of related works on the existing Wi-Fi indoor positioning technologies is given in Section 2. Then, the detailed steps of the proposed method in offline and online phases are illustrated in Section 3 and 4 respectively. In Section 5, the effectiveness of the proposed method in terms of the positioning accuracy and overhead is verified through the extensive experiments conducted in an actual indoor environment. Finally, Section 6 summarizes this article and provides the future direction.

II. RELATED WORK
In recent years, many researchers at home and abroad have shown great interest in developing the Wi-Fi indoor positioning technology to provide a more satisfactory LBS. For example, the authors in [22] use the T-test method to optimize the RSS sample size for Wi-Fi positioning, where the Operating Characteristics (OC) function is applied to depict the relationship between the positioning accuracy and RSS sample size. The authors in [23] use the nonlinear Partial Least Square (PLS) method to extract RSS features to eliminate the correlation between different variables by reducing variable dimensions. The two methods above can reduce the overhead of location fingerprint database construction, but they do not take into account the difference of the location resolution of RSS features from different APs and the existence of redundant APs, resulting in the decrease of positioning accuracy. In response to this compelling problem, considering the diversity of RSS features and the randomness of the noise in the actual indoor environment, the authors in [24] propose a new collaborative positioning method, namely the Maxlifd. In this method, the joint maximum likelihood estimation method is used to integrate the prior information of location fingerprints and the corresponding mutual distance between them, based on which the Semi-definite Program (SDP) algorithm is applied to estimate the target position. This method depends on a single RSS feature for positioning, and the corresponding positioning overhead in the online phase is relatively high, thus limiting its practical applications.
Different from the approaches above, a novel AP optimization method for Wi-Fi indoor positioning based on multi-dimensional RSS feature fuzzy mapping and clustering is proposed in this article, as shown in Fig. 1. Specifically, in the offline phase, we perform the offline RSS data collection and pre-processing, and then calculate the Maximum Information Coefficient (MIC) between different APs as the corresponding AP correlation, based on which the fuzzy clustering algorithm is used to select the non-redundant APs for positioning. In addition, we also utilize the AP information gain ratio to calculate the RSS feature fuzzy weight, which plays an essential role in calculating the online AP fuzzy membership. Then, in the online phase, by selecting APs with the high location resolution (or called the high online AP fuzzy membership) for positioning, the target position is estimated based on the Bayesian positioning approach.

III. OFFLINE PHASE A. AP REDUNDANCY REDUCTION
In the offline phase, the first task is to select the non-redundant APs for positioning by eliminating the APs with the high substitutability. To this end, we do the offline RSS data collection and pre-processing, calculate the MIC between different APs as the corresponding AP correlation, and then perform fuzzy clustering to select the non-redundant APs for positioning. The specific process of achieving the AP redundancy reduction is given below.

1) RSS DATA PRE-PROCESSING
By supposing that there are n APs and m RPs in the environment, we set rss s ij (i = 1, · · · , n; j = 1, · · · , m; s = 1, · · · , χ) as the s -th offline RSS feature, such as the RSS mean, RSS variance, RSS maximum, RSS minimum, the difference between the RSS maximum and minimum, RSS median, RSS with the highest probability of occurrence, and the probability of exceeding the RSS mean [25], from the i -th AP at the j -th RP. Specifically, RSS mean is the average value of each group of RSS data; RSS mean is the variance of each group of RSS data; RSS maximum is the maximum value of each group of RSS data; RSS maximum is the minimum value of each group of RSS data; The difference between the RSS maximum and minimum is the difference between the maximum and minimum values of each set of RSS data; We arrange the RSS data in order of size, and the RSS in the middle of this RSS data is the RSS median of this set of data; RSS with the highest probability of occurrence is the RSS with the highest occurrence frequency in each set of RSS data and the probability of exceeding the RSS mean is the proportion of each group of RSS data that is higher than the RSS mean in all RSS data. Based on them, we can construct the offline RSS feature matrix as where the notation ''T'' represents the transpose operation, z off s = z off s1 , · · · , z off sn (s = 1, · · · , χ), and z off si (i = 1, · · · , n) represents the s-th offline RSS feature from the i-th AP, which is given by z z si , and then take the logarithm of z si (i.e., z si = lg z si ) to reduce the order of magnitude of z si . After pre-processing RSS data, we construct the standardized offline RSS feature matrix aŝ where z si represents the s-th standardized offline RSS feature from the s-th AP.

2) AP CORRELATION CALCULATION
For any two APs (e.g., the p-th and q-th AP), we set RSS p = z 1p , · · · ,z χp and RSS q = z 1q , · · · , z χq , as their standardized offline RSS features, and then construct the set of scatters corresponding to RSS p and RSS q as D pq = z sp , z sq , s = 1, · · · , χ. In this case, for any given division G r×c having the scale r × c, the scatter plot which is composed of scatters in D pq can be divided into r × c regions, where r and c stand for the number of divided rows and columns respectively. Based on this, we calculate the mutual information for the p-th and q-th AP under the division G r×c as where , p z sp and p z sq stand for the marginal probability density of z sp and z sq respectively, besides, p z sp , z sq represents the joint probability density of z sp and z sq . The calculation of p z sp , z sq and p z sp 1 is shown as follows.
i)To calculate p z sp , z sq , we use the concept of the two-dimensional histogram [26], which gives where f (f = 1, · · · , r × c) represents the f -th region under the division G r×c and sents the number of scatters in f , and area f represents the area of f . ii)To calculate p z sp , we define z max p = max z 1p , · · · , z χp and z min p = min z 1p , · · · ,z χ p , and divide the interval P = z min p , z max p into t = P τ sub-intervals with the same length, where τ represents the length of each sub-interval. In this case, we set as the η-th sub-interval, and use the concept of the one-dimensional histogram [27] to obtain where card η represents the number of elements (in the set z 1p , · · · ,z χ p in η . For each scale r × c < m 0.6 [28], we calculate the corresponding mutual information for the p-th and the q-th AP under different divisions, and then define the maximum of the calculated mutual information as the mutual information with the scale r × c, i.e., I r×c (p, q). Besides, for the sake of comparing the difference of the mutual information with different scales, we normalize I r×c (p, q) aŝ I r×c = I r×c (p, q) log 2 min {r, c}, and then obtain the corresponding mutual information feature set as I (p, q) = Î r×c , r × c < m 0.6 . Here, the maximum value of elements I (p, q) is defined as the MIC between the p-th and the q-th AP, given as m pq = max I (p, q). Thus, the larger value m pq indicates the higher correlation between the p-th and the q-th AP, which means the higher mutual substitutability between these two APs for positioning.

3) NON-REDUNDANT AP SELECTION
According to the MIC between different APs, we construct the MIC matrix (or called the fuzzy similarity matrix) with respect to the AP correlation as Based on the equivalence theory of the fuzzy similarity matrix [29], we start from M to calculate the second power successively, i.e., M 2l = M l • M l l = 1, · · · , log 2 n , where the notation ''•'' represents the Zadeh synthesis operation, such that When the relation M l • M l = M l first appears, M l is defined as the fuzzy equivalent matrix of M , denoted as Then, we construct the β-cut matrix of M * , i.e., , in which the element on the p-th row and q-th column ism After obtaining M * β , we continue to perform fuzzy clustering [30] on APs to obtain C 1 , · · · , C K , where C k (k = 1, · · · , K ) represents the k-th set of the correlated APs and K represents the number of clusters. Finally, to conduct the non-redundant AP selection, we select the AP with the highest location resolution (which will be illustrated in detail in the later Section IV(a)) in each cluster for positioning.

B. LOCATION RESOLUTION IDENTIFICATION
The second task in the offline phase is to evaluate the location resolution of RSS features, which can help in selecting APs with high location resolution for positioning. To achieve this goal, we calculate the RSS feature fuzzy weight from the AP information gain ratio, and then use it during the online AP fuzzy membership calculation to obtain APs with the high location resolution.

1) AP INFORMATION GAIN RATIO CALCULATION
According to Z off , we divide RPs into U sets, denoted as U sets, denoted as C 1i , · · · , C Ui , and RPs in each set, i.e., C ui (u = 1, · · · , U ), have the same offline RSS feature from the s-th AP. Also, we divide RPs into V sets according to z off s (s = 1, · · · , χ), denoted as D s 1i , · · · , D s Vi , and RPs in each set, i.e., D s vi (v = 1, · · · , V ), have the same s-th offline RSS feature from the i-th AP. Then, we calculate the uncertainty of RPs with respect to the i-th AP as where and H (P |i ) s represents the uncertainty of RPs classification according to the s (s = 1, · · · , χ)-th feature of the χ multidimensional RSS features about the i-th AP.
Finally, we calculate the information gain ratio [30] for the s-th offline RSS feature from the i-th AP as ψ i s = H (P)−H (P|i ) , where H (P) = log 2 m under the equal prior probability of each RP condition. In this case, the AP information gain ratio matrix for χ offline RSS features is constructed as where ψ s = ψ 1 s , · · · , ψ n s and ψ i s represents the AP information gain ratio for the s-th offline RSS feature from the i-th AP. The larger value ψ i s indicates a significant decrease in the uncertainty of RPs when the s-th offline RSS feature from the i-th AP is considered.

2) RSS FEATURE FUZZY WEIGHT CALCULATION
After the AP information gain ratio is obtained, we continue to calculate the RSS feature fuzzy weight. In concrete terms, first, we perform the normalization of to obtain the offline AP fuzzy membership matrix asB off Finally, from (19), we obtain the solution to the i-th equation in (18) as A (i) = a i 1 ∪· · ·∪a i χ . Then, the solution to (18) is given as A off = a off 1 , · · · , a off χ = A (1) ∩ · · · ∩ A (n) , which is also recognized as the RSS feature fuzzy weight matrix. Here, it is noteworthy that the larger value a off s indicates the higher location resolution of the s-th offline RSS feature. .

IV. ONLINE PHASE
After that, we construct the fuzzy mapping from Z onli to the set of online AP fuzzy membership ℘ = r onli 1 , · · · , r onli From (23)

B. TARGET POSITION ESTIMATION
We suppose that coordinates of the j-th RP and ω-th TP are x j , y j and (x ω , y ω ) respectively. By selecting the AP with the highest location resolution in each cluster of the correlated APs for positioning, the ω-th TP is estimated based on the Bayesian positioning algorithm, such that where rss s kω represents the s-th online RSS feature from the k-th selected AP at the ω-th TP, p rss s kω x j , y j represents the conditional probability density of rss s kω at x j , y j , and µ s kj and δ s kj stand for the mean and standard deviation of the s-th offline RSS feature from the k-th selected AP at the j-th RP respectively.

V. EXPERIMENTAL RESULTS
In an actual indoor environment with dimensions of 56.93 m by 20.08 m, two straight corridors and one lab, namely Areas 1, 2, and 3, are selected as the testing area shown in Fig. 2. There are total 8 APs (notated from AP 1 to 8) deployed in this area, in which 88 RPs and 25 TPs are uniformly calibrated. Besides, at each RP and TP, we do the RSS data collection for 10 s and 5 s respectively with the sampling rate 1 Hz. Fig. 3 shows the variation of the average positioning error with the increase of the online AP fuzzy membership under different noise power conditions. From this figure, we can find that as the online AP fuzzy membership increases, the curve of the average positioning error shows a downward trend, which indicates that the online AP fuzzy membership is positively correlated with the contribution to the positioning accuracy of the corresponding AP.

A. ANALYSIS OF SYSTEM PARAMETERS
For the better illustration of the impact of the online AP fuzzy membership on the positioning accuracy, Fig. 4 shows the Cumulative Density Function (CDF) of positioning errors by using different number of APs (from 3 to 8) with the largest online AP fuzzy membership for positioning. As the number of APs increases from 3 to 8, the sum of the corresponding online AP fuzzy membership, i.e., the value λ, increases from 2.79 to 5.36, and the CDF curve of positioning errors shows an upward trend. By taking the CDF of positioning errors in the Area 1 as an example, the confidence probability of positioning errors within 4 m is 73.88%, 82.60%, 86.72%, 89.27%, 93.12%, and 94.85% when the value λ equals to 2.79, 3.56, 4.19, 4.73, 5.10, and 5.36 respectively.
Besides, Fig. 5 shows the variation of the average positioning error with the increase of the value λ, from which we can find that as the value λ increases, i.e., the increase in of the number of APs, the curve of the average positioning error shows a downward trend and gradually converges. This result further verifies that the proposed method is able to achieve the high positioning accuracy and low positioning overhead by selecting APs with the high location resolution for positioning. Fig. 6 shows the average positioning error under different values of β condition. In general, the average positioning error decreases by increasing the value β, which is due to the reason that by increasing of the value β, the number of clusters by fuzzy clustering increases, and thereby the number of APs used for positioning also increases, thus resulting in a downward trend of the curve of the average positioning error. However, when the value β, increases to a certain extent (e.g., β ≥ 0.7), any further the increase of the value β, cannot significantly affect the average positioning error since the redundant APs may be included for positioning. In addition, when β = 0.7 and β = 0.8, the clustering results of APs are the same, thus the APs used for positioning are also the same, which results in the same average positioning error.

B. ANALYSIS OF POSITIONING OVERHEAD
The variation of the positioning overhead with the increase of the value λ under different noise power conditions is shown in Fig. 7. From this figure, we can find that with the increase of the value λ, the number of APs used for positioning increases, and thereby the positioning overhead is on the rise as expected. By taking the value λ increasing from 0.93 to 5.13 as an example, the positioning overhead for TPs in Areas 1, 2, and 3 increases by 6.94 s, 6.98 s, and 5.72 s respectively. Besides, in comparison with Areas 2 and 3, the positioning overhead for TPs in the Area 1 involves more number of TPs, and thereby higher positioning overhead is required for the AP redundancy reduction and location resolution calculation.    noise power 2 dBm 2 in the Area 1 as an example, the positioning overhead increases by 8.10s for an increase of β increases from 0.1 to 1.
Furthermore, Fig. 9 shows a comparison of the positioning overhead of the proposed positioning method and four other existing ones, i.e., T-test [22], maximum likelihood [32],    InforGain [33], and max-mean [34] positioning methods, under different noise power conditions. From this figure, it is observed that compared with the T-test positioning method (the one having the highest positioning overhead), the positioning overhead of the proposed one is reduced by a maximum of 5.41 s. Comparatively speaking, the positioning overhead of the max-mean positioning method is the lowest since it depends on simply calculating the RSS mean at each RP to conduct AP optimization. However, from the positioning result in the following section, we can find that the positioning accuracy of this method is still low, which is due to the reason that it only considers the RSS mean for AP optimization while ignoring the diversity of RSS features. Fig. 10 shows the bitmap of the average positioning error by different positioning methods. As can be seen from this figure, compared with the InforGain, max-mean, all-AP (i.e., using all APs for positioning) [35], strongest-AP (i.e., selecting the AP with the strongest RSS for positioning) [35], and    max-variance (i.e., selecting the AP with the maximum RSS variance for positioning) [35] positioning methods, the proposed one achieves the smallest average positioning error.

C. ANALYSIS OF POSITIONING ACCURACY
That is because it calculates the AP correlation and RSS feature fuzzy weight to reduce the AP redundancy and identify the location resolution of APs used for positioning. Fig. 11 shows the CDF of positioning errors by the T-test, maximum likelihood, InforGain, and max-mean positioning methods as well as the proposed one. As expected, the confidence probability of positioning errors within 4 m by the proposed method, i.e., 90.65%, is higher than the ones by the other four methods, i.e., 89.89%, 80.43%, 77.01%, and 68.30% respectively. To further demonstrate the effectiveness of conducting AP optimization, Fig. 12 compares the average positioning error by the T-test, maximum likelihood, InforGain, weighted least square [36], median KNN [37], and mean KNN [21] positioning methods with and without AP optimization which refers to selecting the non-redundant APs with high position resolution, from which the promising effect of AP optimization in the positioning accuracy is verified.

VI. CONCLUSION
In complex environments, due to the widely-deployed Wi-Fi APs at various indoor locations, the AP redundancy and the location resolution of different APs used for positioning are often ignored in the conventional indoor positioning methods. To deal with such problems, this article proposes a new multi-dimensional RSS feature fuzzy mapping and clustering-based AP optimization for Wi-Fi positioning. Specifically, in the offline phase, after the pre-processing of the offline RSS data, the MIC between different APs is calculated as the corresponding AP correlation, based on which the fuzzy clustering algorithm is used to select non-redundant APs for positioning. After that, the AP information gain ratio is obtained from the offline RSS features to calculate the RSS feature fuzzy weight. Then, in the online phase, the non-redundant APs with the high location resolution are selected for target position estimation based on the Bayesian positioning algorithm. Experimental results demonstrate that the proposed method can not only eliminate the redundant APs with the high substitutability to reduce the positioning overhead, but also select APs with the high location resolution for positioning to guarantee the high positioning accuracy. However, how to achieve the effective and efficient AP optimization in a more complicated and dynamic indoor environment will be an interesting work in the future. QIAOLIN PU received the B.S. and M.S. degrees in information and communication engineering from the Chongqing University of Posts and Telecommunications (CQUPT). She is currently pursuing the Ph.D. degree in communication and information engineering with the Hong Kong Baptist University (HKBU). Her research interests include indoor localization, location privacy, and network optimization. VOLUME 8, 2020