Fusion-Based Smartphone Positioning Using Unsupervised Calibration of Crowdsourced Wi-Fi FTM

This paper presents a multi-source fusion smartphone localization solution using Wi-Fi Fine Time Measurement (FTM) and Pedestrian Dead Reckoning (PDR), calibrated via multi-source and unsupervised crowdsourcing. In crowdsourcing phase, user movement within the site utilizes PDR to infer their location, and this location is used to calibrate the FTM data. The multi-layer perceptron (MLP) of the ranging model is suitable for non-line-of-sight (NLOS) reception, and the ranging accuracy is improved by more than 24%. In the positioning phase, the 90 percentile error of the ranging model trained using only crowdsourced data is less than 1.37m, which is 32% smaller than the traditional weighted least squares (WLS) localization error.

can be measured. FTM achieves a time resolution of a few 28 The associate editor coordinating the review of this manuscript and approving it for publication was Kegen Yu . nanoseconds, so sub-meter level ranging accuracy is possible. 29 However, Wi-Fi FTM encounters estimation challenges if 30 there is device offset or non-line-of-sight (NLOS) reception 31 [6]. All devices with a FTM infrastructure (presumably in 32 the FTMR mode) must be pre-calibrated and anchored at 33 stationary locations or ground truths, so they are affected by 34 environmental disturbances. A previous study [7] calibrated 35 FTM ranging but the position of the FTMRs' positions must 36 be known in advance. FTMR devices are becoming more 37 common so frequent calibration and coordinate measurement 38 is a major impediment to the deployment of an FTM-based 39 indoor positioning system. 40 Crowdsourcing [8] allows accurate calibration and mea-41 surement because it takes advantage of the wide availability 42 of mobile devices or smartphones as FTMI. Each smartphone 43 user contributes to the collection and analysis of perceived 44 signal information, particularly the FTM readings. Unlike 45 many recent crowdsourcing-based studies of fingerprinting 46 [9] or received signal strength indication (RSSI) ranging [10], 47 the proposed method reverse infers the approximate locations 48 of the stationary FTMR infrastructure using crowdsourced 49 MIMO antenna array to measure the azimuthal bearing angle-104 associated to the Direction-of-Arrival (AoA). Another recent 105 study [7] proposed a calibration-free positioning system for 106 which the ranging characteristic is trained and predicted using 107 a neural network. However, FTMR coordinates are required 108 before the system is deployed. 109 In DeepNar [19] the positioning is estimated from Wi-Fi 110 FTM RTT fingerprint through a fully connected neural 111 network, yielding sub-meter (0.75m) localization precision. 112 In [20] a deep long short term memory (LSTM) neural net-113 work is applied to encode temporal dependencies upon RSSI 114 fingerprint towards positioning, yielding meter-level (1.5m) 115 localization precision. In [21], an autoencoder is applied to 116 extract the representative features of RSSI fingerprints as 117 a sequence of latent code, which is then processed by an 118 LSTM network for positioning. An extensive survey of indoor 119 positioning based on Wi-Fi and machine learning can be 120 found in [22]. 121 In view of most works are under supervised learning 122 scheme, Zou et al. [23] proposed WiGAN to synthesize the 123 Wi-Fi radio map ground truth, which is inevitably needed 124 for fingerprinting based indoor positioning system formu-125 lated under supervised learning scheme. More elaborately, 126 WiGAN synthesizes the entire radio map in a constrained 127 space (e.g. personal offices) from RSSI measurements at 128 several locations, through a combination of Gaussian process 129 regression (GPR) and conditioned least-squares generative 130 adversarial networks (LSGAN). The GPR provides a coarse 131 estimate of the entire radio map from RSSI measurements 132 at several locations, and is learnt with RSSI measurements 133 and initiator location data collected with mobile robots and 134 LiDAR SLAM in free space (e.g. conference rooms). The 135 coarsely-estimated radio map is then adopted as the input 136 of LSGAN generator to synthesize more realistic radio map, 137 with the LSGAN trained with RSSI measurements collected 138 in free space. 139 Given the large number of indoor localization works for-140 mulated under a supervised learning scheme, in this work, 141 FTM data are collected without providing measured coordi-142 nates, where the distance between the smartphone and the 143 Wi-Fi FTMR, the coordinates of the Wi-Fi FTMRs, and 144 NLOS errors are all estimated using a multilayer percep-145 tron (MLP) under an unsupervised learning formulation with 146 FTM measurements as input. 148 The proposed fusion-based localization system uses PDR 149 and FTM as data sources to locate the user's smartphone. 150 The PDR information collected by the smartphone is used 151 to capture the characteristics of the user's local behavior, and 152 the FTM data can be used as the ranging estimation of the 153 system ranging model to provide the user's global location 154 information. Such a fusion positioning system can accurately 155 detect the user's steps, while preventing the accumulated error 156 of the PDR position through the FTM ranging information, 157 thereby maintaining the overall positioning accuracy. As shown in Fig. 1, the system obtains the user's motion 159 information from the smartphone's IMU (including gyro-160 scope, magnetometer, and accelerometer) and Wi-Fi module.

161
Using the PDR algorithm, the user's stride and direction 162 information can be obtained from the data of the IMU sensor 163 through the step detection and heading estimation algorithms.

164
The calculated stride and heading information are used to

176
The localization error of lidar on the robotic system is 1-5 cm, 177 which is an order of magnitude smaller than the proposed 178 method. Therefore, the rest of this article ignores floor plan 179 errors.

180
The proposed fusion and learning based smartphone local- After multiple trainings until the loss function saturates, the 197 ranging model can give more reliable ranging results than raw 198 FTM packets in different environments, even in the presence 199 of NLOS paths. In addition, the ranging model also provides 200 the confidence of the distance information, presented in the 201 form of standard deviation. 202 Finally, in the positioning phase, PDR updates the user's 203 position through the stepping behavior, and the FTM ranging 204 model gives the predicted ranging result and the correspond-205 ing FTMR position, so the PF can update the particle weight 206 of each object according to the probability. PDR can quickly 207 determine the user's location by detecting the steps to obtain 208 short-term local features such as turning or starting to walk. 209 And through the FTM ranging information, users can avoid 210 drift in the long-term positioning process and keep the posi-211 tioning results accurate and stable.

214
The IMU has the advantages of good short-term accuracy, 215 unaffected by the external environment, and good stability. 216 Therefore, the device has been widely used in smartphones 217 for locating and tracking the user's movement or operation 218 behavior. Pedestrian inertial navigation is usually based on 219 the PDR algorithm, which is independent of the integration 220 of acceleration values and can greatly reduce the cumula-221 tive error caused by integration. Using the periodicity of 222 the acceleration waveform and features related to walking 223 speed, we can estimate the step size of pedestrian motion. 224 In addition, due to the randomness of the pedestrian's holding 225 method, the real-time attitude angle of the smartphone is 226 obtained by the integration of gyroscope or the combination 227 of magnetometer and accelerometer.

228
In most mobile phones, the IMU uses a gyroscope for rela-229 tive orientation and a magnetometer for absolute orientation. 230 But in an indoor environment, the presence of metal or other 231 magnetic materials near the mobile phone can interfere with 232 the phone's ability to identify magnetic north, so the magne-233 tometer measurement output is unstable. Therefore, this study 234 only uses quaternions based on gyroscope measurements to 235 estimate heading.

236
The quaternion-based rigid body kinematic equations are: 237 where the quaternion Q = q 0 + q 1 i + q 2 j + q 3 k and ω x , ω y , 239 ω z is the attitude angular velocity from the gyroscope in the 240 sensor frame. The relationship between the attitude rotation 241 matrix and the quaternion means that the rotation matrix can 242 be calculated as: (2) 245 96262 VOLUME 10, 2022 Walking involves many complex processes, such as step-252 ping on the ground and raising legs. A sensor that is attached 253 to the foot allows measurement of the step length using the 254 swing of the foot [24]. For simplicity, this study uses the peak 255 detection method to detect the step event and calculates the 256 step length [25].

257
The original data is normalized and gravity is isolated to 258 obtain the pedestrian acceleration: where a x , a y and a z are the raw outputs from the smartphone's 261 accelerometer and g 0 is the local gravitational acceleration.

262
The built-in sensors of smartphones are usually inex-263 pensive and often generate unwanted noise so false peaks 264 often occur. Normalized data passed through a low-pass fil-265 ter with a moving average filter to remove high frequency 266 perturbations: where a t step is the filtered value and N is the length of the 277 When a step is detected, the corresponding step length is 278 calculated as [26]: where SL is the estimated step length in meters, H is the 281 pedestrian height, and a, b, and c are the model parameters, 282 which are a= 0.371, b= 0.227, and c = 1.

284
A Bayesian approach is used to determine the posterior 285 probability function of the system state. If a sequence of 286 observations are available at timestamp k, then the updated 287 prior probability at current state Z k is calculated recursively 288 as: The Monte Carlo (MC) method gives a sub-optimal estima-293 tion through approximation of (9). The posterior probability 294 is expressed by determining a set of random MC samples in 295 the state space, which is approximated by [14]: where w i k and X i k are the weight and the state of the i-th 298 sample, respectively; and δ (·) is the Dirac delta function.

299
According to the PDR mechanization, the state transition 300 model of the PF is derived as: where (x i k , y i k ) is the updated state vector, (x i k−1 , y i k−1 ) is 303 the previous estimation, SL k and θ k are the step length and 304 detected heading at state k and δ SL and δ θ are the respective 305 uncertainty. In practice, the user's stride and heading change 306 measurements will have errors every time, and if we fully 307 trust the measurement results, we will mistakenly kill the 308 particles that represent the user's likely position. Therefore, 309 we add a random uniform error in the range of ±10% to the 310 stride length and ±20 degrees of the heading as δ SL and δ θ 311 respectively to spread the particles over a wider range, giving 312 the true location a better chance of being included [14].

313
After each update, particles are tested to determine whether 314 they violate any wall constraints as: Finally, after each step the filter normalizes the weight of 317 particles as: The state transition model for PF has been constructed 320 based on PDR rules.  At a specific step k on a path with a total K steps, the 331 position of the user is defined as the weighted average of 332 particles: where X k = (x k , y k ) and the weighted variance of particles 335 is: In general, the positions of the particles in PF converge as where the user is. For at least one step, the variance in the 358 particles must be lower than the threshold s 2 thresh to allow 359 this path to be saved as valid data. In case that the particle 360 degeneracy is detected, a mechanism is setup to reset the PF.

361
If a set of steps K f has the largest number of steps k f in K ,

362
where K is a complete set of all steps of a crowdsourced data 363 provider, such that the variance of the particles satisfies: 365 where k f ∈ K f . Particles in the last step of K f are inherited 366 by the backward process, and the steps not included in K f are 367 discarded. If none of the steps in K satisfies (16), the path 368 record will be discarded and the subsequent process will not 369 be continued.
In the backward process, the PF update is performed again 372 using the converged particles inherited from step K f of the 373 forward process, but the directionθ i k is opposite to the forward 374 process, soθ i k = θ i k + π. The backward process runs from 375 step k f to step 1 in the set of K f , and obtains the smallest 376 step k b satisfying (16)   After the backward process, the PF state of the confidence 381 step within a threshold can infer the user's location. However, 382 since these inferred locations will be used to calibrate the 383 FTMR, we would like to have more precise crowdsourced 384 data provider locations to keep the FTMR calibration process 385 accurate. So the replay process inherits the particle at step k b 386 in the backward process, and then runs the stepwise process 387 again in the forward motion, from k b to k f . This gives position 388 information for two similar paths, from the backward pro-389 cess and from the replay process, with the same step range 390 [k b , k f ] for both. Finally, the two paths are combined using 391 inverse variance weighting [27] to provide accurate location 392 information.

393
The path coordinates that are eventually used as coordinate 394 pseudo-labels are a mix of the backward path and the replay 395 path. The mean and variance for the path coordinates are 396 calculated as:  where R is the MLP model, and x is a batch of FTM packets 469 is the 470 combination of all data from FTM packets of the j-th FTMR 471 that represents distance, standard deviation and RSSI, respec-472 tively; and is the training parameters for the model [11]. 473 If the packet status fails or the request times out, all items of 474 its packet will be set to zero.

475
The input for this model is the time series data of T 476 consecutive FTM samples, which not only eliminates the 477 randomness in the measurement noise but also allows the 478 model to learn the time series relationship between the data. 479 As shown in Fig. 3 whered o is the offset-compensated distance,φ i is the offset 495 variable for the i-th FTMR of a site with J FTMRs, and its 496 multiplication with Kronecker delta δ ij means that onlyφ j 497 is added to d FTM j . The rectified linear unit (ReLU) function 498 ReLU (x) = max(0, x) ensures that the output is always 499 positive because the distance between two nodes is positive. 500

501
When a NLOS condition occurs, from a timing perspective, 502 the signal is only slowed down a bit as it penetrates the wall, 503 which corresponds to a slightly larger RTT distance; but from 504 VOLUME 10, 2022 an energy perspective, the signal is weakened a lot, i.e., the 505 RSSI value dropped a lot. This allows the ratio of RTT range 506 to RSSI to roughly see the presence of NLOS and to correct 507 distance estimates. In [31], a normal probability distribution 508 is used to estimate whether the measured distance is LOS, 509 while in this study MLP is used to estimate the impact of 510 NLOS and predict its correction.

511
On the right side of Fig. 3, the NLOS estimation module is 512 a MLP that predicts the distance at which the offset distance 513 must be corrected and the standard deviation for this sample. distanced j (n) and standard deviationŝ j for the j-th FTMR at 557 z j . 558 We optimize four objectives during the training process of 559 the ranging model: The distance loss is defined as the difference between the true 562 distance z (n) − z j from the received location to the true 563 FTMR coordinates and the predicted distanced j (n) using the 564 ranging model: where the weight w (n) = 1/s 2 k and I (n) = 1 if the true 567 coordinates for j-th FTMR are known and 0 otherwise. The 568 weight w (n) is the reciprocal of the variance computed using 569 (18), since location labels with higher confidence need to 570 be given higher weights. By minimizing this loss term, the 571 ranging model can be updated to more accurately predict the 572 distanced j (n).

574
For FTMR without measured coordinates, a similar loss term 575 called geometric loss is defined as: whereẑ j is the inferred coordinates of the j-th FTMR. Its 578 initial value is set to (0, 0, 0) for all j, and is continuously 579 updated as the loss is minimized. When these two loss terms 580 are minimized simultaneously, the ranging model minimizes 581 the distance loss to obtain a more accurate predicted distance; 582 consequently, when the geometric loss is minimized, the 583 better ranging model is used to allow FTMR for unknown 584 coordinates to converge to more likely coordinates.

586
The ranging model learns how to predict distances more 587 accurately, but the confidence level of such predictions is 588 unknown. Therefore, the corresponding standard deviation is 589 predicted through the distance model at the same time, and 590 the error between the expected distance and the real distance 591 is within the normal distribution range of the standard devi-592 ation. The loss function associated with standard deviation 593 prediction is defined as: where L q 1 and L q 2 represent the quantile losses for (q 1 , q 2 ) = 597 (2.5%, 97.5%). The quantile loss L q is defined as [33]: where q is the quantile level to be predicted. Quantile loss 600 contains an asymmetric feature that compensates for the 601 imbalance of numbers separated by quantile values. There-602 fore, those predictions that deviate from the assumed distance z (n) −ẑ j by more than two standard deviations of predic-604 tionsŝ j (n) will be penalized by a larger loss in terms of the 605 variance loss L var . In this case, with a 95% confidence inter-606 val, the true distance will be within two predicted standard 607 deviations of the predicted distance.  FTM requests to nearby FTMRs. Whenever the smartphone 645 receives an FTM packet, the ranging model will start to pre-646 dict the more likely distance and standard deviation between 647 the user and the FTMR. Note that although the FTM packet 648 window of the MLP model is T , the model can still provide 649 distance estimates when the number of packets is less than 650 T . The ranging model in (21) will predict a distanced and 651 its standard deviationŝ. Assuming that the mobile phone 652 receives J ranging information from FTMRs at the same time 653 step k, the system can update the weight of the i-th particle in 654 the PF according to the Gaussian distribution as [14]:  At the same time, the IMU sensor is also detecting the 663 user's stepping behavior. If a step is detected, the user's 664 location will be updated as in (11). After the particle weights 665 are updated, the user's location can be obtained using weight 666 averaging as in (14). The weight update process is callback-667 driven, so whenever a step or a FTM packet is received, the 668 user's position is refreshed immediately. The proposed method is tested in an indoor office environ-672 ment with dimensions of 62.7 × 24.5 m 2 . There are 7 FTMRs 673 at this test site. The installed FTMR is powered by a Qual-674 comm IPQ8065 chipset, which is configured to support FTM. 675 FTMR can support both 2.4GHz and 5GHz frequency bands, 676 but for simplicity and accuracy, only the 5GHz frequency 677 VOLUME 10, 2022  by particles with heavier weights. As the particles gradually 706 converge to a variance less than the threshold variance, the 707 path remains in the database, and the converged particles are 708 inherited by the backward process.

709
In the backward process, the particles converge in most 710 steps, but at some locations the variance increases. The 711 backward path in Fig. 5(b) shows some trajectories of 712 through-wall paths or strange turns, so the observed path 713 is unlikely to be the user's actual path. While the playback 714 shows a different path than the reverse path, some of the 715 estimated positions are still unrealistic. Finally, two backward 716 and replay paths describing different directional information 717 are combined using (17) to give the mixture path. The results 718 fit the user's trajectory better than the other two paths, thus 719 yielding better location labels to calibrate the FTM ranging 720 model.  Table 1 shows the mean distance error for different ranging 728 models that are trained using three different conditions:  The distance error after training is shown in Fig. 6. The mean 738 distance error in the offset case is the largest, as it is signifi-739 cantly affected by NLOS. The inferred case only uses the true 740 coordinates of FTMR 1, but training the model still provides 741 reliable distance predictions. For the inferred case and the 742 real case, NLOS has less influence on the ranging results, 743 indicating that the NLOS estimation module does suppress 744 the influence of NLOS. The inferred case and the true case 745 reduce the ranging error by 24% and 35%, respectively, which 746 is better than the offset case. 747 FIGURE 6. Box plot of the range error of different FTMRs relative to different ranging models. The overall distance error is also compared in the last. Only the coordinates of FTMR 1 are provided for inferred ranging model. In conclusion, the standard deviation prediction of the NLOS 779 estimation module can make subsequent PF updates more 780 normally distributed.

781
The proposed system not only trains the ranging model, 782 but also obtains the location of the unknown FTMR during 783 optimization. The true and inferred coordinate values are 784 shown in Table 2, and the relative 2D positions are shown 785 in Fig. 8. The coordinates of FTMR 1 are provided for the 786 inferred case, so positioning errors are ignored.

787
Using the labels of the user's measurement location in 788 a 2D plane, the final inferred coordinates of the remaining 789 FTMR cannot deviate from this plane, as the offset provides 790 another degree of freedom that can be optimized. Therefore, 791 the inferred ranging model can only predict the projected 792 coordinates of the FTMR on the measurement plane. The loss 793 of height information is reflected in the offset, so the offset 794 of the inferred ranging model is smaller than that of the true 795 ranging model.    PF-True shows good local and global localization character-832 istics.
833 Table 3 compares the positioning errors. The proposed 834 algorithm uses PF-True and PF-Inferred, while PF-Offset 835 denoted the results by the algorithm in a previous study [14]. 836 The positioning error for the PF for the proposed ranging 837 model is less than 1.4m, and the best PF-True algorithm has 838 a positioning error of 1.15m at the 90 th percentile. PF-True 839 and PF-Inferred give a 43% and 32%, respectively, smaller 840 error than WLS. Compared with the same PF-based algorithm 841 PF-Offset, PF-True and PF-Inferred are 24% and 9.2% more 842 accurate, respectively.

844
This study proposes a fusion-based smartphone localization 845 system using unsupervised calibration of crowdsourced wi-fi 846 FTM data. During the crowdsourcing phase, users collect 847 IMU and FTM data as they walk around the test site. PDR 848 and PF are used for position markers to later provide pseudo 849 markers for FTM ranging model calibration. During the 850 process of optimizing the Wi-Fi FTM ranging model, the 851 unknown FTMR coordinates can also converge to a near-true 852 position at the same time. Finally, the average error of the 853 trained ranging model is less than 1.88m, which is more than 854 24% better than the distance error provided by the original 855 FTM, and the two-dimensional average error of the predicted 856 FTMR coordinates is 1.37m. The ranging model also predicts 857 the error standard deviation of the distance. The model has a 858 probability of more than 85.0% to make the distance error fall 859 within two standard deviations (ideally 95%), indicating that 860 the overall error of the predicted distance is close to a normal 861 distribution.

862
Finally, in the positioning phase, the calibrated FTM rang-863 ing model and PF are used for the multi-source fusion local-864 ization method. Compared with the traditional WLS, the 865 90% localization error of PF-True is reduced by 43%, and 866 the PF-Inferred is reduced by 32%. Compared with general 867 PF-based methods, the localization errors are reduced by 868 24% and 9.2%, respectively. The proposed model gives more 869 accurate results if there is NLOS reception, so it is equally 870 applicable to other ranging based protocols (such as UWB) 871 as it stabilizes the ranging quality for NLOS scenarios