A Survey on Fusion-Based Indoor Positioning

Demands for indoor positioning based services (IPS) in commercial and military fields have spurred many positioning systems and techniques. Complex electromagnetic environments (CEEs) may, however, degenerate the accuracy and robustness of some existing single systems and techniques. To overcome this drawback, fusion-based positioning of multiple systems and/or techniques have been proposed to revamp the positioning performance in CEEs. In this paper, we survey the fusion-based indoor positioning techniques and systems from seminal works to elicit the state of the art within our proposed unified fusion-based positioning framework, which consists of three fusion characteristics: source, algorithm, and weight spaces. Different from other surveys, this survey summarizes and analyzes the existing fusion-based positioning systems and techniques from three characteristics. Meanwhile, discussions in terms of lessons, challenges, and countermeasures are also presented. This survey is invaluable for researchers to acquire a clear concept of indoor fusion-based positioning systems and techniques and also to gain insights from this survey to further develop other advanced fusion-based positioning systems and techniques in the future.


Fig. 1. Embedded sensors in smartphones.
positioning system (GPS) offer high accuracy in outdoor scenarios, the poor connectivity between satellites and end devices render them ineffective indoor, thus triggering further research on indoor positioning [6].
With the rapid growth and ubiquitous nature of sensors, different kinds of sensors such as inertial sensors, magnetic sensors, etc., are integrated into user equipments (UEs), such as smartphones, as shown in Fig. 1. These sensors can measure different information to yield a better location estimate. Generally, positioning or tracking based on a single measurement will aggravate the tracking/positioning performance. For example, inertial navigation system (INS) can achieve higher localization accuracy, but is limited by accumulated errors caused by sensor noise [7], [8]. Geomagnetic signals are omnipresent but lack local distinctiveness. Hence, fusion of multiple measurements from different sensors is becoming indispensable in order to improve the positioning performance.
Furthermore, UEs are generally surrounded by other wireless communcation networks, such as GPS networks, cellular mobile networks, wireless local area networks (WLANs), LiFi networks, and other broadcast networks as depicted in Fig. 2. All these networks can offer different levels of location estimates from their own perspectives. It is worth noting that the electromagnetic environment of UEs is more complex due to the metamorphic nature of indoor environments. Other intentional or unintentional interferences also make the environment complex. Hence, the localization and tracking of UEs in complex electromagnetic environments (CEEs) is a challenging task in various civil and military applications. Although different measurements such as time-of-arrival (TOA) [9]- [12], time-difference-of-arrival (TDOA) [13]- [16], angle-of-arrival (AOA) [17]- [20], and received signal strength (RSS) [21]- [25], have been proposed to enhance the robustness and accuracy of indoor positioning. However, the positioning techniques based on single sensor measurements or networks show inherent drawbacks in positioning accuracy. Therefore  (FBIP) has become more prominent in recent years [26]- [35]. For example, Chen et al. [36] curbed the fluctuation of WiFi signals by means of continuous tracking provided by INS, while the cumulative tracking errors are adjusted by the WiFi-based system.
It is proven that FBIP techniques can efficiently improve the positioning performance by combining the complementarity among positioning systems and techniques [27], [30], [37], [38]. The existing FBIP system can be divided into three main functional parts: sources, algorithms, and fusion weights, as shown in Fig. 3. The sources refer to the information to be fused and algorithms are to obtain the positioning results. The weights are obtained to efficiently amalgamate all the positioning results to yield a better localization estimate.
The aforementioned TOA, AOA, RSS, and TDOA can be one of the sources. These sources can be the same measurements from different networks/sensors or different measurements from the same or different networks/sensors. Most existing works consider the sources from the same network, such as WLAN [35], [39], [40], GSM [41], etc., which can be referred to as the fusion problem in the standalone network [26]- [34]. In a single network, the increase of measurement types or the fusion of different algorithms based on the same type of measurements improves the robustness and stability of the positioning system [42]. However, the single network based positioning system exhibits its own advantages and disadvantages. The cost of this kind of system such as a WiFi-based positioning system is low, but the performance fluctuates due to the nature of WiFi signals. The fusion of the same measurements from different networks [35], [43], [44] and the different measurements from different networks [36], [45]- [51] have also been proposed to enhance the accuracy and robustness of single network based positioning systems. In addition, collaborative localization [20], [43], [52], which acquires multiple measurements from different UEs for localization, can be regarded as a special case of fusion positioning. For example, Abadi et al. [52] demonstrated that pedestrian dead reckoning (PDR) based indoor positioning accuracy can be improved by fusing magnetometer data from devices carried by different pedestrians all walking in the same direction.
The weights are assigned to combine all the outputs of the aforementioned algorithms to yield a more accurate positioning result. In general, weights can be obtained from the offline training with a supervised learning framework [30], [41], [44], and they can also be acquired by unsupervised learning in the online phase [29], [31]. In the FBIP problem, some approaches try to find the optimal one from the candidate positioning results, and these can be regarded as the special case of weights finding, i.e., only one weight is one, and others are zeros [53], [70]- [72].
Although many FBIP methods were proposed during the past decades, surveys focusing on FBIP are very few [73]- [78]. We compare the existing surveys on FBIP in Table I, where we can conclude that most of these articles do not discuss the FBIP problem within a unified fusion framework. Unlike these surveys, we make a comprehensive comparison of works that have been applied to FBIP from the perspectives of sources, algorithms, and weights. From the analysis within the unified framework, readers can better understand the relationship of the existing FBIP works, and hopefully further develop a more accurate fusion method in CEEs.
In summary, this survey will answer the following two questions in FBIP: what to fuse, and how to fuse?
• What to fuse? The complementarity among fusion sources is the main factor in determining the potential of achieving improved fusion result. In other words, only fusion sources with good complementarity can result in better enhancement in terms of positioning accuracy and robustness [30]. This article summarizes commonly used fusion sources in different network frameworks. We will analyze the sources in terms of different measurements from the following three positioning systems: -Homogeneous positioning systems: Homogeneous positioning systems obtain multiple (same or are collectively and effectively fused is the key in improving the performance of fusion-based systems [80]. We will first review major methods used in fusion-based positioning from the following aspects: -Conventional methods: The conventional methods include least squares, maximum likelihood, maximum a posterior, and minimum mean squares error. They can be used in both fingerprint-based and parametric positioning systems for indoor positioning [42], [44], [53]- [56]. -Machine learning methods: They are mainly used to solve the fingerprint-based positioning problem, including k-nearest neighbors, random forests, support vector machine, and neural networks [25], [28], [45], [57]- [64]. Classification and regression are the two main models in machine learning. -State estimate methods: These methods, including hidden Markov model, Kalman filter, extended Kalman filter and particle filter, are generally employed to track the UE [8], [38], [48], [76], [81]- [89]. Based on the outputs of these positioning methods, supervised learning and unsupervised learning are the two main methods to train and obtain the weights for fusion. We analyze them from the following perspectives.
-Supervised learning: It trains and stores the weights in the offline phase based on given training samples and corresponding labels. The obtained weights can be used in the online phase, but they adapt poorly in changing environments [27], [28], [31], [32], [41], [44]. -Unsupervised learning: It obtains the weights in the online phase without prior training. Some existing statistical methods in truth discovery including expectation maximization and conflict resolution on heterogeneous data, are the main tools for unsupervised weights learning. It shows good adaptivity in complex indoor environments [29], [30], [33]. The rest of this paper is organized as follows. In Section II, we propose a unified positioning framework to clarify the paradigms of most of the FBIP works. Then, based on this framework, we will discuss the FBIP works from the source space, algorithm space, and weight space in Section III, Section IV and Section V, respectively. The lessons, challenges and countermeasures in FBIP are covered in Section VI. Finally, some conclusions are drawn in Section VII.

II. A UNIFIED FUSION FRAMEWORK
Here, we propose a unified fusion positioning framework for easy comparisons of state-of-the-art works in FBIP, as shown in Fig. 3; the unified fusion positioning framework is composed of three parts including the source space, algorithm space, and weight space. Mathematically, we can obtain the final location estimationx of the UE as follows [27]: where f n ∈ F is the n-th algorithm for positioning in the algorithm space  and transformed sources, as shown in Fig. 4. Some conventional measurements, such as RSS [47], AOA [17], [19], TOA [9]- [12], TDOA [13]- [16], channel state information (CSI) [90]- [92], and PDR [93], can be the original sources. Other statistics, such as signal strength difference (SSD), power delay doppler profile (PDDP), hyperbolic location fingerprint (HLF), signal strength differences fingerprints (DIFF), delta signal strength (ΔRSS), signal subspace, etc. [32], [43], [94]- [96], which are transformed from the received data, can be regarded as the transformed sources to yield a better positioning result. Sources derived from the same/different networks may result in different positioning performance. In Section III, we will survey the source space of homogeneous, heterogeneous, and hybrid positioning systems to reveal the relationships among state-of-the-art works.
. . , f N } is the algorithm space consisting of algorithms, which provide location estimates when given some relevant sources s m with f n being the n-th positioning algorithm. Existing methods of positioning algorithms can be categorized into probabilistic and deterministic methods [40], [97], [98]. Here, we will categorize the existing positioning algorithms into conventional methods, machine learning methods, and state estimate methods from the FBIP perspective, based on whether the positioning model is static or dynamic. The algorithm space is detailed in Section IV.
The positioning results given by some of the above positioning algorithms can be fused with some weights from the weight space W = {w 11 , . . . , w mn , . . . , w MN }. In general, the positioning results may be single or multiple based on the process of the positioning algorithm. From the fusion positioning viewpoint, the single positioning result can also be regarded as the final fusion result if we set the weight to be one, i.e., w = 1, which is the special case of FBIP. The positioning result obtained by some state estimate method belongs to this case. There are typically two ways to obtain the weights for fusing multiple positioning results offered by multiple sources and multiple algorithms: supervised learning [27], [31], [41], [44] and unsupervised learning [29], [30], [32], which are done in the offline and online phase, respectively. To obtain a better location estimate by using FBIP, the weights learning should obey the principle of ensemble learning [99]- [102]. Hence, we will detail the weight space in Section V. Our proposed unified fusion positioning framework can be applied to most of the existing indoor fusion systems. We will detail them in the following sections.

III. SOURCE SPACE
Considering that the source space may be made up of different measurement technologies in terms of different networks, we first review all the possible measurement technologies and networks used for indoor positioning in existing literature, and then carefully survey these sources from the perspective of homogeneous, heterogeneous, and hybrid positioning systems.
Fusion-based indoor localization systems are classified into the category of homogeneous positioning systems when positioning is obtained in a standalone network; meanwhile, systems realized using single measurement technology and multiple networks belong to the category of heterogeneous positioning systems; hybrid positioning systems employ multiple networks and multiple measurements. Table II shows the distinction among homogeneous, heterogeneous and hybrid positioning systems.

A. Measurement Technologies Used for Indoor Positioning
As indicated earlier, the source space consists of original sources and transformed sources. The original sources are the basic measurements for different kinds of positioning systems, which can be directly extracted from the received signals of the existing networks. We detail them in the following.
The model-based approaches calculate the distance d based on the well-known log-normal path loss model [110] in which P(d) is the path loss measured in dB at distance d, γ is the so called path loss factor, and P (d 0 ) denotes the average path loss at d 0 . n σ is a zero-mean normal random variable reflecting the attenuation in decibel caused by shadowing. Given the estimated d, the location of the UE can be estimated by trilateration methods. However, due to the fluctuation of RSS in indoor environments, it is almost impossible to achieve accurate localization using the model-based methods in CEEs [111].
In fingerprint-based approaches, a set of RSS measurements are collected at the grid points in the indoor environment to construct the fingerprints database in the offline phase. In the online phase, the UE observes RSS measurements at an unknown location and applies algorithms to associate these measurements to the fingerprints database by matching similar fingerprints to estimate the UE's position [27]. Several key problems need to be solved in fingerprint-based indoor localization, including low complexity fingerprint construction techniques (such as crowdsourcing [84], [112], [113]), fingerprint calibration [49], [94], [95], [114], and localization in changing indoor environments [27], [29]- [31].
2) CSI: CSI provides subcarrier-level channel measurements for indoor positioning. It can now be obtained from some commodity WiFi NICs, such as the Intel WiFi link (IWL) 5300 NIC [90], [91]. CSI can reflect the considerable impairments of signal when propagating in indoor environment due to shadowing, multipath propagation, and distortion [92]. CSI is also used by two distinct methods: parametric and nonparametric methods.
PhasePhi [115] is a typical nonparametric method using the transformed phase information of CSI to construct the offline fingerprints for accurate WiFi positioning. A deep network with three hidden layers is used as classifier for location prediction. Experiments in two real indoor scenarios showed that PhasePhi outperforms the CSI amplitude-and RSSbased methods in positioning accuracy. Other nonparametric methods can be found in [116], [117].
The parametric methods calculate the distances or angles between the transmitters and receivers using the phase information extracted from CSI. Based on the estimated distances and angles, trilateration or triangulation can be used to determine the UE's location [118], [119]. SpotFi can achieve decimeter level localization using the AOA information extracted from CSI [19]. In [120], the amplitude information of CSI is extracted to construct fingerprint for narrow-band IoT indoor positioning. Comparatively, CSI is more sensitive to changing environments than RSS, and hence it is more applicable for target detection [121].
3) TOA: Time of arrival (TOA) is the absolute travel time of a signal from a reference node to a UE [9]. Different from time-of-flight (TOF), TOA requires stricter clock synchronization but lower energy cost, and is thus more suitable for real-time positioning [122].
Let x = [x , y, z ] T be the unknown position of the UE, and x i = [x i , y i , z i ] T be the known coordinates of the i-th reference node (access point, base station, beacon, etc.), i = 1, 2, . . . , L, where L ≥ 3 is the number of reference nodes. The distance d i between the UE and the i-th reference node can be expressed as Without loss of generality, we assume that the UE emits a signal at time 0 and the i-th node receives it at time t i ; here, t i is the TOA. It is absolute that the UE is located on the circle  centered at the i-th base station with a radius d i = c · t i in the noise free case, where c is the speed of light. Similarly, the UE is located on the circles of the second and third base stations in the same way. Hence, the position of the UE can be given by the intersection of three circles in noise free cases [73], [123], as shown in Fig. 5. Chan and Ho [124] proposed the well-known Chan's algorithm for TOA location, which can yield a closed-form solution; other methods including least square (LS) [125]- [127], weighted least square (WLS) [128], [129], constrained least square (CWLS) [130] were proposed to localize the UE in LOS and NLOS environment. 4) TDOA: Through the TDOA measurements, the location of the UE can be obtained by the intersection of two hyperbolic curves [131], as shown in Fig. 6. The distance difference for the reference nodes A and B is defined as d 1,2 : The position of the UE can be determined by the intersection of hyperboloids when the third reference node C and other more reference nodes are taken into account [132], [133]. The hyperbolic TDOA equation can be solved through nonlinear regression and iterative algorithm by virtue of Taylor-series expansion [134]. Apart from the well known LS method [135] in TDOA localization, semidefinite relaxation (SDR) [136] and semidefinite program (SDP) [137] based methods were also proposed to address the nonconvex problem of the TDOA localization. In [138], a robust TDOA localization by minimizing the worst-case position estimation error was proposed to improve the special cases of the TDOA localization problem.
In TOA/TDOA localization, some methods assume that the signal travels from the UE to the reference node in a Line of Sight (LOS) environment. However, this assumption does not hold in complex indoor environments due to the multipath and non-line-of-sight propagation. Hence, some NLOS identification and mitigation methods [9], [71], [125], [126] were proposed to improve the accuracy of TOA/TDOA based localization. 5) AOA: Angle-of-arrival (AOA) has important applications in array signal processing [139]. As compared to TOA and TDOA estimation techniques, AOA estimation needs the implementation of antenna arrays. However, only two base nodes (equipped with antenna arrays) are sufficient to maintain full localization of a UE. This adds a higher flexibility to AOA estimation techniques as compared to TOA or RSSI estimation methods.
In summary, owing to the obstacle and multipath propagation of signals, the conventional outdoor AOA-based localization methods degenerate seriously in CEEs. To overcome these drawbacks, some works were proposed to use OFDM signals to overcome the influence of multipath propagation [19], [118]. The measurement from CSI or CFR can be used by some classical DOA estimation algorithms, such as MUSIC, ESPRIT, etc. 6) PDR: PDR (pedestrian dead reckoning) provides the direction and distance obtained from inertial sensors of UEs. The current position estimate is calculated based on previously known position estimate. Basically, positioning using PDR needs the step length and walking direction to track the UE [140]. Other parameters, such as initial point and map, can also affect the positioning accuracy [141]. PDR based positioning methods can work well in a short moving distance. However, its performance may degenerate because the accumulated errors will be enlarged as the walking distance increases. Recent research tends to fuse WiFi-based and PDR techniques together to achieve better localization accuracy [39], [142]- [145].
Three main different ways to mitigate the accumulated errors are 1) to enhance the PDR algorithm itself, 2) to combine PDR with other sensors, such as Wi-Fi, RFID, etc. [146]- [148], and 3) to integrate the information on the map in the matching algorithm to improve the positioning accuracy and stability [149], [150].
7) Transformed Sources: In addition to the original sources mentioned above, sources transformed from the received data of one or multiple antennas, such as signal strength difference (SSD) [108], [109], power delay doppler profile (PDDP) [96], hyperbolic location fingerprint (HLF) [94], and fourth order cumunlant (FoC) [32] can also be used to yield superior positioning even in complex environments. These transformed sources can depict the sources from different perspectives and can overcome device heterogeneity, RSS fluctuation, and NLOS propagation, by extracting other statistics from the received signals. Most of them are proven to be efficient in complex indoor positioning environments [32], [94], [109], [151], and are summarized in Table III for comparison purposes.

B. Networks Used for Indoor Positioning
Indoor localization systems can be categorized based on various transmission signals harnessed for positioning. Positioning systems include but are not limited to WLANbased positioning system, geomagnetism-based positioning system, UWB-based positioning system, and RFID-based positioning system. This section provides an overview of different networks used for indoor positioning. The means for localization, the performance, positioning technique, as well as the advantages and disadvantages of different positioning systems are delineated in Table IV. 1) WLAN-Based Positioning System: Owing to the ubiquity of 802.11 WiFi networks in indoor environments, many solutions utilizing WLAN signals have been proposed to provision indoor positioning. There are two general methods of WLAN localization: fingerprinting [27]- [30], [155]- [160] and trilateration [106], [161]. Trilateration methods cannot achieve results as accurate as fingerprinting methods due to shadowing, multipath and numerous obstacles in indoor environments [161], and are thus seldomly used for WLAN-based indoor positioning.
Fingerprint-based localization is a prominent technique with the following three key advantages: 1) it can mitigate the laborious process of fingerprint collection by using crowdsourcing [112], [113], 2) fingerprint calibration [49], [95] can make brimful use of the fingerprints collected at different time periods and from different hardwares, and 3) positioning algorithms with good performances [32], both in accuracy and robustness in complex indoor environments, can be readily formulated.
2) Geomagnetic-Based Positioning System: Geomagnetic field can easily differentiate the spatial variation in complex indoor environments. However, geomagnetic field is greatly affected by electrical equipment and metallic structures inside the walls of modern buildings. The signal is also ubiquitous and temporally stable [162]; different hardwares may yield different magnetic readings at the same location [26], as shown in Fig. 7. Hence, calibration is an important step in using geomagnetism. The positioning accuracy of some geomagnetic based systems such as Indoor Atlas is between 1-2m [163].
Considering that geomagnetic field is highly affected by H-beam building than reinforced concrete building, Song et al. [164] proposed a geomagnetic-based indoor localization approach by leveraging the dependence of RSS on the type of building materials. Experiments demonstrated that their proposed method not only efficiently reduced database generation costs but also was faster.
Jang et al. [162] proposed a novel geomagnetic-based indoor localization system by using artificial neural network models. Their idea is to construct a sequence of geomagnetic fingerprints as the UE moves instead of using the ambiguous geomagnetic values. A recurrent neural network is then trained by the sequence of geomagnetic fingerprints. The experimental results showed that an average positioning error of 1.062 meters can be obtained. Other related works can be found in [165]- [167].
3) UWB-Based Positioning System: Ultra-Wideband (UWB) transmits data by ultra-narrow pulses in the time scale of nanoseconds. Their accuracy can reach centimeter-level by virtue of leveraging its high bandwidth. Therefore, UWB systems incur high power and hardware requirements [168]. Meanwhile, direct and first path identification problems are incurred by fading, especially in dense or NLOS indoor scenarios [169]. Existing UWB-based positioning methods in NLOS environments need to compensate for the range error by NLOS identification and mitigation, but they make some assumptions or require prior knowledge of the positioning scenario. To overcome the drawback, Yu et al. [122] proposed a less scenario-dependent and a priori knowledge-independent NLOS identification and mitigation method for positioning in harsh indoor environments. The RMS of absolute range errors after NLOS mitigation was reduced from the original 1.3 meter to 0.651 meter in their real office environment.
Recently, UWB was integrated with INS for accurate localization that may potentially spur various applications. Fan et al. [170] proposed an INS/UWB positioning method by using Kalman filter and outliers eliminating techniques. Experimental results showed that the mean square error is reduced by 24.25% as compared with the conventional KF methods. Other related UWB/INS positioning proposals can be found in [171].

4) Inertial Navigation Systems (INS):
Inertial Navigation System (INS) is an independent system, and its core components, Inertial Measurement Units (IMUs), consist of three orthogonal uniaxial accelerometers and three orthogonal gyros, in provisioning position, velocity, and pose measurements [82]. Three orthogonal linear accelerations are continuously measured by the triaxial accelerometer, and three orthogonal angular velocities are monitored by three gyroscopes sensors in the inertial reference frame [36]. Recent advances in electromechanical technology have enabled miniaturization of senors and cost reduction, thus popularizing INS. The most popular application of INS is tracking a UE. At each detected step of a user, real-time estimation of displacements from measurements of IMUs will be added to previously estimated position to determine the current position. Wrong estimation of the previously estimated position will result in the accumulation of errors [7], [8]. Kalman filtering [172] and particle filtering [86], [87], [173] are the main techniques involved in tracking UEs by utilizing their IMU measurements. Details of these techniques are provided in Section IV-C. 5) RFID-Based Positioning System: Radio frequency identification (RFID) positioning systems usually realize localization by writing, storing and reading information in electronic tag embedded in positioning targets. RFID systems can be classified into active and passive systems, depending on whether the electronic tag has its own energy source. The propagation of an active RFID signal can reach 30m, longer than a passive one [174]. The fingerprinting position method can be used for active RFID based on the RSSI measurement [146]. Passive RFID positioning systems depending on inductive coupling usually use the proximity detection method to achieve positioning. LANDMARC is the typical representative system based on RFID [175].
Aldin et al. [174] proposed a boundary virtual reference label algorithm to improve the positioning accuracy by inserting many virtual reference tags on the boundary. They built a linear regression model to eliminate the unwanted tag information from the estimated results. Their simulations showed that the positioning accuracy significantly increased. Readers are referred to other RFID-based indoor localization methods [146], [174], [176]- [178] for further details.
6) Cellular Network-Based Positioning System: As a matured communication technology, cellular networks can be used to locate mobile phones [78].
Ye et al. [179] first extracted the channel parameters from long-term-evolution (LTE) down-link signals by using a feature extraction method. Then, the radio channel fingerprints were collected for a feed forward neural network training. Based on the trained neural network, the location of UEs can be predicted when inputting the online testing signals. The experimental results showed that, the proposed method can obtain a median error distance of 6 and 75 meters in indoor and outdoor environments respectively, by using only one LTE eNodeB.
Fang and Lin [43] proposed a cooperative network positioning framework by combining cellular networks with other networks, such as WLAN, FM, and DVB. Based on the collected RSS measurements from different networks, they proposed two robust fusion positioning methods to improve the drawbacks of standalone networks. Experimental results conducted at Yuanzhi University proved the efficiency of their proposed methods. 7) ZigBee-Based Positioning System: ZigBee is a lowpower and short-range communication protocol based on the IEEE 802.15.4 standard. Positioning methods based on ZigBee perform quite well with the reported accuracy reaching of 2.1m [180].
Li et al. [181] proposed an enhanced fingerprint based positioning method using ZigBee networks. The proposed method tries to fuse differential time difference of arrival and RSS by using random forests, and was validated by using a software defined radio (SDR) platform. The real data testing results showed that the proposed system achieves a 36.1% improvement in positioning accuracy as compared with some traditional RSS-based methods.
Fang et al. [42] proposed a ZigBee-based ensemble learning localization framework for indoor environments. As compared with the conventional methods, such as gradient-based search [182], multidimensional scaling (MDS) [183], and least squares (LS) [184], the proposed method achieves high accuracy by only using RSSI of ZigBee networks. 8) Bluetooth-Based Positioning System: Bluetooth positioning systems are characterized with close range, low power, and low cost [185]- [187]. Bluetooth-based indoor positioning can provide the accuracy within 1m [188], [189].
In the past few years, Bluetooth-based indoor localization had been extensively studied for IoT, mostly based on fingerprint-based techniques. Sikeridis et al. [190] proposed an unsupervised crowd-assisted learning enabling locationaware facility by using RSS of Bluetooth low energy beacons. They designed a three-layer location-aware infrastructure from which many moving clients can provide data from a sensing layer. The positioning algorithms are implemented in a cloudbased decision system. The proposed system has been verified to be efficient both in mobility tracking and UE localization. Other Bluetooth-based indoor positioning works can be found in [21], [191]. 9) Visible Light Positioning System: Visible light positioning (VLP) has attracted much attention because illuminating systems using LED can be deployed in any building and they can be used to build indoor positioning systems without extra cost [10], [192]- [195].
Note that RSS [196], AOA [195], [197], TOA [10], and TDOA [16], [133] can also be used in VLP systems. Luxapose [198] is a well-known indoor positioning system using LED luminaires and camera. This system can determine the location and orientation of a smartphone by detecting the presence of the luminaires in the image captured by the smartphone. A demo was given in this work to show how Luxapose was built for localization.
Machine learning has been studied in provisioning VLP. Guo et al. [33] proposed a visible light positioning method via machine learning and fusion. The RSS fingerprints were first constructed by using the peak values of power spectral density of the received signals. Several machine learning classifiers were trained by inputting these RSSs fingerprints. Two fusion algorithms, namely, grid-independent least square (GI-LS) and grid-dependent least square (GD-LS), were proposed to weigh the outputs of these classifiers. Experimental results showed that the probability of obtaining a mean positioning error, less than 5 cm by GD-LS is improved by 93.03% and 93.15% respectively, as compared with RSS ratio and RSS matching methods.
Works like [192] integrate solar cells into garments at the shoulder level, where radiant energy from indoor building illumination is monitored by solar cells and utilized for localization. To sum up, each positioning system presents its own merits and drawbacks. Undoubtedly, combining the measurements from different networks can improve the accuracy of indoor positioning to some extent. We will survey these works in the next section.

C. Source Space of Homogeneous Positioning Systems
Fusion-based indoor localization systems, realized in standalone network including situations where single or multiple measurements are applied, are classified as homogeneous positioning systems. In the following, we will analyze and summarize works based on whether the related measurement technology is single, according to the definition of a homogeneous positioning system. 1) Single Measurement: From all the papers being surveyed, RSS is the single measurement mostly used for FBIP in homogeneous positioning systems because RSS is widely available in the above different positioning wireless technologies. Other common measurements used in homogeneous positioning systems include AOA, TOA, TDOA, and CSI [9], [14], [20], [92].
2) Multiple Measurements: In a standalone positioning system, different kinds of measurements can be used to improve the localization accuracy by combining the advantages of different measuring techniques. For example, the combination of RSS and TOA in WSN has been proven to be efficient in meliorating localization performance. In [204], RSS measurements, modeled by Gaussian processes, are used for crude localization while TOA measurements are used for accurate estimation. Similarly, TOA and TDOA have been fused to solve the position estimation problem in GSM [78]. The first level fusion process converts raw TOA measurements into TDOA measurements. Then, weighted least-squares and ML are used to estimate the position. Based on Bayesian inference, position estimates from the TOA and TDOA estimators are combined at the second level fusion.
As shown in Table V, we summarize the source space of fusion-based indoor localization in homogeneous positioning systems from the view of network and related measurement technologies, of each surveyed paper.

D. Source Space of Heterogeneous Positioning Systems
Fusion-based indoor positioning systems are examples of heterogeneous positioning systems. Here, the systems are realized in multiple networks with single measurement technology. From the surveyed papers, we ascertain that RSSs are widely utilized in heterogeneous positioning systems. After all, RSSs can be obtained from several different networks with relative ease. There are disparate ways of fusing data extracted from RSS measurements in different networks to significantly and effectively refine localization performance [34], [41], [44].
Utilizing RSS measured in WLAN and Bluetooth networks comprehensively is also a meaningful means to improve position estimates. Aparicio et al. [35] selected a zone where the object of interest is supposed to be using Bluetooth. Then, WiFi signal measurements are used to refine the location estimate within an average error tolerance of 40cm. Rodionov et al. [213] fused the results from different networks including WLAN, Geomagnetism, and RFID based on Gauss-Markov theory. Other combinations of WiFi and magnetic information can be found in [26].
Combining RSS measurements from various networks such as WLAN, WSN, and GPS is another significant strategy for fusion-based indoor localization. For example, Fang and Lin [43] proposed two algorithms, namely, Direct Multi-Radio Fusion (DMRF) and Cooperative Eigen-Radio Positioning (CERP), to fuse the RSS measurements from GSM, DVB, FM, and WLAN for heterogeneous positioning systems. Readers are referred to other related works [214], [215] for further details.
Further, we summarize the aforementioned works in Table VI, detailing the single measurement technology and combination of networks for each work.

E. Source Space of Hybrid Positioning Systems
Here, multiple networks are harnessed for hybrid positioning systems with multiple different kinds of measurement technologies.
Amalgamating WLAN and INS is an effective practice in hybrid positioning systems. Fusing the RSS of WLAN and PDR of INS are widely discussed in [36], [46]- [51]. Panyov et al. [46] proposed to fuse the positioning results provided by the RSS of WLAN and the PDR of INS via Kalman filter. Experimental results demonstrated that an accuracy of up to 1.5m can be achieved. In addition to WiFi and INS, Leppäkoski et al. [217] also combined the map information for accurate indoor positioning. Other PDR and RSS based localization in hybrid positioning systems are reported in [38], [218]- [221].
Wang et al. [222] proposed UnLoc, which combines the readings of accelerometers, WiFi measurements, and magnetometers via unsupervised clustering to obtain unique landmarks. Chen et al. [223] proposed a sensor fusion framework to combine WiFi, PDR, and landmarks for smartphone positioning. The integrated landmarks can be easily identified from the specific sensor patterns in their designed environments. They then solved the sensor fusion problem using Kalman filter. They were able to achieve a localization accuracy of 1m. Wu et al. [224] trained the conditional random field (CRF) based on the labeled RSSs of geomagnetism. During the localization phase, the magnetic field, step counter, and direction are combined with the map information for accurate location estimates.  Liu et al. [89] proposed VMag, a hybrid fusion method by combining the RSS of geomagnetic fields and visual images for their complementary nature. Experiments in four different indoor settings including a research laboratory, a garage, a canteen, and an office, demonstrated that 91% of the localization errors are within 0.85m. Chen et al. [88] fused the RSSs from WLAN and vision information from camera by employing particle filter technology. The reported localization accuracy could reach 2m.
Hartmann et al. [82] presented a hybrid indoor localization method by fusing the UWB system with INS by using a strap-down algorithm. Another typical positioning framework of hybrid positioning systems is illustrated in Fig. 8, from which different measurements from different sensors or networks can be fused to yield a better location estimate.
This section summarizes and categorizes the source space in FBIP under three main positioning systems: homogeneous, heterogeneous, and hybrid positioning systems. The sources of the three positioning systems in the surveyed papers are summarized in Table V, Table VI, and Table VII, respectively. 1) Data Extraction for Hybrid Positioning Systems: Hybrid positioning systems require measurements from diverse networks for localization. Ensuring that the data generated by these networks have the same timestamps before being fused is prudent and imperative for meliorating localization accuracies. Fog computing [238] can be employed in hybrid localization systems as a promising technology for such latency-sensitive applications. Best practices involve sending various collected data from different networks such as RSS from WLAN, RSS from Bluetooth networks, sensor information from wireless sensor networks (WSNs) as well as inertial information emanating from IMUs, geomagnetic data, etc. from IoT devices to dedicated fog nodes for preprocessing, filtering of valid data and categorizing data with the same timestamps before engaging cloud servers. This in turn will aid in saving bandwidths as well as ensuring latency reduction in hybrid positioning systems [239]. Also, fog nodes can also be equipped with computing resources to provide seamless localization services, instead of constantly engaging cloud servers.
Different networks utilize different protocols for communication. To overcome the hurdles of transmitting data emanating from different networks for fusion, software defined networking (SDN) technologies such as protocol oblivious forwarding (POF) and programming protocol-independent packet processors (P4) can be adopted to alleviate this burden by provisioning the flexibility to transfer data over different networks irrespective of the underlining implemented protocols [240], [241]. Reference [242] proposed a novel protocol known as indoor localization protocol (ILP) with a simple packet structure for aggregating data in networks for transferring data from one network to another. The protocol is simple and can be modified as needed by each wireless stack to fit their different approaches to meshing and routing. In situations where all these networking technologies are available on a single device such as a smartphone, JavaScript object notation (JSON) provides the flexibility to transmit extracted RSS data from embedded WLAN module, Bluetooth module, GSM module, IMU, etc. to cloud-based servers for processing [243], [244].

IV. ALGORITHM SPACE
In the above section, we describe various combinations of positioning sources under homogeneous, heterogeneous, and hybrid positioning systems. Different localization algorithms can be applied under these networks to yield a reliable localization estimate. Several reviews have summarized the related positioning algorithms. Seco et al. [245] divided the indoor positioning methods into four categories: geometric-based method, minimization of the cost functions, fingerprints, and Bayesian techniques. Basri    Here, we summarize the-state-of-the-art localization algorithms into three groups: conventional methods, machine learning methods, and state estimate methods, based on whether the positioning model is static or dynamic. Taxonomy of localization algorithms is illustrated in Fig 9.

A. Conventional Methods
The conventional localization methods can also be divided into Bayesian and non-Bayesian methods [76]. The non-Bayesian methods treat the target position x as an unknown deterministic parameter, such as least squares and maximum likelihood. Comparatively, the Bayesian methods consider the target position as an implementation of a random variable x with a prior distribution p x (x ), such as minimum mean square error and maximum a posteriori. Below are four popular estimators.

1) Least Squares (LS):
LS is a standard method to solve an overdetermined system. The fundamental idea is to minimize the sum of squares of the residuals of each equation to get an approximate solution. When there is redundancy in observations, it is feasible to use LS estimation to obtain a unique answer. The measurements such as AOA, TOA, TDOA, and RSS or their combinations can be solved by LS methods [184], [247]- [251]. A conventional LS-based problem can be modeled as [111] where z is the location to be estimated and s = [s 1 , s 2 , . . . , s M ] T is the measurement vector. It can further be written as s = g(z ) + ε, where g(z) is a function of the location of the user, z, such as path-loss model [110], and ε is the measurement error. Unlike the conventional LS methods which treat each equation equally, Yang and Wang [252] proposed a residual-based weight least squares method which utilizes two groups of residuals to evaluate the credibility of measurements by considering different importance of the reference nodes. Other relative works, such as weighted-LS and two-step weighted LS, are reported in [184], [253].

2) Maximum Likelihood (ML):
Given an observation, ML is a probabilistic method of estimating the parameters of a statistical model. The ML estimator tries to maximize the following likelihood function [249]: where p s|z can be approximated by parametric distribution including multidimensional Gaussian distributions, Laplacian distributions, and others [40], [70], [254]. The ML estimator requires the knowledge of the conditional probability density function of the observation s.
Combining the measurements from different sensors can improve the performance of the ML estimator. Ayllón et al. [236] combined the local range with angle estimates by using the ML estimator. Without requiring any reference nodes and any prior synchronization between nodes, the localization error ranges from 13cm to 31cm in their experiments. Chen et al. [36] employed an ML algorithm to combine WiFi with PDR without inputting the user's initial information in advance. Sun et al. [51] proposed MoLoc, a MOtion-assisted indoor LOCalization method using ML positioning estimation, which can explore the potential of leveraging user motion against fingerprint ambiguity. The reported localization error in a large office hall is less than 1m.

3) Maximum A Posterior (MAP):
MAP is based on empirical data to obtain point estimates of hard-to-observe quantities. The probability of the model parameter itself is considered to be uniform in ML, i.e., the probability is a fixed value. Therefore, the maximum posterior estimate can be seen as a regularized maximum likelihood estimate. The MAP estimator finds the value of z with the maximum posterior probability as follows [55]: Lu et al. [54] proved that the MAP estimator can reduce the average localization error in juxtaposition with the LS approach. In their paper, the target environment is first divided into a plurality of grids of the same size, and the probability density of the target in each grid point is deduced by Bayesian criteria. Then, each target takes the grid point with the maximum probability density as its location estimate. Kok et al. [55] proposed an MAP-based approach to combine measurements from inertial sensors with TOA measurements from an UWB system for indoor positioning. The UWB measurements are modeled by a tailored heavy-tailed asymmetric distribution to account for measurement outliers. The reported accuracies in position and orientation are 3cm and 1 • , respectively.

4) Minimum Mean Square Error (MMSE)
: MMSE estimator is an estimation approach which minimizes the statistical average of positioning errors as follows [44]: MMSE makes use of range estimates derived from measurements between an UE and reference nodes. Gwon et al. [44] adopted the MMSE algorithm for weights training and calibration. In the offline phase, each input branch transmits its own information and is weighted individually. MMSE learned the same weights at all grid points for the position estimate, which is simple and easy to implement, but very sensitive to dynamic environments. Zhang et al. [255] proposed an MMSE based hybrid positioning algorithm by combining the measurements from inertial sensors and UWB. The proposed algorithm can be implemented in a single-anchor with moderate calibration. The reported improvement in accuracy is 47.2% as compared with a pure inertial solution. Other MMSE based indoor positioning methods are reported in [56], [256]. The characteristics of the above four conventional positioning methods are summarized in Table VIII for comparison.

B. Machine Learning Methods
Position estimation as a machine learning problem is actually based on the measurement samples collected at known locations to model how the positioning information is distributed in different geographical areas. This is also the reason why machine learning methods have been widely used for fingerprint-based positioning. The machine learning methods for indoor positioning can be categorized into two groups: classification and regression.
Regardless of classification and regression, the machine learning based indoor positioning methods try to train the machine learning algorithm as a predictor to yield the location or label prediction, which can be written aŝ (s, D), (9) where D is the offline training data for classifier or regression model training and s is the online testing measurement. f (·) is the classification/regression function, which can be KNN, SVM, NN, Random forests, AdaBoost, to name a few. For the classification problem,ĉ is the predicted label, which can be mapped to the location estimateẑ based on the prestored map information [27]. For the regression problem, c is the estimated location of the UE and does not need to be transformed further [259].
Numerous machine learning based indoor positioning algorithms were proposed in the past few years. Most of them are used as classifiers in indoor positioning, such as deep learning [92], neural networks [58], and others [32], [260]; few works focus on the regression problem [64]. It has been proven that machine learning methods outperform the conventional indoor positioning methods in mitigating the fluctuation of RSS in CEEs [31].
Most of the existing machine learning algorithms yield the predicted label by finding the grid index with the highest probability. It is not intelligent enough because the label index with the highest probability may be the wrong estimation due to the fluctuation of received signals in a complex indoor environment [159]. Guo et al. [30] proposed an unsupervised fusion localization method based on extended candidate location set (UFL-ECLS) to overcome the drawbacks. UFL-ECLS first collects an extended candidate location set by finding the locations with predication probability greater than a certain threshold from each classifier. A joint optimization of weights and the location of the user is derived for unsupervised fusion. UFL-ECLS does not need weights training and storage in the offline phase, and can yield high accuracy in changing environment. We have summarized the performance of some existing machine learning methods in indoor positioning in Table IX for comparison.

C. State Estimate Methods
State estimate is used to estimate the state of targets, which is also referred to as a tracking technology. The state estimate is a common phase of fusion-based positioning because it also explores the measurements from different sensors or different networks to yield a position estimate. In this subsection, we will introduce four popular state estimate methods including hidden Markov model (HMM), Kalman filter (KF), extended Kalman filter (EKF), and particle filter (PF).
HMM can be applied to compute the probability based on given observed sequence in indoor positioning [67], [76]. A combination of WiFi measurements and motion information [50], [68], [69], [144] is also widely adopted in the design of HMM. Liu et al. [69] adopted the Weibull function to model the distribution of the signal strength over time in order to mitigate the signal strength variation and reduce the required number of training samples. Zheng et al. [270] proposed an adaptive transferred HMM (TrHMM) model to reduce the calibration burden of fingerprints in changing indoor environments. The performance of the TrHMM model was validated by real data in improving localization accuracy and saving calibration effort.

2) Extended Kalman Filter (EKF):
The EKF has gained popularity because it can cope with nonlinear and non-Gaussian problems [271]. However, EKF usually approximates the observed signal distribution with a Gaussian distribution and does not consider potential variables in the state linearization process [76]. The computations of Jacobians are extremely expensive in EKF.
EKF has been successfully applied for indoor positioning especially for the fusion of hybrid positioning measurements. The most common positioning application is to combine PDR and other positioning systems, such as UWB and PDR [82], WiFi and PDR [48], [83], [84] or the combination of multiple systems [85].
EFK can also be used to integrate different positioning information from stand-alone networks [14], [206]. Zhang et al. [14] introduced a novel indoor positioning method for the TDOA-based ultrasound source localization. The proposed method simultaneously uses EKF and robust EKF to restrain the measuring noise for both LOS and NLOS environments. Experiments conducted in a factory building with size of 10 × 12 m 2 showed that the proposed method outperforms other commercially available systems.
3) Particle Filter (PF): PF is a recursive implementation of the sequential Monte Carlo method. The basic idea is to replace the integral operation with a set of samples that are close to the posterior probability to obtain a final state estimate [87]. As the most widely used filter in FBIP, PF can describe any probability distribution. As long as enough number of particles are guaranteed, PF can adapt to non-Gaussian, nonlinear problems, and can converge to true posterior probability [173]. However, the larger the number of particles, the more complex the PF computation is [86].
Fusing heterogeneous sources from different networks/sensors by PF has been widely studied. The prevalent one is the fusion of WiFi fingerprint and inertial localization [86], [87], [272], whose accuracy can be improved by adding constraints to PF with map information [39], [143], using local discernibility of magnetic signals [38], combining vision information [88], integrating multiple above positioning measurements [26], and ameliorating the filter operation [173], [216]. As compared to KF and variants of KF, walking distance and map information can be directly integrated in PF by using a non-linear prediction function. The map information is effective auxiliary information that can be used to remove the impossible particles, such as setting the weights of these particles to zeros when the target exceeds certain bounds [39], [69].

4) Brief Summary of State Estimate Methods:
The state estimation methods are mainly used to solve the sensor fusion problem for indoor positioning [50], [66], [67]. We summarize the algorithms mentioned in the above section in Table X. In comparison, HMM does not use any deterministic models to limit the user's motion, and so it has been widely used to track target in CEEs [144]. KF shows better performance for linear systems with Gaussian noise. EKF relaxes the assumption of KF with slightly increased computational burden. PF is the most widely used but incurs intensive computation.

V. WEIGHTS SPACE
Assume that we have a set of location estimates {z 11 , z 12 , . . . , z MN } obtained from multiple sources or multiple algorithms introduced in Sections III and IV, and z mn = f n (s m ) denotes the location estimate obtained from the m-th measurement by using the n-th algorithm. These results are then combined/weighedx = M m=1 N n=1 w mn z mn . Note that the location estimate here can either be a continuous valued results (e.g., 2-D coordinates) or discrete valued results (e.g., grid points). The key problem here is to achieve good fusion performances by selecting optimal weights.
Each positioning technique possesses certain advantages for different application scenarios, while the performance of a combined location estimation can exploit the complementary advantages of estimations made by the individual positioning technologies. Positioning techniques or technologies can be weighed by harnessing the advantages they possess in order to complement each other. Therefore, determining the weights is a critical issue in fusion-based positioning because leveraging different positioning results from multiple sources or multiple algorithms can help in achieving promising localization performances. There are two strategies to acquire weights: supervised learning and unsupervised learning. Supervised learning attempts to learn the weights by using the labeled data in the offline phase. Alternatively, unsupervised learning learns the weights by using online data directly [156].

A. Supervised Weights Learning
In supervised learning, we focus on how to estimate the performance of different algorithms to obtain reasonable weights when training data are available. Most of the existing supervised weights learning is based on minimizing positioning errors using available training data in the offline phase [27], [31], [34], [40]- [42], [44], [106], [156], as depicted in Fig. 10. There are two key components in supervised weights learning: weight training and weights selection. Both of them can determine the accuracy of FBIP. Therefore, we detail them as follows: 1) Weights Learning Methods: Desired weights should fully reflect the intrinsic complementarity among fused information. The existing weights calculation is mainly based on minimization of the positioning errors [31]- [33], [41], [42], [44] and maximization of the source efficiency [274]. Fang et al. [41] proposed to search the weight of the n-th positioning algorithm for the k-th grid point by minimizing the average positioning error over the L training samples and M sources, which can be represented as follows: M m=1 e(f n (s m (l ))|ω), (10) where e(f n (s m (l ))|ω) is the localization error of the n-th function, the m-th source, and the l-th sample with the Fig. 11. The weights assignment using Eqs. (10) and (12).
weight ω, and M is the number of the sources. After having obtained all the weights of multiple fingerprint functions sequentially, they are normalized such that N n=1ω k n = 1.
Consider that the weight computing strategy is just optimal for each individual algorithm, and so it cannot fully excavate the intrinsic complementarity among fingerprint functions. Hence, Guo et al. [27] proposed to jointly optimize the average positioning error for all algorithms simultaneously, namely, knowledge aided adaptive localization (KAAL), i.e., where w k = [w k 1 , w k 2 , . . . , w k N ] T is the weights of all the algorithms at the k-th grid point, 1 is an N × 1 all one vector, and e (f n (s m (l ))|w ) is the positioning error given by the l-th sample of the m-th source using the n-th algorithm. This weights computing strategy is better than DFC [41] in achieving improved performance in accuracy and robustness because it searches for the weights in all algorithm and source spaces. Fig. 11 depicts the weights assignments by using DFC and our proposed KAAL method, which indicates that the weights assigned by KAAL show bigger difference than those by DFC. Similar works can be found in [31], [156].
Instead of training different weights for different grid points, Fang et al. [42] proposed to train different weights for different anchors by minimizing average positioning errors in the offline phase. The final weights used for fusion are chosen from the anchor with the maximum signal strength in the online phase. It outperforms other conventional positioning methods, such as LLS and MDS, in accuracy and robustness. Different from the above methods, Gwon et al. [44] trained the same weights for all grid points based on the minimum mean square error (MMSE) criterion that is simple to implement and can avoid the weights selection problem [27], [41]. However, it shows poor performance in changing environments. It is effective in VLC localization because the signal strength is more stable than that in WLAN environments [33].
Another way of calculating the weights is to maximize source efficiency. Taniuchi and Maekawa [274] built a stable positioning model by integrating multiple weak classifiers for WiFi-based indoor positioning. Each weak classifier performs location estimation through a randomly selected access point (AP) set, and sets corresponding weights according to the validity of the AP. Three indicators are defined to evaluate the importance of AP: mean of signal strengths, observation frequency, and variance of signal strengths. The first two metrics reflect the efficiency of AP, and the last metric stands for the robustness. The key idea of this weights assignment strategy is that we believe the weakness classifiers trained with the data from high quality APs show better performance than those trained with the data from the poor APs.
The fusion sources used by most of the supervised fusion methods are RSS measurements [27], [31], [41], [44], [274] or RSS related measurements [33], [42], [158]. Some RSS measurements are from the same positioning system [31], [158] and some are from different positioning systems [41], [44]. In summary, the accuracies of the fusion methods using the RSS measurements from the same positioning system are lower than those using the RSS measurements from different positioning systems. The fusion of multiple classifiers and multiple fingerprints can potentially achieve the highest accuracy [158].
2) Weights Selection Methods: Note that not all weighed fusion methods need weights assignment; for example, the MMSE method [44] only trains the same weights for all grid points, and it does not need to assign weights for different grid points. Other fusion methods, such as DFC [41], KAAL [27], and others [27], [31], [33], [42], [43] should consider how to select the proper weights for fusion in the online phase.
As mentioned above, Fang et al. [42] trained different weights for different anchor nodes and selected the weight of the anchor with the maximum signal strength. Comparatively, Li et al. [207] adopted the nearest neighbor rule to find an appropriate weight by directly matching the distance between the online observations and the offline collected signals. However, this positioning strategy is susceptible to the influence of dynamic factors in indoor environments as the data distribution changes, thus resulting in poor positioning accuracy.
Two advanced strategies have been proposed to reasonably select weights for better fusion. One is to select the weights based on the average of the outputs of multiple algorithms [27], and this can improve the accuracy of the weights selection to some extent, but it is often limited by the performance of the worst algorithm. Another alternative technique is based on the output of the best algorithm, i.e., we can select the weights of the grid point predicted by the best algorithm because it is possible to determine the best one given some trained data in the offline phase [27], [31].

B. Unsupervised Weights Learning
In contrast to supervised weights learning, unsupervised weights learning simply exploits the online measurements to calculate the weights, does not need to train and store the weights in the offline phase, and is thus more attractive in actual positioning environments. Furthermore, it is more robust to changing environments as compared with supervised weights learning. There are two typical unsupervised weights learning methods: conventional unsupervised weights learning and truth discovery.
1) Conventional Unsupervised Weights Learning: Given online measurements, the conventional unsupervised weights learning methods learn the weights by some rules, such as Best Linear Unbiased Estimate (BLUE) [34], majority voting [275], etc.
In indoor positioning, simple averaging cannot depict the intrinsic complementarity among different positioning results because it assumes that the errors of different position estimates are uncorrelated. Hu et al. [93] calculated the weighted mean of results to fuse PDR with WiFi via setting a coefficient to combine measurements, where the coefficient is proportional to the absolute distance between the results from PDR and WiFi. Guo and Ansari [32] proposed to estimate the most credible positioning result by evaluating the occurrence of positioning results. However, more testing samples are required to yield a better estimate. Besides, majority voting [32], [275] is a widely used fusion method in combining positioning results without prior knowledge about information sources. In the majority voting method, the final result is the one with the most votes. It performs poorly in changing environments because it neglects the quality of each positioning algorithm [29].
Wang and Wong [34] proposed a BLUE-based fusion positioning algorithm to efficiently combine all the estimates from different algorithms. The new estimate can differentiate the correlated and uncorrelated positioning results. They concluded that the performance of the proposed fusion positioning algorithm increases as the correlation between each positioning result decreases. It was reported that more than 20 percent reduction in the mean distance error can be achieved by using the proposed fusion method.
2) Truth Discovery Methods: Truth discovery is a data mining method used in text classification and other big data applications. The main idea is to find the most credible positioning result among the multiple candidate positioning results. In this case, the weight for the most credible positioning result can be considered as 1, while the weights of the other positioning results can be regarded as 0 [145].
Guo et al. [29] proposed an expectation maximization method for accurate indoor positioning. It can estimate the location of the UE and fingerprint quality simultaneously. The mean RMSE of the proposed method can achieve 2.51m in a real office environment, and it outperforms other related methods. The key of this work is to offer a knowledge evaluation model, which can be used in transfer learning to alleviate the burden of fingerprints construction.
Note that the conventional machine learning methods are not robust in presence of multipath and changing environments, i.e., the location estimates predicted by conventional machine learning methods may be wrong in changing indoor environments, as shown in Fig. 12. The index of the true location of the UE is 40, while the predictions of SVM, LR, and KNN are 50, 50, and 46, respectively, owing to the fluctuation of RSS measurements. To overcome this drawback, Guo et al. [30] proposed unsupervised fusion localization based on an extended candidate location set (UFL-ECLS) to estimate the weights by empowering the existing machine learning methods. UFL-ECLS iteratively updates the weights and location of the UE by minimizing the positioning errors. Experimental results showed that UFL-ECLS can reduce 67th percentile RMSE by 21.6%, 21.4%, and 16.5% as compared with MMSE [44], DFC [41], and KAAL [27], respectively.
The sources used in unsupervised fusion RSS measurements [29], [30], [34] and other statistics [28], [32], aided by other measurements, such as PDR, can achieve more accurate results [49], [276]. Expectation-maximization [29] and convex optimization [49], [276] are the two main strategies for higher accurate unsupervised fusion positioning. The former can intelligently estimate the locations of targets because it can yield the source quality estimate simultaneously. The latter can work well with a small size of samples but it imposes heavy computational burden.
We have summarized the characteristics of the weights learning methods in Table XI for comparison. VI. LESSONS, CHALLENGES, AND COUNTERMEASURES Various FBIP systems have been detailed from three perspectives: source space, algorithm space, and weight space. Here, we further deliberate on lessons, challenges, and countermeasures for FBIP systems.

A. Lessons
Designing an indoor fusion-based location system requires consideration of many factors, including the selection of complementary information, efficient localization algorithms, and appropriate weights.
In homogeneous positioning systems, indoor localization is mainly based on the combination of single or different measurement techniques. Obviously, redundant data can enhance the robustness and stability for positioning systems, regardless of their raw measurements, processed data or position estimates [42], [98]. Meanwhile, different types of combinations mentioned above can be used for indoor positioning. For positioning systems belonging to the class of heterogeneous positioning systems, the same type of applied measurement technologies facilitates an easier fusion framework as compared with the hybrid positioning systems. It is clear that the combination of data from different networks can enhance the robustness of positioning systems. The greatest advantage of fusion-based positioning in different networks is complementarity [43]. For example, geomagnetism-based positioning system can adjust the accumulated error for INS, while INS can alleviate the signal low identification problem for geomagnetism-based localization system. Hence, the final position estimation will be determined by the complete utilization of the network's respective advantages.
An appropriate positioning algorithm is the key to improve the positioning accuracy. Different positioning algorithms have their own advantages and disadvantages. The selection of positioning algorithms should balance the accuracy and complexity. For example, machine learning methods can achieve better performance as compared with conventional methods [276], but they need numerous training data for model training. ML, MAP, and Bayesian methods show better performance in indoor positioning, but they need to assume some probability distribution in localization measurements, and such probability distribution is often not known in reality [254].
For the case with available training data, the weights can be learned based on the offline data. Generally speaking, fixed weight shows poor adaptivity for changing positioning scenarios. A dynamic and area-dependent weight can better exploit the advantages of different results in different subareas. In practice, it is generally impossible to obtain labeled data in advance, and the weights can only be calculated based on the testing samples. Therefore, unsupervised weights learning is more attractive in real indoor positioning.
Note that the fingerprints construction is a laborious task in indoor positioning, and it is always difficult to obtain enough label fingerprints for positioning. To overcome the bottlenecks, crowdsourcing [84], [112], [113], [190] and other calibrationfree [49], [95], [277] techniques are two candidate strategies in easing the burden of data collection in the offline phase. Crowdsourcing resorts to other clients to automatically sending their positioning data to a server, and thus greatly reduces the site-survey fingerprint construction work.
The calibration-free techniques mainly focus on improving the fingerprints efficiency in indoor positioning because the constructed fingerprints may no longer be valid as the positioning environments change. Two alternative techniques have been proposed to fully leverage the constructed fingerprints for positioning in new environments. One is transfer  [270], [278]- [280], which tries to transfer knowledge from an old domain to a new domain for high accurate positioning. The other is the gain-without-pain method, which makes full use of the existing fingerprints to obtain more robust location features [31], [32], [94], [95], [98], [109], [281]. Both of them require more in-depth research in the future.

B. Challenges
Fusion-based indoor positioning technology has become a hot topic, and a large number of works have been discussed in our review. However, there are still some issues that limit the adoption of fusion-based positioning.
1) Positioning Accuracy of Single Network: In CEEs, positioning accuracy of the single network/measurement is lower due to multipath propagation, changing environment, or short of measurement. From the ensemble learning theory [282], we know that the positioning accuracy of a fusion-based technique is constrained by the positioning accuracy of each single network/measurement. Hence, we should try to obtain more accurate positioning results from each network/measurement before fusion.
2) Positioning Cost: The cost in indoor positioning mainly consists of two aspects: computational cost and fingerprint construction cost. Computational cost is greater in fusionbased indoor localization. Note that the amount of data obtained by different measurement technologies is greater than the one that just depends on a single location technology regardless of homogeneous, heterogeneous or hybrid positioning systems, and so the computational cost is higher. How to design a computationally efficient positioning framework is the key in FBIP [190], [283]. To reduce the cost of fingerprint construction, crowdsourcing is an alternative solution, but how to make full use of the unlabeled data from crowdsourcing is a great challenge in FBIP. Transfer learning can also reduce the cost of fingerprints construction to some extent, but the existing transfer learning methods show limited ability in transferring knowledge from different networks and in evaluating the efficiency of the transferred knowledge; they are also computationally expensive (e.g., most of them need eigen decomposition of a large size matrix). Plenty of efforts should be made to improve the computational efficiency of existing FBIP methods.
3) Fusion Efficiency: Multiple sources, multiple algorithms or both can be fused to improve the positioning accuracy. Fusion-based level also presents multiple choices. Effectiveness is the basic premise of the design of a fusionbased positioning system. A good fusion scheme can fully exploit the complementary advantages among various signal sources and compensate for the limitations among different algorithms. However, the fusion level and the efficiency of the combination technique are the two critical issues that require further investigation. Over-fusing may lose the discriminative information of data, while under-fusing will result in the complementary advantages not fully exploited. Ensemble learning [99], [101] is a baseline for fusion localization, but how to select a good combination of algorithms and sources based on localization accuracy and diversity using ensemble learning principle is an interesting direction and deserves further study.

4) Fusing Data From Diverse Networks:
Fusing data from different networks is one of the major hurdles especially with hybrid positioning systems because different protocols harnessed by different networks make communication across networks challenging. There is therefore a need to design specialized protocols to realize cross network communication so that different measurements can be extracted and harnessed for localization.

C. Countermeasures
To mitigate the above challenges, we suggest some countermeasures below: 1) Improving the Positioning Accuracy of Single Network/Measurement: Nowadays, the indoor positioning methods using different networks or measurements have been developed rapidly. A growing number of algorithms have emerged to improve the positioning accuracy for the single network or measurement technology. For example, in WLAN environment, although the RSS-based positioning methods show low accuracy, but other useful measurement technologies from the WiFi interface card, such as CSI, have been proven to improve the positioning accuracy effectively by using high resolution techniques. The improvement in positioning accuracy of the single network/measurement will greatly improve the positioning accuracy and robustness of fusion-based positioning results.
2) Reducing Positioning Cost: Reducing the positioning cost inevitably needs to trade-off the positioning accuracy and computational cost. Selection of a suitable fusion strategy should be considered in reducing the computational cost. For example, if we can obtain a robust feature from given sources, it is better to fuse the sources instead of fusing the positioning results [98]. Designing efficient fusion methods to fully leverage the sources is the key to lessen the burden of the computational cost, such as SLAC [49]. For the case of reducing the burden of fingerprints construction, crowdsourcing and transfer learning technologies should be paid much attention in the future.
3) Fusion Efficiency: Generally speaking, fusion efficiency can be improved by jointly considering the diversity and accuracy of each single network/measurement based positioning technique. As show in Fig. 13, a variety of information, such as motion, map, and different measurements from different networks can be combined to obtain a refined positioning result. Additionally, most of the existing fusion works assume that the positioning results from multiple algorithms and multiple measurements are independent; such assumption is not realistic in many applications because the adopted measurements and algorithms may be correlated [37]. So, how to design an fusion strategy with good efficiency for correlated sources also requires further investigation.

4) Fusing Data From Diverse Networks:
To resolve the hiccup of fusing data from different networks, protocols such as POF, P4, ILP, etc., can be utilized as they alleviate the burden of cross network communication by provisioning flexibility for transferring data over different networks irrespective of their underlining implemented protocols [240]- [242]. That is, different networks such as WLAN networks, Bluetooth networks, UWB networks, RFID networks, etc. can cross communicate to realize localization.

VII. CONCLUSION
The demand for location based services have attracted much attention; the pursuit of localization performance has thus become paramount both in academic and industrial communities. We cannot deny that data fusion is an effective way to further improve the accuracy and robustness of indoor-based localization, while existing single positioning technology is maturing.
In this survey, we have proposed a novel architecture for fusion-based indoor-localization, composed of a source space, algorithm space, and weight space. In the source space, we summarize the sources of homogeneous, heterogeneous, and hybrid positioning systems, based on the different combinations of networks and measurement technologies. In the algorithm space, the indoor positioning algorithms are grouped into three categories based on whether the positioning model is static or dynamic. In the weight space, we have reviewed related works from supervised and unsupervised learning. Finally, we have delineated the lessons, challenges, and countermeasures for fusion-based indoor localization in this study. Readers will quickly gain a good grasp on state-of-the art indoor localization modus operandi from tables and figures that summarizes key concepts from different perspectives. The foundation for fusion-based indoor positioning has also been laid in this survey.