Crowdsourcing-Based Learning Data Collection for Real-Time Sensor Error Correction in Indoor Environments

Sufficient training data and high positioning accuracy are crucial components of indoor positioning. However, the collection of learning data consumes much time and manual effort, inhibiting the global spread of indoor positioning technology. The use of crowdsourcing-based data collection that does not require user intervention can reduce deployment effort, but results in loss of positioning accuracy. Pedestrian dead reckoning partly resolves the problem by using a variety of sensors to provide a relatively accurate position; however, the accumulation of errors is yet to be successfully addressed. In this study, we introduce a highly accurate positioning method that implements error correction based on a crowdsourced database. The proposed method constructs learning data without manual effort and the need of reference points in the target area, and improves positioning accuracy by continuously learning and correcting the error distribution of the inertial sensor. The obtained positioning accuracy was approximately 3.38 m for roughly 83% of the collected fingerprints. Furthermore, the error correction algorithm improved the moving distance and direction accuracy by up to 2.86% and 25.7%, respectively. The proposed method was verified through experiments in an office building where it successfully constructed a learning dataset and reflected a dynamic environment by deriving accurate tracking results.


I. INTRODUCTION
Location-based services provide a variety of content and information based on geographic data obtained from mobile communication networks or the global positioning system (GPS). Such services are widely used for many purposes, such as location tracking and inquiry, safety and complementary services, and information services for surrounding areas. In particular, indoor location positioning technology using location information is used in core services such as indoor navigation. Because GPS uses the satellite location, angular velocity, and signal attenuation, it can provide optimal positional accuracy for outdoor locations. However, indoor GPS coverage is limited, making GPS challenging to apply as an indoor positioning technology.
The associate editor coordinating the review of this manuscript and approving it for publication was Ehsan Asadi .
Contrarily, Wi-Fi-based positioning technology has become the most popular indoor positioning method because Wi-Fi environments are widely constructed and are available in most indoor spaces. However, unlike GPS, the Wi-Fi access point (AP) only serves as a relay for channel allocation for data communication and does not support mobile device positioning functions. Therefore, a central database that stores required positioning information and provides it to a mobile device in real time is essential in a positioning system using a Wi-Fi signal. However, such databases, called Wi-Fi radio map (WRM) databases, are costly to develop and maintain.
To reduce WRM construction efforts, many researchers have tried to utilize fingerprint collection without location information [1]. Implicit and explicit crowdsourcing approaches [2]- [4] have been introduced to make use of contributed received signal strength indicator (RSSI) measurements. Because the true location of the collected data is VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ unknown, this type of data can be regarded as an unlabeled data sample. Therefore, in crowdsourcing-based data, it is necessary to allocate the true location of data collection. Additionally, unlabeled data were handled using the inertial sensor (INS) built into smartphones [5]- [7], and semisupervised learning methods were deployed to utilize both labeled and unlabeled samples [8]- [10]. However, in the INSbased WRM construction method, new issues arose, such as device heterogeneity and power consumption. Consequently, without the design of a good initial model, finding a global optimal solution is challenging; therefore, semi-supervised learning methods require some labeled samples for initial model design.
Research on improving positioning accuracy with WRM construction is still ongoing. In a WRM construction, if the fingerprints are collected very densely, relatively accurate positioning results can be obtained. However, as mentioned above, the cost of WRM construction is high, and if the positioning is performed using the dense WRM, the amount of computation increases, causing difficulties in providing the positioning service in real time. For this reason, rather than focusing on building WRMs, many studies have attempted to improve the positioning results by integrating data from various sensors mounted on a smartphone. In particular, PDR is a technology that tracks users using INSs (e.g., the accelerometer, gyroscope, and digital compass of a smartphone) and is widely used in the field of hybrid positioning. Hybrid positioning methods include Kalman filters [11], [12], which use weighted averages of sensor positioning results, and their variations [13]. Particle filters [14]- [16] and hidden Markov models (HMM) [17]- [19] are widely used in hybrid positioning as a sophisticated method of predicting the user's location based on probability. However, because this method ignores the user's environment and movement pattern that changes depending on the location, it is difficult to obtain a highly reliable result compared to sensors using offline training. Therefore, sensor fusion studies so far have been empirically or experimentally set to use the reliability of inertial sensors.
In this paper, we propose a crowdsourcing bio-informaticsbased radio map construction (CBRC) method, which constructs a WRM using only Wi-Fi correlation and indoor layout between fingerprints in crowdsourced-collected traces. The proposed CBRC uses a bio-informatics sequence alignment method to merge crowdsourced traces, reduce them, and map them to the indoor layout to label fingerprints in the traces. Since this method does not use location-labeled fingerprints or other sensor data, it is possible to significantly reduce the amount of data required and construct a highaccuracy radio map.
In addition to CBRC, we introduce a sensor error correction (SEC) positioning algorithm, which improves the positioning accuracy by continuously learning and correcting errors of the inertial sensor in a WRM/CBRC environment. In the proposed SEC, the majority indicator (MI) results calculated by fusing all sensor tracking information are more accurate than those obtained by using single sensors. Then, the error probability distribution of single sensors is adjusted by referring to the results obtained by the sensor fusion. Using this method while the tracking progresses, the error probability of each sensor can be adjusted.
To validate the proposed CBRC, a fingerprint sequence was collected in a multi-story indoor space to test the labeling accuracy. In the experiment, the location labeling accuracy was confirmed to be 3.38 m for about 83% of the collected fingerprints. For the SEC validation, we combined Wi-Fi with a magnetometer for absolute positioning and inertial sensors, such as accelerometer and gyroscope, for user movement detection and tracking. Comparing the positioning results before and after error correction was applied, the moving distance improved by approximately 2.86%. Similarly, the moving direction improved by 20% and 25.7% for a single sensor and for simultaneous use of the accelerometer and gyroscope, respectively. These results indicate that tracking accuracy can be improved by SEC in dynamically changing environments, such as indoor spaces.
In this study, we successfully developed a hybrid technology by integrating concepts from various fields, such as bioinformatics, in computer science. By collecting data from numerous unspecified implicit participators, we constructed the WRM at a low cost. In addition, we proposed a robust positioning method for determining the characteristics of the service location or user through real-time error correction of the INS data. Therefore, we demonstrated the feasibility of real-time INS data correction, which may prove useful in future studies concerning error correction.
The rest of this paper is organized as follows: Section 2 discusses WRM construction process and the error correction approaches. In section 3, we describe the proposed CBRC method in detail. In section 4, we introduce the proposed SEC method in a theoretical manner. Section 5 reports the experimental results. The conclusions are presented in section 6.

II. RELATED WORK
To track users in real time, the learning data must be collected in the offline phase. However, a method is required to solve the sensor error problem that arises when user tracking is performed using an inertial sensor. This section discusses various studies that address the aforementioned issue.

A. REDUCING CALIBRATION EFFORTS
In crowdsourcing-based Wi-Fi fingerprint radio map construction, matching unlabeled fingerprints with indoor layout coordinates is a fundamental approach for indoor positioning. Although various studies on unlabeled fingerprint matching have been proposed, the problem of high-quality WRM construction remains unresolved. Because the positioning estimation is highly dependent on the quality of the WRM, simply building a WRM cannot guarantee satisfactory performance of the positioning system. Therefore, approaches, such as point-by-point manual calibration [20], that require time-consuming and laborious human intervention have been used to construct high-quality WRMs.
To reduce the WRM construction effort, implicit and explicit approaches based on crowdsourcing were studied in the past few years [21]. Radu and Marina [22] combined location tracking and activity recognition using inertial sensors on mobile devices with location-specific weighted assistance from a crowd-sourced Wi-Fi fingerprinting system via a particle filter. In this study, the Wi-Fi scan period was set to 20 s to reduce battery consumption. However, if the application required high location accuracy, the frequency could be increased to provide frequent assistance to the PDR. G-Loc [23] first built a gradient-based map (Gmap) by comparing RSSI values at nearby positions and ran an online extended particle filter to localize the user/device. This study managed to reduce the fingerprint calibration overhead by using the proposed Gmap but, considering the computing time consumed for calibration, it has limited applicability.
Zhang et al. [24] proposed a geomagnetism and crowdsensing-powered indoor navigation system, which encapsulated three functions into one unit, namely, map building, localization, and navigation. In this study, a map was built using sensor data, semantic labels were collected from user contributions, and positioning was realized using geomagnetic fingerprinting techniques. However, because positioning ambiguity depends on the relatively severe geomagnetic map compared to the Wi-Fi RSSI, there may be a limit in providing stable learning data. A crowdsourcing system utilizing sensor-rich video data from mobile users was introduced for indoor floor plan reconstruction [25]. The proposed method leverages crowdsourced sensory and video data to track user movements, then uses the inferred user motion traces and image context to produce a floor plan. However, for non-rectangular rooms, this visual-based approach is not applicable or provides less accurate results. Moreover, because this method is based on vision, energy consumption is high.
Yuanqing et al. [26] proposed Travi-Navi, which records high-quality images during the course of a guider's walk on the navigation paths, collects a rich set of sensor readings, and packs them into a navigation trace. This method enables self-motivated users to easily deploy indoor navigation services without assuming a comprehensive indoor localization service or even the availability of floor maps. However, because the algorithm is image-based, it is difficult to guarantee its stable operation if image recognition fails. Sungwon et al. [27] proposed a novel calibration-free technique for a crowdsourced indoor localization system and evaluated its performance through laboratory-level experiments. The introduced system extracted fingerprint values from short RSS measurement times, performed calibrationfree positioning across different devices, and maintained a single fingerprint for each location in a radio map, irrespective of the number of uploaded data sets for a given location. Nevertheless, because the proposed method was verified at the laboratory level, the system performance could not be guaranteed.

B. SENSOR ERROR CORRECTION
Generally, when a user's movement is continuously tracked, the positioning accuracy gradually decreases owing to the cumulative error of the inertial sensor. This phenomenon is caused by various factors, such as environmental differences during tracking, user movement patterns, and sensor bias or fluctuations. In various studies, the positioning results have been reduced from errors in sensor data used for user tracking.
To correct PDR errors, researchers used a zero-velocityupdate algorithm with a double-step unscented Kalman filter and HMM [28]. The proposed algorithm divided the measurement updates of the gravity and magnetic field vectors into two steps to avoid the unwanted correction of the Euler angle error. However, the integral error is implicit in strapdown inertial navigation systems and cannot be completely eliminated by the proposed algorithm. Additionally, because different user behavior patterns are not considered, unexpected positioning results may be obtained in real situations. Li et al. [29] suggested an indoor positioning error correction algorithm for pedestrian multi-motion recognition. The proposed method used the hybrid-orders fraction domain transformation to extract feature vectors. Error correction of the heading angle and positioning was also carried out. However, only seven representative human poses were selected, and wearable device data was deployed for user tracking. Therefore, it is uncertain whether environmental factors or personal characteristics are properly reflected.
Liew and Wallace [30] attempted to deploy the short-term positioning effectiveness of PDR in long-term navigation. In this approach, an RSSI-based corrective scheme was proposed to improve the PDR-estimated headings for long-term usage. By using a particle filter and historical PDR data, heading error correction was performed. However, proper error correction is difficult in areas with poor Wi-Fi environments lacking an indoor layout. Mohd et al. [31] proposed a method for correcting heading errors of foot-mounted navigation based on inertial navigation sensors. A threshold-less turn detection method, known as pelvic rotation-ZUPT turn detector, was applied to correct the heading in the Kalman filter-based foot-mounted navigation. However, performing tracking independently on the inertial sensor cannot ensure the reliable application of the error correction effect, unless periodic absolute positioning is also performed.

III. CROWDSOURCING-BASED RADIO MAP CONSTRUCTION
This section introduces CBRC, which labels the location of crowdsourced fingerprints. CBRC merges single traces collected through crowdsourcing by using a sequence-alignment technique based on the similarity of fingerprints, and reduces the merged traces to a simple logical graph to match the indoor layout. VOLUME 8, 2020 A. Wi-Fi TRACE ALIGNMENT Individual raw traces collected from a user's mobile device at a certain location (e.g., corridor) can be merged based on the Wi-Fi signal similarity. The merged trace is more similar to the way-link of the entire indoor space than individual raw traces. In addition, it minimizes the memory and spatial complexity of the accumulated raw trace and further increases the positioning accuracy of the trace. Raw traces collected from individual mobile devices are transferred to the central database, and periodically accumulated new traces are merged with existing traces. Sequence alignment is a technique typically used in bio-informatics. It arranges sequences so as to maximize the degree of similarity between DNA, RNA, and protein sequences. In trace alignment, an individual trace collected from various users is a sequence of each measure point.
The conventional sequence alignment technique can be roughly divided into three stages: initialization, scoring, and backtracking. During initialization, a scoring matrix is created, and two specified sequences are merged into each row and column. Then, the components of the two sequences that correspond to each cell are compared. Scoring is realized based on the match, mismatch, score of insertion and deletion, and policy. Finally, cells are aligned via backtracking according to the corresponding score from highest to lowest. Deploying the sequence alignment technique to merge Wi-Fi traces in CBRC offers multiple benefits. Both sequence and trace alignment are used to merge two different sequences. Particularly, sequence alignment is a proven method widely used by researchers in the field of bio-informatics.

B. LOGICAL GRAPH GENERATION
Wi-Fi trace alignment is similar to DNA sequence alignment, but there are also many differences. First, in trace alignment each trace measure corresponds to a point in a continuous space, rather than several discrete locations. Second, there is a clear match and mismatch relationship between each component. However, although the trace measure points may vary depending on the algorithm, they are generally expressed as a continuous similarity function of 0-1. Finally, the alignment sequences are directional, but the trace direction cannot be known. In addition, the sequences are simple one-dimensional continuous functions of components. However, traces are three-dimensional sequences of measure points with location information. In this section, the problems caused by this dimensionality difference are described, and potential solutions are introduced by constructing a simplified logical graph.

1) MULTIPLE ALIGNMENT
This part describes the process of inferring logical graphs inside a building based on a continuous Wi-Fi fingerprint, i.e., an individual trace. An individual trace is a continuous Wi-Fi fingerprint that passes through a portion of the physical path inside the building. Multiple alignment combines individual traces to extract a logical graph that represents the path inside the building. The procedure is similar to the overlap layout consensus technique commonly used in bio-informatics for gene sequence assembly. The overlap layout consensus constructs an overlap graph using overlap information of each sequence pair extracted via pair-wise alignment.
The sequence of all sequences is sorted based on the overlap graph, and finally a single sequence pertaining all sequences is extracted. Simultaneously, the branch of the sequence formed by gene mutation or the like can also be extracted. However, the trace of data collected via the crowdsourcing technique is not directional. Even if information on the moving direction of the trace collector is available, the collection area, i.e., the path inside the building, can be bidirectional. Therefore, the collection direction can be considered as meaningless for path inference, and CBRC solely expresses the relationship between two aligned traces by expanding. Fig. 1 shows the relationship between two aligned traces. In CBRC, to prevent an individual trace from passing through three or more corridors, the trace is divided into components. Therefore, in case 4) of Fig. 1 (a), an error is produced owing to misalignment between the ends of a trace or between traces that should not be aligned. Misalignment may occur between traces passing through three or more corridors, or may have the same form as in case 4) at the intersection of four or more passes. However, for a sufficient number of collected traces, alignments as in cases 1), 2), 3) may exist simultaneously in a section. Therefore, alignment 4), which is prone to error, is ignored in CBRC.
In CBRC, U-Turn (i.e., self-alignment) is removed using pre-processing. Since the trace connection to construct the path of the indoor building does not allow U-turns in the aligned place, there is a unidirectional relationship in each alignment. Therefore, the alignment relationship of Fig. 1 (a) is illustrated in the state diagram of Fig. 1 (b). The state nodes of the diagram correspond to the ends of each trace, and the state transition occurs when the ends are aligned with the middle of the neighboring trace. In the state diagram, paths composed of the corresponding alignment can be expressed in both directions. Therefore, two state diagrams are derived per alignment, and each diagram node has trace-belonging information. To this end, a trace diagram in the form of Fig. 1 (c) is devised and expressed in CBRC. The trace diagram is a multi-graph structure with a direction and non-direction line. It includes the state transition information of the state diagram and end information of the trace. In the state transition of the trace diagram, i.e., the path derived from the CBRC as a set of traces, there are laws represented by the repetition of direction and non-direction lines.
In this study, both ends of the corridor are considered as a state. Therefore, at the intersection of three corridors, there is a walking rule, depicted in Fig. 2 (a), which can be represented by a trace diagram in Fig 2 (b) that includes state transition information.  This strategy can be used to effectively determine and eliminate trace alignment errors. The walking rule assuming there are paths in all directions at the intersection on the indoor map means that all paths must be expressed in a trace diagram composed of sufficient traces. Consequently, if a path does not comply with all walking rules, it can be regarded as an error even if it branches from the configured trace diagram.

2) LOGICAL GRAPH REDUCTION
Trace diagram inevitably contains many errors induced by the error of trace alignment. This subsection introduces a method for simplifying the diagram so as to eliminate errors and ultimately derive a logical graph of the indoor space from the trace diagram.

a: TRANSITIVE EDGE REDUCTION
Individual traces form alignment relationships with multiple traces. This is a shared characteristic with sequence fragment used in gene sequence assembly. In the overlap layout consensus method, the transitive reduction technique of graph theory is used to remove the transferable link in the overlap graph. General transitive reduction can be applied to directed acyclic graph (DAG). The overlap graph constructed via the overlap layout consensus technique is applicable in this scenario, but the trace diagram covered by CBRC cannot be considered as DAG. Therefore, transitive reduction is applied as shown in Fig. 3 only when both paths are transitive. When two or more traces are transitive, they can be integrated as traces in the trace diagram and form a single path.

b: PARTIAL BUBBLE REDUCTION
In CBRC, a partial bubble is created among four traces with a well-ordered sequence at one end, and means that two of the four traces are misaligned. The partial bubble error is represented in the trace diagram, as shown in Fig. 4. Since there is no such alignment type path on the actual indoor space path, it can be regarded as a misalignment error of the middle two traces. In the trace diagram, if a path exists between two out of four traces and there is no path to the traces in-between, the order is rearranged by connecting the misalignment of the two traces in the middle of the path.

c: BUBBLE REDUCTION & FALSE PATH TRIMMING
Unlike partial bubbles, the bubble and false path on the trace diagram occurs when the whole trace is misaligned, rather than one end of the trace. Misalignment occurs not only between two traces but also between multiple consecutive traces. A bubble forms when two miss-aligned paths merge at the same paths at both ends. When only one side is integrated in the same path, a false path occurs. These two errors can be caused by an insufficient number of collected traces even if a path exists in the actual indoor space. However, these errors occur only owing to misalignment, presuming collection is sufficient. In that case, branching of the path is possible only at the intersection, and traces branching must be mutually reachable because there are mutual paths in all directions from the branch.
Different trimming strategies are applied to eliminate these errors. In the case of bubble errors, because both ends of each branch are connected to the same paths, the branches initially form one path but are not aligned. Thus, two branches are merged into a single path. In the case of false path errors, the path to be aligned is initially unknown; thus, it is removed.

d: REDUCTION USING SUBSTITUTION
The path that can proceed in the trace diagram can be expressed as a sequence of directed and undirected edges. Therefore, a trace in which edges continuously alternate from directed to undirected can be substituted with a single directed edge, as shown in Fig. 5 (a). When this substitution process is applied to trace diagram reduction, the path among various traces that are complicatedly intertwined can be simplified, as shown in Fig. 5 (b).

e: SMALL CYCLE REMOVING
When enough data is available, the alignment between two merging traces may be completely matched, or one end of the traces may coincide. This forms a small cycle in the trace diagram and acts as an obstacle to reduction. Small cycles can be removed using the trace-path substitution technique described in Section 2d. When removing a small cycle from the trace diagram, it is essential to process the path that the removed trace can transfer. Small cycles, at which both ends of the trace are aligned, can be easily removed through integration. However, if only one side shows the alignment, the small cycle is eliminated because one trace is absorbed by the other. Because the directed edge between the nodes is connected to the path that the absorbed trace does not control, the logical integrity of the diagram is maintained.

f: SUMMARY
The various techniques for merging location traces described so far can be summarized as follows. The trace diagram generated from trace alignment can be replaced by a minimal trace diagram by applying reduction and trimming techniques. In the final minimal diagram, the intersection can be replaced with a node, and the corridor can be directly replaced with a logical graph of the interior space expressed as an edge.

3) LOGICAL-PHYSICAL GRAPH MATCHING
Graph matching is the next process of merging location traces. Its main purpose is to map the logical graph created by merging to a configurable physical way-link graph based on the map of the indoor space. The mapping between the two graphs aims to find a relationship with the logical graph by reflecting the structural characteristics of the indoor space. The purpose is to create the final Wi-Fi database by pairing the fingerprint and the indoor location coupled to G1. This is achieved by mapping logical graph G1 and the physical way-link G2, as shown in Fig. 6. If mapping is successful, the pairing between the Wi-Fi signals of the trace merge graph and location information of way-link is also possible. CBRC solves this problem by adopting graph theory.
Basically, graph matching aims to find an edge set within a graph. Therefore, to draw the relationship between the two graphs comprising the matching problem, a conceptual extension of the problem is needed; i.e., graph matching is interpreted as checking whether a graph is bipartite. If so, the divided vertices are two disjoint sets, and each vertex is connected to a vertex in the other set, thus connecting the graphs themselves. However, in CBRC, graph matching is an inexact problem that allows mapping of subgraphs, and thus should be solved as a quadratic assignment problem, or a 'NPhard' problem. So far, several studies have proposed highly precise algorithms; nevertheless, in the real world where infinite time is not given, an approximate algorithm is used to find the local optimal problem.

4) FINGERPRINT ARRANGEMENT
One edge contains traces absorbed by the reduction and fingerprint information of the trace; each fingerprint contains information about the connection with the fingerprint on the other trace matched during trace alignment. Thus, the relative position of the fingerprint on the edge (corresponding to a corridor) can be inferred. Then, the coordinates of the fingerprint can be predicted based on the actual coordinate values at both ends of the intersection or corridor obtained by logical-physical graph matching.
The relative position on the edge of the fingerprint is predicted using the connection information between fingerprints. The connection between fingerprints can have both the form that is connected to the front and rear fingerprints in the order within one trace, and the form that is connected to the fingerprint on another trace that is matched according to the trace alignment. By these connections, all fingerprints belonging to one edge on the logical graph form a connection graph between fingerprints. The two connected fingerprints have Euclidean distances in the signal space. To calculate the relative position of the fingerprint, a technique that implements Hooke's law of elasticity can be used to find the equilibrium state of each connection. This technique regards the connection between fingerprints as an arbitrary elastic body. Then, considering the length of the elastic body as the Euclidean distance between its ends, the final one-dimensional equilibrium state on the corridor can be calculated. CBRC uses this method to calculate the relative position between fingerprints and the logical length of the edge.

IV. SENSOR ERROR CORRECTION METHOD
Signals that can be extracted as features of fixed coordinates, such as Wi-Fi signals, can be trained in an offline phase to learn the probability distribution for each location. Since this probability distribution reflects environmental factors, relatively reliable positioning results can be derived. However, it is not easy to obtain an error distribution that includes environmental factors from user movement data collected by sensors. This is because such data are used to track the user's movement in real time in the online phase and thus cannot be learned in advance in the offline phase. Of course, a dedicated collector can train inertial sensors offline with much time and effort, but the tremendous cost makes this process unrealistic. Sensor fusion studies so far have empirically or experimentally utilized the reliability of inertial sensors. However, since this method ignores the user's environment and movement pattern, which changes depending on location, it is difficult to obtain a highly reliable result compared to those obtained from sensors using offline training.
In the proposed method, positioning and tracking are evaluated by essentially multiplying probability; thus the sensor results are also derived as probabilities. We use Wi-Fi and geomagnetic fingerprints to compute the absolute positioning of users and analyze the signal distributions for each location to train the signal probability distribution. That is, based on the training data of the Wi-Fi and geomagnetic sensor, the probability for each location of the target region is derived from the probability model. The user's distance and direction of movement are calculated by the probability distribution of the accelerometer and gyroscope. Simultaneously, the MI is only compared with the positioning result of the inertial sensor, and the difference between the values is regarded as the error. Finally, the error distribution of each sensor is stored. If errors are stored repeatedly in this way, the mean and variance of the error distribution can be derived. In addition, as time goes by, the reliability of the inertial sensor increases, and the calculated error distribution gradually affects the tracking result. The stored error distribution shifts the values read by the inertial sensors to realize the error correction. To apply offline trained data and user movement to probability-based positioning sequentially over time, we use the HMM. The HMM is a suitable model for indoor positioning using sequential data because it can adequately express the temporal features contained in the data and deduce desired information from them. The user movement in a trace was successfully described by HMM and its variations [32].

A. WRM MODELING
The radio map constructed in Section 3 is used as the learning data to perform probability-based positioning. Constructing a radio map via crowdsourcing is the first step before applying the error correction method and is realized by matching the coordinates of the indoor floor plan. The CBRC begins by dividing the area of interest into location cells with the help of a floor plan. Formally, a two-dimensional area is modeled as a finite location-state space L, which is a set of physical locations with x and y coordinates; L = {l 1 = (x 1 , y 1 ) ,l 2 = (x 2 , y 2 ) , . . .l n , = (x n , y n )}, where the coordinates denote the center of the location cell. The radio map constructed with the crowdsourcing base collects the geomagnetic field data and Wi-Fi signals, and the norm of the magnetic strength and inclination values are stored together for each coordinate. The form of CBR is as follows: where k is the index of an AP, rssi i is the RSSI value of the ith AP, m n is the reading vector of the magnetic strength norm, and m i is the magnetic inclination value at each point L.

B. TRACKING
The proposed method searches for the most probable location l given the online measurement o by calculating the posterior priority P(l|o) for all locations. The proposed SEC compensates for sensor data errors by gradually learning the errors in the accumulated sensor data as it searches for optimal traces. During the error correction process, a combined tracking result is used to refer to the position of the user. The tracking and optimal trace are well interpreted and identified by the HMM, respectively. SEC aims at calculating the optimal trace to obtain the result most similar to the correct answer after error correction. The positioning algorithm based on the HMM operates by finding a hidden structure of user traces, which fits into the building inner layout. The HMM can estimate a current state at each time by utilizing observations, parameters, and HMM properties. Although the current state is hidden at each time, it can be estimated by simultaneously observing the previous state. Indoor localization system can apply the HMM by replacing state locations with online measurements.
On the one hand, the performance of SEC is closely related to the positioning algorithm that is used to reflect the characteristics of the CBR with the attenuation of the motion sensor errors. The proposed SEC method uses the Viterbi tracking algorithm in the HMM framework. Because Viterbi tracking is an interpretation framework that efficiently computes position changes in the HMM, it enables the stochastic tracking of historical and dynamic user movement trajectories. Probability-based Viterbi tracking is performed using the absolute positioning and INS data, and the optimal trace is found from the used position with the highest probability value. In other words, the optimal trace corresponds to the position with the highest probability of minimum error in the offline and online phase data.
HMM describes the probability distribution of an observable state vector o=<cbr 1 , cbr 2 , . . . ,cbr k >. The CBR considers a localization problem in the two-dimensional indoor space of a building. To train the localization model, physical location l should be paired with a corresponding observation o. In the HMM for fusion-based indoor localization, the hidden states H =<h 1 , h 2 , . . . ,h k > represent possible location cells in the divided area, which is constrained by the indoor map layout. The Wi-Fi and online magnetometer measurements of a user trace are treated as an emitted observation sequence from the hidden states. The HMM can thus be described using five elements λ =<L, O, π, A, B >, which include two state sets (i.e., location-state space L and observation state space O) and three probability matrixes (i.e., initial probability distribution π, transition probability A, set of emission probability B). We assume that any state l k can be the starting point, which describes the initial probability distribution π. Therefore, π was fixed in our model to provide the initial information at each point.
The transition probability A shows the user movement between any two hidden states in the space. A user cannot move through walls and other barriers from the previous state since the distance a user can walk at a certain time is limited. The implemented transition probability A was discussed in several studies using inertial sensors [33]. Consequently, users can only move from a location to nearby locations in a time interval equal to two successive observations. By assigning equal probability to all possible movements, the transition probability A can be determined from the prior information given by an indoor map and human mobility settings. From the likelihood o ∈ O at a location l ∈ L, the set of emission probability B is derived. The model parameter =<π, A, B > is adjustable through training in the HMM. Finally, from emission and transition probability, user tracking is performed. The fundamental purpose of the HMM tracking in our work is to determine the optimal hidden state sequence H * =<h * 1 , h * 2 , . . . ,h * k > by using the Viterbi algorithm. When the model parameter and observation vector O are given, the most likely state sequence H * , called a Viterbi path, can be derived from: Fig. 7 illustrates the transition probability process under inertial sensor reading errors, assuming user movement tracking has been performed from time T 0 to T 3 . The bold arrows depict MI as introduced in Section 1, while the dotted arrows represent the combined result of the heading and moving distance probability distribution at each time. At T 0 , the initial location is determined from the cbr t calculation. At this time, an error piece (EP) database is generated for recording the inertial sensor errors. The step length and heading errors are recorded in the EP database in units of cm and degrees, respectively.
Initially, the EP is uniformly set; i.e., if the inertial sensor is inaccurate, the EP highly relies on sensors that show the absolute position. At T 1 , the probability distribution of the transitions from the location at T 0 is depicted in dark blue circles, whose center is indicated by the inertial sensor readings. However, the MI and inertial sensor reading at time T 1 are not the same. This mismatch should be considered an error of the inertial sensors and corrected for the calculation of the next transition probabilities. When the mismatch occurs, the error distributions are stored as red blocks, as shown in Fig. 7. From the MI of the multiple sensors, it is possible to calculate the distance and direction of the movement between states. Both parameters are repeatedly compared with the inertial sensor reading value and recorded in the EP. Even if the inertial sensor indicates an incorrect direction or distance (blurred circle pointed at by dotted line), if the error correction is performed based on the recording of the EP, the transition probability distribution converges upward and the tracking algorithm utilizes the corrected probability distribution (dark blue circle pointed at by solid line).
The conceptual process is as follows. If the inertial sensors contain no errors, the optimal positioning results are: where n represents the time, i represents the inertial sensor type, TV i,n is the true value with no errors, and IR i,n is the optimal result. However, as shown in Fig. 7, the data of the inertial sensors contain errors that are inputted in the SEC system, reducing positioning accuracy.
where SR i,n is the sensor reading containing error, and ER n is the estimated result derived from SR i,n . Therefore, the final goal of the proposed SEC is to induce ER n to have a value as close to IR n as possible. To achieve this, a filter f that outputs a correction value is used, and the revised estimation error can be obtained by: The correction value of filter f gradually converges to a negative number of the bias value, bias i,n , because the bias is constant as long as the environmental factors are homoge-neous. This is expressed as: In the SEC, both the error factor for the sensor input value and the bias value are corrected. Ultimately, only the fluctuation error remains, which is corrected as shown in Fig. 7.

V. EVALUATION
Experiments to validate the proposed CBRC and SEC were conducted in office building N5 at the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. Test bed N5, which consists of corridors and rooms, is a typical configuration of an office building. The test area subject to CBRC verification included the 1 st , 2 nd , and 3 rd floors of the N5 building, and the total corridor length of the test area was 296 m. Fig. 8 (a) shows the layout of the verification zone. The physical way-link graph for the test zone is shown in Fig. 8 (b).
The derived physical way-link graph has 4 nodes and 4 edges; each node and edge contain location coordinate and length information, respectively. There is one corridor corner, which is not considered an intersection and thus cannot be represented as a node in the physical way-link graph. Therefore, the coordinates of each corner and the information about the distance from the node are stored at the edge representing the corridor. The collector acquired a continuous fingerprint to be used as verification data by walking through all areas of building N5. A total of 784 fingerprints were collected and divided into 10 fingerprint groups to form a total of 78 traces.

A. Wi-Fi PAIR-WISE EVALUATION
Experiments for Wi-Fi pair-wise sequence alignment were conducted on the 2nd floor of building N5, and Samsung Galaxy S was used as the experimental device. Fig. 9 shows the trace alignment in the optimal environment for two traces moving at the same speed. The overlapping parts of the two traces are marked with the correct collection time through human input. The red points correspond to the measure points estimated to be the same. Because the users were moving at almost constant speed, most Wi-Fi measure points are 1by-1 matching. In addition, we observe that the result of backtracking exactly matches the overlapping part of the two traces. Fig. 10 shows the alignment results of the two traces moving at different speeds. In sequence alignment, only 1-by-1 matching is possible. However, in the case of traces, when the user speeds are different, one or more points of a slow trace may correspond to one point of a fast trace. Therefore, in the trace alignment, 1: N matching was allowed by partially modifying the scoring and backtracking techniques, so that traces with different speeds could be effectively merged. In addition to the velocity, the Wi-Fi scanning cycle may be different depending on the device type or OS version. In this case, the collected data can be effectively merged using the same method. This is because the collected data are similar to those acquired for users moving at different speeds.
The direction vector of traces used for trace alignment is unknown. However, it is possible to know the directional relationship between traces owing to their directionality. In this case, if trace directionality coincides with the merging direction, it is possible to successfully merge the traces in unknown directions. Fig. 11 shows the traces that pass through the same path but have different progress directions. One trace can be reversed and merged even in the opposite direction because the signal characteristics are similar between the measurement points of similar locations. However, a big difference in the degree of alignment is observed. Therefore, the effect of alignment can be maximized by merging the traces in a direction with a high alignment degree. Fig. 12 and Fig. 13 show the trace diagram composition composed of 78 traces and the intermediate steps of the configuration. Fig. 12 (a) illustrates the final trace diagram generated directly from trace alignment. Fig. 12 (b) is obtained by applying transitive path reduction, partial bubble reduction, bubble reduction, and small cycle removal techniques on     the final trace diagram. Fig. 12 (b) is not a minimal form, as there is no optimal form of intersection yet, and a false path appears. To this end, false path trimming, transitive edge reduction, and trace-to-path substitution were applied, and a minimal trace diagram of the target area was derived, as shown in Fig. 13 (a). The derived diagram includes four paths and undirected edges showing mutual reach between them. Therefore, a logical graph, shown in Fig. 13 (b), was obtained by expressing the trace diagram as a general graph.  The target area is represented by a symmetrical graph, and it is difficult to obtain a unique matching result using a general graph-matching technique. Therefore, the unique matching result was derived by using the edge and VOLUME 8, 2020 logical length information on the physical way-link graph and logical graph, respectively. The distance information used is summarized in Table 1. Since the distance of the logical edge represents the distance in the signal space, it is not always proportional to the actual distance. Therefore, referring to the logical edge is recommended only in the case of a symmetric graph.

C. LOCATION LABELING ACCURACY TEST
The proposed method was able to predict the coordinates for 650 out of the 784 fingerprints used for verification. This number excludes the traces determined as false paths when constructing the trace diagram and the traces where the path belonging to the intersection is uncertain. The coverage of the prediction technique is 83%. The average error distance corresponds to the average distance difference between the predicted and actual coordinates of each fingerprint and indicates the accuracy of the prediction. This index was calculated for 539 out of the 650 predicted fingerprints, excluding fingerprints for which the actual location was uncertain because the traces were collected while users were moving on stairs. The error distance was calculated to be 3.38 m. In the target area, a total of one inter-floor prediction error occurred, showing a 99.8% floor prediction accuracy.   15 visualizes the prediction results for the 1st, 2 nd , and 3 rd floors of the target area. The actual location of the fingerprint is indicated by the colored square; the predicted fingerprint location is represented by the hollow square on the physical graph of the target area. To highlight the difference between the actual and predicted locations, the corresponding squares are connected with gray lines. Therefore, the prediction is considered accurate when the gray line is vertical. The location of the fingerprint collected on the stairs can be predicted but only the predicted location is displayed because the actual location is uncertain.
The area were fingerprints are predicted by a single corridor as a single edge in the logical graph is marked with a dotted line. As shown in Fig. 15, this area coincides with the actual corridor except for a few fingerprints at the intersection. In the case of the third floor (marked with red dots), the result of the prediction is left skewed. This can be considered as a prediction of the corridor area being skewed left or an error in the fingerprint arrangement. In the case of the second floor, the predicted location is more centered than the actual location, and in the central region, the order of the predicted fingerprint is slightly different from the actual one. This can also be regarded as an error in the fingerprint arrangement; therefore, a fingerprint arrangement technique that can more realistically reflect space should be implemented.

D. TRACKING ACCURACY TEST
We evaluated the performance of the proposed SEC in terms of tracking accuracy. Wi-Fi and geomagnetic sensor data were used to calculate the emission probability, while gyroscope and accelerometer data were used to compute the transition probability. For the target area, the CBR constructed in Section 3 was deployed. In testbed N5, a total of 257 APs were detected, but only 32 APs were used for the experiments to confirm the effectiveness of the SEC more accurately.
Approximately 500 values of magnetometer data were collected per cell for a total of 690,000 measurements at N5.
The total length of corridors in N5 is 296 m, and 2,725 test measurements were collected in 10 test traces. To construct the CBR for the magnetometer, the norm values of the x, y, and z axes were assigned to the magnet intensity, and the averages were used along with the inclination values. The average values of the Wi-Fi and magnetic field in each HMM cell of the test area were stored as training data along with the location coordinates extracted during the construction of the CBR. Because SEC uses probability-based localization, the signal distribution of the collected training data was analyzed and used in the online phase. Finally, the test point coordinates were registered in the test data, and the accuracy was calculated using the difference in distances between the ground truth and test point.
For the validation of the SEC method, we considered the case of tracking SEC while only correcting either the heading data or the step data, or both the heading and step data simultaneously, and performing no error correction. These variations are denoted by SEC and no error correction (NEC), as per the experimental condition. The abbreviations and mean distance error for the methods implemented in this experiment are summarized in Table 2. The accuracy of each method is determined by the cases with no error (denoted by ANS) in either the step or heading data and those with no error in neither the step nor heading data. The purpose of the experiment is to evaluate how the proposed SEC method corrects NEC, and thus approach the values obtained by the ANS. Fig. 16 presents the tracking accuracy with respect to the distance data in testbed N5. The tracking accuracy of the NEC, SEC, and ANS converge at 3.5 m, 3.4 m, and 3.2 m, respectively. SEC achieves an improvement in accuracy of 2.86% compared with NEC. Although the accuracy is further improved by ANS, the difference in accuracy between ANS and NEC is only approximately 0.3 m. Fig. 17 presents the tracking accuracy with respect to the heading data. NEC,  SEC, and ANS converge at 3.5 m, 2.8 m, and 2.4 m. Compared with NEC, SEC converges with a 20% improvement in accuracy. ANS converges with an 0.8 m improvement in accuracy for the heading data compared to the distance. SEC exhibits a similar pattern as it converges with an 0.7 m improvement in accuracy for the heading data. Finally, SEC has similar accuracy to NEC for both distance and heading data before sequences 50 and 20, respectively. This means that the effect of SEC becomes evident near these sequences. In fact, the accuracy continues to improve and reaches a maximum at sequences 90 and 70, respectively. Although the EP database is generated for the sensor bias and applied to the probability distribution, it does not substantially improve the accuracy compared to NEC. In the case of SEC, the improvement in accuracy for the heading data with respect to that for the distance is less than 0.2 m. This is because the deviation in the stride is not large, even if SEC is applied. By contrast, the accuracy is highly improved from sequence 20 in the case of SEC in Fig. 17 because the weight is accumulated in the value obtained by applying the EP database to the heading distribution. A comparison of the results of Fig. 16 and Fig. 17 shows that the accuracy improves and degrades intermittently because the methods shown in Fig. 16 correct the distance error but not the heading error. This indicates that the bias of the heading data degrades the tracking result relatively strongly. Fig. 18 shows the tracking accuracy for the simultaneous correction of distance and heading sensor data in testbed N5. SEC converges at 2.6 m and ANS converges at 2.2 m. Here, SEC improved accuracy by about 25.7% with respect to NEC. The accuracy begins to improve as SEC moves past sequence 20, similarly to the results for SEC presented in Fig. 17. Even in ANS, where there is no inertial sensor error in the sensor data, the accuracy for the simultaneous correction of distance and heading sensor data is improved by 31.25% and 8.3% compared to the ANS accuracies of Fig. 16 and Fig. 17, respectively. The relatively low improvement rate of ANS shown in Fig. 16 with respect to that of Fig. 17 confirms that the moving direction of a user has a higher effect on the tracking result. In this regard, if the conditional probability of the moving distance and direction is applied, the accuracies of SEC and ANS in Fig. 18 are expected to improve.

VI. CONCLUSION
This paper presented a crowdsourcing-based radio map construction and error correction method for user tracking. It was confirmed that the proposed method could construct an accurate WRM for data collection without human effort and provide accurate tracking result in an indoor environment. When performing a fingerprint arrangement in CBRC, errors may occur during path prediction, which need to be eliminated by accurately reflecting the actual space. In addition, it is necessary to study outliers that may occur when conducting SEC or analyze convergence time. This remains to be investigated in future work. We expect that this study will provide the basis in the integration of real-time error correction in different research fields.