Deep Learning-Based Multi-Floor Indoor Tracking Scheme Using Smartphone Sensors

Having recently become an important research topic, indoor tracking in a multi-floor building delivers comprehensive and efficient location-based services. In this paper, we present a deep learning (DL)-based indoor multi-floor tracking scheme, that is independent of infrastructure and only uses the smartphone as a terminal device to measure and analyze the user’s mobility information. Our method detects the floor transition according to changes in barometer readings. We compiled the time-series barometer data to train the DL model, and applied the data augmentation method to avoid overfitting and data imbalances during model training. Furthermore, we developed a floor decision algorithm to process the DL model’s output and generate the floor detection result. In the proposed scheme, the smartphone’s inertial measurement unit sensors are used to measure the user’s mobility information, and pedestrian dead reckoning (PDR) is exploited to update the user’s 2D location. We implemented the multi-floor tracking by combining the floor detection algorithm with PDR. To avoid the accumulated error problem that commonly arises in the infrastructure-free approach, the calibration nodes (CN) were configured in the floor plan to correct the estimated location by matching the possible CNs during the floor transition. We conducted several experiments in multi-floor buildings to evaluate our scheme’s performance, and found that our floor detection method achieves a 99.6% average floor number accuracy, with all floor transition types (i.e., stairs, elevator) being successfully recognized. Furthermore, we compared localization performance with the conventional methods to validate the effectiveness of our approach.


I. INTRODUCTION
Over the years, localization techniques have been widely applied for location-based services (LBS). In outdoor environments, a pedestrian's location can be estimated via wireless signals transmitted from a satellite, as seen in the global positioning system. In indoor environments, however, localization generally requires higher accuracy. Meanwhile, the occlusion of a building makes satellite signals inaccurate or unusable. To provide better LBS in indoor environments, technologies dedicated to indoor localization are rapidly being developed. Considering increasing smartphone penetration and various micro-electromechanical systems (MEMS) sensors integrated into smartphones, research on smartphonebased indoor localization systems is being actively conducted [1]- [3]. Smartphone-based indoor localization technologies can be broadly classified as infrastructure-dependent or infrastructure-independent technologies [4]. Infrastructuredependent technologies which provide absolute positional information according to the strength of the signal received from the building infrastructure, include ultra-wideband (UWB) [5]- [7], Wi-Fi [8], [9], Bluetooth low energy (BLE) [10]- [12], and others. These technologies generally face issues such as path loss, shadowing, and fading. Furthermoore, because these techniques depend on anchor nodes, they are not always available in specific situations, such as during disasters or power outage.
As an infrastructure-independent technology, pedestrian dead reckoning (PDR) is an alternative indoor localization method that exploits the user's mobility information measured by the inertial measurement unit (IMU) sensors inside the smartphone to update the location of each step. The smartphone's IMU typically consists of an accelerometer sensor, gyroscope sensor, and magnetometer sensor that measures the inertial generated by the user's movement. As a relative method, PDR derives the user's current location from the previous estimation; thus, the errors (e.g., missing steps, inaccurate stride length and heading direction estimation) accumulate over time. Due to the limited accuracy of consumergrade MEMS sensors in smartphones and the inevitable noise present in indoor environments, PDR only achieves high performance within a short tracking trajectory. To improve the performance of smartphone-based PDR methods, fusion algorithms such as Kalman filter [13]- [15], complementary filter [16], [17], and particle filter [18]- [20] have been developed to combine signals from multiple sensors according to certain criteria; thus reducing uncertainty in the results and optimizing the accuracy. Furthermore, several studies focused on components of the PDR method. Zhang et al. improved step detection by filtering and differentiating the acceleration data [21], and Wang et al. used the walking speed as an influencing factor of stride length to achieve a stable dynamic stride length estimation [22]. In [20], a particle filter and an extended Kalman filter were introduced to calibrate the output deviation of magnetometer within soft/hard iron effect environments. Although these approaches mitigate the accumulated error problem in PDR, the absence of estimated location calibration or boundary constraints still results in errors that accumulate over time.
Furthermore, whereas most buildings are multi-layered structures, the calculation result of PDR represents a twodimensional (2D) location. In an indoor environment, the altitude information can be represented as floor level and determined by detecting floor transition events. Indoor multi-floor localization can be achieved by combining the pedestrian's estimated altitude and 2D location. The smartphone sensor readings correspond to different patterns as the pedestrian walks in different areas, and these signals can be exploited to achieve floor detection. [23]- [25] localize the floor according to information extracted from acceleration. [23] converts the step counts and traveling time to the number of floors traveled, and [24] proposes acceleration signatures for six types of user movement, which can be used to verify the floor number. Because the signal of height change can be captured by the smartphone's barometer, this feature was incorporated into a floor detection technique. An approach was proposed in [26] to extract the patterns between the floor levels and barometer readings according to the measurements on different floors at different times. Ye et al. in [27] built the barometer fingerprint of a building by exploiting crowd sourcing which includes the barometric pressure value for each floor, and localized the floor by matching the calibrated barometer readings with the fingerprint. Typical indoor localization scenarios involve medium to large buildings (e.g., malls, school buildings), and the barometric is sensitive to environmental conditions. Consequently, the barometer data collected from different areas may exhibit different characteristics. However, some methods only yield high accuracy in uniform environments such as closed corridors or have a large delay for determining the floor change event.
Recently, deep learning (DL) approaches have been widely applied in LBS. DL methods convert raw input features into abstract, higher-level representations by composing nonlinear modules to learn the relationship between the input and output data [28]. Owning to their powerful learning capabilities, DL methods generate efficient solutions for localization applications. Therefore, they have been utilized to extract features from received signal strength (RSS) datasets [29], [30], optimize step length calculations [31], [32] and heading direction estimation [33]. However, most DL applied in LBS involves supervised learning, which poses a challenge in terms of the costs of training data collection and labeling. Morevoer, considering the computational and energy limitations of the mobile operating platform, the DL model should be low complexity and lightweight.
This study proposes an infrastructure-free DL-based indoor multi-floor localization scheme using smartphone sensors for the following purposes.
• Provide a DL-based floor transition detection approach capable of promptly delivering stable and accurate floor detection results in complex indoor environments, as well as distinguish the transition types and direction of usage. • Realize the multi-floor tracking base on the 2D location and altitude information obtained from PDR and floor detection. • Correct the estimated location by matching possible calibration nodes (CN) to prevent error accumulation.
To meet these goals, the proposed scheme records barometer readings as the user walks across various regions (e.g., floor, stairs), takes the time-series barometric pressure data as input data, and uses the real action of each step as label data for supervised learning. Next, a DL model is built to extract the predictive relationship between the changes in barometer readings and corresponding step actions from the shape of data, and then convert the data to floor transition results using a floor decision algorithm. To improve the performance of the DL model, we propose a data augmentation method that handles the issues of imbalanced data, overfitting, and cost of dataset preparation. According to the floor (transition) detection, we improve the PDR estimation and developed a multifloor tracking service by combining altitude information with 2D location. Moreover, to prevent the error accumulation over time, the proposed scheme sets up CNs on each floor. Once the floor transition signal is detected, the estimated location can be corrected according to match the CNs. To ensure the accuracy of floor detection, the decision algorithm waits for the DL prediction of several steps whenever the user's walking state changes. Thus, the proposed scheme is best suited for situations where an output delay is acceptable. The rest of the paper is organized as follows. In Section II, we present the proposed scheme, which describes each step of the overall multi-floor localization process. The experiments and their results are analyzed in Section III, and the concluding remarks are presented in Section IV.

II. COMPONENTS OF THE PROPOSED TRACKING SCHEME
As illustrated in Figure 1, while the user walking in an indoor environment, our proposed scheme collects data using the IMU and barometer sensors in the smartphone, and feeds them into the localization engine that consists of the following three components: (1) Floor detection is performed first to detect the floor transition and roughly estimate the current user's location. (2) PDR updates the 2D location according to the mobility measured by the IMU sensors, and the floor detection output is used to improve the PDR estimation. (3) The location correction combines the estimated 2D location with the floor transition signal to calibrate the localization by CN matching, and outputs the current location in the multifloor building.

A. DL-BASED FLOOR DETECTION
The floor transition detection component in the localization engine operates as shown in Figure 2. In this subsection, the step action prediction scenario and data collection process, the data augmentation and preprocessing to obtain training data, the hyperparameter setting and training of the model, and the floor decision algorithm to process the output from the DL model are described in detail.

1) Step Action Prediction Scenario and Data Collection Procedure
Because atmospheric pressure is inversely proportional to height, the altitude information of the user holding a smartphone can be measured via barometer readings. In an ideal case, the barometer measurements should be similar while the user walking along a flat floor, and change significantly when descending or climbing stairs. Therefore, the action of a step can be recognized by comparing the barometric pressure values of adjacent steps (e.g., difference, slope). However, barometric pressure is easily affected by the environment features, such as changes in building structure during walking. To demonstrate the effect of environmental conditions on barometer measurements, we collected barometric data in two regions: a closed corridor, and a combination of a closed corridor and open corridor, as shown in Figure 3. We shifted the barometric values close to 0 for easy visual comparison. In the first region (blue), the barometer readings did not change significantly on the flat floor, and exhibited a clear altitude correlation; however, the second region (orange) exhibited significant variations in barometric readings even over the flat surface. Therefore, it is difficult to determine whether a change in barometer readings is caused by noise or floor transition solely using the barometric pressure values of adjacent steps. If the prior barometer readings are smooth, we presume that the current barometric fluctuation is caused by changes in altitude; whereas if the prior barometric pressure values are unstable and sloppy, and the current reading manifests a similar instability, the barometric fluctuation is likely caused by noise. In this study, the step action recognition is approached as a classification problem, and the changes in barometric readings were used to generate the probability distribution corresponding to each action.
Because the changes in barometric pressure caused by user movement are largely low-frequency, lowpass filters are generally employed in data collection to eliminate highfrequency noise and provide a better dataset for DL training. We used a method called a weight smoothing to smooth outlier data as follows [34].
where x t is the sensor reading sampled at time t, B d t is the barometric pressure value after applying a lowpass filter, and β is a weight factor to control the effects of x t on B d t . The efficiency of denoising depends on the sampling frequency and the magnitude of β. Although values of β closer to 0 VOLUME 4, 2016 indicate a more significant denoising effect, they also cause considerable delay. During data collection, the corresponding barometer data is stored in the smartphone database whenever a new step is detected. When floor transition occurs, the current number of steps is recorded to label the collected data. For example, if a user goes upstairs at the 50th step and back to a flat floor at the 70th step, the ground-truth label of the 50th to 70th steps is "Go upstairs." Data were collected in Hyeongnam Engineering Building at Soongsil University.

2) Data Augmentation and Preprocessing
In DL, the reliability and quality of the dataset directly determines the model's performance. For example, imbalanced data leads to a heavy focus on the majority class, which creates an underfitting problem for the minority class samples. In contrast, insufficient learning data causes an overfitting problem, as it is difficult to procure comprehensive representative information from a limited dataset. These problems are common in DL because data collection is expensive, and the amount and quality of collected data are often lower than expected, which causes machine learning algorithms to underperform. We identify four events in the pedestrian floor transition: (1) from floor to downstairs, (2) from downstairs to floor, (3) from floor to upstairs, and (4) from upstairs to floor. For example, when a user ascends the stairs from the first floor to the second floor, they first transition from floor to upstairs, and then go from upstairs to floor. The number of events for the four types in the dataset should be similar to prevent unbalanced performance in the model (e.g., achieves a good action recognition accuracy for upstairs steps but inaccurate recognition for downstairs steps). Therefore, we utilized a data augmentation method to prevent the deterioration of learning performance caused by insufficient and imbalanced data as well as reduce the cost of data collection.
The international barometric formula, which constructed a relationship between height h and barometric pressure, is given by [35] h(p, p 0 ) = 44330 · 1 − p p 0 where p and p 0 are the measured pressure from the barometer and reference pressure at sea level in mbar, respectively. Although the calculated absolute altitude is inaccurate owning to atmospheric pressure drift arising from weather, (2) shows that the pressure changes caused by ascending and descending for the same distance are almost identical. According to this symmetric feature, the decreased pressure data caused by going upstairs can be augmented by subtracting the increased pressure caused by going downstairs. Thus, the data augmentation in the proposed scheme was performed as follows.
Here, δ is the pressure change between adjacent steps, N is the total number of collected data, and B a is the barometer data obtained by data augmentation. Subsequently, the training dataset B was obtained by concatenating B d and B a . By performing data augmentation, the sample size of the dataset almost doubled, and the balance of the dataset was maintained.
Owning to the large values and small variation of barometric pressure, the training data need to be converted into forms suitable for learning through preprocessing. (6) and (7) show two feature scaling methods commonly used in data processing, min-max normalization and standardization, respectively. Normalization scales the data into a specific range such as [0, 1] according to the maximum and the minimum values of features, while standardization transforms features to a distribution with zero mean and unit standard deviation. Because the collected data contains unavoidable outliers as well as the maximum or minimum values are easily replaced when the floor level updated, normalization is not suitable for this study. Standardization does not influenced by outliers since the data are not restricted into a predefined range, but it changes the magnitude scale of features. We want to retain the unit of features to express the real barometer reading changes, thus mean centering is applied in our work, as shown in (8).
Here, µ is the mean of features, and CB is the centered barometric data. Because the DL model predicts the action of a step based on the changes in barometer readings, we used the time-series pressure data to represent pressure changes during walking. A sliding window algorithm was used to obtain subsets WB of the dataset as the time-series pressure data.
where s is the sliding window size. Because the window size should be large enough to contain the pressure change information, we set s = 15. WB k contains the barometric pressure change in s steps, corresponding to the action of k-th step. The proposed DL model requires s barometer readings to predict an action, which means it has to wait until s step data is collected at start of tracking. We will discuss how to overcome this drawback in Section II-C2.

3) Model Training and Learning Results
In this study, step action recognition is treated as a classification problem. In time-series analysis tasks, the recurrent neural network (RNN) [36] is widely employed because it includes internal memory, and can therefore consider previous inputs when making a decision. However, the multi-layer perceptron (MLP) [37] is sufficient to learn the relationship between pressure changes and the action of a step through WB to provide accurate step action prediction. Moreover, MLP trains and runs faster than RNN, making it more suitable for mobile platforms. In this study, MLP was used to implement step action recognition. Figure 4 illustrates an overview of the proposed MLP model. The model takes the WB k as input features, and outputs the probability of the three classes (i.e., "Normal walking", "Going up", "Going down") regarding the k-th step after passing through hidden and output layers. The parameters used in the DL model are listed in Table  1. There were 677 samples collected for model training, and the data size were extended to 1,125 after performing the data augmentation. Because the collected sensor data include outliers, the robustness to perturbations of the DL model should be considered in model training. Therefore, the batch normalization layer was inserted after each hidden layer to maintain neuron activations at unit variance and zero mean by shifting and scaling operations, and the scaled exponential linear unit (SELU) was used instead of rectified linear unit (ReLU) to induce self normalizing properties and prevent the ReLU units be killed off by dying ReLU problem [38]. Moreover, the LeCun normal initializer was used to initialize parameters with a scaled Gaussian distribution, thus improve performance as well as training speed [39]. A simple step action recognition experiment was conducted to demonstrate how the model predicts the action of a step, as shown in Figure 5. The barometer data were collected as the user ascended the stairs from the first floor to the second floor, and then took the elevator back to the first floor. From Figure 5, we noticed that the model detected not only the flat floor and stairs, but also the height change caused by the elevator. In addition, the model is sensitive to sudden changes in barometric pressure, and it was immediately able to detect the first step after floor changes when the user took the elevator. Due to the cost of dataset creation and the model size in the mobile platform, we did not design a complex MLP model with a larger dataset to classify the action more finely, such as by making a distinction between elevators and stairs. Instead, these functions were realized in the floor decision algorithm based on the characteristics of model prediction in stairs and elevators we observed in Figure 5.

4) Floor Decision Algorithm
As shown in Algorithm 1, the floor decision algorithm in this study was developed for the following functions: (a) convert the step action to the floor number, (b) handle any incorrect predictions arising from outlier data, and (c) identify the floor VOLUME 4, 2016  Wait for prediction of next step 29: end if 30: end if transition type. Because the prediction received from the DL model is the action of a step, it needs to be converted to a floor number to represent the altitude information in a multifloor building. The step action of "Going up/down" implies that the user is going upstairs/downstairs, while "Normal walking" means that the current floor remains flat. Therefore, we can assume that the user enters the transition zone by detecting the walking state transfers from "Normal walking" to "Going up/down." Likewise, the user exits the transition zone when their state transfers back to "Normal walking." The floor number is updated based on these signals. However, due to various noise recorded during the measurement, the model may produce incorrect predictions that severely affect the accuracy of floor (transition) detection. Therefore, floor decision algorithm compares the received output from the DL model to that found in the previous step. If the state has not changed, the same floor detection result is returned.
Otherwise, the algorithm waits for N wait steps and saves the step data in queue to confirm the possible effect of outliers, where the state of the steps in queue is determined following the decision result. A small magnitude of waiting threshold N wait implies that the floor detection result can be obtained immediately but is easily affected by incorrect prediction, whereas a large N wait means that floor detection is robust to the outlier data but requires a N wait step waiting period when the user changes floors. Regarding the transition type, because the model is sensitive to sudden change of the pressure values, the algorithm records the barometer reading of the current step if the predicted action is different from the previous step, and calculates the difference when the state change is confirmed. The transition type can be obtained easily from the difference, since the pressure change caused by the elevator is much larger than that caused by stairs. Therefore, the possible outputs of floor decision algorithm are a floor number, "Stairs up," "Stairs down," "Elevator up," and "Elevator down." The floor number indicates the current floor where the user is located, and the rest are used to represent floor transition information.

B. PDR METHOD USING SMARTPHONE SENSORS
In this subsection, we introduce the smartphone IMU-based PDR used in this study. The proposed scheme assumes that the smartphone always points forward, which means the device heading and walking direction are identical; thus, they are both referred to heading direction in throughout the paper. The user mobility information can be obtained using the SensorEvent and SensorManager classes in Android OS [40], [41], or CoreMotion and CLHeading classes in iOS [42], [43]. A typical IMU-based PDR method composes of step detection, stride length estimation, and heading direction estimation, as well as provides the polar vectors such as {step length, heading direction} that can be summed to estimate the location of each step. Next, we describe how the three basic functions of the PDR method work in practical localization.

1) Step Detection
Step detection is typically implemented by detecting the peaks and valleys of vertical acceleration generated by feet periodically touching the ground during walking [46]. To obtain the vertical acceleration, the raw acceleration values measured in the local coordinate system (LCS) need to be transferred to the global coordinate system (GCS) through the rotation transformation. The rotation matrix R can be obtained by the cross-product of a gravity vector from the accelerometer and a magnetic vector from the magnetometer. We used getRotationMatrix function provided by Sensor Manager to perform this calculation. Subsequently, the acceleration in GCS A g can be computed from the acceleration in LCS A as Assuming that a z is the vertical acceleration in A g , step detection can be expressed as {a z k > a upper , a z k+p < a lower , p min < p < p max }. (11) Here, a upper and a lower are the positive and negative threshold, respectively, and we set a upper = 0.9m/s 2 and a lower = −0.8m/s 2 . p min and p max are the minimum and the maximum duration of one step, respectively, and we set p min = 3 and p max = 12, which correspond to 0.15s and 0.6s under 20Hz of sampling frequency. Once a step is detected, PDR calculates the corresponding stride length and heading direction.

2) Stride length estimation
A commonly used approach that describes the relationship between stride length and vertical acceleration is given in [44].
where λ k is the stride length of the k-th step, a z max,k and a z min,k are the maximum and the minimum vertical acceleration during the k-th step, and τ k is the coefficient for adjusting the magnitude of the λ k .
When a user takes stairs, the variance in vertical acceleration is generally greater than that during normal walking due to gravity. Therefore, by calculating (12), the estimated step length when taking the stairs is usually slightly longer than that during normal walking, whereas the real step length is typically similar to or even shorter than that during normal walking owning to the length limitation by the stairs. To obtain an accurate step length, we need to compensate for the stride length estimation by adjusting the parameter τ in (12). According to the floor transition detection method presented in Section II-A, we know the region (e.g., a certain floor, stairs) where a detected step is located. Therefore, τ 1 and τ 2 are given to compensate for the step length calculation in different regions. In this study, we set step length coefficient τ k of the k-th step as where τ 1 = 0.7τ 2 , and τ 1 and τ 2 can be determined empirically or from the building information. By distinguishing between stairs and normal walking, the stride length estimation in multi-floor tracking is improved. The accelerometer reading is mainly affected by the force generated by user movement rather than environmental factors; thus the performance of step detection and stride length estimation in an indoor environment is almost identical to that in an outdoor environment.

3) Heading Direction Estimation
The traditional way to estimate the orientation of a device is to compute the pitch and roll angle from the accelerometer reading and the azimuth angle from the magnetometer reading, and then combining them to obtain a vector that represents 3-axis orientation [34], [45]. Another way to update the orientation is by integrating the angular velocity measured by the gyroscope sensor to obtain the 3-axis rotation angle during a sampling time interval. We denote α am k as the orientation calculated from accelerometer and magnetometer sensors, and ω k as the gyroscope measurement at the k-th step. α am k provides the absolute heading angle relative to the GCS, which is prone to errors from magnetic interference, whereas ω k is characterized by short-term accuracy but suffers from the drift problem. In this study, we estimated the orientation α f k by fusing the α am k and ω k as follows to compensate for their characteristic disadvantages.
Here, ∆t is the sampling time interval and β is the parameter that determines fusion proportion. We set β = 0.98, which means that the heading update was dominated by gyroscope readings, and the accelerometer/magnetometer was used to compensate for the gyroscope drift over time.
Note that although there are many approaches to implement the components of PDR, the optimal solution is still an open problem. We did not focus on providing the best PDR strategy, and the appropriate implementation of a PDR component can be chosen depending on practical use. After the step length calculation and heading direction estimation, the location at the k-th step P k (x k , y k ) is updated as

C. LOCATION CORRECTION AND TRACKING PROCEDURE
In this subsection, we introduce location correction to prevent the accumulation of estimated error, and the complete indoor multi-floor tracking procedure. From the DL-based floor transition detection method introduced in Section II-A, we know whether a detected step serves as a transition. The proposed scheme uses these transition signals to correct the estimated location, and then combines the altitude information with PDR tracking to generate the multi-floor localization result.

1) Calibration Node Matching-Based Location Correction
We observed that the design of the common facilities distribution in the building always considers the balance between convenience and cost due to limited resources. Therefore, common facilities (e.g., toilets, stairs) that perform the same function on each floor are distributed as evenly as possible over a wide area. When a pedestrian changes floors, we know that they are near a specific regions, such as stairs or an VOLUME 4, 2016 elevator. From this observation, we create CNs near stairs and elevators. Once a floor transition signal is detected, the proposed scheme retrieves the possible CNs and calibrates the estimated location through the closest CN's information. When a new step is detected, the proposed scheme not only calculates the step length and direction, but also estimates altitude information, represents the rough region of the user's location. By comparing the region of the current step to that of the previous step, we can confirm whether the user has transitioned between regions. The following information can be obtained when a signal of region change is detected: (i) the region where the previous step is located, (ii) the direction of stairs/elevator usage, and (iii) the transition type. This information limits the possible locations of the user to a few places. Figure 6 shows an example of the user changing floors. Assuming that this floor has only one staircase and elevator, we know that the user was previously on the second floor according to (i), is heading downstairs according to (ii), and is taking the stairs rather than the elevator according to (iii). With the presence of a nearby CN, the estimated location can be directly calibrated based on the CN's information.
Before using the information, the CN profile needs to be defined for matching. We generated CN profiles according to facility usage and location, such as CN set = {P cn j , F eature j , j = 1, · · · , n cn }, where P cn j = (x cn j , y cn j ) is the location of the CN, F eature j = <previous region, current region> is the CN characteristic, and n cn presents the total number of CNs. Here, the previous region in F eature j corresponds to the (i), and the current region corresponds to (ii) and (iii), as the height information obtained from floor detection denotes both the transition direction and type. Thus, there is at least one CN for matching and correction when the user changes regions. Next, the location correction is performed according to the following rules: • Once a signal of region change is detected, we only retrieve the possible CNs at the first step of the new region, and then generate a list that includes the possible CNs. • We compute the Euclidean distances between the current location and the CNs in the list by (17). If the length of the list is 1 (i.e., only one possible CN), then match this CN directly; otherwise, match the closest CN.
• We denote the distance of the closest CN from the current location as d c . If d c ≥ 1.5m, the proposed scheme sets the location of the closest CN as the calibrated location. Otherwise, the current location would not be corrected.

2) Indoor Multi-Floor Tracking Procedures
In this subsection, we describe how the proposed scheme performs multi-floor tracking in an indoor environment. Our scheme consists of a smartphone and a CN profile database, which is applicable in areas that cannot be covered by infrastructure-dependent approaches. As the user carrying a smartphone walks inside in a building, the scheme employs the various sensors in the smartphone to measure mobility information over time and update the location of each step, as well as corrects the location by matching it to a CN. Application initialization: Regarding the floor transition detection presented in Section II-A, there are two problems that need to be considered first in the online phase, as follows.
(a) The DL-based step action recognition is performed by receiving s (e.g., s = 15) barometer readings as input features, and predicts the action of the current step, which implies that the DL model has to wait until s step data are collected. This is a severe delay, since the user may have already changed floors once during those s steps. (b) In the data preprocessing stage, we execute the mean centering operation to the training data by subtracting the mean of the barometer data. This is impractical in the online phase since the mean of the measured pressure data is unknown in advance. We need an alternative to shifting the barometer readings close to 0. In order to overcome the problem described in (a), our scheme generates s − 1 synthetic barometer data based on the reading of the first detected step to simulate barometer data before the first step. Since the identical pressure data causes sensitivity to pressure change, random variables are generated based on the normal distribution. Assuming that the barometer reading of the first detected step is B d 1 , we generate s − 1 barometer data B gen ∼ N (B d 1 , 0.15) as the simulated pressure data. Subsequently, we concatenate B gen and B d 1 to obtain the {B gen 1 , · · · , B gen 14 , B d 1 } as the first input data of DL model. The simulated pressure data could be gradually replaced by the measured data as the tracking progresses until all input features are measured pressure values. Regarding the problem described in (b), because the mean centering is unusable in the online phase, we center the barometer readings by subtracting the first barometric value. The pressure value used for centering can then be updated after a certain tracking path length to avoid the effect of atmospheric pressure drift. 8 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and Multi-floor tracking: As illustrated in Figure 7, the proposed scheme executes as follows.
(1) The sensors in the smartphone continuously measure and analyze mobility data during user movement. When a step is detected, the DL model predicts the step action and delivers its height attribute through the floor decision algorithm. The height information of a detected step is represented as a rough region, such as a certain floor, "Stairs up," etc. (2) After the height information has been determined, we detect the region change by comparing adjacent steps.
If the region has changed, the scheme retrieves the possible CNs from the database and corrects the user's location by matching it to a CN according to the location correction rules described in Section II-C1. In particular, if the floor transition type is stairs, the scheme performs twice the location calibration to eliminate the localization error generated in the flat floor and stairs. If the floor transition type is elevator, the scheme only performs once the location calibration, since the elevator transports the user vertically without 2D location changes. (3) If the region has not changed, the scheme estimates the stride length λ and heading direction α f z of this step according to the smartphone-based PDR method. The step length calculation uses the two gain coefficients τ 1 and τ 2 , corresponding to stairs and other cases respectively, to improve the accuracy of stride length estimation. Afterwards, PDR updates the current location according to (15). (4) Finally, the scheme combines the height information obtained from DL-based floor detection with the 2D location obtained from PDR, to generate the multifloor tracking result.

III. EXPERIMENT RESULTS
In this section, we present several experimental results to validate the proposed scheme. The smartphone used in tracking was a Samsung Note 10+ with Android 11, 256 GB storage, a barometer and triaxial IMU sensors. We built a data collection application on the smartphone to measure and save the sensor data. Once a step was detected, the application stored the sensor data for location estimation based on Python 3. In addition, development, training, and prediction of the DL model were conducted using the machine learning library TensorFlow. We assumed that the user carries the smartphone in front of their body during tracking, and walks in the middle of the corridor if there are no other pedestrians or obstacles along the path.

A. DL-BASED FLOOR TRANSITION DETECTION
The accuracy of floor transition detection determines the scheme's performance, as it not only provides the altitude information but also impacts the PDR estimation and location correction. To demonstrate the effectiveness of the proposed method, we assessed the accuracy of the DL-based step action recognition and floor decision algorithm. In this study, the step action is represented as "Normal walking," "Going up," or "Going down." The floor decision algorithm receives the step action and outputs the altitude information for each step, which is represented as one of five classes: a certain floor, "Stairs up," "Stairs down," "Elevator up," "Elevator down." The action and region of each step were recorded during tracking and used to evaluate the floor transition detection. In particular, if the floor transition type is stairs, there are approximately 20 steps to be taken during floor transition; therefore, we can easily assess the floor transition accuracy for steps in stairs by comparing the predicted labels for each step to the ground-truth labels. However, as the floor change through the elevator is sudden, there is no step to be taken during floor transition; therefore, alternative evaluation criteria for steps regarding the elevator are required. We determined that the prediction regarding "Elevator up/down" is correct when there are more than N wait steps continuously classified as "Elevator up/down" after the user takes the elevator.
The accuracy rate (AR) of floor detection was employed to evaluate the accuracy by calculating whether the estimated floor is consistent with the true floor for a step, as follows.
where AR F D is the AR for floor detection, and f i andf i are the ground-truth number and detected floor number, respectively. N normal indicates the total number of steps whose ground-truth label is "Normal walking" since the steps in the transition zone (i.e., stairs or elevator) are not calculated for estimation.
We  Sung-deok Hall at Soongsil University. With three experiments per building, we used 13 elevators and 16 sets of stairs. The floor height of each building is approximately 3.0-3.5m. Figure 8(a) shows one result for step action recognition in Cho Man-sik Memorial Hall, where the barometer readings have been denoised by (1). Figure 8(b) shows the results for floor detection after utilizing the floor decision algorithm. Due to significant noise during tracking, the barometer readings exhibited fluctuations even during denoising. In Figure  8(a), it can be seen that there are several incorrectly identified actions when the region changed, while in Figure 8(b) they are ignored as errors by the floor decision algorithm, and do not affect floor detection accuracy. Table 2 and Table  3 show the AR values of floor detection in each building, and the confusion matrix of the results for the floor decision algorithm, respectively. Based on Table 3, we observe that the detection accuracies of elevator cases are 100%, which implies that all elevator usage was detected. For "Stairs up" and "Stairs down," the detection accuracies are close to 90% because data augmentation eliminated the imbalanced data problem. Most false-negative errors were the "Stairs up/down" steps detected as floor steps at the start of stair transitions as shown in Figure 8(b). This occurred because the DL-model requires 2-5 barometric reading changes to confirm a floor transition. Although these errors did not affect the accuracy of floor detection, which is corroborated by the over 99% total accuracy shown in Table 2, they caused a delay of several steps when the user entered and exited the stairs, which could lead to inaccurate location corrections. Because we know the direction of stairs usage, the CNs can be set a little distance in front of the joint area of the stairs and the floor to compensate for the delay. We set one CN at each available elevator, and two CNs at each joint area of stairs and floor base in the walking direction to avoid confusion in CN matching and compensate for the transition delay. In terms of the performance metrics, the AR of localization was computed to represent the probability of estimated steps within the error boundary ϵ, as follows [47].
where P(ϵ) indicates the desired path with thickness ϵ, and N step denotes the total number of detected steps. Before demonstrating the experimental results, we made several descriptions for the visualization of the tracking trajectory. Because the altitude information of the pedestrian is represented as a floor level, we plotted the 2D location of each detected step on the floor plan that corresponds to the floor number where the user is located to show more details 10 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and  of the user's activities on each floor. When the user detected to walk through the transition zone, the floor number was updated at the first "Elevator up/down" step or the last "Stairs up/down" step (i.e., exiting the elevator or stairs).
We conducted a tracking experiment using sensor data collected for 483 steps during walking. To illustrate the performance of the CN matching-based location correction algorithm, the experimental results are given in Figures 9 and  10, which demonstrate two demo localization trajectories: indoor multi-floor tracking without location correction, and indoor multi-floor tracking with CN matching-based location correction. The user holding the smartphone walked along the desired path plotted in green. Each grid in the map represents 2m of the distance. Each empty circle in the figure is a detected step where the red circle indicates a step recognized as "Normal walking," a purple circle indicates "Stairs up/down," and a blue circle indicates "Elevator up/down." We set the location of the CNs 1-3m in front of the transition area of the stairs and the floor to avoid the delay problem generated when entering and exiting the stairs. Only the CNs that we used were plotted in the figures. The start and end points are indicated by a square and a diamond. The activities during walking are described below, which include all transition cases: go upstairs/downstairs by elevator and stairs. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. corridor to the open staircase » go downstairs through the staircase to the 3rd floor. The estimated location could be corrected via S5 and S1 during the floor transition. • 3rd floor activities: Back on F3 corridor and move to end point. These floor transition detection results are presented in Figure 8(b). Base on the floor transition detection results, all transition types and the directions were successfully detected, and the fluctuation in barometer readings caused by several complex regions in each floor did not affect the floor detection accuracy. These floor detection results provide stable altitude information for the estimated 2D location. From Figures 9 and 10, the stride length estimation of PDR improved according to the various gain coefficients in different areas. Meanwhile, we noticed that each calculation of a detected step contains the inevitable errors owning to the noise in the tracking generated by external magnetic fields, pedestrians, and obstacles along the pathway. These errors accumulate over time, causing the estimated location to completely deviate from the desired path, as shown in Figure 9. The solution to this problem is presented in Figure 10, since the proposed scheme eliminated the accumulated errors by CN matching at each floor transition. Even if significant errors were present along the path, they did not affect the subsequent tracking accuracy. Because we set the location of the CNs 1-3m in front of the transition area of the stairs and the floor, the delay problem when entering and exiting the stairs mentioned in Section III-A did not affect to the location correction. Figure 11 shows the AR with a different error boundary ϵ in two cases: update the 2D location with PDR-only (con-ventional in Figure 11), and proposed scheme with location correction. We added the AR of the first half of the tracking to confirm the effect of the accumulated error on location estimation under two cases by comparing them to the AR of total tracking. In Figure 11, the PDR-only case obtained AR values within the error boundary ϵ = 2.5m of 89% in the first half of the path which is close to the AR of the calibrated case, and 45% in the overall tracking. This indicates that although the PDR-only localization works well for short tracking paths, performance decreases sharply for long tracking paths. Meanwhile, it can be seen that the AR values of the proposed scheme with location correction in the first half and overall paths are relatively consistent, which achieved approximately 95% with ϵ = 2.5m. As illustrated in Figure 11, the proposed CN matching-based location correction improved the performance of long path tracking by eliminating the accumulated errors.

IV. CONCLUSION AND DISCUSSION
In this study, we presented an indoor multi-floor tracking scheme that functions without infrastructure. The proposed scheme consists of three components: DL-based floor transition detection, IMU-based PDR, and CN matching-based location correction. Considering the limitations of the mobile platform, we designed a lightweight MLP model and train it using time-series pressure data collected from the barometer sensor. In addition, we presented a data augmentation method to solve the overfitting and imbalanced performance issues caused by insufficient and imbalanced train data, as well as to reduce the cost of dataset preparation. A floor decision algorithm was developed to obtain a robust prediction of each step in complex areas, and to identify the floor transition direction and type. The 2D location update was implemented by the PDR method, and the stride length estimation of PDR was improved by the floor detection information. The proposed scheme realizes multi-floor tracking by combining floor detection and PDR. To avoid the error accumulation of PDR estimation, we implemented CNs near transition zones according to the building's floor plan. When the user changes floors, the scheme matches the possible CN base on the location and mobility of the user, and corrects the estimated location using the CN information to eliminate the accumulated error during tracking. To evaluate the scheme's performance, we conducted 12 experiments in four buildings and compared the accuracy of multi-floor tracking with and without location correction. The experimental results show that DL-based floor detection delivers stable and accurate tracking results in varying areas, and the localization performance in long path tracking is significantly improved by location correction.
The CN presented in the scheme not only can be used in infrastructure-independent approaches, but also is useful for the infrastructure-dependent approaches, to reduce the requirements of the anchor nodes. In future, we will optimize the DL model size for mobile platform friendly, introduce additional feature data to improve the prediction of the DL