Indoor Positioning Using Deep-Learning-Based Pedestrian Dead Reckoning and Optical Camera Communication

The need for an accurate indoor positioning system has rapidly increased with the development of large complex malls and underground spaces. As signals from the global positioning system cannot be received inside buildings, only approximate locations can be estimated using Wi-Fi routers or cellular base station information, and exact locations cannot be determined. Therefore, a pedestrian dead reckoning (PDR) scheme using several sensors is suggested in this work. However, this scheme requires users to hold their smartphones in a specific manner; furthermore, user-dependent parameters, such as height and step length, are necessary because the sensor parameters vary. This study uses deep-learning algorithms to overcome the limitations of the existing smartphone-based PDR scheme. A convolutional neural network algorithm is used to classify the smartphone positions; then, appropriate sensor data are selected and adjusted. The long short-term memory algorithm is used to estimate the user step length. Although the PDR performance is enhanced using the deep-learning algorithm, accumulated error is unavoidable because the algorithm traces the relative position with reference to the original location. Therefore, optical camera communication is introduced to provide the reference location and periodically compensate for the accumulated PDR error. The proposed algorithm is experimentally demonstrated, and its results are obtained and analyzed.


I. INTRODUCTION
With the increasing development of large complex malls and underground spaces, the need for indoor location recognition is increasing. In particular, such location recognition is essential to enable delivery services, where the destinations must be found within short durations. Smartphones primarily use the global positioning system (GPS) to predict location information; however, GPS signals cannot be received indoors. Therefore, radio frequency (RF) signals, such as Wi-Fi or Bluetooth, are used instead. Moreover, the intensities of RF signals vary with the environment, which causes large location errors. The concept of enhanced particle filters (PF) has been introduced to reduce such errors, but PFs require several Wi-Fi antennas and complex communication protocols between the smartphones and each of the antennas [1].
The associate editor coordinating the review of this manuscript and approving it for publication was M. Shamim Kaiser .
Inertial sensors have been used as alternatives to estimate the relative positions of pedestrians, and this scheme is referred to as pedestrian dead reckoning (PDR). The most common type of PDR estimates the step counts and strides using acceleration sensors [2]. This method using acceleration patterns and peak values to respectively determine the number of steps and strides can estimate the distance traveled by relatively simple calculations. However, PDR schemes are susceptible to accumulated sensor errors and usually need fixed smartphone positions, i.e., placed flat on the user's palm, which limit pedestrian usability of smartphones. In [3], quaternion and zero velocity update (ZVU) methods have been used to reduce the accumulated errors from sensor installations and path integrations. In [4], multiple virtual tracking (MVT) was used to compensate for the error of a pedestrian walking along an ambiguous straight line and their deviation from the dominant direction. There are also reported studies on allowing more flexibility to smartphone VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ positions. In [5], it was assumed that the smartphone was in the front pocket of a pedestrian's pants while walking. Although the stride length is estimated using this scheme, the height of the smartphone from the ground should be known in advance, which is not easily available. There is also a scheme to determine the walking direction using a rotation matrix and principal component analysis with a smartphone placed in a pocket [6]; in this scheme, personalized parameters are required to calculate the step length. In [7], the step length was calculated from walking speed using a model describing the relationship between them, which requires coefficients for each person separately. Deep learning has been introduced in several studies to handle the aforementioned problems. Smartphone positions can be estimated by different types of deep-learning algorithms [8]. The stride length is often estimated using long short-term memory (LSTM) and denoising autoencoders, where the pedestrians walk when holding their smartphones with their hands horizontally in front of their chests [9]. In other cases, walking information is obtained through the Vicon system, which is then used to calculate the stride length and direction using deep learning [10]. A bidirectional LSTM with data obtained at different walking speeds has been used to determine pedestrian stride lengths [11]. In [12], a convolutional neural network (CNN) algorithm was used to match user-collected magnetic patterns to estimate location. The above studies show the possibility of using deep learning for PDR to solve its existing problems. However, these methods are still not independent of smartphone positions and cannot solve the accumulated error problem.
In this study, we employ deep-learning algorithms, including CNN and LSTM, to detect smartphone positions and strides, respectively. These results are then combined with those of the PDR scheme to enhance location estimation. In addition, optical camera communication (OCC) is used to avoid error accumulation, which is the main disadvantage of most PDR systems. The OCC scheme transmits signals by modulating a light-emitting diode (LED) lamp attached to the ceiling and receives data using the rolling shutter effect of an image sensor installed in the smartphone camera. LED lamps are deployed in almost all buildings, so that OCC can be installed easily on such LED lamps. Although the transmission distance of the OCC is low for high data rates, this is not of concern since the required data rates are very low for sending location information. Therefore, it is expected that the OCC would be a good candidate to compensate for the accumulated errors of PDR schemes. Fig. 1 shows the combined PDR and OCC indoor positioning scheme. When commencing the service, the absolute location information stored in a nearby LED lamp is transmitted via OCC; then, a deep-learning based PDR algorithm traces the pedestrian movements until the next OCC is available. Fig. 2 presents the flowchart for the indoor location recognition scheme used in this study. The sensor values of smartphones vary according to their positions. Therefore, the smartphone positions are first classified using a CNN  deep-learning technique. Three types of divisions are used in this study: flat hand, pocket, and calling. We employ another deep-learning algorithm, namely the LSTM, to determine the stride. By learning with data from different users and postures, these deep-learning algorithms can handle the problems of existing PDR schemes. The position errors accumulated in the above computation processes, which are caused by stride estimations, gyro sensor drift, and other sensor noise, are compensated using the absolute positions obtained via OCC transmission.

II. DEEP-LEARNING-BASED PDR
The term PDR refers to a location tracking method to determine the current position of a pedestrian by tracking their relative position changes from a given known location. The PDR scheme usually employs various sensors, such as accelerometers, gyroscopes, and magnetometers. Fig. 3 shows the three axes of the smartphone for different positions. The sensor data change according to the posture of the pedestrian holding the smartphone, and different calculations are applied based on these postures, which makes these estimations complex. Deep learning is used in this study to avoid such problems. The postures of the pedestrians using the smartphones are classified into three types, as shown in Fig. 3: the ''flat hand'' category is applicable when the smartphone is held horizontally on the user's hand; ''pocket'' category is used when the smartphone is placed in the front pocket of the pants of the user; ''calling'' category is used when the smartphone is used for a call.

A. POSITION CLASSIFICATION BY CNN
The position of a smartphone must first be determined to find the steps and walking direction. The CNN algorithm using a gravity sensor is used for this purpose as it performs well for classifying images or environments [13]. Gravity sensor data are collected for each position, as shown in Fig. 4, and used in CNN learning. Preprocessing is not performed in this process since the gravitational acceleration bias is required to classify the positions. Three-axis gravity data are used as the inputs, and 200 samples are grouped as one data, thus making the data size (3,200). A sampling period of 20 ms is used for data collection.
The learning layers of the CNN algorithm for classifying the smartphone positions are illustrated in Fig. 5. As shown in the figure, the Conv2D and max pooling layers are used for learning the two-dimensional input data. In this process, dropout is used to prevent overfitting; flatten layer is employed to convert the data to one-dimensional vectors, and dense layer is used to provide the outputs. The functions used in the CNN learning process are described in Table 1. Softmax is used as the activation function in each layer, and Glorot normal distribution is applied to initialize the bias value and weight. In addition, categorical cross-entropy suitable for multiclass classification is used as the loss function, Adam is used as the optimizer, and one-hot encoding is applied to obtain the labels in the form of a one-hot vector.

B. STEP LENGTH ESTIMATION BY LSTM
While the CNN is efficient for determining smartphone positions, LSTM is good for estimating the step length since sensor data during walking have closely related time sequence. Inertial sensor data are collected while walking for three step lengths of 40 cm, 50 cm, and 60 cm. The LSTM algorithm is implemented using the gyro sensor, magnetic sensor, gravity sensor data, pitch (x-axis), roll (y-axis), and azimuth (z-axis) corresponding to the smartphone rotation angle. The sampling period is 20 ms, and 200 samples are used as one sequence of data. Mean normalization is used for preprocessing, and learning is achieved using the layers presented in Fig. 6. Three LSTM layers followed by two dense layers for output connection are employed in the scheme and drop-out is applied to prevent overfitting. The functions used in the LSTM learning process are listed in Table 2. The meansquared error is used as the loss function, while Adam is used as the optimizer.

C. STEP DETECTION BY ACCELEROMETER AND MAGNETOMETER
To calculate the walking distance, each step of a pedestrian must be detected, and the corresponding step lengths must be accumulated. For flat hand, as the smartphone moves vertically in a pedestrian's hand, the z-axis component of the acceleration sensor, which is used to detect the steps, changes periodically, as shown in Fig. 7(a). One cycle of the signal in this figure corresponds to one step. When walking with the smartphone inside the pocket, the pitch value is meaningful since the smartphone rotates about the x-axis.   The pitch values are obtained from the magnetic field sensor of the smartphone, as depicted in Fig. 7(b). As the feet are extended alternately, one cycle of the pitch corresponds to two steps. In the calling position, the smartphone is held at an angle during walking. Thus, the x-axis component of acceleration has the shape shown in Fig. 7(c), where each period corresponds to one step.

D. DIRECTION ESTIMATION BY GYRO SENSOR
In the flat hand position, the smartphone rotates about the z-axis when a pedestrian rotates. Therefore, as depicted in Fig. 8(a), no significant changes are noted in the x-and y-axis components of the gyro sensor, but only the z-axis component changes. In the pocket position, the smartphone is vertically placed in the pocket and rotates about the y-axis according to the pedestrian movements, so the y-axis component of the gyro sensor changes, as noted in Fig. 8(b). For the calling position, the smartphone is obliquely oriented along the side of the face, so the x-and y-axis values of the gyro sensor must be observed simultaneously. In this manner, the walking direction can be determined by integrating useful gyro sensor output values according to the positions, as shown in (1). Fig. 9 presents the measured results for the three positions while walking along a rectangular path inside a building.

III. ABSOLUTE POSITION ACQUISITION USING OCC
The term OCC refers to a transmission method where the data are modulated by blinking an (on/off) LED lamp and subsequently received by the complementary metal-oxidesemiconductor (CMOS) image sensor of the smartphone camera. The CMOS image sensor uses a rolling shutter method to receive images line by line instead of the entire frame at once. Data are transmitted by modulating the LED installed on the building ceiling, as illustrated in Fig. 10(a). Stripe images, such as that in Fig. 10(b), are received at the camera by the rolling shutter effect [14]. In this study, the location of the LED lamp is transmitted using the above method. Specifically, OCC using digital zoom is used to receive the data when the distance between the LED lamp and camera exceeds several meters [15]. In this manner, the position estimation errors due to PDR   are corrected using the absolute positions received via OCC. Fig. 11 depicts the transmission algorithm of the OCC. At the transmitter, the location information is converted to binary data, and the number of data repetitions is determined. Then, a preamble indicating the start of the packet is added, and the packet is Manchester encoded to prevent LED flickering. Finally, the data are transmitted in the form of light through the LED driving circuit. At the receiver side, the smartphone receives the data using the image sensor, finds the preamble, and restores the location data through Manchester decoding.
As the distance between the LED lamp and smartphone increases, the rolling shutter image size reduces, as shown in Fig. 12(a). Therefore, only a small part of the received   image carries meaningful information. To transmit data without loss in this condition, the same data should be transmitted multiple times. The OCC packet structure shown in Fig. 12(b) is used to meet this condition. Each packet is composed of  multiple data blocks, where each block comprises a preamble, block serial number (BSN), and data. As the smartphone receives data, it selects only one successfully received block by checking the BSN. The number of data blocks is determined by the maximum target distance. The OCC in this study is allowed to transmit data for distances exceeding 5 m when 37 repetitive blocks are used in the packet.
During OCC transmission, there is a duration in which the pedestrian does not walk when they are just below the LED lamp. If the walking path is set separately from the LED lamp, and the smartphone faces the lamp to receive data, as shown in Fig. 13, then the location information from the lamp can be corrected using the following equations. In these equations, α is the tilt angle toward the lamp, H is the height of the lamp, and ( x, y) is the correction distance. The location errors can be measured within 15 cm for the distance between the LED and smartphone in two dimensions.

IV. ENVIRONMENT OF THE TESTBED
An experiment was conducted while walking along a rectangular path of 16.6 m width and 9.5 m length to test the deep-learning-based PDR and OCC proposed in this study. The smartphone (Samsung Electronics SM-N960N) was held in one of the three positions during walking, and the sensor   data were collected for each sampling period of 20 ms. First, the CNN learning and testing proposed in Section II.A were implemented to determine the smartphone position. The number of data points used for training was 9,382, and the number of data points used for testing was 12,861. The accuracy and loss parameters for CNN learning are presented in Figs. 14(a) and (b), where the learning becomes stable from small epoch numbers while the loss approaches zero over 10 epochs. The accuracy of the CNN-based classification is measured while walking a distance of about 55 m, and the calculated accuracy is listed in Table 3, which is almost 100%.
The sensor values were collected every 20 ms to estimate the step length using LSTM. Fig. 15 presents the test results comparing the actual and estimated values while walking about 20 m with step lengths of 40, 50, and 60 cm for each position. Each figure illustrates the acceleration that is used to detect the steps and their LSTM outputs. Each step length is estimated by the LSTM at the moment the step is detected. In flat hand and pocket positions shown in Figs. 15(a) and (b), the 40 cm steps are accurate, whereas the 50 cm and 60 cm steps show errors more often. These errors are attributed to very small changes in the sensor data. In contrast, for the calling position, a relatively high accuracy is observed and very few errors are observed, as illustrated in Fig. 15(c).
An experiment to estimate the indoor positioning was performed by incorporating all the schemes proposed in this study, including position estimation, stride length, and direction, while walking through a corridor inside a building, as indicated in Fig. 15. Two LED lamps on the ceiling, as depicted in Fig. 16(a), were used to provide the reference locations by OCC. The experimenter walked along the corridor, as indicated by the closed path of about 54 m, as shown in Fig. 16(b). The smartphone model SM-N960N from Samsung Electronics was used in the experiments, and data were collected every 20 ms.
Three different estimation schemes were tested for each position, namely PDR without deep learning, PDR with deep learning, and PDR with deep learning and OCC. Figs. 17-19 display these experimental results for each position. When only the PDR was used without deep learning, the smartphone position was identified using various sensor values, and the step length was calculated using the peak values of the accelerometer [2]. Figs. 17-19(a) show the results of this scheme, where the estimated paths deviate from the actual paths, and the end points are different from the starting points. In particular, more errors were found in the pocket and calling positions. Figs. 17-19(b) show the results when the PDR is combined with deep-learning schemes; the estimated paths in this case are observed to be not considerably different from those without deep learning. However, if the entire traveled path is integrated, then the PDR with deep learning shows better accuracy, as seen from Table 4. The important thing here is that deep learning does not require parameters unique to each person, which would be needed in PDR without deep learning.
When the OCC was used in addition to the PDR, the accumulated errors were removed as shown in Figs. 17-19(c). As the pedestrian approaches the LED lamps and smartphone camera receives OCC signals, the absolute locations replace the current estimated positions. It is seen from the figure that the walking path jumps to the center of the corridor, and the changed paths are indicated by different colors.

V. CONCLUSION
In this study, deep-learning algorithms are applied to PDR to enable more efficient position estimations, and OCC transmission is used to correct the accumulated errors from the PDR. Using CNN and LSTM deep-learning algorithms, the step length and direction are efficiently estimated for three different smartphone positions: flat hand, pocket, and calling. The positions and step lengths are then classified and used for distance estimation through CNN and LSTM learning, respectively. The position classification is observed to be very precise, and the stride length estimation is fairly good for calling. However, errors occur more often for 50 cm and 60 cm step lengths in the flat hand and pocket positions. These errors are corrected occasionally using the absolute location information via OCC.
The advantage of deep-learning-based PDR is that it is not dependent on the user. If data from many people are used in the learning process, then the PDR does not require a pedestrian's personal parameters. In addition, the performance of the PDR can be enhanced with time. The pedestrian's step length using LSTM can be learned continuously with the OCC information. Since the distance between the LED lamps is known, the step lengths can be calculated using the step counts and reflected in the new learning process. It is expected that the combined deep-learningbased PDR and OCC can enable very precise indoor location estimations.