A Hybrid Markov and LSTM Model for Indoor Location Prediction

Accurate and robust indoor location prediction plays an important role in indoor location services. Markov chains (MCs) have been widely adopted for location prediction due to their strong interpretability. However, multi-order Markov chains (<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-MCs) are not suitable for predicting long sequences due to problems of dimensionality. This study proposes a hybrid Markov model for location prediction that integrates a long short-term memory model (LSTM); this hybrid model is referred to as the Markov-LSTM. First, a multi-step Markov transition matrix is defined to decompose the <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-MC into multiple first-order MCs. The LSTM is then introduced to combine multiple first-order MCs to improve prediction performance. Extensive experiments are conducted using real indoor Wi-Fi positioning datasets collected in a shopping mall. The results show that the Markov-LSTM model significantly outperforms five existing baseline methods in terms of its predictive performance.


I. INTRODUCTION
In recent years, traditional ''brick-and-mortar'' industries have been severely affected by rapid developments in e-commerce [1].Therefore, the traditional ''brick-andmortar'' industries urgently need to find ways to help merchants establish relationships with customers and provide them with personalized shopping experience to improve the marketing ability of brick-and-mortar industries [2].With the development of indoor positioning technology and the popularization of mobile terminal devices, there has been an explosive growth in the availability of indoor mobile trajectory data [3].Such data are an important basis for indoor location services and provide new opportunities for the development of these industries [4], [5].
Location prediction technology can infer the next location of a user based on historical trajectory and provide flexible services for users, the latter of which is a current concern for organizations [6]- [8].Previous studies have shown that 93% The associate editor coordinating the review of this manuscript and approving it for publication was Juan A. Lara . of user behavior is predictable [9].To date, this technology has been widely used in social security [10], [11], intelligent transportation [12]- [17], and location services [18]- [20].
As a classical statistical model, the first-order Markov chain (1-MC) has strong interpretability and is widely used in time-series data prediction.However, 1-MC assumes that the location at the next moment is only related to the current location, which significantly limits the predictive performance of the model [21], [22].For this reason, Sha et al. [23] proposed a multi-order Markov chain (k-MC) based on 1-MC.The k-MC assumes that the location at the next moment is related to the previous k locations but is prone to problems related to dimensionality; i.e., its state space explodes with an increase in k, which renders k-MC less practical in the field of timeseries predictions.In addition to the MC model, the hidden Markov model (HMM) [24], [25] and association rule mining algorithms [26], [27] can also be used for time-series location prediction, but are still not suitable for predicting longterm time-series data.To solve the long-term dependence problem, previous studies have applied deep learning models to time-series data prediction, such as the recurrent neural network (RNN) [28], long short-term memory (LSTM) [29], and gated-recurrent-unit (GRU) models [30].Compared with the classical statistical model, deep learning models achieve higher prediction accuracy; however, they are data-driven empirical models that are hampered by difficulties interpreting causal relationships in the model.
Therefore, we propose a hybrid Markov-LSTM model, which combines the advantages of the Markov and LSTM models to mine user movement patterns based on the user's transition probability (i.e., the transition probability is interpretable and describes the movement tendency of the user), as well as improve the performance of the location prediction model.This study makes several significant contributions, which are summarized as follows: (1) A new multi-step Markov transition probability matrix is presented, which divides the multi-order Markov model into multiple first-order models and solves the shortcomings of the multi-order Markov model in the dimension disaster.
(2) The prediction results of the multiple first-order Markov models are combined based on the advantages of the LSTM for predicting long-sequence data.This improves the practicality of the multi-order Markov model for location prediction.
(3) The performance of the Markov-LSTM model is evaluated using real indoor trajectories.The results demonstrate the advantages of our approach compared with five baseline methods.
The remainder of this study is organized as follows.In Section II, we review current literature on location prediction models based on trajectories.The basic definitions and problems are described in Section III.In Section IV, we propose a new methodological framework for destination prediction.The performance of this method and those proposed in previous studies are compared using real indoor Wi-Fi positioning data.These results are presented in Section V.In Section VI, we summarize the study and provide suggestions for future research.

II. BACKGROUND
Existing location prediction methods can be divided into two approximate types: group-based and individual-based prediction models.
Group-based models consider that movement behavior ''follows the crowd'' to a certain degree and use the history trajectories of other users to predict user next location.These models are predominantly used to mine similarity behaviors from group users.For example, Morzy [27] used an improved apriori algorithm that uses association rules to predict the next location of user; Ang et al. [31] utilized a Markov chain to convert location sequences into transition probabilities for location prediction; Qiang et al. [32] presented a spatiotemporal RNN (ST-RNN) based on RNN [33] to model the location of group users; and Ying et al. [34] presented a geographic-temporal-semantic-based location prediction model to predict the next location of group users.Unlike single-object models, group-based models can mine the movement patterns of group users in certain scenarios [35].In addition, group-based models do not require long-term trajectories of individual users.However, there are several deficiencies to these models.Group-based models build a model for all users, ignoring the existence of similarity subgroups.Therefore, some models only obtain the movement trajectories for people that are somehow related to the user.Zhang et al. [36] found a strong correlation between the calling patterns and co-cell patterns of users.Based on the results, they presented the NextCell model, which aims to enhance location prediction by harnessing the social interplay revealed by cellular call records.Moreover, Wen et al. [37] presented a fallback social-temporalhierarchic Markov model (FSTHM), which used modified cross-sample entropy to quantify similarities between an individual and their peers to enhance predictive performance.Furthermore, Peixiao et al. [38] proposed a location prediction framework based on the similarity of location sequences.
Conversely, individual-based models consider that the movement behavior of each individual is independent; thus, they use only the movement history of the specific user to predict their next location.Individual-based models are predominantly used to mine the periodic behavior of individual users.For example, Lee et al. [39] presented a spatiotemporal-periodic (STP) pattern to capture the periodic behavior of an individual.Then, using an association rule algorithm to mine periodic patterns in the STP pattern, Vu et al. [40], [41] proposed a novel framework, named as Jyotish, to obtain the periodic movement of people based on Wi-Fi/Bluetooth positioning data.Bayesian classifiers and support vector machines were utilized to predict the next most likely location.Minh Tri Do et al. [42] redefined the location prediction problem from a new perspective and presented a probabilistic kernel method to determine the dependence between the user location and multivariate context variables from sparse data.Moreover, Wu et al. [43] presented a spatialtemporal-semantic neural network algorithm (STS-LSTM) for location prediction and Zhang et al. [12] combined the respective advantages of support vector regression and deep learning to present a novel data embedding and ensemble learning method.Furthermore, Zhou et al. [44] defined a novel Markov chain via Markov transition matrix multiplication and proposed the DestPD model.
However, the existing models suffer from certain deficiencies.First, group-based models require a large number of user trajectories and the prediction accuracy is relatively low.Second, individual-based models have better predictive performance but often require a significant amount of personal information.Finally, previous research has focused on location prediction in outdoor environments, with relatively few studies on indoor environments.Therefore, in this study, we develop a novel indoor location prediction model for an individual user, termed the Markov-LSTM.Compared with existing models, the proposed model only requires the trajectory of the user and combines the advantages of Markov and LSTM methods to improve the prediction performance.

III. PRELIMINARIES AND PROBLEM DEFINITIONS
Definition 1 (Trajectory): A trajectory, traj = {pt i } n i=1 , is an ordered sequence of points for pt i = (id, t i , x i , y i , f i ), where id is a unique user identifier, t i is the time that pt i was collected, and (x i , y i , f i ) corresponds to the longitude, latitude, and floor, respectively, of the user at time t i .
Definition 2 (Stay Point): In general, a stay point or points, sp id = (x, y, f , arrT , levT ) represent a geographic region in which the user remained over a certain time interval, where id is the unique user identifier, (x, y, f ) corresponds to the average x, average y, and floor, respectively, on which the user stayed, and (arrT , levT ) represents the arrival and departure times, respectively, of the user in the geographic region [45].As shown in Fig. 1a, the stay point of user u is expressed as

Definition 3 (Location Set):
The location set, lset = {l i } N i=1 , represents the set of regions in a specific application.The application employed in this study is shops in a mall, where l i = (lid, shape i , f i ), lid represents the unique identifier of shop l i , shape i represents the limited area of shop l i , f i represents the floor identifier where shop l i is located, and N represents the number of shops in the mall.
Definition 4 (Location Sequence): A location sequence, , is an ordered sequence of locations visited by the user, where l id i represents the shop visited at the stay point, sp id i .As shown in Fig. 1b, locSeq u = l u 1 , l u 2 , l u 3 , l u 4 represents the location sequence of user u.The main object of this study is to analyze the location sequence, locSeq id , of individual users as a method of determining the behavior patterns and living habits of individual users from their historical location sequences, which could aid future location prediction for that user.Taking user u as an example, their location is defined in ( 1) and ( 2): (1) where l u i m i=1 represents the recent location sequence of user u; M u represents the established prediction model based on the historical location sequence of user u; ŷres = {ŷ res 1 , ŷres 2 , . . ., ŷres i , . . ., ŷres N } T represents the prediction result of model M u , where ŷres i represents the probability that the user's next visit location is l i ; l u m+1 represents the result of the final prediction by the model; and argmax is a function that finds the index of the maximum value in ŷres .

IV. METHODOLOGY
In this section, we describe the proposed hybrid model for indoor location prediction, whose structure is presented in Fig. 2. Based on the bottom-up design principle, our method is divided into four phases: location sequence detection, multi-step transition probability matrix definition, adjacent location selection, and fusion multiple Markov chains, which are discussed in Sections IV.A and IV.B, respectively.First, considering the continuity of space, the trajectory is not suitable for direct input into the prediction model.Therefore, we must convert the trajectory into a location sequence associated with a specific shop.Second, a novel multi-step Markov transition probability matrix is defined, which converts a higher-order Markov chain into multiple first-order Markov chains.Third, we select the most appropriate adjacent locations for each user.Finally, the LSTM model is used to integrate these first-order Markov chains to obtain the predicted results of the target user.

A. LOCATION SEQUENCE DETECTION METHOD
Stay point identification is one of the important steps in location sequence conversion.When the user is staying, there is a greater probability of viewing the location service information [46].In this study, we used the indoor-STDBSCAN algorithm to detect the stay point, sp id , from the indoor individual trajectory.The Indoor-STDBSCAN algorithm [38] divides the indoor individual trajectory, traj, into k disjoint clusters where k clusters have k stay points.Indoor-STDBSCAN, which is an improved version of the DBSCAN algorithm [47], redefines the spatiotemporal neighborhood of the indoor space based on DBSCAN.The spatiotemporal neighborhood of the trajectory point, pt i , can be defined using the following expression: where sd is a function that calculates the spatial distance between pt i and pt j , td is used to calculate the time distance between pt i and pt j , and N 1 , 2 (pt i ) represents a set of points contained in the spatiotemporal neighborhood.The stay point detected by the Indoor-STDBSCAN only contains spatial information, not semantic information.Therefore, we use the nearest-neighbor search to assign semantics to each stay point.As shown in Fig. 3.For a stay point sp u 1 that is inside the shop, we use the intersection method to obtain the shop that user u visited at stay point sp u 1 .For a stay point sp u 2 that is outside the shop, the shop nearest to point sp u 2 and the corresponding distance d are determined.If d is less than the distance threshold δ, the nearest shop is that which the user visits at stay point sp u 2 .

B. HYBRID MARKOV-LSTM MODEL 1) DEFINITION OF THE MULTI-STEP TRANSITION PROBABILITY MATRIX
The k-MC is a classic statistical model that describes the movement of a user between locations from a transition probability perspective.The k-MC treats each location in the user location sequence as a random variable, using the joint probability to predict the location of the user's next visit.
Taking user u as an example, a location sequence, l u i m i=1 , of length m can be expressed as a random variable, L u i m i=1 , of length m.Moreover, the k-order probability transition matrix of user u, Y u ∈ R k×N ×N , can be expressed as ).With an increase in k, the transition probability matrix, Y u , of user u increases rapidly, which renders k-MC less practical for location prediction.For this reason, we propose a novel k-step Markov chain, MC (k) .
Definition 5 (1-Step Transition Probability Matrix): The 1-step transition probability matrix, Y u (1) , of user u is equivalent to the 1-order transition probability matrix, Y u(1) ij , which represents the probability that user u moves from location l i through one step to location l j .Y u(1) ij is defined by the following expression: where locSeq u represents the location sequence, l u i m i=1 , of user u, m−1 p=1 |{l u p = l i ∩ l u p+1 = l j }| represents the distance that user u moves from location l i through one step to location l j , m p=1 |{l u p = l i }| represents the total distance that user u moves from location l i through one step to other locations, and N represents the total number of shops in the mall.
Definition 6 (k-Step Transition Probability Matrix):.The k-step transition probability matrix, Y u(k) , of user u is a N ×N matrix, ŷu( i→ * , which represents the probability that user u moves from location l i through k steps to other locations.The definitions of Y u(k) and ŷu(l i → * :k) for user u are expressed in ( 5) and ( 6), respectively: where Y u(k) can be directly obtained by Y u (1) ; i.e.
indicates that user u determines to visit location l u m−k+1 at random variable L u m−k+1 (l u m−k+1 can be obtained in the position sequence locSeq u ), Y u(k) describes the effect that cross-location has on the prediction results from another perspective, and N represents the total number of shops in the mall.
The aim of the MC (k) is to establish a transition probability matrix of the same size as the 1-MC transition probability matrix.Using this matrix, the k-MC can be decomposed into k first-order Markov chains in order to avoid solving the joint probability of k-MC and reduce the dimensions of the transition probability matrix to a certain extent.To make this theoretical analysis more rigorous, we provide mathematical proof of the k-step transition probability matrix Y u (k) .If the location of user u at the random variable L u m−k is l u m−k , the probability that user u moves from location l u m−k to other locations through k steps can be defined using the following expressions: where * Y u(1) * . . .Y u (1) .As matrix multiplication satisfies the associative law, Y u (1)   * Y u(1) * . . .Y u(1) can be expressed as Y u(1) k , i.e., Y u(k)  = Y u(1) k .

2) SELECTION OF THE BEST ADJACENT LOCATIONS
Similar to the k-MC, the Markov-LSTM model must also determine the hyper-parameter, k; i.e., the number of locations that the prediction result depends on.The appropriateness of the parameters has a substantial influence on the prediction performance of the model.The k value is primarily employed to determine the number of adjacent locations.If the k value is too small, the model corresponds to a first-order Markov chain that reduces the prediction performance.If the k value is too large, the model becomes more complex and overfitting is possible.Considering that the selection of the k value significantly influences the prediction performance, this value is typically determined using cross-validation, which selects the k value that minimizes the model prediction error [15], [48].In this study, each user is an independent individual; therefore, we select the optimal k value for different users.Taking user u with a k value of k u as an example, when k u > 1, the k-MC can be decomposed based on the following expressions: where ŷu l u m−i+1 → * :i represents the prediction results of multiple first-order Markov models for user u.

3) FUSION OF MULTIPLE MARKOV MODELS
For each user, u, we establish k u first-order Markov models.Each model, however, has a limited prediction ability for the next position.Therefore, this study combines k u first-order Markov models to ensure good location prediction performance.Considering the order of the k u first-order Markov model prediction results; i.e. ŷu l u m−i+1 → * :i k i=1 , we use the LSTM model to merge the k u results.Improvements in model prediction performance can be considered from two aspects: (1) from a Markov model perspective, the multi-step transition probability matrix allows the use of multiple 1-MCs to achieve k-MC predictive performance without problems regarding dimensional disasters; (2) from an LSTM model perspective, our model does not directly mine the location pattern from the simple identification sequence but rather mines the location pattern from the transition probability that contains more of the user's movement tendencies.
As an extension of the RNN model, the LSTM model has a unique cell that effectively controls the rate of information accumulation by introducing gate mechanisms (i.e., input gate, forgetting gate, and output gate) and selectively forgetting certain historical accumulation information.As shown in Fig. 4, the outputs of k u first-order Markov models are merged in turn using the input gate, forgetting gate, and output gate.This fusion method not only integrates the independent influence of multiple outputs on the prediction results but also determines the interaction between multiple outputs.The fusion process for user u can be expressed with the following equations: In this algorithm, f m , i m , C m , and o m represent the forgetting gate, input gate, control unit, and output gate, respectively, h m−1 represents the hidden unit of the correlation between the outputs of multiple Markov models, W hf , W yf , W hi , W yi , W ha , W ya , W ho , W oy , and W h represent the weight matrices, represents the Hadamard operation, ŷm represents the output of the Markov-LSTM model; i.e., ŷres in the problem definition, and σ represents the sigmoid activation function.
Our Markov-LSTM can be trained to predict y m by merging multiple Markov outputs in order to minimize the cross-entropy loss between the predicted and true locations of a user.This loss function can be defined by the following expression: where θ represents all learnable parameters; i.e., all W and b parameters, in the Markov-LSTM model; N represents the total number of locations; i.e., the number of shops; ŷm j represents the output of the model; and y m j represents the expected output (true value) of the model.From the perspective of space complexity, the number of elements in the k-step transition probability matrix is k * N * N ; however, the number of elements in the k-step transition probability matrix is N * N .Compared with the k-order Markov transition probability matrix, the storage space required for the k-step Markov transition probability matrix is significantly reduced.Especially with an increase of k, the advantages of the k-step Markov model become increasingly significant.
From the perspective of computational complexity, according to ( 7) and ( 8), the k-step Markov model only needs to  calculate the first-order Markov transition probability matrix.Compared with the k-order Markov model, the k-order Markov transition probability matrix has higher computational complexity.Especially with an increase of k, the computational complexity of the k-order Markov transition probability matrix becomes increasingly large [49].However, the k-step Markov transition probability matrix does not increase with an increase of k.

V. EXPERIMENTAL RESULTS AND ANALYSIS A. DATA PREPARATION 1) DATASETS
The experimental data predominantly consisted of Wi-Fi positioning information on 50 users and shops data for a shopping mall in Jinan City, China.The indoor Wi-Fi data was provided by Shanghai Palmap Science & Technology Company Limited (http://www.palmap.cn/)and collected using fingerprint positioning technology.The data covered the eight floors of the shopping mall from December 20, 2017, to February 1, 2018.The positioning accuracy was approximately 3 m.Fig. 5 shows the data sampling interval.Trajectory points with a sampling interval of 1-5 s accounted for more than 70% of the collected data points.There were a total of 11,677,438 trajectory points and each user had an average of 200,000 trajectory points.As shown in Table 1, the data field included the user unique identifier, the record upload time, the user's XY-coordinates, and the unique identifier of the floor.As shown in Table 2, there are 489 shops in the mall, each with an average size of approximately 40 m 2 .Data for each shop included its unique ID, its shape (a polygon consisting of a sequence of coordinates), its name, and the floor ID.

2) DATA PREPROCESSING
The original trajectory data for the indoor users were collected through Wi-Fi positioning.Due to the unstable mobile terminal signal and the artificial shutdown of Wi-Fi signal, abnormal, erroneous, and invalid data were easily generated.There were three types of noise in our data set: (1) Abnormal coordinate points.If the trajectory point fell outside the study area, it was treated as an abnormal coordinate trajectory point.(2) Abnormal time points.If the sampling interval of two adjacent trajectory points was 0 s, it was considered an abnormal time trajectory point.(3) Abnormal floor points.If a trajectory point was not in the study area or jumped between different floors in a short time period, it was considered an abnormal floor point.

3) TESTBED AND TEST DEVICE
Table 3 lists the experimental environment from two aspects: hardware and software.

B. EVALUATION METRICS AND COMPARATIVE METHODS 1) EVALUATION METRICS
In this study, we treat location prediction as a classification problem using Accuracy@X , Precision@X , Recall@X , and F1−Measure@X (top X locations) as quantitative indicators of the evaluation model [50].Accuracy@X evaluates the predictive performance of the model from the perspective of binary classification; i.e., whether the top K predicted shops were actually visited.Precision@X , Recall@X , and F1 − Measure@X use macro-averaging to evaluate model performance using multiple classifications.To fully test the prediction performance of the Markov-LSTM model, this study used the top 1, 3, and 5 locations to test the prediction ability of the model; i.e.X ∈ {1, 3, 5}.The Accuracy@X , Precision@X , Recall@X , and F1 − Measure@X are defined in ( 12), ( 13), (14), and (15), respectively: Accuracy@X = number of samples correctly predicted total number of test samples (12) where N represents the total number of locations, that is, the total number of shops, TP i represents the number of samples for which the model correctly predicts that a user will visit shop l i , and FN i represents the number of samples for which the model incorrectly predicts that a user will not visit shop l i .When the predicted shop ID is equal to the shop ID of an actual visit, the prediction is considered correct, and vice versa.

2) COMPARATIVE METHODS
To comprehensively evaluate the performance of the Markov-LSTM model, we used five baseline methods for comparison: MC[21]: A Markov chain (MC) is a statistical model used to describe discrete-time stochastic processes with Markov properties.In our experiment, we compared 1-MC, where 1-MC represents the first-order Markov chain.
HMM [24]: A hidden Markov model (HMM) is a statistical model used to describe a Markov process with implicit unknown parameters.The performance of the HMM model corresponds to the number of states.We fixed the number of states to one of 10, 15, 20.The three HMM variants were then compared; i.e., HMM-10, HMM-15, and HMM-20.

RNN[28]:
A recurrent neural network (RNN) is a deep learning model that can determine temporal dependencies.Formally, the performance of the RNN model corresponds to the number of hidden states.In our experiment, the number of hidden states was fixed to one of {64, 128, 256} .Therefore, the three RNN variants were RNN-64, RNN-128, and RNN-256.

GRU[30]:
A gated-recurrent-unit network (GRU) is a special RNN that can be used to determine long-term temporal dependencies.The GRU variants selected as the baselines were GRU-64, GRU-128, and GRU-256, which had identical settings to the RNN.

C. VARIABLE ESTIMATION
The hyper-parameters of the Markov-LSTM model predominantly include the radius, 1 , time window, 2 , minimum number of points, MinPts, distance threshold, δ, number of best adjacent locations, k, and parameters in the LSTM.The Indoor-STDBSCAN algorithm fixes 1 , 2 , and MinPts to 5 m, 7 min, and 100, respectively [38].To further determine the δ, k, and parameters in the LSTM, we used the control variable method to obtain the combination of parameter values.In the parameter calibration phase, the distance threshold, δ, was first determined, followed by the optimal k value based on the distance threshold.Finally, the LSTM parameter was adjusted to obtain the optimal model parameter combination.

1) CALIBRATING THE DISTANCE THRESHOLD
The distance threshold, δ, predominantly determines the influence of shop matching results on the prediction performance.If δ tends toward 0, the shop information only matches the stay point inside the shop.If δ tends toward infinity, any stay point will match the shop information.In this study, δ obtains the optimal parameter from [0, 2, 4, . . ., 18].Fig. 6 shows the effect of the distance threshold, δ, on the prediction performance.When X ∈ {1, 3, 5}, Accuracy@X and Precision@X first showed an increasing trend followed by a decreasing trend that finally stabilized.Recall@X AND F1 − Measure@X first showed an increasing trend then a stable trend.When δ > 6M, the model prediction result did not change significantly because the indoor space was small.If δ is too large, δ will not act as a constraint.When 4 ≤ δ ≤ 6, the model exhibited better prediction performance.In this work, we fixed the distance threshold, δ, to 4 m.

2) CALIBRATING THE NUMBER OF BEST ADJACENT LOCATIONS
In the Markov-LSTM model, the number of best adjacent locations, k, plays an important role in the prediction process.We suggest that all users are independent individuals; thus, we select an optimal k value for individual.During the parameter calibration process, we set the range of k to [1, 2, . . ., 30] and use cross-validation to obtain the optimal VOLUME 7, 2019  combination of parameters for each user; this was performed to obtain the best k value and optimal prediction performance.To illustrate the experimental results more simply, we randomly selected three users for F1 − Measure@1 as examples with which to calibrate the hyper-parameters.These users were User 2, User 28, and User 42.Fig. 7 shows the effect of hyper-parameter k on the prediction model.The three users showed a consistent trend.As the k value increased, the prediction performance exhibited a rapid increase.When the k value reached a certain value, the predictive performance the model began to stabilize.These results allowed us to obtain the optimal k value for the three users (i.e., k user2 = 5, k user28 = 8, and k user42 = 6).Fig. 8 shows the different optimal k values for each user, which reflects the fact that each user is an independent individual.These results also reflect the appropriateness of the method.

3) CALIBRATING THE LSTM PARAMETERS
In addition, we further validated the impact of LSTM parameters on model performance.In the LSTM, we calibrated the number of hidden states and set the range of hidden states to [32,64,128,256,512,1024].Fig. 9 shows the prediction results.As the hidden size increased, the model predictive performance first increased then became stable.When the hidden size was 128, the model exhibited better prediction performance.

4) MARKOV-LSTM MODEL PERFORMANCE
After determining the optimal combination of parameters, the model proposed in this study was further analyzed from the perspective of model prediction performance.Fig. 10 displays the results, which are summarized below.
(1) Comparing the four indicators for the dataset, the model prediction performance gradually increased with an increase in X .This was particularly true when X = 3, when Accurary@3, Precision@3, Recall@3, and F1 − Measure@3 reached 72.07%, 69.57%, 61.38%, and 65.22%, respectively.Compared with X = 1, the indicators improved by 6.4%, 6.53%, 4.4%, and 5.36, respectively.Compared with X = 5,  the indicators were only reduced by 1.7%, 1.96%, 3.49%, and 2.82%, respectively.When X increased from 1 to 3, the performance of the model was greatly improved.However, X increased beyond 3, the performance of the model deteriorated.Thus, if the value of X is low, the prediction performance of the model is low.If the value of X is too high, the model prediction results would not have much value.The experiments determined that X = 3 is the most suitable value for this study.(2) Compared to the Accuracy@3 value of the model, the Precision@3, Recall@3, and F1 − Measure@3 for the Markov-LSTM were reduced by 2.5%, 10.69%, and 6.85%, respectively.This is predominantly because the Accuracy@3, which is more suitable for a binary classifier, can be misleading.However, the Precision@3, Recall@3, and F1 − Measure@3 values suggest that location prediction is a multi-classification problem.As the test samples of each classification were unbalanced, there was a slight decrease in indicator values.

5) COMPARISON WITH BASELINE METHODS
In this section, the collected indoor trajectory data was used to test the prediction performance of the Markov-LSTM model with five existing baseline methods.These baseline methods can be approximately divided into two categories.The first category includes the MC and HMM methods, which are regarded as classic statistical prediction models.
The second category includes the RNN, LSTM, and GRU methods, which are regarded as data-driven deep learning prediction models.The experimental results were analyzed using Accuracy@X , Precision@X , Recall@X , and F1 − Measure@X .Fig. 11 compares the prediction performance of the five models.

VI. CONCLUSION AND FUTURE WORK
Accurate and robust indoor location prediction plays an important role in indoor location services, particularly in the retail industry.For example, the ability to predict the next shop visited by users, as well as push information to the user on shops of interest, not only provides a personalized shopping experience to users but also boosts profits for retailers.Markov chains have been widely adopted for location prediction due to their strong interpretability; however, the k-MC is not suitable for predicting long sequences due to problems related to dimensionality.In this study, we proposed a novel hybrid Markov-LSTM model for indoor location prediction.First, a multi-step Markov transition matrix was defined, which decomposed a k-MC into multiple 1-MCs, solving the dimensional problem of the k-MC.Then, the LSTM model was introduced to merge multiple 1-MCs and improve model prediction performance.Experiments were conducted using real indoor trajectories from 50 users over 45 days to verify the predictive performance of the proposed model.First, we used the control variable method to obtain the optimal parameter combination of the Markov-LSTM model.When employing the optimal parameter combination, the model evaluation indicators Accuracy@3, Precision@3, Recall@3, and F1 − Measure@3 were 72.07%, 69.57%, 61.38%, and 65.22%, respectively.Then, we analyzed the predictive performance of the Markov-LSTM model using the test dataset.We conducted a comparison with five existing baseline methods, including the MC, HMM, RNN, LSTM, and GRU models.Compared with the existing methods, the Markov-LSTM model significantly improved indoor location prediction performance by enhancing Accuracy@3 by 6.29-43.43%,Precision@3 by 3.79-44.8%,Recall@3 by 9.23-35.02%,and F1 − Measure@3 by 13.80-39.68%.These results demonstrated the predictive performance of the Markov-LSTM model.The hybrid Markov-LSTM model is a generalized prediction model that can be applied to more than simply indoor environments in future research.However, before its wider application, the following aspects require further study: (1) verification of the proposed model with a variety of data sources such as GPS trajectories, (2) comprehensive comparisons with other prediction models, and (3) integrate more factors to boost model robustness, thereby further improving the performance of location prediction.

FIGURE 1 .
FIGURE 1. Basic definitions used in the prediction model: (a) movement of a user on the third floor and (b) the location sequence of a user in the indoor space.

FIGURE 2 .
FIGURE 2. Schematic of the overall Markov-LSTM model process.

FIGURE 3 .
FIGURE 3. Schematic showing the method of nearest-neighbor search.

4 )Algorithm 1 5 :
ALGORITHMS AND OPTIMIZATION In this study, the location sequence of user u, locSeq u , is divided into three parts: historical samples, training samples, and test samples.The historical location sequence is used to construct the k-step transition probability matrix, Y u(k) , of user u; the training location sequence is used to train the θ parameter of model M u ; and the test location sequence is used to test the prediction performance of model M u .Algorithm 1 shows the M u training process.Markov-LSTM Training Process Require: Individual trajectory: traj = {(u, t i , x i , y i , f i )} m i=1 Hyper-parameters of Indoor-STDBSCAN: 1 , 2 , MinPts Distance threshold: δ Length of adjacency locations: k u 1: Construct locSeq u based on 1 , 2 , MinPts, and δ //construct first-step transition probability matrix 2: Divide the locSeq u into his_locSeq u , tr_locSeq u , and te_locSeq u 3: Construct Y u(1) based on (4) with his_locSeq u //construct training instances 4: D ← ∅ For next i ∈ [k u , k u + 1, . . .,tr_locSeq u ] do 6: Construct Y u(2) , Y u(3) , . . .,Y u(k u ) by Y u(1) 7: Obtain index of the previous k u locations of tr_locSeq u [i]: ix i , ix i+1 , . . .,ix k u 8: Put a training instance ({Y u(1) [ix i ], . . .,Y u(k u ) [ix k u ]}s,tr_locSeq u [i]) into D //train the model 9: Initialize the parameters θ 10: Repeat 11: Randomly select a batch of instances D b from D 12: Find θ by minimizing (11) with D b 13: Until stopping criteria is met 14: Output the learned Markov-LSTM model M u 5) MODEL COMPLEXITY ANALYSIS Definition of the k-step Markov model is a key step in Markov-LSTM model.In this section, we analyze the advantages of the k-step Markov model from two perspectives: space complexity and computational complexity.

FIGURE 5 .
FIGURE 5. Sampling interval distribution of the trajectory data.

FIGURE 6 .
FIGURE 6. Impact of the distance threshold (δ) on the prediction performance.

FIGURE 7 .
FIGURE 7. Impact of the number of best adjacent locations, k, on F 1 − Measure@1 for different users: (a) User 2, (b) User 28, and (c) User 42.

FIGURE 8 .
FIGURE 8. Number of best adjacent locations for each user.

FIGURE 10 .
FIGURE 10.Location prediction performance of the Markov-LSTM model.

FIGURE 11 .
FIGURE 11.Comparisons of baseline methods and the Markov-LSTM model using the shopping mall dataset: (a) location prediction accuracy, (b) location prediction precision, (b) location prediction recall, and (d) location prediction f1-measure.

TABLE 1 .
Samples of user trajectory data.

TABLE 2 .
Sample shops in the shopping mall.

TABLE 3 .
Specifications of the experimental environment.