An Attention-Based Spatiotemporal GGNN for Next POI Recommendation

The task of Point-of-Interest (POI) recommendation is to recommend the next interest locations for users. Gated Graph Neural Network (GGNN) has been proved to be effective on POI recommendation tasks. However, existing GGNN solutions rarely consider the spatiotemporal information between nodes in the sequence graph, which is essential for modeling user check-in behaviors in next POI recommendation. In this paper, we propose an attention-based spatiotemporal gated graph neural network model (ATST-GGNN) for next POI recommendation. Firstly, the user’s check-in sequence is represented as a graph structure. Secondly, we use spatiotemporal context information to dynamically update nodes in the sequence graph, and obtain the complex transfer relationships between the check-ins. Thirdly, each session is then represented as the composition of the long and short preference using an attention network. However, current short preference fails to model union-level sequential patterns, we improve the local embedding representation of graph nodes by window pooling method, as well as the global embedding representation of graph nodes by integrating it into attention mechanism. Finally, the objective function is constructed by cross entropy and the model parameters are learned. The experimental results show that the precision rate and mean reciprocal ranking of ATST-GGNN method are greatly improved compared with the state-of-art methods. It has good application prospect.


I. INTRODUCTION
W ITH the rapid development of smart phones and global positioning systems, the location positioning function provided by intelligent terminals is becoming more and more accurate. Under this background, the locationbased social network (LBSN) service has been developed rapidly, and has been loved by lots of users, such as Gowalla, Yelp and Facebook Places and so on. Compared with traditional social networks, the advantage of LBSN is that users can publish their geographic check-in information in the form of check-ins, and share their experiences with friends, such as: gymnasium, coffee shop and restaurant, etc. Points of interest (POI) recommendation can help users find information of their own interest in the massive datas of location social network, and access new geographical locations and facilitate their life. So the task of POI recommendation is to recommend the next interest location for users [1], [2], [3], [4].
The traditional POI recommendation algorithm generally adopts matrix factorization. This method decomposes the user's check-in datas, and extracts the latent feature matrices of the users and the POIs, and make the recommendation for user [5], [6], [7]. However, the user's check-in datas are sequential. The current solution generally uses recurrent neural network (RNN) to analyze check-in sequence data [8], such as long-short term memory (LSTM) [9] and gated recurrent unit (GRU) [10]. Meanwhile, the current research shows that these methods can only model the one-way transitions of adjacent check-in points in sequences, while ignoring the other check-in ones. The current solution is to convert the user's check-in sequence to a graph, update the nodes of sequence graph by the gated graph neural network (GGNN), so as to obtain the complex transitions between the different check-in points [11]. However, there are still some problems with the method. Firstly, GGNN did not consider the spatial and temporal relationships between nodes in the sequence graph. Secondly, every POI session is represented as the composition of the long and short preference using an attention network. However, current short preference means the last location, and fail to model union-level sequential patterns.
To address the problem mentioned above, this paper proposed an attention-based spatiotemporal gated graph neural network (ATST-GGNN) for next POI recommendation. The model can further improve the accuracy of POI recommendation. The main contributions are as follows: (1)A spatiotemporal GGNN model is proposed. The method firstly convert the user's check-in sequence to a graph. Secondly, three kinds of contextual informations are integrated into the GGNN model. Finally, every node of sequence graph is dynamically updated.
(2) A window-based pooling method is proposed to obtain the short-term representation of user, and every POI seqence is represented as the composition of the long and short preference using an attention network.
(3)We conduct empirical studies on two real-world datasets of Foursquare and Gowalla. Experimental results show that the proposed ATST-GGNN outperforms the stateof-art methods. For example, compared with next POI recommendation model, the P@5, P@10, P@20, MRR@5, MRR@10 and MRR@20 value of ATST-GGNN are increased.
The remainder of this paper is organized as follows. Section 2 reviews the works related to the topics of collaborative filtering recommendation, neural network recommendation, sequence recommendation and graph neural network recommendation. Section 3 introduces the preliminaries to our study. Section 4 introduces the details of the proposed attention-based spatiotemporal GGNN network. Section 5 describes the experimental environment and results. Finally, Section 6 concludes this paper and outlines our future work.

A. COLLABORATIVE FILTERING RECOMMENDATION
Collaborative filtering recommendation include user-based collaborative filtering [12], item-based collaborative filtering [13] and matrix factorization recommendation [14], [15] etc. Zhang et al. [16] proposed personalized and efficient geographical location recommendation framework (iGeoRec). They could predict the probability of a user visiting any new location using his personal distribution, and developed an efficient approximation method to compute the probability of any user to all new locations. Zhao et al. [17] proposed a POI mining method, and a personalized recommendation model by fusing sentimental spatial context. The sentimental-spatial POI mining method is utilized to mine the POIs by fusing the sentimental and geographical attributes of locations. The sentimental-spatial POI Recommendation model incorporates the factors of sentiment similarity and geographical distance. Lian et al. [18] proposed a scalable and flexible framework, dubbed GeoMF++, for joint geographical modeling and implicit feedback-based matrix factorization.
They then developed an efficient optimization algorithm for parameter learning, which scales linearly with data size and the total number of neighbor grids of all locations. Gao et al. [19] proposed a POI recommendation algorithm by fusing time influence and matrix factorization. They introduced four time aggregation strategies, and integrated different time states into the user's check-in preference.

B. NEURAL NETWORK RECOMMENDATION
Neural network includes fully connected neural network [20], convolutional neural network [21] and recurrent neural network [8], etc. Machine translation [22], image classification [23] and recommendation system [24], [25] can be realized through neural network. Yang et al. [26] bridged collaborative filtering and semi-supervised learning. The potential characteristics of users and POI are learnt in deep neural network, and the preferences of users for the next POI are predicted. Xing et al. [27] proposed a POI recommendation model based on convolution matrix factorization by convolution neural network and probability matrix factorization. The model combines user check-in information, social relationships and comment information etc. Ma et al. [28] proposed a POI recommendation algorithm combining self-attentive coding and neighbor-aware decoding. Self-attentive coding uses multi-dimensional attention mechanism to analyze user preferences from different aspects. Neighbor-aware decoding can take advantage of geographic context information.

C. SEQUENCE RECOMMENDATION
With the development of POI recommendation, the next POI recommendation based on sequence information has become a new hot topic. Cheng et al. [29] proposed a novel matrix factorization method, namely factoring personalized markov chain and region localization (FPMC-LR). It exploits tensor decomposition to recommend POIs being close to the user's geographical position. Feng et al. [30] proposed a personalized ranking metric embedding method (PRME). It maps each location to a low-dimensional space to calculate the transfer probability between locations, and considers two influencing factors: user preference and timing transfer. With the development of natural language processing technology, recurrent neural network has been successfully applied in the fields of text classification and machine translation. Liu et al. [31] proposed a recommendation method based on spatial temporal recurrent neural network (ST-RNN). It combines time information and geographic information by time and distance transfer matrix. Zhao et al. [32] proposed a variant of LSTM, named a spatial temporal LSTM model (ST-LSTM), which implements time gates and distance gates into LSTM to capture the spatiotemporal relation between checkins. Lian et al. [33] proposed a geography-aware sequential recommender based on the self-attention network (GeoSAN) for location recommendation. They represented the hierarchical gridding of each GPS point with a self-attention based geography encoder, and put forward geography-aware negative samplers to promote the informativeness of neg-ative samples. Hao et al. [34] proposed an annular-graph attention based sequential recommendation model (AGSR) by exploring user's long-term and short-term preferences for the personalized sequential recommendation. Huang et al. [35] proposed attention-based spatiotemporal LSTM POI recommendation algorithm (ATST-LSTM). It combines time and space information on the basis of LSTM, and extracts effective information by soft-attention. Wu et al. [36] proposed a novel method named personalized long and short term preference learning (PLSPL) to learn the specific preference for each user. They combined the long and short term preference via user-based linear combination unit to learn the personalized weights on different parts for different users.

D. GRAPH NEURAL NETWORKS RECOMMENDATION
With the development of neural network, graph neural network can process graph structure data, including social network, knowledge graph, and so on. A graph representation learning algorithm DeepWalk is designed to learn vector representations of graph nodes based on random walk [37]. Unsupervised network embedding algorithms LINE is designed to extract any information of graph structure, including undirected graph, directed graph and weight graph, etc [38]. In addition, graph neural network can also be applied to recommendation system. Wu et al. [11] proposed session-based recommendation with graph neural networks (SR-GNN). The session sequences are modeled as graph structure. The vector representation of nodes in the graph is obtained by gated graph neural network. Finally, the soft-attention mechanism is combined to realize the session recommendation. Xu et al. [39] proposed a graph contextualized self-attention model (GC-SAN), which utilizes both graph neural network and self-attention mechanism for session-based recommendation. However, there exist some challenges preventing graph neural networks from becoming the best solution for next POI recommendation.
Fisrt of all, graph neural networks fail to model the spatial and temporal relationships between nodes in the sequence graph. They can be very useful for learning the node representations in the sequence graph by using the context information [40], [41], [42]. Secondly, attention mechanism has been designed for long-term user's preference. Zhao et al. [32] reported that users' short-term and long-term preference are both significant on achieving the best performance. However the short-term preference means that recommended POIs should depend on the last visited POI, and it fail to model union-level sequential patterns. In fact, several last actions jointly influence the target action [43].
To this end, in this paper, we propose an attention-based spatiotemporal GGNN, named ATST-GGNN. The user's check-in sequence graph based on GGNN combines the spatiatemporal contextual information, and the local embedding representation of graph nodes by window pooling method is improved , as well as the global embedding representation of graph nodes by integrating it into attention mechanism. The proposed method can better improve the accuracy of POI recommendation.

III. PRELIMINARIES TO THIS STUDY A. NOTATIONS AND DEFINITIONS
Notations are summarized in Table 1. Definition 1 (POI). In LBSNs, a point-of-interest (POI) is a geographical location associated with spatial and temporal information. Definition 2 (Check-in activity). A user's check-in activity is represented as v u ti which stands for visiting the location v for user u at time point t i . Definition 3 (Check-in sequence). A check-in sequence of a user u is a set of check-in activities of the user, denoted by Definition 4 (Check-in sequence graph). A check-in sequence graph of a sequence is sets of check-ins and edges, stands for set of check-ins. E u stands for set of edges between two adjacent nodes in sequence graph, it means that the next location is v u t+1 after visiting location v u t . The primary goal of this study is to offer target user a list of interested locations that the user is likely to visit at the next time.

B. GATED GRAPH NEURAL NETWORK
The primary challenge of the next POI recommendation problem is learning personalized user preference for POIs. As we know, a favorite choice is RNN, LSTM or GRU architecture. However, how to obtain transfer relationship between the check-ins in a check-in sequence and learn VOLUME 4, 2016 the embedding representations of graph-structured data are important problems needed to be solved. As the standard GGNN architecture can automatically extract features of sequence graph with considerations of rich node connections [44], in this study, we use it as a building block for next POI recommendation. Formally, the update functions of graph nodes are given as follows:  (4). The left part controls the current information of input by the z t i , the right part controls the former retained information by the 1 − z t i in the formula (5).
The A in i and A out i are defined as the i th row of neighbor connection matrix A = A in , A out , which represents weighted connections of sequence graph. For example, consider a sequence , the corresponding graph G u = (Q u , E u ) and the matrix A are shown in Figure 1.
There are incoming edge e in ji and outgoing edge e out ij for each node. The weight of incoming edge e in ji is represented by A in ji , and the weight of outgoing edge e out ij is represented by A out ij [45].
Function count (x, y) means whether there is an edge between node x and y. N i in is the set of predecessor nodes v i . N i out is the set of successor node v i . The neighbor connection matrix element can be written as:

IV. OUR APPROACH
In this section, we introduce how we build the proposed ATST-GGNN. The architecture of the ATST-GGNN is shown in Figure 2. Firstly, a user's check-in sequence is converted to a sequence graph confusing spatiotemporal information. Secondly, three kinds of contextual informations are integrated into the ST-GGNN to realize the dynamic update of node vectors. Thirdly, the short-term representation of user is obtained by window-based pooling method. We utilize attention network to learn the long-term representation of user. Finally, the loss function is constructed by cross entropy, and the model parameters are optimized.

A. SPATIOTEMPORAL CHECK-IN SEQUENCE GRAPH
Considering the effect of spatiotemporal contextual information on human real-world check-in activities, modeling the geographical and temporal influence to dynamically update the nodes in the sequence graph is essential to predict the next location. To model such information more effectively in the sequence graph, a model called spatiotemporal GGNN (ST-GGNN) is proposed. Firstly, we employ the distance interval ∆t x,y and the time interval ∆d x,y to define the spatial weight and temporal weight between nodes in sequence graph, respectively. The graph structure of user's check-in sequence is converted to the neighbor connection matrix, time connection matrix and space connection matrix. Secondly, we dynamically update node vectors by three kinds of contextual information, such as neighbor, time and space in ST-GGNN model. Finally, POI embedding representations of the graph nodes are obtained by iterative operation of ST-GGNN.
In sequence graph, the calculation method of incoming weight and outgoing weight based on time interval and distance interval is defined as T _A in ji , T _A out ij , D_A in ji and D_A out ij , respectively.
Funtion T _weight (x, y) is the weight of time interval between location x and y, and T _weight (x, y) = ∆tx,y tt , ∆t x,y > tt 0, ∆t x,y < tt , in which ∆t x,y is the time interval between locations, and parameter tt is time scaling factor. Funtion D_weight (x, y) is the weight of distance interval between location x and y, and D_weight (x, y) = ∆dx,y dd , ∆d x,y > dd 0, ∆d x,y < dd , in which ∆d x,y is the distance interval between locations, which can be counted based on the haversine formula of the coordinates of longitude and latitude of two locations, and parameter dd is distance scaling factor. The check-in sequence graph confusing spatiotemporal information can be converted into two kinds of connection matrices, such as: time connection matrix A T , spatial connection matrix A D besides neighbor connection matrix A, which can be written as:

B. DYNAMIC UPDATE OF NODE VECTORS
In order to combine spatiotemporal information in the process of node dynamic update, the graph node vectors q u i ∈ R d are all converted into graph input vectors a u i ∈ R 2d , a u Ti ∈ R 2d , and a u Di ∈ R 2d by A, A T and A D , respectively.
The graph input node vectors a u i , a u Ti and a u Di combine neighbor connection information, time connection information, and spatial connection information, respectively. The Where Formula (18), (19) and (20) is an improvement on formula (2), (3) and (4). The W dis_ra , W time_ra , W dis_za , W time_za , W dis_qa and W time_qa ∈ R d×d are also parameter matrices. Hyperparameters γ is used to adjust the effect of time and distance on the recommended next POI.

C. EMBEDDING REPRESENTATION OF SEQUENCE
The check-in behaviors of user are periodic and regular, so the contribution of each location in the check-in sequence is different for next POI recommendation. The irrelevant checkins can generate noise. Thus, attention mechanism is used in this paper, an attention-based spatiotemporal GGNN model is proposed.
The POI embedding of check-in sequence is obtained by ST-GGNN, that is hidden ∈ R n×d . In order to obtain the user's short term representation s l , a kind of window-based pooling method is proposed.
hidden j is the embedded representation of j th node being closest to the next POI. w is the size of window. The long VOLUME 4, 2016 term representation s g of the user is obtained by the attention mechanism. The weight coefficient α i is designed as follow.
W α , W l ∈ R h×l are the weight matrices of the embedded vectors. The long term representation s g is shown in following formula.
α i is the corresponding weight coefficient for sequence graph node q i . Finally, the long and short term are combined, and the embedded representation of sequence graph is obtained by linear transformation, that is s h .

D. MODEL TRAINING
The score of each candidate check-in can be calculated by embedding representation s h and check-in feature matrix L.
Then the probability distribution of each candidate check-in is calculated by softmax function, which can be written as follows:ŷ = sof tmax s h × L T (25) y is supposed to the true value of next location in POI sequence. The loss function is defined as the cross-entropy of the predictionŷ and the ground truth y.
All parameter are updated by stochastic gradient descent method to minimize the objective function until the convergence of the objective function is obtained. An attentionbased spatiotemporal GGNN POI recommendation algorithm steps are as follows: Algorithm 1. Training of ATST-GGNN Input: set of check-in sequences C, time scaling factor tt, distance scaling factor dd, weight coefficient γ. Output: ATST-GGNN model parameters Θ //construct training instances 1. For each user u in U do 2. For each location v u t S u do 3.
Add (train_time_i, train_dist_i) to D u ; 8. End for 9. Add a training list D u to D;

V. EXPERIMENTS
In this section, we firstly present the experimental datasets, evaluation metrics and parameter settings. Then, we assess the proposed method with compared methods. Besides, we analyze the performance results with different component, different vector dimension, size of window(w), as well as effect of time and distance(γ). Finally, the influence of different layers of ATST-GGNN are analyzed for two datasets. In this section, we intend to answer the following questions through experiments. Both sites all offer check-in services that allow users to share information about their current location, which contains user information, check-in information, location information and time information, and so on [33]. Firstly, two kinds of datasets are preprocessed. Following (Zhang et al.2019), the users whose check-in number is less than 10 will be deleted. The accessed check-ins being less than 10 will be deleted. The 70% of two datasets are used as the training set. The remaining 20% are used as the test set. The remaining 10% are used as the validation set to tune parameters. Statistical characteristics of the two datasets are shown in Table 2.

B. EVALUATION METRICS
The Precision@K and MRR@K (mean reciprocal ranking) will be used as evaluation indicators. The K represents the number of POI recommendation. The Precision represents the proportion of correctly recommended locations amongst the top-K locations. The MRR is the average of reciprocal ranks of the correctly recommended items. The MRR measure considers the order of recommendation ranking, where larger MRR indicates that correct recommendations items are in the top of the ranking list.

C. PARAMETER SETTING
The experimental platform is composed of Intel i5 CPU, 8G RAM, Windows 7 operating system, etc. The Pycharm is used as the development tool. Python 3.5 is used as the program language. Pytorch is used as the neural network learning framework. In experiment, hyperparameter learning rate lr is 0.001, L2 penalty λ is 0.003, the size of batch batch_size is 20, and the number of training epoch_num is 32. The distance scale factor is 200. The time scale factor is 30. The recommended length K is 5, 10, and 20, respectively.

D. PERFORMANCE COMPARISON (RQ1)
First, for question RQ1, to demonstrate the overall performance of the proposed model, we compare it with 7 kinds of state-of-the-art POI recommendation methods.
• BPR-MF [46] optimizes the matrix factorization model on implicit feedback data using a pairwise ranking loss. • FPMC-LR [29] is a POI recommendation method based on three-order tensor decomposition. • LSTM [9] applies recurrent neural network to learn users' sequential behaviors based on the check-in location sequences. • ST-RNN [31] combines spatial and temporal information based on RNN for POI recommendation. • PLSPL [36] consider the spatial and temporal features of POIs in user history records and leverage attention mechanism to capture user preference for POI recommendation. • SR-GNN [11] is a session recommendation method based on gated graph neural network. It is applied to the POI recommendation. • GC-SAN [39] utilizes both gated graph neural network and self-attention mechanism for session-based recommendation.
The overall performance in terms of Precision@K and MRR@K is shown in Table 3.
For both the two datasets, BPR-MF and FPMC-LR are the worst performer. Because BPR-MF and FPMC-LR don't analyze the check-ins' sequence features. LSTM algorithm is superior to BPR-MF and FPMC-LR in Precision and MRR. Because it analyzes the sequence characteristics of check-ins by long-short term memory network.
For the Foursquare, compared with the ST-RNN, P@5, P@10, P@20, MRR@5, MRR@10 and MRR@20 of PLSPL are increased, on average, by 27 Compared with the PLSPL, SR-GNN and GC-SAN recommendation algorithms have greatly improved in Precision and MRR. It shows that the gated graph neural network has obvious advantages in improving the performance of POI recommendation.
Finally, our method ATST-GGNN always achieves the best performance regardless of datasets and evaluation metrics, and gains 11.39% Precision and 13.79% MRR improvements on average against the strongest baseline GC-SAN. In ATST-GGNN, we use spatiotemporal context information to dynamically update nodes in the sequence graph, and obtain the complex transfer relationships between the check-ins, and improve the local embedding representation of graph nodes by window pooling method, as well as the global embedding representation of graph nodes by integrating it into attention mechanism. The results mentioned above indicate that ATST-GGNN can indeed enhance the performance of next POI recommendation.

E. ABLATION STUDY (RQ2)
Next, we turn to RQ2. To verify the importance of designed ST-GGNN and proposed attention module, the different components of model will be compared. GGNN indicates that spatiotemporal information and attention module are not used, and only the GGNN model is used. ST-GGNN indicates that attention module is not used, and only the spatiotemporal information and GGNN model are used. The experimental results are observed from the Foursquare. We have the following conclusions from Table 4. GGNN model performs the worst performance in terms of Recall and MRR, compared to ST-GGNN, and ATST-GGNN. ATST-GGNN model improves the Precision and MRR by 3%-11% and 8%-10% compared with ST-GGNN model, respectively. It shows that the effectiveness of spatiotemporal information and attention module is very important for POI recommendation.

F. PARAMETER STUDY (RQ3)
To answer RQ3, different vector dimension, hperparameter w and γ, will have different effects on the recommended results for the POI recommendation based on ATST-GGNN model.

2) Analysis with w
In this subsection, we investigate the impact of the hyperparameter w. It is set to 1, 2, 3, 4, 5, 6, 7, 8 and 9, respectively.  As shown in Fig.4, we can observe that the performance of our model is firstly improved, then decreases with a larger window size. When w = 5 on Foursquare or w = 7 on Gowalla, MRR becomes the maximum. This indicates that window-based pooling method is helpful to improve the accuracy of recommendations. For training efficiency, we choose w = 5 on Foursquare or w = 7 on Gowalla for our task.

3) Analysis with γ
Hyperparameter γ reflects the effect of time and distance on POI recommendation in the experiment. The experimental results are observed by adjusting the γ value from the Foursquare datasets. The Hyperparameter γ is set to 0.2, 0.4, 0.6 and 0.8, respectively.    5 shows that when the hyperparameter γ is different, the effect of distance factor and time factor is different in POI recommendation. when 0.2 ≤ γ ≤ 0.4, that is, the weight of time factor is larger than that of distance factor, the performance of POI recommendation can be improved when increasing the γ. when 0.4 < γ ≤ 0.8, that is, the weight of time factor is less than that of distance factor, the performance of POI recommendation decreases with the increase of γ. Experimental results show that time factors can better determine the next POI that the user will access. For example, if users go to the restaurant at noon, go to the coffee shop in the afternoon, and probably go to the bar in the evening in their life circle.

G. MODEL LAYER STUDY (RQ4)
The layers of ATST-GGNN reflects the order of each node obtaining the adjacent relationship in the check-in sequence graph. To answer RQ4, we will compares the effects of different layers from the Foursquare and Gowalla.   Fig.6 shows that in the Foursquare, when layer = 1, the Precision and MRR of the POI recommendation become the largest. When layer > 1, the Precision and MRR decrease gradually. It indicates that the first order neighbor feature of nodes extracted can obtain better performance. In the Gowalla, when 1 < layer ≤ 4, the Precision and MRR of POI recommendation increase gradually. When layer = 4, the Precision and MRR of POI recommendation become maximum. When 4 < layer ≤ 6, The Precision and MRR of POI recommendation decrease gradually. It shows that the fourth order neighbor feature of nodes extracted can obtain better performance. The experimental results show that the lower density dataset needs extract higher order neighbor feature of nodes.

VI. CONCLUSIONS
In this paper, we have proposed an attention-based spatiotemporal GGNN model for next POI recommendation. Firstly, the user's check-in sequence graph based on GGNN combines the spatiatemporal contextual information. Secondly, the local embedding representation of graph nodes by window pooling method is improved, as well as the global embedding representation of graph nodes by integrating it into attention mechanism. Finally, we test the algorithm in two open datasets. The experimental results show that Precision and MRR of ATST-GGNN are greatly improved compared with other recommended methods. For future work, we will further improve performance of POI recommendation by incorporating social networks, and more effective attention mechanisms. VOLUME