A Point-of-Interest Recommendation Algorithm Combining Social Influence and Geographic Location Based on Belief Propagation

The location-based social network (LBSN) contains a large amount of user check-in data informations, in order to better improve the recommendation performance and avoid the impact of user check-in data sparsity. It is proposed to mine the time-category informations in the user’s check-in data to carry out network modeling; by using the belief propagation algorithm on the time-category Markov network to obtain the user’s social influence set. Calculating the similarity and familiarity of the social users in the collection, linearly integrate the unified social influence factors and geographical location influences, to recommend locations. Experimental analysis in the Foursquare dataset, compared with other algorithms, the recommendation algorithm performance of combining social influence and geographic location based on belief propagation has improved.

In recent years, with the rapid development of communication networks, mobile devices and positioning systems, more and more users sign in to a location by using mobile intelligent devices and generate sign-in information on social software. This kind of ''user check-in'' behavior has given birth to more and more location-based social networks. At present, the typical location-based social networks (Location Based Social Network, LBSN) mainly include: domestic WeChat, public reviews, etc.; Foursquare, Gowalla, Facebook locations, etc. abroad [1]. In LBSN, when users check-in to a location, they share location information on social platforms or smart devices, and post some comments information about the location. Therefore, using a variety of information in the LBSN to recommend one of the tens of thousands of small locations in a large city that users are interested in or most likely to check-in to is called POI recommendation (such as scenic spots, restaurants, shopping malls, etc.). Different from traditional recommendation, location recommendations The associate editor coordinating the review of this manuscript and approving it for publication was Fabrizio Messina . are location-aware services that combine contextual information. For example, student A who is studying at school usually go shopping in an entertainment mall near the school. Then, on holidays, student A wants to choose to go sightseeing in the provinces and cities, so how to choose locations for vacation travel? What tourist locations would Student A is interested in? This requires the combination of context and other informations for location recommendation.
In LBSN, by using more information contained in the location, to analyze and mine the user's mobile check-in behavior, and the corresponding location is recommended. On the one hand, it can provide users with a quick and convenient retrieval service at the right time, effectively avoiding the inconvenience caused by information overload to the user; on the other hand, it can also stimulate the user's desire to explore locations, and bring certain commercial value by pushing recommended informations. The research on POI recommendation is getting hotter and hotter. The common recommendation methods include content-based recommendation method, collaborative filtering-based recommendation method, model-based recommendation method and hybrid recommendation method [2]. At present, many existing research work are based on the collaborative filtering algorithm. The main idea of the algorithm is to calculate the similar neighbors of the target object and recommend the ones they like or are interested in by calculating the similar neighbors of the object. With the rapid development of location-based social networks, it is more and more important to provide people with location services. Therefore, scholars are paying more and more attention to the topic of location-based recommendations. Ye et al. [3] first proposed the use of collaborative filtering methods for POI recommendation, which integrated the user's personal preferences, social and geographic location, improved the algorithm and the recommendation results. Ren et al. [4] used Gaussian model to model the social correlation between users and their friends, and to mine similar locations to users, which not only improves the recommendation effect, but also can be used to solve the cold start problem. Lee et al. [5] proposed the calculation of improving user similarity based on entropy, combined with the difference in user ratings, and improved the selection of adjacent user sets.
However, in reality, the number of POI locations is very large, and there are very few locations where users may check-in. In particular, there are still many POI locations that are unknown, and the use of collaborative filtering algorithms for location recommendation faces serious data sparseness. There are many challenges such as poor recommendation and cold start. Moreover, as time changes, the user's preferences will also change. For example, in daily life, people like to eat at home in the morning, and people choose to go to a nearby entertainment location for dinner at night. Therefore, when recommending locations to users, it is necessary to comprehensively consider the time factors, geographical factors, social factors and other aspects of informations.
Based on the above problems, this paper proposes an improved location recommendation algorithm. 1) Based on the influence of time, the time is segmented, and the network modeling is carried out by mining the mutual connection of users to check-in different types of locations in the corresponding time period; 2) Based on the network using belief propagation algorithm, the solution is based on the set of users with greater influence under the location category; 3) Based on the solved user set, by calculating the similarity and trust degree of the user (location) check-in, combined with the influence of geographical location, linearly combined to form a location recommendation list to implement recommendations. This algorithm can not only avoid the influence of data sparsity and improve the corresponding recommendation performance, but also can recommend locations for new users.

II. RELATED WORKS
In LBSN, users will describe and share the location at a certain time, and the resulting location recommendations are becoming more and more popular. Many scholars at home and abroad have used rich geographic informations, time informations, social informations, etc. are merged to achieve location recommendations and improve the performance of recommendations.

A. IMPACT BASED ON GEOGRAPHIC LOCATION
In the location recommendation algorithm, the research on the influencing factors of geographic location is more important. In reality, everything is related, but similar things are more closely related, and the probability of POI signing follows a Power Law distribution [3]. Existing research usually uses Power Law distribution and Gaussian distribution to characterize the equidistant distribution between locations and analyze the influence of location. In [6], a multi-center Gaussian model is proposed. It is believed that the location where users usually check in is based on a certain center. By digging the center location to improve the impact of geographic location on the recommendation, the disadvantage is that the model is difficult to distinguish between different User modeling determines the center of the location. In [7], the kernel density estimation method is used to convert the geographic location model into the distance distribution between users. This method is more difficult to calculate when the distance is estimated, and has a narrow application area. In [8], the influence of geographic location was merged in the recommendation process of locations, and the power law distribution was used to characterize the influence of different geographic locations between locations. Then, based on the Naive Bayes method, the influence score of geographic location on the recommendation was solved. Make unified recommendations. In reference [9], the influence of geographical location on location factors was simulated by integrating three influencing factors: geographical influence, sensitivity and distance between locations.

B. INFLUENCE BASED ON SOCIAL FACTORS
With the gradual progress of research, it was found that the recommendation of locations is based only on location factors, the recommendation results are still greatly improved, and other factors in the user's check-in data also affect the final recommendation.
Based on the development of social networks, In [3] added the influence of social relationships to the algorithm based on collaborative filtering, and users will not only be influenced by friends but also by others around them. In [10] first proposed a social probability matrix decomposition model, which integrates the trust relationship between users' social friends into the matrix decomposition process to optimize the recommendation.
In many POI recommendation processes, when constructing a user preference model, only the similarity between users or the similarity of user check-in categories are considered without combining the two for location recommendation; in the construction of social model, only direct trust relationship is used, which makes it difficult for users who have no friends or few friends to do social modeling. In [11], based on social networks, the local trust relationship transfer and the untrusty relationship transfer are studied separately, and the two are VOLUME 8, 2020 combined to improve the influence of the social relationship and further improve the recommendation effect.
In summary, much research works on the location recommendation algorithm based on social relationships are only for the target user's friend collection, and does not take into account the impact of non-friend relationships on it, and friends who ignore friends may also have contacts.

C. INFLUENCE BASED ON TIME
In real life, people attach great importance to time, and people's activities always revolve around the periodicity of time. For example, on weekdays, people usually appear in companies, organizations, and other locations and people go to shopping malls, restaurants, and other locations of entertainment. With the change of time, user behaviors and interests will also change.
Therefore, in the LBSN, the user's check-in behavior of the location will generate time information. So, we can recommend the location by mining and analyzing the time characteristics of the user's behaviors. Based on this, reference [12] divides the working days and rest days based on time, and reference [13] divides the user's check-in time into segments, and divides the 24 hours of the day into 24 time segments. Reference [14], the time series is considered to be continuous, and the Markov chain sequence is constructed to predict the location to be checked-in at the next time.
Many time-based location recommendations only unilaterally study location changes against the factor of time. Either the time division granularity is too large, or the dynamic recommendation cannot be carried out at specific time points, and the hidden relationship between time factor and other influencing factors in location recommendation is not considered.

D. IMPACT BASED ON LOCATION CATEGORY
At present, there are few researches on recommendation based on location categories, categories can map users' potential preferences and play an important role in certain scenarios.
The check-in data of LBSN contains a lot of information about categories, and a location can contain multiple category labels. We can use these category tags to complete location-based recommendations. In [15], the time components were used to describe the user's check-in behaviors of in different types of locations as a time curve.
The influence of the time curve on the check-in was studied, and then the user behavior similarity was calculated based on the time curve. Reference [16] the third tensor optimized by List Wise Bayesian Personalized Ranking (LBPR) method was used to predict the type of user check-in; two functions are introduced, namely the Plackett Luce model and cross entropy, which were used to calculate the subsequent ranking. Then, the predicted categories are filtered according to spatial influence and category ranking influence.
Based on some deficiencies in the above research work, this paper proposes an improved location recommendation algorithm based on time-category, social relationship and geographical factors. The main contributions are as follows: (1) The algorithm in this paper performs a segmented processing of time to study the dynamic changes of user preferences.
(2) By combining the classification and time of the location, mining the hidden connection between the category and time in the user's check-in location to model, alleviating the impact of data sparsity on recommendations.
(3) This article uses the belief propagation algorithm on the previously modeled network to generate a social relationship-based influence set. In this set, there are friend relationships and non-friend relationships. By calculating the similarity and trust degree, we study the influence of social relationship on recommendation under the set of time category division.
(4) Regarding the cold start problem in the recommendation problem, this paper can also recommend the user in the user influence set solved in the previous section to sign in, or recommend the location with high probability of signing in to new users.
(5) Through experimental verification on the data set, the results show that the algorithm proposed in this paper is better than other comparison algorithms

III. TIME-CATEGORY NETWORK MODELING A. NETWORK MODELING BASED ON TIME-CATEGORY FEATURES
The check-in data in the LBSN contains time tamps information and location category information. The category information implies the location style and the categories and services provided. By analyzing the category of the place where the user checked-in, the user's personal preferences and their own wishes are obtained. In the Foursquare data set, all POIs are divided into eight different categories, and users will check-in to different categories of locations at different times, so considering the relationship between time and categories is helpful for better recommendations.
This article divides the 24 hours of a day into 24 time periods based on experience [13], for example, 0:00 to 1:00 is the first time period, and 23:00 to 0:00 is the 24th time period. Users will check-in to a certain type of location within a certain period of time, and different users have different time planning. By mining the rules of checking in for different types of places under the time condition between users, the hidden relationships is discover. For example, user A go to restaurant A near company 1 for lunch at 12:10, and then after lunch at 12:30 he will go to store 1 to buy a drink; user B will go to restaurant B packed meals at 12:40 near company 2, then he goes to convenience store 2 at 12:50 to buy water. It can be seen that in the same time period, user A and user B have checked in to many different locations, but the checked-in locations all belong to one category. If this behavior occurs multiple times, we can mine the relationship.
In order to discover the connection between such users based on time-categories check-in, we construct a check-in  network diagram based on category-time [17]. At this time-category check-in network graph G = (C, E), the set of location categories is C, and the set of edges is E. The weight of the edge is W , which indicates how many users have checked-in to the category locations at both ends of the edge within a certain period of time. ''()'' is the number of check-in locations. If a certain type of locations c i and c j are checked-in within a certain period of time, the weight E i→j of the side, W i→j is increased by 1. The traversal of all users corresponds to the number of users, which is the corresponding directed edge weight W i→j of c i to c j . As shown in Fig. 1.
Similarly, construct a user-based categories-time check-in graph G = (U , E). In this network graph, there is a set of users U , the set of edges is E, and the weight of the edges is W . ''()'' indicates the number of users who checked-in. If, within a certain period of time, users and have checked-in to a location of the same category, then the weight value W i→j of their side E i→j is increased by 1. Traverse all the check-in locations, and the total weight of the directed edges that meets this category is u i to u j . As shown in Fig. 2.

B. NETWORK UNDIRECTED GRAPH CONVERSION
Converting the (user) time(categories)check-in network graph constructed above to an undirected graph. For a given location category set C = {C 1 , C 2 , · · · , C N }, a certain category sample c i in the location category set is regarded as the vertex v i of the undirected graph model, from which the vertex set V = {V 1 , V 2 , · · · , V N } of the undirected graph vertices can be obtained [18]. The edge set of vertices is defined as E, and expressed in the form of an adjacency matrix of the graph. The element E (i, j) in the adjacency matrix indicates  that the vertices v i and v j have a time-dependent relationship, there is a certain temporal relationship between the types of check-in locations.
The method of determining its value is: (1) In the entire location category data set, there are categories c i and c j , and within a certain time segment, the number of users who have checked in to the same category location successively is defined as the edges of c i and c j ; (2) The maximum number of users who have checked in to a certain category in sequence is defined as the weight value of E (i, j).
There are: Among them,c i↔j indicates that the number of users who have checked in to categories c i and c j successively; c i ↔ c j indicates that there is an edge connection between categories c i and c j , that is, there is a check-in behavior.
The undirected graph network corresponding to Fig. 1, as shown in Fig. 3.
Similarly, the undirected graph model corresponding to Fig. 2 can be obtained, as shown in Fig. 4.
In this paper, the implicit relationship between the category and time of the user's check-in is mined to construct an undirected graph of the check-in network.
Define the random variable Y i corresponding to the vertex of category c i to get a set of random variables Y = {Y 1 , Y 2 , · · · , Y N } [19]. The following proves that the established category undirected graph network is Markov features: VOLUME 8, 2020 In the category sample data set, category data samples c i and c j have a time-series relationship, that is, within a specified time, the number of users who have checked in to these two categories in succession will not change with the presence or disappearance of other categories. The time sequential influence relationship E (i, j) between categories c i and c j is independent of the influence relationship between other arbitrary category data samples C (c ∈ [1, M ] , c = {i, j}). The elements of edge set E are also independent of each other.
For data samples c i and c j , the corresponding vertices in vertex set are v i and v j , the corresponding random events in Y are Y i and Y j , and the corresponding element in edge set E is E (i, j); except v i and v j the other arbitrary vertices are denoted as v k , and the random event corresponding to v k is denoted as Y k . Available from known conditions that the random even Y i and Y j corresponding to the vertices v i and v j in E the undirected graph are independent of each other, that is, the joint distribution probability corresponding to the events Y i and Y j satisfies,as in: as in (23)proves that the undirected graph G = (V , E) of the constructed category consumer network satisfies the pairwise Markov property. According to the definition of the probability undirected graph model (PUG), the undirected graph G = (V , E) established is a Markov network [19].

C. BELIEF PROPAGATION ALGORITHM
BP is widely used in the Markov network to infer the state of the target node according to the state of its neighboring nodes. Information is passed between nodes to obtain the confidence value of the node to infer the corresponding state of the node.
The message sent from one node to another is a vector, called the message vector, which takes possible states as elements [20]. For the process of information transfer between nodes, as in: m ij is the information sent from node i to node j, which refers to the confidence value of node i when node j is in state x c [20]. The message from node i to node j is composed of all messages from the neighboring nodes of the node i, except for node j itself.
After the message update is completed, the confidence value of the node needs to be calculated, as in: The confidence of a node obtained by the belief propagation algorithm is the edge probability of the node. As shown in Fig. 5, the process of node 2 propagating messages between neighboring nodes 1, 3 and other nodes 4, 5 is to obtain the marginal probability value of node 2. as in: After sorting out the above equations, the confidence of node 2 is finally obtained, which is exactly the marginal probability value of node 2, that is, the confidence of node.
When using the BP algorithm to calculate on the Markov network, instead of searching the edge nodes layer by layer along the entire edge from the starting node for recursive calculation, a node and neighbors nodes are randomly found, and the messages sent by the neighbor nodes are used to calculate the neighbors. The beliefs value of the node; then calculate the value of this point and the value of other nodes, and repeat the above process. The calculated node confidence is the marginal probability.
According to the marginal probability solved by the BP algorithm, the marginal probability values are sorted in descending order, and the Top-N nodes with large category node values are selected by users in the network to construct a user set with a large social influence under the time-category condition U .

IV. HYBRID RECOMMENDATION ALGORITHM A. INFLUENCE OF SOCIAL FACTORS
In the previous section, the belief propagation algorithm was used to solve the set of users with the greatest influence. In this set, the user may have a friend relationship before, or a non-friend relationship. Therefore, we divide the user check-in category set solved above into a friend set U F and a non-friend set (stranger set) U S .

1) SOCIAL SIMILARITY CALCULATION
Suppose that the set of friend users is U F , the set of check-in locations is L, and the user's check-in matrix for locations is C, where user C u,l = 1 indicates that user u ∈ U F has checked in at location l ∈ L, and C u,l = 0 indicates that user has not checked in at location l. Given a user u, if the user has not checked in to location l, then the probability that user u will check in at location l under the influence of social relationships is:p In the above equation, SJ u,v represents the unified social similarity between user u and userv.
There are many ways to calculate user similarity. In this paper, the cosine similarity is used to calculate the user similarity in the friend set [21]. For users u and user v, the similarity calculation is as follows:

2) SOCIAL FAMILIARITY CALCULATION
In social networks, the interaction between people is not more than six people, which means that friends of friends will also have some kind of connection. We can calculate the familiarity between non-friends based on the transfer between friends.
In the non-friend set, if the number of common friends is greater, the familiarity between strangers will increase. Here we are using the Jarccard coefficient [22] to calculate the familiarity between sets. as in: In the above equation, U S,u and U S,v represent the sum of users u and v in the non-friend set U S .
In summary, we can use the similarity of the friend set U F and the familiarity of the non-friend set U S to represent the influence of the social factors of the user's check-in set U on the location recommendation.
In summary, there is a uniform user social similarity:

B. GEOGRAPHICAL FACTORS
When the user check-in, the closer the distance is, the higher the probability of signing in, and the checking-in conforms to the power law distribution [3]. The Power-Law distribution represents the probability relationship between two locations where users check-in, and is expressed by (14)as: In the above equation,p [d (l 1 , l 2 )] represents the probability of signing in; d (l 1 , l 2 ) represents the distance between locations l 1 and l 2 ; and are parameters.
Taking the logarithm of the above announcement: Using the least square method to get the value of the parameter.
Let the location where the user currently checks in is l i , the location where the user will check in is l j , and the distancedis l i , l j between the location l i and l j . Then the conditional probability of the user's check-in location is the ratio of the user's probabilityPr l j | l i of check-in l j to all locations. As shown in the following, as in: There is a history record L u i of the user u i and his checkin. The Bayesian method is used to obtain the conditional probability p l | L u i of the location where the user will check-in.
Then there are: For the locations that have not been checked in, they are sorted in descending order according to the probability value of the solution, and the first is selected for recommendation.
The calculation based on geographical influence factors is as in:p u,l = maxp l | L u i (18)

C. HYBRID RECOMMENDATION ALGORITHM
Aiming at the social factor-based influence factor and the geographical factor-based influence factor solved above, a linear combination is made.
In view of the fact that different influencing factors have different scopes of action, the solved influencing factors should be normalized.
The final recommendation is determined by S u,l , and the top K locations with larger values are selected as the recommendation target to be checked in.
In the above equation, β is the balance parameter, and the value range is [0, 1].

V. EXPERIMENTAL ANALYSIS A. EXPERIMENTAL DATA
The experiment used a Foursquare data set [23] on a location-based social networking site in the United States. This data set contains all the historical information left by VOLUME 8, 2020 the user for the check-in, such as the time stamp of the check-in, the location type of the check-in, and the check-in User information, etc. This experimental data set is part of it. First, pre-process the experiment to clear the users who checked in less than 4 and the locations where the check-in less than 4. The data contains more than 1,000 users, more than 300 types of locations, more than 5,200 locations, etc. Randomly select 80% of the data for training and the rest for testing.

B. EVALUATION CRITERIA 1) ACCURACY RATE
In a given location recommendation list, the ratio of the number of users actually checked in to the total number in the recommendation list. The recommendation list given by R (u), the list actually checked in by T (u). Then there is the accuracy rate:

2) RECALL RATE
In the recommended list of given locations, the ratio of the number of check-ins that appear on the test set at the same time to the check-ins in the test set. The recommendation list given by R (u), the list actually checked in the T (u) test set. Then there is the recall rate:

1) EFFECT OF PARAMETER α
When considering the user's social influence, the solution influence set is divided into two parts: friend set and non-friend set. Calculate the similarity in each set separately.
Parameter αis used to balance the last friend similarity and non-friend trust. The difference in the value of parameter α will affect the social impact factor. Through experimental analysis, as shown in Fig. 6, when the parameter α = 0.5 under TOP-10 conditions, the recommended accuracy is the highest. It shows that when the similarity calculated based on the friend set and the trust degree calculated in the non-friend set account for the same proportion, the value of the social impact factor is the best. In the following experiment, set parameter α = 0.5.

2) EFFECT OF PARAMETER β
Parameter β is a parameter that is ultimately used to balance the user's social impact factor and geographic location impact factor. In this experiment, by setting different values of parameter β, you can view the results of the recommended accuracy rate and recall rate. As the parameter setting value increases, it can be seen from the Fig. 7 that when the parameter value β = 0.6, the effect is the best.  Experimental setting α = 0.5, β = 0.6. It can be seen from Fig. 8, Fig. 9 that the SFTL only considers the user's social factors, and does not consider the geographical factors, so the recommendation effect is low; the FLIL takes  geographical influence factors into consideration to improve the recommendation effect; STSL by adding the time factor on the basis of geographical factors, the accuracy of the recommendation is much improved compared to the previous two algorithms. In addition, the fusion of social relationships (MFFL) on the factors of time and geography proves that the results of the hybrid recommendation algorithm are better. But the above algorithms all consider the influence of various factors separately and then combine them linearly. In this paper, the time factor and category factor that influence the location recommendation are combined and analyzed to strengthen the influence of the factor, and then the social factor and the geographical factor are combined linearly. Experiments have shown that the combination of influencing factors and consideration of each factor alone can improve the recommendation results. Therefore, the recommendation algorithm in this article can be recommended.

4) COLD START PROBLEM
For new users who do not have similar users and do not consider various factors, they can follow the user s social influence set solved in this article, and recommend to new locations where users frequently check in or have a higher check-in frequency in the influence set user. It can be recommended to new users for new locations that users in the subsequent influence set will check in.

VI. CONCLUSION
In LBSN, the user's check-in data is sparse, and the fusion of multiple information such as time information, social information, and geographic location information in the check-in history data can improve the performance of recommendations. In this paper, an improved algorithm is proposed. By mining time-category information network modeling, a confidence propagation algorithm is used to solve the set of the most influential users. Combining the influence of social similarity and geographic location, the recommendation results are optimized.