PredicTour: Predicting Mobility Patterns of Tourists Based on Social Media User’s Profiles

This paper proposes PredicTour, an approach to process check-ins made by users of location-based social networks (LBSNs), and predict mobility patterns of tourists visiting new countries with or without previous visiting records. PredicTour is composed of three key parts: mobility modeling, profile extraction, and tourist mobility prediction. In the first part, sequences of check-ins within a time interval are associated with other user information to produce a new structure called “mobility descriptor”. In the profile extraction, self-organizing maps and fuzzy C-means work jointly to group users according to their mobility descriptors. PredicTour then identifies tourist profiles and estimates mobility patterns of tourists visiting new countries. When comparing the performance of PredicTour with three well-known machine learning-based models, the results indicate that PredicTour outperforms the baseline approaches. Therefore, it is a good alternative for predicting and understanding international tourists’ mobility, which has an economic impact on the tourism industry when services and logistics across international borders should be provided. The proposed approach can be used in different applications, such as in recommender systems for tourists or in decision-making support for urban planners interested in improving tourists’ experiences and attractiveness of venues through personalized services.


I. INTRODUCTION
The tourism industry is essential in several economies. According to the World Tourism Organization (WTO), the flux of tourists around the world generated revenues of more than one billion US dollars in 2019 [1], and it created millions of direct and indirect jobs [1].
In this context, it is relevant to understand patterns of tourists' behaviors to improve the attractiveness of venues with more efficient and personalized services. In particular, the study of tourist mobility is an under-explored aspect of tourism scholarship [2], [3]. Despite previous efforts, very few works have attempted to model the mobility patterns of tourists on large scale [4], [5]. One of the challenges involves finding the appropriate type of data.
The associate editor coordinating the review of this manuscript and approving it for publication was Qilian Liang .
Location-based social networks (LBSNs) as Foursquare, Waze, Twitter, and Instagram, 1 provide a new range of possibilities to obtain data on large scale, especially with a considerable increase of social media users. LBSNs have been successfully explored in large scale studies on users' behavioral patterns [6]. They range from identification of specific groups of people with the same interest [7] and study of socio-economic problems in different areas of a city [8], [9], to the understanding of cultural boundaries, and similarities between societies [10]- [12]. In addition, because LBSN data can be obtained from different places around the world, this type of data represents an alternative for studies interested in behavioral patterns of users acting as tourists [7], [13], [14].
The present work aims to predict international tourists' mobility patterns using LBSNs. We provide a novel approach called PredicTour. First, it models the mobility of users from different perspectives to produce a mobility descriptor of each user. Next, it explores this structure to extract profiles of those users classified as tourists with similar characteristics. Finally, taking all the obtained information together, i.e., the model of tourists' previous mobilities associated with their identified profile, the proposed approach predicts the international tourists' mobility pattern in different countries. The experiments show that PredicTour can be extended to cases with no prior information about the tourist's behavior in other countries.
In the present paper, we aim at answering three research questions: (i) what kind of intrinsic relationships can be observed when we group users? (ii) what kind of pattern can be observed in each profile? (iii) how does PredicTour perform when compared with baselines under different difficulty levels? To comparatively evaluate the performance of PredicTour, we consider well-known machine learning-based models -Deep AutoEncoder, Multi-layer Perceptron, and Collaborative Filtering -as comparison approaches. The results indicate that PredicTour outperforms all the baseline approaches, improving the understanding and prediction of international tourists' mobility. The main contributions of the study can therefore be summarized as follows: • The proposition and exploration of a new structure mobility descriptor that considers different data features ranging from straightforward information, such as users' origin and destination countries, to sophisticated information extracted from users' mobility in LBSNs.
• An approach to perform profile extraction which separates groups of tourists with similar mobility patterns and describes each group based on its profile.
• A novel methodology to predict mobility patterns of international tourists when visiting new countries with and without previous information. Based on these contributions, we believe that PredicTour can be helpful for many applications in tourist planning. For instance, it can build recommender systems of new places for particular groups of international tourists.
The remainder of the paper is organized as follows. Section II discusses related works. Section III details PredicTour. Section IV describes the methodology used in the experiments, with comparison approaches and metrics being discussed in Section V. Results are presented and analyzed in Section VI, with conclusion and future perspectives discussed in Section VII.

II. RELATED WORKS
In the literature, there are many studies related to our research in different aspects. To summarize the main related topics, we consider the aspects of human mobility using usergenerated data, prediction of trends from LBSN data, and characterization of tourist activity patterns.

A. CHARACTERIZATION OF HUMAN MOBILITY
According to Barbosa et al. [15], the study of human mobility is especially important for applications such as estimating migratory flows, traffic forecasting, urban planning, and epidemic modeling. As the authors show, there are several initiatives in this direction. For instance, Pappalardo et al. [16] use mobile phone and GPS data to explore patterns of human mobility. They discovered the existence of two distinct classes of individuals: returners and explorers. Lima et al. [17] use mobile network data records (from call detail record data) to analyze the behavior of a large number of individuals. According to the authors, human mobility and social structure are important characteristics to understand the diseases spreading. Mourchi et al. [18] build a set of features that capture spatial, temporal, and similarity characteristics of user mobility and combine these features for future location prediction.
In the study of Amoretti et al. [19], a smart mobility application that recommends points of interest (POI) is proposed based on users' behavior. Aiming to build individual and group behavior profiles, the authors consider user actions, for instance, through check-ins and preferences. Luceri et al. [20] investigate the social influence and how it impacts human behavior from event-based social network data. They study how influence propagates among subjects in a social network. In a similar direction, Roy et al. [21] quantify the impacts of an extreme event on human mobility, showing that geolocated social media data allow studying socio-economic impacts and help to guide policies toward developing disaster strategies. The study developed by Rajashekar et al. [22] shows that the behavioral models proposed by the authors are capable of uniquely identifying each user under a one-class learning constraint. They used smartphone data such as those provided by specific applications, cell towers, and websites to construct a userspecific behavioral model.

B. PREDICTION OF HUMAN MOBILITY
Regarding the predictability of human mobility, Gonzalez et al. [23] present a deep study with 100,000 mobile phone users whose positions have been tracked for six months, presenting evidence supporting such a phenomenon. Connected to that, Song et al. [24] raise an important question: To what degree is human behavior predictable? The authors explore the limits of predictability in human dynamics by studying anonymized mobile phone users' mobility patterns. Brockmann et al. [25] show that human traveling behavior can be described mathematically on many spatiotemporal scales by a two-parameter continuous-time random walk model with surprising accuracy. Moreover, Zheng et al. [4] explore multiple users' GPS trajectories to mine interesting locations and classical travel sequences in a given geospatial region. Ben Zion and Lerner [26] show evidence that by using cellular data, it is possible to identify 9258 VOLUME 10, 2022 social lifestyles and extract mobility patterns to better predict the trajectory of users.
Considering related works that also explore LBSN data, Hsieh et al. [27] propose a system to recommend timesensitive trip routes. They use a sequence of locations with associated timestamps, based on the knowledge extracted from large-scale check-in data. In the same direction, Gu et al. [28] developed a system that can accomplish fast routing in LBSN, leveraging geographical knowledge predicted from check-in data. Wang et al. [29] use LBSN data to construct a prediction model for POI, such as restaurants, stores, popular attractions, and hotels.
Silva et al. [30] propose a technique based on transition graphs that summarizes people's movements between location categories of venues; the authors highlight that specific transitions are much more probable than others. Domenico et al. [31] focus on the study of the interdependence and predictability of human mobility and social interactions. Senefonte et al. [14] propose a novel classification approach k-FN to identify the venue categories from unlabeled geolocated check-ins with noisy data using mobility patterns of users. Additionally, D'Silva et al. [32] propose a prediction framework able to forecast weekly popularity dynamics of new places by using mobility data from Foursquare and k-nearest neighbor metrics.

C. TOURISTS' ACTIVITY PATTERNS
There are also studies related to tourists' activity patterns in the literature exploring large-scale mobile data. For example, Vu et al. [33] analyze cross-country tourist activities. Grinberger and Shoval [34] explore smartphone data to study tourists' activity patterns and the time-space resource allocation decisions they support. Lozano and Gutiérrez [35] use WTO data to study global tourism network. Ferreira et al. [36] consider spatio-temporal aspects of the behavior of tourists and residents to analyze the tourists' behavior. Yochum et al. [37] present a current systematic review and map the linked open data, i.e., structured information in a format meant for machines, to a location-based recommendation system in the tourism domain.
More closely related to our present study, Ferreira et al. [38] explore LBSN to study the tourist movements through time and space. The authors propose an approach based on a topic model (using Latent Dirichlet Allocation -LDA) to automatically identify mobility pattern themes used to better understand users' profiles.
We explore some of the gaps found in the related works. For example, no other work has explored complex data regarding users' mobility patterns. Moreover, researchers that cluster users by a profile approach usually employ crisp clustering algorithms such as K-means. As human behavior is quite complex, fuzzy clustering methods (as the C-means explored in our approach) tend to be more appropriate because they allow us to capture a more complete and realistic scenario, especially with the idea of one individual belonging to more than one cluster, i.e., the individual might present characteristics of more than one profile. Another important advantage of our method is the use of LBSN data since it improves scalability compared to other sources of data explored in the literature. It is also worth mentioning that, to the best of our knowledge, there is no previous approach proposed to help predict international tourists' mobility in different countries in the way we do in this study. As detailed in the next section, we use mobility descriptors to provide clusters of tourists with similar profiles. Then, the proposal predicts unknown mobility patterns based on those profiles, which can be associated with tourist historical data whenever they are available.

III. PredicTour
Here we describe the new proposed approach called Predic-Tour that considers relevant features extracted from LBSN data beyond the trivial tourist origin and destination countries. An overview of the proposed approach is depicted in Figure 1, which includes three main parts: (i) Mobility Modeling, (ii) Profile Extraction, and (iii) Tourists' Mobility Prediction. The first two parts are building blocks for the main task of the third block. Figure 1 also highlights the key outputs of each part of PredicTour. The output of the first block is a mobility descriptor, which aggregates essential features of tourists' behavioral patterns. The second block identifies user profiles based on mobility descriptors. The prediction of user mobility patterns is the output of the last block.

A. MOBILITY MODELING
One of the contributions of the present work is a structure, namely mobility descriptor, that describes tourists' mobility patterns considering features from several perspectives, providing a rich description of the phenomenon under study. A mobility descriptor is a vector d = (m|v class |v home |v dest ) obtained from the concatenation of a mobility vector m ∈ N (V ·V ) with a binary vector v class ∈ {0, 1} 2 composed of two elements, and two binary vectors v home ∈ {0, 1} L and v dest ∈ {0, 1} L , both in one hot codification, with as many components as the number L of locations (countries in our case).
The mobility vector m contains the number of transitions made by a tourist between two venue categories in a set of V different categories (or types of visited places) according to [7], [13], [14]. The other vectors improve data from different perspectives. Vector v class expresses user's classification as returners or explorers, a concept and methodology proposed by [16]. The user's country origin v home and destination v dest are then aggregated to the previous vectors to complete the information needed. Appendix A details the algorithm of building the mobility descriptor d (dest) u of a user u visiting a destination country dest.

B. PROFILE EXTRACTION
The second block of PredicTour extracts profile patterns from mobility descriptors. It encompasses two main tasks: VOLUME 10, 2022  (i) spatial organization of mobility descriptors using a Self-Organizing Map (SOM) [39]; (ii) clustering of outputs provided by SOM into different profiles with the Fuzzy C-means (FCM) algorithm [40]. SOM has been chosen due to its unsupervised learning capability of discovering intrinsic relationships among data and spatially organizing it. FCM generates a membership degree for each input pattern to its corresponding cluster which is further limited by crisp boundaries. The combination of SOM+FCM allows one to use the spatial organization provided by SOM for clustering data in different profiles with membership degrees indicating ''how much'' a user belongs to a cluster. This information is essential for dealing with cases in a gray zone, i.e., close to cluster boundaries, by providing more robust information.
The mobility descriptor is presented at the SOM's input layer as shown in the architecture map of Figure 2.
The output layer is a map grid of K neurons. A neighborhood N considered in the map grid defines how weights of each output neuron are updated during the training phase. The neighborhood structures can assume different shapes as the hexagon of Figure 2. The algorithm for building the profile extraction is shown in Appendix B.

C. TOURISTS' MOBILITY PREDICTION
The last block of PredicTour performs two tasks: (i) identification of the profile most associated with a target tourist; and (ii) prediction of the mobility pattern for unvisited countries based on the tourist's profile. The prediction process considers particular and collective behaviors (or signatures) of users visiting different countries: •m t (tourist signature): it is computed either as the average of mobility vectors of tourist t visiting destination countries or the average of mobility vectors of other tourists from the same origin and destination of t. Therefore, computing the tourist signaturem t requires the analysis of two different situations: (i) if a tourist has visited other destinations, the calculation assumes previous tourist's visits; (ii) if tourists do not have history, it considers visits of other tourists with the same origin and destination.
• c p (profile signature): it is the weighted average of mobility vectors of users in the same profile p of tourist t (weights are membership degrees). The profile signature c p represents the collective behavior of users in the same profile. The prediction of the mobility vectorm (dest) t of a target tourist t in a new destination dest is accomplished by aggregating the tourist and profile signatures according to (1).
In this case, the tourist signature is more relevant when t has non-zero L p previous visits to other countries. Otherwise, both signatures have the same weight. The algorithm for building the prediction is shown in Appendix C.

IV. DATASET AND PredicTour SETUP
This section describes the dataset and setup considered in the experiments conducted to address the questions raised in Section I.

A. MODELING MOBILITY WITH FOURSQUARE CHECK-INS
Here we present the characteristics of the addressed dataset and how data is used to provide the mobility descriptor of any tourist in the dataset.

1) DATASET
In the experiments, we explore the same Foursquare dataset of [12] and [41]. Foursquare is a location-based social network where users can use a mobile app to perform a checkin, which is the act of disclosing their current location to friends. Each venue in Foursquare has a category with subcategories according to a hierarchical taxonomy provided by Foursquare. For instance, a given venue may have Food as category with Burger Place being its subcategory in the hierarchy. A complete list of venue categories and subcategories in Foursquare is available on the corresponding website. 2 As check-ins from Foursquare are not public by default, useful data are compiled from Twitter. The dataset addressed in this paper represents around 15 million tweets containing publicly available URLs of check-ins from Foursquare. Each check-in in the raw dataset is composed of seven fields: user ID (iduser); date and time of the check-in (date); latitude (latitude); longitude (longitude); venue ID (idvenue); venue category (categorievenue); and venue subcategory (subcategorievenue).
There is no specific information on the city and country of each post in the raw dataset. This information can be retrieved using standard reverse geolocation procedures. In the present paper, we focus on popular countries of different continents. Figure 3 shows the number of check-ins worldwide on a base-10 log scale. 2 https://developer.foursquare.com/docs/resources/categories.  Figure 4 shows the distribution of check-ins for the main categories of the eight most popular countries. We can observe similarities and differences between the behaviors of users in different countries. For example, Residence category is vastly shared by Brazilians, but the Japanese almost avoid it, preferring Travel & Transport category to perform a check-in. We can also observe that Food is a very popular category to share check-ins for all addressed countries.

2) MOBILITY DESCRIPTORS
We explore transitions between Foursquare's venue categories of the set If a user makes consecutive check-ins at venues v i and v j , respectively, within no more than two periods of the day, a transition links v i to v j (see Appendix A for details). We use the same division proposed by Veiga et al. [42] to define a period of the day: (i) morning between 6:00 am and 9:59 am; (ii) noon between 10:00 am and 2:59 pm; (iii) afternoon between 3:00 pm and 6:59 pm; (iv) night between 7:00 pm and 11:59 pm; and (v) dawn between 00:00 am and 05:59 am. In the performed experiments, we consider the set of locations L = {BR, GB, ID, JP, MX, MY, TR, US} encompassing L = 8 countries with the highest numbers of check-ins.
In the experiments, the dataset is filtered by considering only users with a minimum of two check-ins and one visiting country other than their home country (see Appendix A for this classification). A minimum of two check-ins guarantees at least one transition between a pair of categories, and one visiting country guides the analysis to the international flow of tourists, which is the main interest of this work. The filtered dataset is then composed of 974 mobility descriptors. Each one has dimension dim(d) = dim(m) + dim(v class ) + dim(v home )+dim(v dest ) = 10 2 +2+8+8 = 118 components. A mobility descriptor of 118 components is computed for each LBSN user visiting a specific country. The number of check-ins among venue categories is normalized by user, avoiding scale problems when comparing users with different numbers of check-ins.

B. DISCOVERING TOURIST PROFILES
Aiming to discover tourists' profiles, we adopt a selforganizing map to obtain intrinsic relationships among the whole set {d} of |{d}| = 974 mobility descriptors, and to spatially organize such relationships, as mentioned in Section III. The SOM's architecture is defined by 118 input neurons, each one associated with an component of d presented at the input layer. The SOM's input encompasses features like: normalized mobility transitions t ij between each pair of venue categories (v i , v j ), i, j = 1, . . . , 10, returner/explorer classification (two binary inputs with onehot encoding) and the origin and destination countries (eight binary inputs with one hot encoding).
= 144 units (neurons), organized in a hexagonal topology, forms the SOM's output (see Figure 2 for more details on SOM's architecture and hexagonal topology). This setup provides a suitable distribution of samples among the output neurons, more than those proposed by other works [43].
Then, FCM groups the output neurons into C = 3 clusters (or profiles) defined experimentally. The membership degree µ up establishes how much the mobility descriptor of user u belongs to profile p. It can be characterized by the average membership degreeμ p = 1 N p u µ up of N p mobility descriptors of cluster p.

C. PREDICTING MOBILITY PATTERNS FOR DIFFERENT DIFFICULT LEVELS
To assess the PredicTour performance when predicting the mobility pattern of users with and without previous information, we separate the set of users into two groups: (i) set u h of users with history i.e., users that have visited more than one destination country (2 to 5 in our case); (ii) set u ∅ of users with empty history, i.e., users that have visited only one country (which is the target destination when they form the test set). In the addressed dataset, |u h | = 36 users generate a set {d} h of |{d} h | = 83 mobility descriptors, and u ∅ generates a set {d} ∅ of |{d} ∅ | mobility descriptors. The dataset used for training and testing sets has size |{d} h | + |{d} ∅ |.
According to Table 1, we can evaluate PredicTour considering several difficult levels from 1 to 20, each one with a different dataset size. For instance, the difficulty level 2 is set with |{d} ∅ | = 10% · |{d} h | = 8 mobility descriptors. Therefore, the dataset size is 83 + 8 = 91.
Depending on the difficulty level and for |{d} ∅ | = 0, there are R repetitions (alternatives) in Table 1 for choosing elements of {d} ∅ among the whole set {d} according to (2).
For each repetition rep out from R, we perform the well-known 10-fold cross-validation method to evaluate the PredicTour (or any comparison approach) performance for different training and test sets. The dataset {d} h {d} ∅ is partitioned into one fold for test {d} ts and nine folds using the remaining mobility descriptors for training {d} tr . This strategy results in a pair ({d} tr , {d} ts ) it,rep which is repeated for iterations (folds) it = 1, . . . , 10, and repetitions rep = 1, . . . , R.

V. COMPARISON APPROACHES
The comparison of approaches is an additional challenge as we did not find a proposal like ours in the literature. We compare PredicTour with five different standard baseline approaches. The baselines range from simple statistics taken from destination information to more complex machine learning-based models (Deep AutoEncoder, Multi-Layer Perceptron, and Collaborative Filtering). Trivial statistics of users with same destination are considered for Baselines 1 (random choice) and 2 (average behavior). The recommender system (Baseline 3) is a natural choice for predicting preferences of users visiting new countries. Additionally, we consider two machine learning techniques (Baselines 4 and 5) successfully applied to regression/prediction problems. 9262 VOLUME 10, 2022  ) i.e., using the same method adopted in the test phase to calculate the inputs. However, as the results became worse, we decided to disregard this method and maintain m (dest) u as the input in the training phase. Baseline 5 (DeepAC): In this last approach, a deep autoencoder addresses the problem considering the original representation of mobility information: the transition matrix (see Appendix A for details). This baseline provides an estimate computed with a more complex and recent technique of machine learning. As depicted in Figure 5, instead of using mobility vectors like the other methods, it considers transition matrices: as input, it receives tourist signature in a matrix form, and as output, it produces the matrix representing the predicted transition, which can be further submitted to a linearization process to provide the predicted mobility vector m (dest) t . Notice in Figure 5 that the model's architecture is composed of convolutional, max pooling, upsampling, and deconvolution layers.

A. METRICS FOR PERFORMANCE EVALUATION
The mobility vectorm (dest) t estimated by each approach for every user in the test set can be compared with the actual m (dest) t from d (dest) t ∈ {d} ts to compute the performance P in terms of the Root Mean Square Error (RMSE) as in (3).
Assuming that the test set has cardinality S = |{d} ts |, the performance is measured for each approach appr at iteration it and repetition rep: A total of 10 iterations (folds) and R repetitions are made. The overall performance is then calculated considering the Aiming to avoid biased results, particularly for Baseline 1, the prediction error m (3) is calculated according to (5).
We also perform an additional evaluation of the prediction results using normalized discounted cumulative gain (nDCG). It is a technique proposed by Wang et al. [44] to measure ranking quality. The goal here is to evaluate the results incorporating the idea of ranking relevance for each mobility vector of every target tourist t in the test set. Assuming that we have the actual mobility vector m and the predicted mobility vectorm, we sort top ∈ {1, . . . , |m|} features of m in a decreasing way. Then, we attribute levels of relevance rel i (m) to each feature i = 1, . . . , top, reinforcing that high relevance features represent more active transitions. For example, assuming top = 10, we would have for a linear scale: rel 1 (m) = 10, and rel 10 (m) = 1. Next, we find the respective rel i (m) for the same top features found in m.
Based on this ranking applied to a tourist t at destination d, we calculate nDCG top (t) by Equations (6) to (8).
The overall performance of an approach is then averaged over S tourists of the test set:

VI. RESULTS
Here, results are presented and discussed according to the three research questions stated in the beginning.

A. WHAT KIND OF INTRINSIC RELATIONSHIPS CAN BE OBSERVED WHEN WE GROUP USERS?
As discussed in Sections III-B and IV-B, an important result of SOM is the neighborhood map generated from the set of weights of neurons that compose the output layer. Figure 6 presents the neighborhood map generated by the proposed model applied to the whole set {d}. Red color represents close neurons, i.e., neurons whose neighbor neurons' weights are similar, whereas light yellow colored neurons represent high distances from their neighbors. After the fuzzy clustering performed by FCM, we obtain a fuzzy partition of the grid map. Crisp partitions can also be obtained when considering the maximum membership degree for each element in all clusters. The black line of Figure 6 illustrates a crisp partition of neurons into three clusters. From the mapping f SOM +FCM : d Notice that all profiles present similar average membership degrees, but profiles 2 and 3 encompass most users. High values ofμ p indicates that data are properly clustered. Other algorithms such as K-means and hierarchical clustering were tested without good clustering indicators. The same was observed for a number of clusters other than three.

B. WHAT KIND OF PATTERN CAN BE OBSERVED IN EACH PROFILE?
Clustering mobility descriptors of Foursquare users in different profiles (see Appendix B for more details) allows us to characterize each group according to specific patterns. Here we highlight some of the most distinct characteristics to better understand each profile. Considering only users with membership degrees higher than 0.75, we investigate separately the following information: m (user's mobility vector), v home (user's origin information), v dest (user's destination information), and v class (user's returner/explorer information). Moreover, we focus the analysis on transitions with high values because they represent more relevant users' preferences. The number of transitions is normalized in [0,1] in each profile to provide a fair comparison between profiles. Figure 7 shows important transition preferences of each profile regarding the 100 elements of m, i.e., only transitions with values higher than the profile's average are depicted.  Different line colors represent each profile: yellow for profile 1, blue for profile 2, and pink for profile 3. It is possible to note that, in terms of mobility, profiles have key characteristics or signatures. For instance, whereas transition m56 is high for users of Profile 2, it is low for users of Profile 1. On the other hand, transition m89 is preferred in Profile 3, and it is not as popular in Profiles 1 and 2. Profile 1 shares some characteristics with Profile 2 and others with Profile 3. Therefore, the second block of PredicTour could extract some particular mobility patterns from each profile. Table 2 presents the top three category transitions (those with the highest number of transitions between the subcategories) for each profile, with TT representing Trav/Trans category. The last line of this table describes the preferences of transitions among the whole set of users (global), i.e., without considering users' profiles. By observing the top three characteristics, we notice that most transitions, particularly all the Top 1, are between the same category -represented by category 2 . Moreover, according to Top 1, each profile could be taken as representative of one preferential transition. We can also see that most preferences in each profile are different compared to global choices, especially in Profile 3, where distinct patterns are observed for Top 1 and Top 2.  Considering v home and v dest to characterize users of each profile, Figure 8 highlights the user's origin and destination countries, respectively. According to Figure 8(a), regarding the origin country of users, we can see that most users of profile 1 (yellow) are from the United States. Most Brazilian users are in profile 2 (blue). Profile 3 (pink) is mostly composed of people from Mexico, Turkey, and Japan. We can see from Figure 8(b) that the United States is the favorite destination country among all tourists except for users classified as profile 1 (yellow) because it is composed by many American users and we only consider international tourism, i.e., data from users visiting countries different from their origin.
By considering the returner and explorer characteristics (v class ), the results depicted in Figure 9 show that the proposed approach distinguishes profile 1 from the other two.
We can see that profiles 2 and 3 have returner users mostly, with more than 98% and 99%, respectively. Explorer users are mainly included in profile 1 (100%). Figure 9 also shows that considering all data (without profile classification) there are 70% of returners users and 30% of explorers. Therefore, we reinforce that PredicTour can appropriately split users into particular mobility patterns.
The results presented in this section show that discovering tourists' profiles (task performed by block 2 of PredicTour) highlights particular mobility patterns of users by transition VOLUME 10, 2022 preferences, country, and explorer or returner behaviors grouped into profiles 1, 2, and 3.

C. HOW DOES PredicTour PERFORM WHEN COMPARED WITH BASELINES ON DIFFERENT DIFFICULT LEVELS?
To compare the results of block 3 of PredicTour, which uses profile information to predict mobility patterns, we consider five baseline approaches (see Section IV-C). The overall RMSE performance is shown in Figure 10 for different difficulty levels. It compares the performance of PredicTour with all baselines, which consider as input an approximated mobility pattern of a tourist t and as output: Baseline 1) the mobility vector of a randomly chosen tourist with the same destination; Baseline 2) an average of mobility vectors among tourists with the same destination; Baseline 3) the output vector provided by collaborative filtering; Baseline 4) the output vector provided by a multi-layer perceptron; Baseline 5) the linearization of the matrix provided by a deep autoencoder applied to the input transition matrix.
The results of RMSE show that PredicTour produces smaller errors for predicting tourists' mobility when they visit different countries, especially when the number of tourists with historical information is representative. This characteristic can be better analyzed if we consider three different scenarios when observing the red region of Figure 10: • optimistic: the left side of the red region, where the majority of users have historical information and the performance of PredicTour is far better than the others; • pessimistic: the right side of the red region, where the majority of users do not have historical information, Pre-dicTour performance is similar to the other approaches; • realistic: the red region, where the proportion between non-historical and historical information is balanced, PredicTour still outperforms the remaining approaches. Aiming to understand if the prediction errors are occurring for more relevant components of the mobility vectors,  we consider the nDCG metric as in (6). The average nDCG over the whole test set is presented in Figures 11 and 12 for top 5 and top 10 more relevant components, respectively.
It is possible to see that the results are significantly better for PredicTour in almost all cases. Therefore, the general prediction of tourist' behavior is satisfactory if we consider more important components (top 5 and top 10) that should be correctly predicted in practice.
Another important observation present in Figures 10, 11 and 12 regards the evolution of PredicTour in different scenarios. Even in more challenging scenarios, PredicTour can effectively predict users' preferences. We can thus affirm that prediction with profile information accomplished by PredicTour obtains better results than those obtained by approaches that do not apply user profiling (Baselines 1 to 5). We notice that the performance of deep-AC increases as the training process accesses more information, even when the percentage of users without history increases. This is a characteristic that can be further evaluated in another work, being out of the scope of this present study.

VII. CONCLUSION
This study aimed to extract and explore patterns available on LBSN data to improve the understanding of users' behavior and use this information to predict international tourists' mobility patterns in different countries. In the present paper, the proposed approach, PredicTour, explored LBSN data to construct a mobility descriptor necessary to express nontrivial information regarding international tourists' mobility. PredicTour is then capable of predicting the mobility of tourists at unvisited countries based on the tourists' profiles extracted from Self Organizing Maps (SOMs) and Fuzzy C-means clustering.
We showed evidence that PredicTour can extract important characteristics of each user profile, which can be further explored in a tourism context. In this paper, we evaluated the performance of PredicTour in different scenarios and against relevant baselines. The results showed that our approach could achieve satisfactory performance, providing smaller RMSE than the addressed baselines, especially for non-pessimistic scenarios. Furthermore, if we focus on the performance for the top 5 and top 10 features, the most important ones, PredicTour outperformed the baselines in virtually all test cases, except on the extreme ones expected to occur less in practice.
Our approach can be helpful in different kinds of applications for tourists. For instance, in the construction of specialized place recommendation systems, the suggestion of attractive services and products, and improvement of transport and attractions strategies. The map of intrinsic relationships provided by SOM could also be useful to show (in a visual tool for tourism planning, for example) tourists with similar behaviors, which could be grouped into the same activities. Although this paper has focused on international tourism, we believe that PredicTour could also provide results for intern tourism with slight adaptations in home and destination locations.
This study can be expanded in numerous ways. In future works, we intend to address larger datasets to evaluate the impact on the prediction performance of all the considered approaches. We can also pay more attention to outliers, i.e., tourists whose behavior is far from the behavior represented by the profile centroid. Another possible expansion is the idea of grouping users by geographic proximity, considering spatial distances of places. In the same way, the availability of places that are part of tourists' routine should also be studied in the future. These factors can influence their choices. For example, a tourist from Indonesia may have difficulty finding Indonesian cultural preferences in a western country due to religious or gastronomic differences. However, it is the opposite when visiting a country with a similar culture to the tourist's home.

APPENDIX A MOBILITY MODELING
Algorithm 1 describes the main steps performed in the first block of PredicTour for building mobility descriptors. Compute {x} u,dest = {x | u id = u, l = dest}; 3: Calculate the mobility vector m according to (9); 4: Calculate the binary vector v class according to (10); 5: Define the binary vector v home with the one hot encoding of the origin country 6: Define the binary vector v dest with the one hot encoding of the visited country 7: Concatenate m, v class , v home ,v dest to get d The algorithm receives a time window and check-ins data as inputs. The time window defines the period elapsed between two consecutive check-ins made by the same user for characterizing a transition between two venue categories. Check-ins data of a user contain information about the country and venue category where a check-in was made. A user is classified as a resident or tourist in a country depending on how much time is spent in each country. This procedure is key for setting v home and v dest .
The time of stay is computed from check-in sequences performed in each country. For example, if a user did a check-in on May 5th and another on June 10th of the same year in Brazil, we assume the user stayed 35 days in Brazil (BR). Therefore, BR is considered the user's home country if the user stays more than 30 days (for instance) posting from BR and stays no longer in any other country. All remaining countries with check-ins other than BR are considered destinations d of user u. Note that users are only considered tourists if we can spot a home country according to our approach. Previous studies in the literature have successfully applied similar strategies [13], [42], [45], and [46].
The mobility vector m is computed from the adjacency matrix T V ×V of a transition graph G whose elements are w ij . G is a directed graph G(V, E) with a set V of V venue categories, and a set E ⊆ V × V of edges or transitions. An edge e ij = (v i , v j ) ∈ E with weight w ij represents the number of transitions made within a given time interval from venue v i to v j . The row vectors t i of the transition matrix T are concatenated to generate m of dimension V 2 according to (9).
The classification v class of users into returners or explorers is defined by how far a tourist goes around most visited places [42], [47]. It is based on the gyration radius information according to (10). v class = Returner if r g (M ) > 0.5 r g (n a ) Explorer otherwise (10) The gyration radius r g (M ) is computed by (11) over M most visited venues of a user making n i visits to venue i with coordinate l i , i = 1, . . . , M .
The gyration radius r g (n a ) is taken over all n a venues of same user. The term dist(l i − l M ) can be calculated by the Haversine distance metric, which measures the distance between venue i and the center l M given by (12).
Finally, the vector d (dest) u = (m|v class |v home |v dest ) describing a user u visiting a particular country dest is a concatenation of the previous vectors.

APPENDIX B PROFILE EXTRACTION
The main steps performed in the second block of PredicTour are shown in Algorithm 2.
It is divided into two different parts: the task performed by SOM and the task performed by FCM. Algorithm 2 starts with random initialization of all weights w k , k = 1, . . . K . Then it enters the main loop by setting the training data and subsequently starts the inner loop. It randomly picks an input (d (dest) u ) from the current training dataset. Considering the strategy ''winner takes all,'' it identifies the output neuron with the most similar weights to the picked input. As an attempt to provide region maps in the output grid, it updates not only the winner's weights but also those of its neighborhood N . At each iteration, the process of updating weights is repeated for all inputs of the training dataset, which are chosen in random order without repetition. Then, the algorithm updates the neighborhood size (usually using a non-increasing update function) and starts the next iteration. This process repeats until it achieves the maximum number of 100 iterations. After finishing the SOM task, FCM receives the resulting set of weights as input for the second task of Algorithm 2. It starts by randomly initializing membership degrees of all weights to C clusters, with C set as a parameter of FCM. The algorithm enters the main loop by computing the centroid c j of cluster j according to to (13).
The parameter z = 2 controls how fuzzy the cluster will be. The process is repeated for all clusters. Subsequently, the while S = ∅ do 5: Randomly pick (without reposition) one input pattern d (dest) u ∈ S; 6: Track the output neuron that produces the smallest distance dist(d (dest) u , w k ) (the winner node); 7: Update weight vectors of the winner's neighborhood N to approximate them to the input; 8: end while 9: Update size(N ); 10: end while // Task 2: FCM clusters the output map of SOM Inputs: weights of SOM output neurons; 11: Randomly initialize the membership degrees; 12: while (FCM stop condition is FALSE) do 13: for (j = 1 : C number of clusters) do 14: Compute the centroid of cluster j as in (13); 15: end for 16: for (k = 1 : K number of neurons) do 17: for (j = 1 : C) do 18: Update the membership degree of neuron k to cluster j according to (14); 19: end for 20: end for 21: end while membership degree µ kj of neuron k in cluster j is updated according to (14).
The update of membership degree is performed for all K neurons of the output map. The main loop repeats until the stop condition is achieved (the maximum number of iterations is 100 or when the difference between updated values of J obtained by (15) in two consecutive iterations is arbitrarily small.
In summary, we assume a dataset {d} tr of N mobility descriptors d  1, . . . , N ). Then, f SOM +FCM provides the mapping d (dest) u → w k → c j → µ kj which characterizes, by setting the membership degree µ kj , how fairly d (dest) u belongs to a profile with centroid c j . The complete description of the technical parameters is available in the GitHub repository. 3

APPENDIX C TOURISTS' MOBILITY PREDICTION
The main steps accomplished in the last block of PredicTour are described in Algorithm 3.

Algorithm 3 PredicTour Block3 Mobility Prediction
// Profile Identification 1: Obtain vectors v home , v dest ; 2: Calculate the averagem t of the queried/target tourist t using (16); 3: Calculate v class of the queried/target tourist t; 4: Concatenate all vectors iñ d t = (m t |v class |v home |v dest ); 5: Inputd t to SOM and obtain the winner output neuron; 6: Identify the profile p of the winner neuron; // Mobility Prediction 7: Calculate the profile signature c p using (17); 8: Calculatem (dest) t according to (1); It starts by defining the origin and destination of the target tourist t. Then, it computes the remaining information of t, considering that: • when t has previous information, the algorithm calculatesm t using (16) as the average of mobility patterns of t in other previously visited locations. It also calculates the classification vector v class (returner or explorer) as in Algorithm 1; • when t has no previous information, the algorithm computesm t using (16) based on the average mobility vectors of other tourists with same origin and destination of t, and the classification vector v class is the average of classifications of those tourists. Formally, the tourist signaturem t is calculated by (16).
t /L p if L p = 0 (t has historical data) U t u=1 m u /U t otherwise (16) where m (l) t is the mobility vector of tourist t visiting country l among L p countries previously visited by t, and m u is the mobility vector of every user u in U t , the set of tourists with same origin and destination of t.
Further, the algorithm aggregates all information in the approximated vectord t = (m t |v class |v home |v dest ) of the target tourist t. Then it presentsd t at the input layer of a trained SOM which maps it into the output grid region. The output neuron with the most similar weights to the approximationd t is activated as the winner neuron. The profile p of the winner neuron is chosen as the profile of t, and the profile signature c p is calculated by (17), (17) with N p mobility vectors d i clustered in the same profile p, m i mobility vectors extracted from d i , and µ ip defined as the membership degree of the neuron fired when d i is presented at SOM input. Joining c p calculated by (17), withm t calculated by (16) we obtainm (dest) t according to (1).