A Tourism Route-Planning Approach Based on Comprehensive Attractiveness

In recent years, “free travel” has been increasingly popular. How to plan personalized travel routes based on the perspective of tourists, rather than that of tourism intermediaries, is in great need. However, some factors reflecting tourists’ preferences are ignored in the related work. What’s more, the evaluation about scenic spots is incomplete. Besides, real data sets are seldom used in existing works. We propose a novel route-planning method that considerate multiple factors (that is, the distance between sites, initial travel position, initial departure time, time duration of tour, total cost, scores and popularities of sites) comprehensively, and routes were rated by what we call a comprehensive attractiveness index. We conducted comprehensive case studies based on the real-world data of sites from the Baidu and Xiecheng websites and found that our proposed method is feasible. It is also found that the genetic algorithm outperformed two baseline ones in terms of run time.


I. INTRODUCTION
With the continuous maturity of cloud computing technology and intelligent terminal technology, the personalized demands of users will be greatly satisfied. And people can get a real-time travel route through mobile terminals, such as cell phone or Pad.
In recent years, ''free travel'' tourism mode has been increasingly popular. How to plan personalized travel routes based on the perspective of tourists (rather than the perspective of tourism intermediaries) remains to be studied. Assume a scenario in which several undergraduates intend to visit Beijing to complete their graduation trip, but, due to the limitations of travel time and travel costs, how can they visit as many scenic spots as possible in a limited travel time?
At present, there are several problems in the related research, which makes them difficult to satisfy the personalized tourism. Firstly, the related researches mainly consider the types of scenic spots, travel costs and the distance between scenic spots. However, some other factors reflecting tourists' preferences are ignored, such as the starting time, the starting place and travel time. And these factors The associate editor coordinating the review of this manuscript and approving it for publication was Ying Li. might play a decisive role in the planning of travel routes. Therefore, individualized travel routes can be planned for different tourists after considering these factors. Secondly, the related works usually indicate the attraction of a single scenic spot by scoring. However, other people's evaluation about scenic spots is not taken into account, such as Baidu Index (http://index.baidu.com/), the rating of scenic spots on Meituan website (https://www.meituan.com/), the number of photographs in the sight reviews on the Xiecheng website (https://www.ctrip.com/), etc. These factors are vital in the evaluation of scenic spots' attraction. Thirdly, few related works use real data sets for travel route planning.
In this paper, consideration should be given not only to travel costs, the distance between scenic spots and other popular factors, but to all the factors that tourists pay attention to. Only when all the above factors (the opening time of the sites, their travel budget, the length of the tour, the initial travel location and so forth) are considered can the planned travel routes satisfy tourists' demands. In addition, the popularity of scenic spots in the paper is measured mainly by the Baidu index (http://index.baidu.com/), the number of photographs in the sight reviews on the Xiecheng website (https://www.ctrip.com/). Therefore, this study used a comprehensive attractiveness index to incorporate the attraction of scenic spots and tourism costs as a target function. In terms of constraints on the model, we considered many factors including the opening times of the sites, initial departure time, initial travel position, and travel expenses. We tried to make the planned routes realistic and satisfy the tourists' demands as much as possible. Finally, the experiments were based on the real data set we have built. In this study, a genetic algorithm (abbreviated as GA) was adopted to achieve the above goals and then plan travel routes more aligned with the real situation and tourist demands.
The main contributions of this study can be summarized as follows. Firstly, the innovation lies in the consideration of multidimensional user preferences. We combined the opening time, the popularity of the sites, the length of the travel duration, initial departure time, and initial travel position to construct a comprehensive attractiveness index that incorporates the attraction of sites as the target functions, so the planned travel routes were closer to reality. This made routes more suited to tourists demands.
Secondly, the popularity of the scenic spots was measured mainly by the Baidu index and the number of photographs in the sights reviewed on the Xiecheng website. Those criteria, based mainly on user comments and Internet search records, can better reflect the popularity of sites. Planned routes which use those criteria can better meet the demands of tourists in the actual situation and have strong practicability.
Thirdly, this study used real data on the classic sites in Beijing. The data were obtained from the Baidu and Xiecheng websites. There are 24936 photographs in the Xiecheng website reviews. Real data improves the effectiveness of the model and algorithm proposed in this paper.
The rest of the paper is organized as follows. Section II describes related works about travel route planning from three aspects. Section III introduces the research approach using a concept graph and some related definitions. A mathematical model is constructed in Section IV. Section V introduces a GA. Section VI analyzes the experiments and their results. The conclusion is presented in Section VII.

II. RELATED WORK
In 1959, the famous mathematician Dantzig proposed a solution to the traveling salesman problem (TSP) that found the shortest route to be used by salesmen to sell products in different cities. The solution has been widely used. The TSP is described as follows. Through coordinate calculation, the coordinates of a set of n cities and the distances between each pair of cities are known. A merchant is to start from and return to the same city using the shortest route and visiting each city only once [1].
In recent years, travel route planning has attracted a great deal of research attention from such fields as operations research, computer science and applications, graph theory, and mathematics [2]. The related works fall into three categories: related works of considering factors, research on modeling methods, and research on route-planning algorithm.

A. RELATED WORK OF CONSIDERING FACTORS
The results of investigation and analysis are combined to determine the main influencing factors of tourists' choice about scenic spots, including scenic spot types and hard work index. This work establishes a relatively complete optimal route-planning model based on tourists' expectations. The grey entropy evaluation method is introduced into the model. The influencing factors are regarded as multiple attributes of uncertain decision-making and the evaluation indexes of scenic spots are analyzed in the work. Also, Dijkstra algorithm is applied to obtain optimal tourist route in [3].
Lu Guofeng and other scholars have designed a comprehensive scoring mechanism, which can be considered as attraction. The scoring mechanism introduces three factors: the rating of scenic spots, the rating of time arriving at scenic spots and the rating of scenic spots' opening time. These factors are defined separately and integrated with a general formula in this paper. The improved greedy algorithm is used to plan tour routes which are more realistic [4]. An ensemble model which combines the model-based CF and neighborhood-based CF is proposed to solve several defects that limit the application of the CF-based methods [5]. In order to fully utilize hidden features, this paper proposes a new matrix factorization (MF) model with deep features learning, which integrates a convolutional neural network (CNN) [6]. As the authors say, this model achieved consistently higher accuracy, both in low data densities and high data densities. A novel quality of service prediction approach based on probabilistic matrix factorization (PMF) is proposed, which has the capability of incorporating network location (an important factor in mobile computing) and implicit associations among users and services [7]. Yuyu Yin and his team propose a novel service recommendation method, which utilizes network location as context information and contains three prediction models using random walking [8]. Chen et al. [9] propose a data-intensive service edge deployment scheme based on genetic algorithm (DSEGA) and the experimental results show that this algorithm can get the shortest response time among the service, data components and edge servers. Zhang et al. [10] utilize a strategy based on the density of internet of things (IoT) devices and k-means algorithm to partition network of edge servers and proposed an algorithm for IoT devices' computation offloading decisions. Xiang et al. [11] focus on improving performance of the service provisioning system by deploying and replacing services on edge servers. A cost-driven services composition approach is proposed for enterprise workflows that employs formal verification to recommend appropriate services for abstract workflows [12]. To cope with the challenge of how to manage services, Gao et al. [13] present an extension of data, information, knowledge and wisdom architecture as a resource expression model to construct a systematic approach to modeling both entity and relationship elements. VOLUME 8, 2020 R Fdhila and W Elloumi use their algorithm (pMOPSO) to solve the TSP problem by two contradictory objectives as minimize the total distance traveled by a particle and minimize the total time [14]. SA Bouziaren and B Aghezzaf suppose a new novel approach called TSPP that is an extension of the famous traveling salesman problem (TSP) in which a prize is associated with each vertex. The Branch and Cut algorithm is used to find a route that simultaneously minimizes the tour length and maximizes the collected prize [15]. The related work solves the single-depot multiple TSP (SD-MTSP) that is a simple extension of the standard TSP, in which more than one salesman is allowed to visit the set of interconnected cities, such that each city is visited exactly once (by a single salesman) and the total cost of the traveled tours is minimized using the multi-objective Ant Colony Systems [16]. The quality of points of interest and the traffic distance between points of interest are the factors of the travel route optimization in. In the proposed TripPlanner model, the choice of scenic spots needs to consider not only the traffic time, but also the quality of scenic spots and the preferences of tourists. Travel routes planned through TripPlanner need more traffic time, but the routes are more in line with the special demands of tourists [17]. The quality of scenic spots is considered in planning travel routes. In route planning, the scholars believe that the average quality of scenic spots included in the route needs to meet certain standards. Through the ETOTP method, tourists can get a travel route with the shortest time and the average quality of interest points meeting the requirements [18].
In summary, the most popular factors used in these related works include rating of scenic spots for the users, the staring time of each scenic spots and the traffic distance between scenic spots. Besides these popular factors, our work considered some other important factors that are closely related to the real situation, including the number of pictures of each site on the Xiecheng website's reviews, the Baidu index, and the ticket prices of the sites.

B. RESEARCH ON MODELING METHODS
At present, the establishment of mathematical optimization model is the main method to get the best travel route in solving the problem of travel route planning. In related works, there are two mathematical optimization model singleobjective optimization model and multi-objective optimization model.

1) SINGLE-OBJECTIVE OPTIMIZATION MODEL
The rating of scenic spots, the rating of time arriving at scenic spots and the rating of scenic spots' opening time are introduced in the scoring mechanism constructed by Lu Guofeng and so on. These factors are combined into an objective function to obtain more realistic travel routes in the model [4]. R Necula solves the multiple TSP (SD-MTSP) problem of single warehouse, which is a simple extension of basic TSP. The objective function of the model is to minimize the total cost of travel expenses [16].

2) MULTI-OBJECTIVE OPTIMIZATION MODEL
R. Fdhila and W. Elloumi use their algorithm (pMOPSO) to solve the TSP problem through two contradictory objectives. The objectives are to minimize the total distance traveled by a particle and to minimize the total time used to minimize it [14]. SA Bouziaren and B Aghezzaf propose a new method called TSPP, which is an extension of the famous Travel Salesman Problem (TSP). They use branching and cutting algorithms to find travel routes which meet the conditions: simultaneously minimizing travel length and maximizing prize collection [10]. Wang Yongzhen uses the improved grouping genetic algorithm to solve the multi-traveling salesman problem [19]. Based on the multi-objective evolutionary algorithm NSGA-II, a dual-objective evolutionary algorithm is designed to solve the problems of minimizing distance and minimizing cost [20]. Chen Biao proposes an evolutionary multi-objective optimization method for solving traveling salesman problem. A bi-objective optimization model with path length and average outlier distance as objectives is established. Also, an improved non-dominated sorting genetic algorithm NSGA-II is used to solve the model [21]. Liang Xingxing constructs a multi-objective and multi-traveling salesman problem model to solve Pareto problem. Its optimization objective is to minimize the number of traveling salesmen and to minimize the access path of multi-traveling salesmen. The improved multi-objective simulated annealing (IMOSA) algorithm and traditional multi-objective genetic algorithm are used to solve the problem [22]. An improved particle swarm optimization is introduced into the quality service evaluation of dynamic service composition to meet the mobility requirements of hybrid networks [23]. This method is used as a monitoring mechanism and can guarantee the availability and reliability of the service composition. Gao et al. [24] propose a service selection method for workflow reconfiguration based on interface operation matching.
In summary, the single-objective optimization model and the multi-objective optimization model are used to obtain better travel routes. It can be seen that there are no differences between the two mathematical optimization models. In choosing the optimization model, more consideration is given to the relevant factors affecting travel route planning.

C. RESEARCH ON ROUTE-PLANNING ALGORITHM
Many studies have focused on optimizing the route-planning algorithm to improve planning results and reduce the duration of route planning. Qi and Thomas Weise provided two easy and general methods to represent the time-solution quality relationships of anytime algorithms: function fitting and artificial neural network training. Through a detailed case study on the TSP, they showed that such models have a wide variety of viable and easy-to-implement applications. Those models are particularly suitable for basic representation of experimental data for benchmarking, performance comparison, and algorithm behavior analysis [25].
Weichen Liu and Thomas Weise introduced the concept of hybridizing two local search (LS) methods by combining them with a crossover operator. They explored the usefulness of this method for solving the TSP. They came to three conclusions through experiments. The new LS-LS-X hybrids were faster than their pure LS algorithm and LS-LS hybrid components. Their new evolutionary computation (EC)-LS-LS-X hybrids were faster than the LS-LS-X algorithms and the EC-LS and EC-LS-LS hybrids. Different LS-LS hybrids had different suitable crossover operators [26]. Yuezhong Wu and Thomas Weise thoroughly compared Lin-Kernighan (LK), mixed neighborhood selection (MNS), and their hybrid versions with evolutionary algorithms (EAs) and populationbased ant colony optimization (PACO). That was the first statistically sound comparison of the two efficient heuristics and their hybrids with EAs and PACO over time based on a large-scale experimental study. They not only showed that hybrid PACO-MNS and PACO-LK were both very efficient, but also found that a full runtime behavior comparison provided deeper and clearer insights than focusing on final results, which could lead to a deceptive conclusion [27]. A new hybrid method was proposed to optimize parameters that affect the performance of the ant colony optimization algorithm using particle swarm optimization. In addition, a 3-opt heuristic method was added to the proposed method to improve local solutions. Experimental results showed that in most cases, the performance of the proposed method, which used fewer ants than the number of cities for the TSP, had better solution quality and robustness than compared methods [28]. Another paper [29] introduced a new hybrid algorithmic nature-inspired approach based on honeybee mating optimization for successfully solving the Euclidean TSP. It showed that the honeybee mating optimization algorithm could be used in hybrid synthesis with other metaheuristics for the solution of the TSP with results remarkable for both quality and computational efficiency [30]. Also, Yin et al. [31] propose a group-wise itinerary planning framework to improve the mobile users' experiences and save travel cost.
The existing literature shows that many scholars have made great achievements in travel route planning, but there are still some problems. The first problem is that most scholars considered the improvement of the route-planning algorithm rather than the factors affecting the route. In real travel route planning, more factors must be considered to improve the resulting travel route. Specifically, this study considered the popularity of the sites, the length of the tour, the opening times of the sites, the initial departure time, and the initial travel place. Considering these factors can make the mathematical problems comply more closely with the real situation. The second problem is that some scholars used virtual data sets, which could not produce an algorithm as effective and applicable for real-life route planning as could real data on the sites. The third problem is that most scholars studied only the shortest-route-planning model, so the planned travel routes were not dynamically adjusted to the demands of tourists.

III. OVERVIEW
The concept graph in Fig.1 clarifies the concepts of the study, and the aim of this study was to determine travel routes to classic tourist destinations dynamically. As we can see from this graph, when the users have different demands such as initial travel positions, initial departure times and the durations of tours, we use a genetic algorithm to recommend different top routes with high comprehensive attractiveness based on the information we have got from websites. We get evaluation scores of sites from Meituan, baidu index from Baidu, photographs of sites from Xiecheng and distance between sites from Beijing map and use these information to maxmize the comprehensive attractiveness index at the same time meeting the users diverse demands.

A. PROBLEM DEFINITION
Considering the dynamics of such factors as the initial travel positions, the initial departure times, and the durations of tours, we use a genetic algorithm to recommend different travel routes.For example,in Fig.2, starting his travel at place A, jack can choose route1-the green one if he wants to minimize his cost. Also he can choose route2-the orange one if he wants to spend more time on travel duration. Likewise, when the departure time is considered, he may choose another different route to meet his schedule.

B. BASIC CONCEPTS
The following definitions introduce the basic concepts needed, and you can query the meaning of a symbol quickly using the Table1.

Definition 1 (Site Information):
For every site s∈S, the information that site s has can be built as a vector s (t, c, e), where t is the stay time (duration) at the scenic site recommended by the Meituan or Baidu website for a user unfamiliar with the site, c is the ticket price for the site, and e is the Xiecheng website's rating for the site, ranging from 1 to 5.

Definition 2 (Travel Duration Between Every Pair of Sites):
The paper assumes TT (s i ,s j ) = TT (s j ,s i ); that is, the travel durations to and from the sites are the same. In addition, the travel duration between two sites can be approximately expressed by a formula that divides the distances between sites by a reasonable average speed, V average . TT (s i , s j ) can be defined as (1):

Definition 3 (Travel Route):
We defined the travel route as tp = < s f ,s f+1 , . . . , s f+n >, which includes one or more sites. n is the number of sites in a trip; that is, (|tp| =n). ∀s f+2 ∈S, f > 1, AT (s f+2 ) is the time to reach the site s f+2 . AT (s f+2 ) can be defined as (2): AT (s f+2 ) = AT(s f+1 ) + t(s f+1 ) + TT(s f+1 ,s f+2 ). (2) Eq. (2) calculates the duration the tourist requires to reach the site s f+2 , where t (s f+1 ) is the stay time (duration) at the scenic site recommended by the Meituan or Baidu website for a user unfamiliar with the site t (s f+1 ). In addition, it is the sum of the time arriving at the site s f+1 , the duration of the stay at the site s f+2 , and the travel duration from the site s f+1 to s f+2 . It is a cumulative calculation.
Definition 4 (Travel Duration): Take s f as the starting point of the travel route. For a travel route tp = < s f ,s f+1 , . . . , s f+n >, travel duration TPT (s f , tp) indicates how long the tourist spends on the trip. It can be calculated using (3): The total time to visit n sites starting at s f includes the travel duration from s f to the first site, the travel duration between every two sites in the travel route, and the sum of the stay durations at all the sites.
Definition 5 (Travel Cost): For a travel route tp = < s f ,s f+1 , . . . , s f+n >, travel cost TPC (tp), the amount of money the tourist will spend on the trip can be calculated by (4). (4)

Definition 6 (The Proportion of the Number of Photographs of a Given Site to That of all Candidate Sites):
For a site, the number of photographs on a travel website such as the Xiecheng site reflects the spot's popularity. Therefore, the ratio determined by dividing the number of photographs of a certain site by the number of photographs of all the candidate sites can be used to reflect the popularity of the site. We used PP (s i ) defined by (5) to represent popularity:

Definition 7 (The Proportion of the Baidu Index):
The Baidu index is composed of the search index r(s) and the information index z(s), both of which can reflect the popularity of a site. Therefore, for one site s, its Baidu index can be calculated by (6).
The information index comprehensively measures the passive attention of net citizens about intelligent distribution and recommendation content. The search index reflects the search scale of a key word in search engines. And it represents the active concern of netizens.
We make the follow parameter setting: the weight of the search index is 0.8 and the weight of the information index is 0.2. There are two main reasons this setting: Firstly, the search index represents the active attention of netizens, while the information index represents the passive attention of netizens. When planning travel routes, people often obtain information by browsing relevant travel notes or searching relevant information about tourist scenic spots on Baidu website on one's own initiative. But few information is obtained through intelligent distribution or information recommendation. So we give higher weight to the search index. Secondly, from the real data set obtained from Baidu website, the average value of the information index is 9 times that of the search index, as can be seen from table2. In order to weaken the impact of magnitude gap, we give the information index a lower weight.
To carry out normalization, we defined the proportion of the Baidu index of a site as R(s), calculated by (7):

Definition 8 (Composite Index c(s)):
The composite index is an index that takes account of both the photograph ratio and the proportion of the Baidu index. Weight allocation of these two factors is achieved through questionnaires. By investigating 200 tourists with the Richter scale method, we find that the weight allocation of the two factors are both around 0.5. The composite index reflects the popularity of sites more comprehensively. It can be calculated by (8):

Definition 9 (Departure Time and Travel Duration):
The departure time at the initial location s f is the initial departure time TI f , and the travel duration is IN. TI f can be set to 6:00, 7:00, 8:00, 9:00, and so on. IN can set to 8 h, 10 h, 12 h, or 14 h.
Definition 10: The time of arrival at a site s f+k can be determined using (9). It can be represented by a directed graph (Fig. 2).

Definition 11 (Initial Location):
The initial location s f is defined as the bus station, train station, or airport where the tourists start their trip.

Definition 12 (Cost Budget):
The cost budget of a trip is the sum of all the sites' ticket costs. Cost budget can be set to 100, 150, 200, 250, or 300.

IV. OPTIMIZATION PROBLEM MODELING
To plan travel routes that meet the demands of tourists, we considered multidimensional user preferences. In the model, these preferences were represented as target functions and constraints. This section describes how we combined these factors to build a comprehensive attractiveness index of the travel route as the target function and the corresponding constraint conditions to replace the real travel route planning problem with a mathematical model.

A. BASELINE ALGORITHMS 1) GREEDY ALGORITHM
Greedy algorithm is a heuristic algorithm to solve the problem without backtracking. It divides the whole solving process into several stages, and then obtains the global optimal solution by solving the local optimal solution of each stage. Therefore, when using the greedy algorithm to solve the problem, the following two conditions should be satisfied: first, the problem can be decomposed into several stages of small problems; second, the overall optimal solution can be obtained step by step through the local optimal solution, and the optimal solution can be found in each stage. In fact, the optimal solution obtained by greedy algorithm may not be the global optimal solution, but it must be close to the global optimal solution. Greedy algorithm is an algorithm without backtracking. It is often used to calculate problems with multiple stages. In some mathematical problems, greedy algorithm can get the optimal solution, but in some other mathematical problems, greedy algorithm can only produce local optimal solution. This is also the shortcoming of greedy algorithm. The advantages of greedy algorithm are clear thinking, short running time and less code. Compared with the greedy algorithm, the simulated annealing algorithm has a certain probability of jumping out of the local optimal solution and finally tending to the global optimal solution. The advantage of simulated annealing algorithm is that the running time is short and the global optimal solution can be obtained, but its disadvantage is that the parameters are not easy to debug. Although there is a probability of jumping out of the local optimal solution, it still needs to undergo multiple experiments and is still the local optimal solution in most cases.
The implementation mechanism of simulated annealing algorithm that has the opportunity to jump out of the local optimal solution is that it has a certain probability to accept the solution worse than the current solution, so that the global optimal solution can be found by searching the solution around it, but the global optimal solution may not be obtained every time. In the implementation of simulated annealing algorithm in this chapter, the annealing probability is set as 0.64, which is the general setting value for realizing the tourism route planning problem.

B. COMPREHENSIVE ATTRACTIVENESS OF TRAVEL ROUTE 1) SCORING BASED ON RATINGS FOR SITES
We standardized the expectations of all sites so they could be valued between 0 and 1. We defined them as ECI, which can be calculated from ECI(s f+k ) = (e(s f+k )/5 + c(s f+k ))/2.
where e (s f+k ) means the Xiecheng website's rating for the site s f+k , and c (s f+k ) is the ticket price for the site s f+k . VOLUME 8, 2020

2) SCORING BASED ON TRAVEL DURATION TO SITES
We used the GPS coordinates on the Baidu map to calculate the distances between the sites by converting the standard geographic coordinates to the actual distances. Then we calculated the travel duration between every two sites using the average speed (60 km/h) of vehicles inside and outside cities. We assumed that the shorter the travel duration to the site, the greater its appeal and the higher the allocated score. In the same way, we also standardized the grading; the score was defined as DCI, which was between 0 and 1. We calculated DCI using (11) and (12): DCI(s f+k ) = avg(TT(s f+k−1 , s f+k ))/(avg(TT(s f+k−1 , s f+k )) +TT(s f+k−1 ,s f+k )). (11) avg(TT((s f+k−1 ,s f+k )) = n i=1 TT(s f+k−1 ,s f+i )/n.

3) SCORING BASED ON OPENING TIME OF SITE
For each site s f+k , we define a rating β (s f+k , TI f+k ) based on 24 hours. TI f+k was the arrival time at the site s f+k . β was considered from two aspects. The first aspect was what time period was suitable to visit the site. We knew the daily opening and closing times for each site. During the open time, the sites were allowed to be visited. The second aspect was which period during the open time was more attractive to the tourist. We assumed that the tourists would have the most visiting time in the middle of the open time regardless of early or late arrival, so the score should be highest at that time. In addition, the closer the time was to the opening or closing times, the less the interest of the tourists. Similarly, we normalize β to make it between 0 and 1. β (s if+k , TI f+k ) can be calculated by where time f+kmin is the opening time of the site, and time f+kmax is the closing time?

4) ATTRACTION SS (S F +K )
Attraction SS (s f+k ) of the site s f +k can be defined as SS(s f+k ) = β(s f+k , TI f+k ) * (α * ECI (s f+k ) +(1 − α) * DCI (s f+k )). (14) The higher the value of α, the greater was the influence of the rating data of the site and the comments of past tourists. The lower the value of α, the greater was the influence of the distance to the site.
The comprehensive attractiveness index of a travel route, which we termed TSS, was the sum of the sites' attractions. TSS can be defined as We wanted to choose a travel route with the maximum value of TSS (SS (s i )) when all the constraints were satisfied.
The planned total travel duration was the time constraint and the sum of the travel and visiting durations:

3) BUDGET
The budget considered only the site ticket prices:

4) INITIAL LOCATION OF TRAVEL ROUTE S F D. MODEL
The purpose of the model was to find the travel route that satisfied the following constraints and maximized the comprehensive attractiveness index.

V. GENETIC ALGORITHM
This section shows how we used a genetic algorithm (GA) to solve the above model and determine the best travel route. After comparing the GA with the greedy and simulated annealing algorithms, we concluded that for this model, the GA was the most suitable for determining the best travel route.
A GA is a computational model based on the natural selection and genetic mechanism of Darwin's theory of biological evolution. The model searches for an optimal solution by simulating the natural evolution process. It was first proposed by Professor J. Holland of the American University of Michigan in 1975. With the publication of the influential monograph Adaptation in Natural and Artificial Systems, the genetic algorithm gradually became known. The GA proposed by Professor J. Holland is a simple version.
The basic process of solving a problem using a GA is as follows. First, the program is used to code the parameters of the problem and form a certain number of ''chromosomes.'' These chromosomes are the initial populations that must be solved by the algorithm. Next, the iterative method is used to generate better chromosomes through selection, crossover, and mutation among the initial populations. The best chromosome populations that meet the optimal target function eventually remain. In this study, we used a basic GA to plan travel routes. Algorithm 1 is the GA code adopted in this paper.

Algorithm 1 Genetic Algorithm
Input: travel duration, IN; travel expenses, cost; place of departure, s f ; departure time, TI f ; number of sites in the travel route, n. Output: travel route, tp; comprehensive attractiveness, TSS (SS (s i )); run time. 1: initialize P (0); 2: t = 0; # t is the actual evolutionary algebra; T is the expected evolutionary algebra 3: while t <= T do 4: for i = 1 to M do # M is the number of individuals of the initial population, M = 500 5: Evaluate fitness of P(t); 6: end for 7: for i = 1 to M do 8: Select operation to P(t); 9: end for 10: for i = 1 to M/2 do 11: Crossover operation to P(t); 12: end for 13: for i = 1 to M do 14: Mutation operation to P(t); 15: end for 16: for i = 1 to M do 17: P(t+1) = P(t); 18: end for 19: t = t + 1; # t<= T: t← t+1 transfer to step 2 20: end while Lines 1 and 2 initialize the algorithm. When the GA satisfies the condition, the cycle process of lines 3-20 is carried out. Lines 4-6 calculate the fitness of the population (comprehensive attractiveness index). Lines 7-9 make selection operations of the populations. Lines 10-12 make a crossover operation among the populations. Lines 13-15 make a mutation operation on the populations. Lines 16-18 make a roulette operation on the population. After the above operations, lines 19 and 20 output the best two groups (travel routes) in the population.

VI. EXPERIMENTS
To evaluate the performances of the GA and the model that determines the multidimensional users' preferences, we did three sets of experiments to evaluate the effectiveness of the travel route planning model. In the first set, we changed the number of sites and compared the results of the GA with those of other methods. In the second set, we changed the planned total travel duration constraint and compared the results of the GA with those of the other methods. In the third set, we changed the budget constraints and compared the results of the GA with those of the other methods.

A. EXPERIMENTAL DATA
The data used in the experiments were directly obtained from the Internet. This ensured the authenticity and validity of the data. The specific sources of the related data are as follows: The longitudes and latitudes of 20 classic sites in Beijing were obtained from the Baidu map site at http://api.map.baidu.com/lbsapi/getpoint/index.html.
The photographs of sites in Beijing were extracted from the comments of various sites on the Xiecheng website (https://www.ctrip.com/). We extracted 24 936 photographs in total.
The detailed information (opening times, ticket prices, best visiting hours, and so forth) and the Baidu index of 20 classic sites in Beijing were obtained from the Baidu website.
The evaluation score of 20 classic sites in Beijing were obtained from the Meituan website.

B. EXPERIMENT SETUP AND EVALUATION METRICS
We processed the algorithms using Python 3.6 64-bit. Experiments were implemented on a computer with the following configuration: Intel Core i5-3317U CPU at 1.70GHz and a RAM of 4.00GB.
The evaluation metrics used in this experiment were the run time (s) and the comprehensive attractiveness score of the travel routes obtained by each algorithm.

C. EXPERIMENTAL RESULTS
The real data of 20 classic sites in Beijing were used to carry out the experiments. The GA was used to plan the travel routes under certain constraints. The baseline algorithms were the greedy and simulated annealing algorithms. The evaluation metrics were the run time and the comprehensive attractiveness score. This experiment verified the routeplanning results of the three algorithms with various numbers of sites contained in the route, budget constraints, and planned total travel duration constraints.

1) EXPERIMENT SET 1: INFLUENCE OF NUMBER OF SITES IN THE ROUTE
The following experiments and graphs from Fig.4 to Fig.7 show the effects on running time and comprehensive attractiveness of the number of sites in the routes. Fig. 4 shows that the run time of the GA was longer than that of the greedy and simulated annealing algorithms. Keeping other factors constant, the run time of the greedy and simulated annealing algorithms were basically unchanged with the increase in the number of sites included in the route. However, the run time of the GA increased because an increasing number of sites prolonged the time to judge the constraints. Fig. 5 shows that the comprehensive attractiveness score of the GA was higher than that of the greedy and simulated annealing algorithms. Thus, it can be seen that the   GA can plan better travel routes than those of the other algorithms. In addition, as the number of sites in the travel route increased, the comprehensive attractiveness scores of the three algorithms increased. This is in line with the real situation, where tourist satisfaction was improved by visiting more sites within a limited period.
Because the iterative algebra of the GA went through 500 generations, its final comprehensive attractiveness score was also higher than that of the other two algorithms. However, it is more effective to compare the run time of the greedy algorithm with that of the GA which achieved the same comprehensive attractiveness score as the greedy algorithm. In Fig. 6, it can be seen that that difference was larger with the increase in the number of sites included in the route. Fig. 7 shows that the run time of the GA was much longer than that of the simulated annealing algorithm when the GA achieved the same comprehensive attractiveness score as   that of the greedy algorithm in most situations. Thus, it can be concluded that the GA compares well with the simulated annealing algorithm in comprehensive attractiveness score.

2) EXPERIMENT SET 2: INFLUENCE OF TIME CONSTRAINT
The following experiments and graphs from Fig.8 to Fig.11 show the effects on running time and comprehensive attractiveness of planned travel duration. Fig. 8 shows that the run time of the GA was longer than that of the greedy and simulated annealing algorithms. However, the run time of the three algorithms were basically unchanged with the increase in the planned total travel duration constraints in the travel route. Fig. 9 shows that the comprehensive attractiveness score of the GA was higher than those of the other algorithms. Thus,  it can be seen that the GA algorithm can plan the best travel routes. In addition, with increasing planned total travel duration constraints, the comprehensive attractiveness score of the travel routes obtained from the greedy algorithm and GA algorithm remained basically unchanged. However, the comprehensive attractiveness score of the simulated annealing algorithm fluctuated with the increase in planned total travel duration and became stable after reaching a certain value. This is due to the limit on the number of sites in the travel route. Fig. 10 shows that the run time spent on the GA that achieved the same comprehensive attractiveness score as the greedy algorithm was close that of the greedy algorithm. From Fig. 11, it can be seen that the run time spent on the GA that achieved the same comprehensive attractiveness score as the simulated annealing algorithm was shorter than that of the simulated annealing algorithm in most situations. It can be concluded that the GA planned better routes than those of the simulated annealing algorithm based on the run times and comprehensive attractiveness scores.

3) EXPERIMENT SET 3: INFLUENCE OF BUDGET CONSTRAINT
The following experiments and graphs from Fig.12 to Fig.15 show the effects on running time and comprehensive attractiveness of cost. Fig. 12 shows that the run time of the GA was longer than that of the greedy and simulated annealing algorithms.   Also, the run time of the three algorithms remained basically unchanged with the increase of the budget constraint on the travel route.
The comprehensive attractiveness score of the route obtained by the GA was higher than that of the other algorithms. In addition, the comprehensive attractiveness score of the greedy and simulated annealing algorithms was quite close, as seen in Fig. 13. From this, we can see that the GA can plan the best travel route. In addition, with an increasing budget constraint, the comprehensive attractiveness score of the route obtained from the greedy algorithm remained basically unchanged; however, the comprehensive attractiveness score of the GA and simulated annealing algorithms fluctuated with an increasing budget constraint.   . 14 seems to show that the GA had a run time which achieved the same comprehensive attractiveness score as the greedy algorithm was longer than that of the greedy algorithm. Fig. 15 seems to show that the GA had a run time that achieved the same comprehensive attractiveness score as the simulated annealing algorithm took much less time than did the simulated annealing algorithm in most situations. It can be concluded that the GA's run time and comprehensive attractiveness score was better than those of the simulated annealing algorithm.
In summary, the results of the three sets of experiments show that the GA's run time was longer than that of the greedy and simulated annealing algorithms, but the comprehensive attractiveness score of the GA's travel route was far higher than that of the other two algorithms. Therefore, the GA was slightly slower, but it planned travel routes that would satisfy tourists. In addition, from the run time and the comprehensive attractiveness score, the run time of the GA that achieved the same comprehensive attractiveness score as the greedy algorithm was much longer than the greedy algorithm's run time. However, the run time of the GA was close to the run time of the simulated annealing algorithm. Therefore, the GA was better than the greedy and simulated annealing algorithms for the two-evaluation metrics used in travel route planning.
As the size of dataset may be not big, we hold the view that it does not matter so much since it only affects the run time and have nearly no influence on comprehensive attractiveness index.

VII. CONCLUSION
Most travel route planning research focuses only on planning the shortest route and optimizing the TSP algorithm. However, in real life, the best route needs to satisfy the tourists' personalized demands. Therefore, this research was based on the multidimensional preferences of tourists, including travel distances among sites, the popularity and evaluation scores of sites, the total travel duration and cost, the initial departure time, and the initial travel location. We constructed a comprehensive attractiveness index as the target function of the model. Also, the GA was used to do experiments using real data from 20 classic sites in Beijing and to obtain the corresponding travel routes. Our work considered multidimensional tourist preferences to improve the travel routes, but future improvements can be made in three aspects. The first would be to combine the personal preferences of different tourists and then plan routes that meet those interests [32]. Second, the GA should be improved to reduce its run time. This can avoid tourists having to wait too long and can improve their satisfaction [33]. Third, more experiments are needed using more classic sites in Beijing or other cities to verify the effect of the model and the GA in the future.
YANMEI ZHANG received the Ph.D. degree in 2010. She was a Visiting Scholar with Northwestern University, Evanston, IL, USA, in 2007 and 2017. She is currently an Associate Professor with the Information School, Central University of Finance and Economics (CUFE), Beijing, China. Her current research interests include service computing, personalized recommendation systems, and web information processing.
LINJIE JIAO is currently pursuing the master's degree with the School of Information, Central University of Finance and Economics, China. Her current research interests include POI recommendation and tourism route planning.
ZHIJIE YU is currently pursuing the bachelor's degree with the School of Information, Central University of Finance and Economics, China. His current research interests include POI recommendation and services mashups.
ZHENG LIN was born in 1975. She is currently a Professor with the School of Information, Central University of Finance and Economics, China. Her current research interests include enterprise information systems and electronic commerce.
MENGJIAO GAN is currently pursuing the bachelor's degree with the School of Information, Central University of Finance and Economics, China. Her current research interests include recommendation systems and services computing. VOLUME 8, 2020