Tour Route Planning Algorithm Based on Precise Interested Tourist Sight Data Mining

Aimed at current problems in tour route planning, this research proposes a tour route planning algorithm based on the data mining of tourist sights based on their precise interest. It further studies on the aspect of tourist interests and needs, tourist sight feature attributes, data mining of tourist sights based on their precise interest and optimal tour route planning, and then designs and performs experiment to testify the feasibility and superiority of the algorithm. First, tourist interest and need label matrix and tourist sight feature attribute matrix are set up. Tourist sight attribute factor is formed by mining text encyclopedic data, and combining with other factors, tourist sight clustering algorithm is set up to output tourist sight clusters. Second, data mining algorithm of tourist sights based on their precise interest is set up, which outputs the optimal tourist sights not only match the tourists’ interests and needs but also optimally distribute around to produce the lowest travel costs. Third, based on precise interested tourist sights, combining with geographic information data, traffic information data and tourist sight information data, optimal tour route planning algorithm based on interest label is set up. It outputs maximum heap and complete binary tree with descending order interval motive values and corresponding edge clipping circle, which directly reflects the satisfaction content of each tour route on tourists. Tourist sight domain and geographic spatial data of Zhengzhou city are collected as basic data to design the experiment, and it testifies the algorithm’s feasibility, effectiveness and practicalness. Meanwhile, the experiment brings in three commonly used route planning algorithms as control group to make comparison with the developed algorithm on the aspect of optimal route, motive value, motive value difference, algorithm performance and guide map,etc., which testifies that the developed algorithm has superiority to traditional algorithms. The research indicates that this developed algorithm solves the problems on tour route planning and it is an effective, feasible and practical algorithm.


I. INTRODUCTION
Tour route planning is one of the most important research contents of smart tourism. Setting up smart tourist sight and tour route recommender system is the indispensable component of the smart tourism construction, and its purpose is to provide recommendation and decision support for tourists, especially The associate editor coordinating the review of this manuscript and approving it for publication was Yi Zhang .
for tourists who are not familiar with tourism city and tourist sights. This group of tourists have different tourism purposes and motives, and they hope to get the optimal tourist sights and tour route which meet their interests and needs through smart recommender system to get the best motive benefit satisfaction. Currently, the research on the tour route planning mainly focuses on computer algorithm modeling and optimization through simulation experiment. Most of the studies worked on how to realize the shortest path, which simply and mechanically cascade several or more tourist sights on the path, and then output the shortest travel distance and time of duration when tourists finish the whole trip. This kind of methods concern the execution efficiency and running space of algorithm but neglect the basic principle that tourists are the main subject of tourism activity, whose specific interests and needs as well as motive benefits should be the critical factors of tour route planning, and also, the tourist sights and routes should fit the critical factors. Precisely, this is the problem that current methods neglect and should be studied in depth. Whether the interests and needs are satisfied or not is the standard to judge the success of tourist sights and tour route planning. Thus, this research focuses on tourists' interests and needs and tends to solve the current problems on tour route planning.
Summarize and analyze the current researches. Literature [1] summarizes and analyzes the existing tourism route algorithms, focusing on a large number of real tourist sight attributes and user constraints, and discusses the existing problems and development prospects of tourism planning research. Literature [2] focuses on the study of the heuristic algorithm based on clustering method of public transport mode. Combined with various factors, it calculates the travel time of tour routes and obtains the optimal solution. Literature [3] uses emotional perception method to provide personalized tourist sight and tour route recommendation. It recommends tourist sights which match tourists' interests relating to emotional evaluation index data, and finally plans the optimal tour route. The research focuses on tourists emotional perception. Literature [4] introduces the application and research progress of RSs in tourism industry. It focuses on the classification of mobile RSs, and provides opinions on its service application in tourism industry. Literature [5] makes a comprehensive analysis of the research on tour route recommendation. It proposes the general classification of related tourism research and discusses the whole process of tour route recommendation research. Literature [6] focuses on the analysis of individual tour planning system, and summarizes the functions, characteristics and problems of the system. References [7] and [8] studied the application of orientation problem and its method in tour route planning. They discuss the exact solution method and meta heuristic method. Reference [9] designs and develops a tour route planning tool to provide each tourist with the most suitable itinerary for their needs. In the tool, the considered factors include the time schedule, expected activities and the characteristics of tourism area. The interactive multiple criteria and mathematical modeling method are used to realize the tour route planning. Literature [10] proposes the framework of ''filterfirst, tour-second''. Individual tourist sight and tour route are obtained from social media. The collaborative filtering method is used to select the interested tourist sights. Considering various factors, the iterative tabu search algorithm is used to plan the visiting sequence of tourist sights and provide tour routes for tourists. In literature [11], a non-dominated sorting heuristic method combining with ant colony optimization and differential evolution algorithm is designed to plan tour routes.
Analyze the current researches on tour route planning, the following problems and issues should be further studied in depth. First, the research on tourists feature attributes and interests are insufficient, and this is the principal condition to meet motive benefit satisfaction. Without thorough research on tourists' feature attributes and interests, optimal tour route cannot be planned [12]- [16]. Second, the research on typical tourist sights of tourism city is insufficient, in which tourist sight feature attributes and functions to match tourists' interests and needs are not studied and tourist sights are not clustered, which is the second condition to meet motive benefit satisfaction. Without thorough research on tourist sight feature attributes and functions, the precise tourist sights which can meet the needs of tourists will not be mined since the precise tourist sights are the necessary nodes for planning optimal tour routes [13], [17]- [19]. Third, there is no research on the data mining of tourist sights based on their precise interest, which is the precondition to plan tour routes. The tourist sights provided for tourists should all meet the tourists interests and needs, avoiding any unexpected one during the trip. If recommender system provides tourists with unexpected tourist sights, the degree of satisfaction in the whole trip will be greatly decreased and it will finally bring negative influence on the tour route experience. Fourth, the method to output shortest distance and time duration of the whole trip only considers optimization on computer algorithm, which simply and mechanically cascades plenty of tourist sights. The drawback of this method is that it is only feasible in computer simulation circumstance but not suitable for realworld circumstance, as the travel activities are carried out in actual geographic space and influenced by many realworld factors, that is, this method neglects city geographic information data, traffic information data and tourist sight information data, etc., which will influence motive benefit satisfaction. Meanwhile, cascading plenty of tourist sights in one route is also not practical as tourists cannot visit all of the tourist sights in limited time all at once, or it is impossible that tourists stay in the city for sufficient days only for visiting all of the tourist sights. The most possible and feasible situation is that, tourists depend on their interests and needs, visit limited tourist sights which match their interests and needs and bring them with the best motive satisfaction in limited time, and also the whole trip should meet their physical qualifications, time schedule, tour expense, etc [20]- [24]. According to the analysis, this research focuses on the precondition of tourists interests and needs, and also ensures that the planned tour routes fit the real-world tour process. The research results can provide smart recommender system with technique support and be applied in tourism activity.
Aimed at solving the mentioned problems, this article proposes a tour route planning algorithm based on the data mining of tourist sights based on their precise interest. First, aimed at the problem of insufficient research on tourists' VOLUME 8, 2020 feature attributes and interests, tourist interest and need label matrix is set up, through which the tourists' interests and needs are quantified to precisely describe tourists' motive and purpose. This is the first condition to plan tourist sight and tour route. Second, aimed at the problem of insufficiency study on typical tourist sights in tourism city, this research tends to set up the clustering objective function σ (c r , c ¬r ) based on the parameter factors of text encyclopedia data mining, attraction index, optimal visiting time and tour expense, etc. And then tourist sight clustering algorithm model is set up. The foundation of tourist sight clustering algorithm model is the precondition for the data mining of tourist sights based on their precise interest, and it provides the knowledge on how tourist sights meet the needs of tourists. Third, aimed at the problem of tourist sight mining research, this article brings forward precise interested tourist sight mining algorithm based on interest label and geographic location, whose key thought is to search the nearest tourist sights around the starting point in certain neighbourhood buffer within the scope of those tourist sights which can match tourists' interests and needs. It realizes that all the selected tourist sights not only meet the tourists' interests and needs, but also have the optimal geographic distribution to make the tour expense lowest. Fourth, aimed at the problem of tour route planning, the developed algorithm combines with geographic information data, traffic information data and tourist sight information data, takes the mined precise interested tourist sights as nodes, to output the tour routes which not only meet the interests and needs but also bring the best motive benefits [25]- [29]. Above all, the research lays emphasis on tourists' interests and needs, tourist sight feature attributes, data mining of tourist sights based on their precise interest and optimal tour route planning, and it aims to meet the satisfaction of tourists. The main structure, research contents and methods are as follows.
The first section: Introduction. Analyze the background and problems to be solved on tour route planning research. Aimed at the problems, propose a tour route planning algorithm based on the data mining of tourist sights based on their precise interest. Meanwhile, the structure, research contents and methods are introduced in this section.
The second section: Tourist sight clustering algorithm model based on interest mining. This section aims to study on tourists' interests and needs as well as city tourist sights' feature attributes, based on which, tourist sight clustering algorithm is designed and set up. This is the precondition for the data mining of tourist sights based on their precise interest. First, the interest label matrix is set up, and tourists' specific interests and needs are quantified through the motive factor indexes. Second, tourist sight feature attributes are standardized in accordance with interest label to set up tourist sight feature attribute matrix. Based on the tourist interest label matrix and tourist sight feature attribute matrix, by mining tourist sight text encyclopedia data to get tourist sight attribute factor, and combining with attraction indexes, optimal visiting time, tour expenses, etc., tourist sight clustering objective function and relative algorithm are designed to generate tourist sight clusters. Thus, tourist interest label matrix, tourist sight feature attribute matrix and tourist sight clusters are the precondition for data mining of tourist sights based on their precise interest.
The third section: Optimal tour route planning algorithm based on interest label. This section includes two research parts. The first part is to set up precise interested tourist sight mining algorithm. Based on tourist interest label matrix, tourist sight feature attribute matrix and tourist sight clusters, precise tourist sight mining algorithm based on interest label and geographic location is designed. The basic thought of the algorithm is to precisely search the optimal geographically distributed tourist sights within the ones that meet tourists' interests and needs, which ensure that the selected tourist sights not only meet tourists' interests and needs, but also make the lowest tour expense as they are nearest to the starting point, in another way, among those tourist sights which fit the conditions, the nearer the better. Based on the mined tourist sights, combing with geographic information data, traffic information data and tourist sight information data, optimal tour route planning algorithm is set up to output the tour routes which meet tourists' interests and needs while bring the best motive benefits. In the whole route, each tourist sight is the precise interested one for tourists, and the tour routes accord with real-world environment and conditions. Thus, the algorithm can be used as key technique to develop smart recommender system for tourism activity.
The fourth section: Experiment and example analysis. In this section, an experiment is designed to testify the algorithm's feasibility and practicalness. A control group of algorithms is designed to compare with the experimental group algorithm in outputting optimal tour routes, motive values, motive value differences, algorithm performance, and tour guide maps. In the experiment, tourist sight research domain is confirmed to collect, calculate and obtain basic experimental data, and tourist sight clusters are generated. Through the data mining algorithm of tourist sights based on their precise interest, interest tourist sight basic vector is obtained to confirm the tourist sights to be visited. Take these tourist sights as nodes, through the developed algorithm, tour routes and motive values are obtained. Meanwhile, the motive value maximum heap, complete binary tree and corresponding edge clipping circles are also obtained, from which the optimal tour route sequence could be visualized and the best ones could be confirmed. The experiment sets three traditional and common route planning algorithms as the control group and outputs the top four tour routes respectively. On the aspect of sub-interval motive value, interval value, time complexity and space complexity, two groups of algorithms are compared and analyzed to get experiment conclusion. Also, this research work is compared with some current research works in literature to find out the similarities and differences.
The fifth section: Conclusion and future work. This section concludes the whole research work, research method and experimental results, finally concludes the future work.

II. TOURIST SIGHT CLUSTERING ALGORITHM MODEL BASED ON INTEREST DATA MINING
In one tour process, in order to make sure that tourists can get the best motive benefit satisfaction, smart recommender system should consider tourists' individualized interests and needs, that is, the most expected experience they want to get in the tour process, based on which the internal algorithm is designed. Usually, tourists tend to choose tourism city and tourist sights which they have not been to or visited to satisfy the desire to explore the unknown world. Therefore, in the stage of scheduling the trip, they usually concern three issues, first one, what interests and needs should be satisfied during the trip, second one, whether tourism city and tourist sights could satisfy the interests and needs, and the third one, how to satisfy the interests and needs. If tourists can get best motive benefit satisfaction from the three issues, they will finally get the best motive satisfaction. This is the emphasis smart recommender system mostly concern on selecting tourist sights and planning tour routes, and also the purpose for smart tourism construction to provide individualized service [30]- [32]. Aimed at the problems, the research obtains tourists interests and needs, matches the interest tourist sights and plans the optimal tour routes. It is the key process to set up the internal algorithms for smart recommender system. Generating the tourist sight clusters in accordance with feature attributes is the precondition to set up critical algorithms.

A. FOUNDATION OF TOURIST INTEREST AND NEED LABEL MATRIX
Usually, tourists are probably not familiar with the tourism city and tourist sights when they plan their trips, but they have definite interests and needs. According to the mostly concerned factors when making tour schedule, interests and needs could be divided into two levels. And all the divided interests and needs could be coded and stored as label [33]- [35]. Here is the first group of definition. Def 1.1: Interest motive factor T (i) . The single one factor which represents the tourists' interests and needs and determines for smart recommender system to search tourist sight classification and specific tourist sight when tourists plan their schedule is called interest motive factor T (i) . i stands for the serial number of the factor. The conditions for searching certain tourist sight classification and specific tourist sight are determined by α interest motive factor T (i) . α is the maximum number of factors, i ∈ (0, α] ⊂ Z + . Each interest motive factor T (i) is called one interest label. Def 1.2: Interest motive factor feature attribute T (ij) . The feature attribute owned by the single interest motive factor T (i) and can determine its character as well as tourists' specific interests and needs is called interest motive factor feature attribute T (ij) . According to definition, each interest motive factor T (i) has its own feature attribute, and the maximum value of its feature attribute β (i) varies among different factors, j (i) ∈ (0, β (i) ] ⊂ Z + .

Def 1.3:
Interest and need label set T and subset T (i) . The vector which is formed by all interest motive factors T (i) is called the interest and need label set T. The maximum value α of the contained interest motive factors is called the rank for interest and need label set T, noted as rank(T). Each interest motive factor T (i) contains some quantity of feature attributes T (ij) . The matrix formed by the feature attributes T (ij) which represents interest motive factors T (i) is called interest and need label subset T (i) . The maximum value β (i) of the contained interest motive factor feature attributes T (ij) is called the rank of the interest and need label subset T (i) , noted as rank(T (i) ).
Def 1.4: Interest and need label matrix T (αβ) . The matrix formed by α interest and need label subset T (i) and its elements is called interest and need label matrix T (αβ) . The No. i row of matrix T (αβ) stores the elements of interest and need label subset T (i) . The No. j column of matrix T (αβ) is the No. j feature attribute T (ij) . The rank of different subsets T (i) and feature attributes are different, thus, arbitrary two rows or two columns of matrix T (αβ) are non-linear related. Row rank of matrix T (αβ) is α, column rank of matrix T (αβ) is max β (i) . In matrix T (αβ) , if certain row rank of interest and need label subset T (i) meets the condition rank(T (i) ) < max β (i) , then rank(T (i) ) elements of the row are interest motive factor feature attributes T (ij) , other max β (i) −rank(T (i) ) elements are 0, shown as formula (1). In the schematic matrix, the first row has three elements, the second has β (2) amount of elements, and the No. i row has β (i) amount of elements. All 0 elements are supplemented at the last elements and the former β (i) elements are non-zero elements. (11) T (12) T (13) . . . 0 0 T (21) T (22) T (23) . . . T (2β (2) As to the selecting of tourist sights to be visited, tourists usually concern tourist sight classification, popularity, optimal visiting time and basic expense, and they are the key factors for tourists to select tourist sights to be visited, through which, tourist interest motive factors T (i) are T (1) : tourist sight feature; T (2) : tourist sight attraction index(noted as a), a ∈ (0, 1) ⊂ Z + ; T (3) : optimal visiting time(noted as t, unit: hour); T (4) : basic tour expense(Cost, noted as c o , unit: ¥ yuan). Four factors form the four labels. Tourist sight feature and relevancy degree are obtained by text mining, they represent the tourists' interest tendency to tourist sight clusters. This is the first precondition for tourists to select tourist sight clusters. Tourist sight attraction index is a comprehensive value, and it represents tourists' interest tendency to one tourist sight in one certain tourist sight classification. Tourist sight visiting time is the expected time for tourists to stay in one tourist sight. Basic tour expense is the lowest fee that tourists must pay for the tourist sight, such as ticket fee, not including the other individual expenses during the trip which cannot be calculated. According to the definition, the interest and need label set T is formed as T = <T (1) , T (2) , T (3) , T (4) >. VOLUME 8, 2020 Each label has interest motive factor feature attribute T (ij) . Each interest and need label T (i) and feature attributes T (ij) is defined as follows.

B. TOURIST SIGHT CLUSTERING ALGORITHM MODEL BASED ON INTEREST MINING
Specific tourist sight selecting is the key process to plan optimal tour route for tourists, while the precondition for precise tourist sight mining according to tourists' interests and needs is the effective clustering on tourist sights. The purpose of tourist sight clustering is to realize rapid and efficient optimal tourist sight searching and mining, and match tourists' interests and needs. Smart recommender system firstly generates tourist sight clusters and sets up the basic data source for tourist sight mining [27], [36]. Here is the second group of definition. Def 2.1: Tourist sight domain C and tourist sight cluster C i . The initially defined data set which is composed by m amount of tourist sight c r that have tour conditions and reflect tourists' interest feature attributes in one tourism city is called tourist sight domain C, r ∈ (0, m] ⊂ Z + . The classifications with the similar and common feature attributes which are set for tourists mining interested tourist sights and generated by tourist sight clustering objective function from the initially defined m amount of tourist sight c r are called tourist sight cluster C i . According to the definition, the generation of tourist sight cluster C i is controlled by certain principle, that is the tourist sight clustering objective function. Arbitrary tourist sight cluster C i is a non-empty set ¬ ∅, and it is one classification of tourist sights representing tourists' interest tendency and also the data unit which can be recognized by smart recommender system to realize the precise match on tourist sights and tourists' interests and needs. Tourist sight cluster C i is the proper subset of tourist sight domain C. Arbitrary two tourist sight clusters ∀C i and ∀C ¬i are nonempty and mutually exclusive, which meets the condition of formula (3), i is the code of cluster, p is the maximum amount of cluster. This formula shows that the domain C is the union set of all clusters. Two clusters C i and C ¬i has no common element.
Def 2.2: Tourist sight meta-data c(i, v i ). The single tourist sight stored in the tourist sight cluster C i is called tourist sight meta-data c(i, v i ). By the initially defined m amount of tourist sight c r in the tourist sight domain C, tourist sight cluster ∀C i contains m i amount of tourist sight meta-data c(i, v i ), i is the code of cluster, v i is the code of tourist sight meta-data c(i, v i ) in tourist sight domain C, and meets the condition of formula (4). The formula shows the total quantity of tourist sights in domain C is composed by each cluster's tourist sight quantity.
Def 2.3: Tourist sight clustering factor k o and tourist sight clustering objective function σ (c r , c ¬r ). The process for single one tourist sight c r coming under the tourist sight cluster C i in tourist sight domain C and being noted as tourist sight meta-data c(i, v i ) is determined by certain factors which influence tourist sight clustering process, and these factors are called tourist sight clustering factors k o , o ∈ Z + . The function which is set up by tourist sight clustering factors k o and determines the process of single one tourist sight c r coming under the tourist sight cluster C i in tourist sight domain C and being noted as tourist sight meta-data c(i, v i ) is called tourist sight clustering objective function σ (c r , c ¬r ). As to relative independent two tourist sights c r and c ¬r , their similarity is determined by the interest and need label set T, including tourist sight attribute relevancy degree k 1 , attraction index k 2 (parameter a), optimal visiting time k 3 (parameter t) and tour expense k 4 (parameter c o ). Of all factors k o , the tourist sight attribute relevancy degree k 1 is obtained by tourist sight text similarity data mining algorithm; attraction index k 2 , optimal visiting time k 3 and tour expense k 4 are statistics parameters. Here is the algorithm process of setting up tourist sight clustering objective function σ (c r , c ¬r ).
Step 1 Set up keyword label set L and keyword label subset L (u) .
According to the feature attribute of tourist sight c r and text encyclopedia data, through the confirmed interest motive factors T (i) and attributes T (ij) , the keyword label set L and keyword label subset L (u) in tourist sight domain C are set up. Set L and subset L (u) meet the conditions: In the conditions, l (uv) is the label element in L (u) . Their relationship must meet the following conditions: label set and label subset are non-empty text set; arbitrary two label subsets are mutually exclusive and independent, that is, they don't contain the same label l (uv) ; the arbitrary two labels are mutually exclusive and independent, that is, they don't contain the same text keywords; the arbitrary label subset L (u) is the proper subset of label set L. They are noted as follows.
According to the definition, label set: L = {{l (11) , . . . , l (1v(1)) }; {l (21) , . . . , l (2v(2)) }; . . . ; Step 2 Set up the tourist sight keyword frequency matrix T c r (uv) . The element of tourist sight keyword frequency matrix T c r (uv) is defined as the occurrence frequency of this tourist sight's text encyclopedia keyword label l (uv) . The row of matrix T c r (uv) relates to the keyword label subset L (u) , the column of matrix T c r (uv) relates to the No. v label l (uv) of the keyword label subset L (u) . According to the definition of label set constraint condition in step 1, arbitrary two rows or columns of matrix T c r (uv) are non-linear related. The row rank rank(T c r (u) ) of matrix T c r (uv) is the maximum value of interest motive factor attributes max β (1) . The column rank rank(T c r (v) ) of matrix T c r (uv) is the maximum label quantity max v (u) of all label subsets L (u) . According to the definition, the total quantity of non-zero element of matrix T c r (uv) is u=1 v (u) . Sub-step 1: Initialize the keyword frequency matrix T c r (0) of the tourist sight c r . In the mean time, the matrix T c r (0) is the null matrix with dimension max β (1) × max v.
Sub-step 2: Set the keyword frequency of related label l (uv) in keyword frequency matrix T c r (uv) as n c r (uv) . For the element l (1v) of the first row label subset L (1) in matrix T c r (uv) , search for the text encyclopedia data of tourist sight c r and obtain n c r (1v) , then store n c r (1v) into the first row of T c r (0) . Sub-step 3: Turn back to sub-step 2. For the element l (2v) of the second row label subset L (2) in matrix T c r (uv) , search for the text encyclopedia data of tourist sight c r and obtain n c r (2v) , then store n c r (2v) into the second row of T c r (0) . In the same way, the other rows' labels of matrix T c r (uv) are also searched and obtained to finally get keyword frequency matrix T c r (uv) . The storage rule is shown as follows.
(1) For the row L (u) of matrix T c r (uv) , store n c r (uv) into the No. v element of this row; (2) For the row of keyword label subset L (u) : The top several v (u) elements are the related keyword frequency of label l (u1) ∼ l (uv(u)) in label subset L (u) , and the other elements are 0; y If v (u) − max v (u) = 0. Row L (u) is full rank. Elements are the related keyword frequency of label l (u1) ∼ l (uv(u)) in label subset L (u) . (3) max v v=1 n c r (uv) is the total keyword frequency of matrix T c r (uv) . Formula (5) is the tourist sight keyword frequency matrix T c r (uv) . In the schematic matrix, the first row has three elements, the No. i row has v (i) amount of elements. All 0 elements are supplemented at the last elements and the former v (i) elements are non-zero elements. (11) n c r (12) n c r (13) . . . 0 0 n c r (21) n c r (22) n c r (23) . . . n c r Sub-step 4: Turn back to sub-step 1. Search tourist sight keyword frequency matrix T c ¬r (uv) for other tourist sight c ¬r in tourist sight domain C. All tourist sight matrix T c r (uv) elements distribution and related labels l (uv) are identical while element keyword frequencies are different.
Step 3 Confirm tourist sight attribute clustering factor k 1 . Tourist sight attribute clustering factor k 1 is the principal factor for setting up the tourist sight clustering objective function σ (c r , c ¬r ), which influences tourist sight clustering process on the aspect of tourist sight attributes. Tourist sight attribute clustering factor k 1 is determined by tourist sight keyword frequency matrix T c ¬r (uv) . Here is the algorithm process of setting up the tourist sight attribute clustering factor k 1 (c r , c ¬r ).
Sub-step 1: Confirm objective tourist sight c r1 and c r2 , and search for the related tourist sight keyword frequency matrix T c r1 (uv) and T c r2 (uv) . Sub-step 2: Search for the non-zero label of the same elements in the keyword frequency matrix T c r1 (uv) and T c r2 (uv) .
Here is the definition of the non-zero label of the same element. If the elements in the tourist sight keyword frequency matrix T c r1 (uv) and T c r2 (uv) simultaneously meet the following conditions, the related labels l (uv) of the elements are the non-zero labels of the same element.
(1) Elements should relate to one label, but not the supplementary 0 elements when the matrix is not full rank; VOLUME 8, 2020 (2) As to the same element label l (uv) , the keyword frequency is non-zero, that is, n c 1 (uv) = 0, n c 2 (uv) = 0; (3) Tourist sight text encyclopedia data difference makes the difference on text data mining result, that is, the situation of | n c 1 (uv) − n c 2 (uv) | = 0 is a rather small probability event. Sub-step 3: Set up the matrix S of the non-zero label with the same element. According to the sequence from the left element to the right one, and from the top element to the bottom one in the matrix T c r (uv) , search for the non-zero labels with the same element in the matrix T c r1 (uv) and T c r2 (uv) . Set the total quantity of the non-zero labels with the same element in No. u row as Count (u) , Count (u) ∈ (0, v (u) ] ⊂ Z + . Set up the matrix S of the non-zero label with the same element, whose dimension is 2 × u=1 Count (u) and element is S (ew) . The element S (ew) is defined as the keyword frequency of the relative non-zero label l (uv) with the same element, and e ∈ (0, 2] ⊂ Z + , w ∈ (0, The first row is the keyword frequency of the non-zero labels with the same element in matrix T c r1 (uv) , and the second row is the keyword frequency of the non-zero labels with the same element in matrix T c r2 (uv) . In the matrix S, the same one column represents the keyword frequencies of the same label l (uv) in matrix T c r1 (uv) and T c r2 (uv) respectively. Set up the matrix S according to the following steps and rules.
x Search for the No.1 element of the first row in matrix T c r1 (uv) and T c r2 (uv) respectively, initial value Count (u) = 0. i) If n c 1 (11) = 0 and n c 2 (11) = 0, thus the related label l (11) of the No.1 element is the non-zero label with the same element. Store n c 1 (11) into S (11) , and store n c 2 (11) into S (21) , turn to step y. Count (u) = 1; ii) If ∃ n c 1 (11) = 0 or n c 2 (11) = 0, thus the related label l (11) of the No.1 element is not the non-zero label with the same element. Turn to step y. Count (u) = 0.
y Search for the No.2 element of the first row in the matrix T c r1 (uv) and T c r2 (uv) respectively. i) If n c 1 (12) = 0 and n c 2 (12) = 0, thus the related label l (12) of the No.2 element is the non-zero label with the same element.
In step x, if meets the condition i), store n c 1 (12) into S (12) , store n c 2 (12) into S (22) , and turn to step z, Count (u) = 2; In step x, if meets the condition ii), store n c 1 (12) into S (11) , store n c 2 (12) into S (21) , turn to step z, Count (u) = 1; ii) If ∃ n c 1 (12) = 0 or ∃ n c 2 (12) = 0, thus the related label l (12) of the No.2 element is not the non-zero label with the same element. In step x, if meets the condition i), Count (u) = 1; in step x, if meets the condition ii), Count (u) = 0, turn to step z.
z According to the rules of step x and step y, search other elements of the matrix T c r1 (uv) and T c r2 (uv) . The termination condition of the whole searching process is the total value Count (u) reaches β (1) u=1 Count (u) , and the matrix S of the non-zero label with the same element is full rank. The row rank is rank(S (e) ) = 2, and meanwhile the column rank is Sub-step 4 Set up the tourist sight attribute clustering factor k 1 algorithm based on the similarity measurement.
When the matrix T c r1 (uv) and T c r2 (uv) are both the null-zero matrix and contain at least one non-zero label with the same element, set up the tourist sight attribute clustering factor k 1 (c r , c ¬r ) algorithm based on similarity measurement as the formula (6) shows. Set the p = β (1) u=1 Count (u) . Factor k 1 (c r , c ¬r ) is obtained from the text data mining on the aspect of keyword frequency. It is the key factor of tourist sight clustering. The formula k 1 (c r , c ¬r ) is determined by the four parameters n c r1 (uv) , n c r2 (uv) , S(e 1 w) and S(e 2 w).
S(e 1 w) × S(e 2 w) (6) Step 4 Set up the clustering objective function σ (c r , c ¬r ). According to the definition and the clustering criterion, the clustering objective function σ (c r , c ¬r ) is determined by the attribute factor k 1 , attraction index k 2 , optimal visiting time k 3 and tour expense k 4 of tourist sight c r and c ¬r . It is the direct measurement on attribute affinity of two tourist sights. The larger the clustering objective function value is, the higher attribute affinity the tourist sights c r and c ¬r will have, and vice versa the lower will be. Thus, in the process of tourist sight clustering, attribute factors k 1 ∼ k 4 are nonlinear relationship, and they have the similar influence on the clustering of tourist sights c r and c ¬r . In the clustering criterion, suppose that k i (c r ) and k i (c ¬r ) are the same feature factors measure of the two tourist sights, i ∈ (1, n] ⊂ Z + , n is the the dimension of tourist sight feature factor. According to the definition of Minkowski distance, the clustering objective function between tourist sights c r and c ¬r is shown as formula (7). When the parameter is d = 2, Minkowski distance is Euclidean distance, which conforms to tourist sight clustering process. According to the influence degree of factors on tourist sight clustering, all the factors are normalized to avoid certain factors reducing other factors' influence degree because of relatively large value. Bring in disturb and restrain coefficients ε i , ε i ∈ (0, 1] ⊂ R + . The coefficients ε i are determined by the factors' order of magnitude. The formula shows that σ (c r , c ¬r ) is determined by n parameters k i , and each k i impact on the σ (c r , c ¬r ) is modified by coefficient ε i . The aim of the formula is to find out the maximum value of σ (c r , c ¬r ).
From formula (7), when clustering the tourist sights, the criterion to decide whether tourist sights c r and c ¬r belong to the same cluster is that whether the clustering objective function σ (c r , c ¬r ) value is larger than the preset threshold value. If the clustering objective function σ (c r , c ¬r ) value between the standard tourist sight and the tourist sight to be clustered is larger than another function σ (c r , c ¬r ) values between the standard tourist sight and tourist sights to be clustered, meanwhile, larger than the preset threshold value, then this tourist sight and standard tourist sight belong to the same cluster. Along with the second group of definition, here is the third group of definition.
Def 3.1: Two matrix of the tourist sight clustering dynamic matrix C ∧ (p×max m (i) ) and the tourist sight clustering steady matrix C (p×max m (i) ) . According to the Def 2.1, after the dynamic clustering process, there will be p amount of tourist sight clusters C i in the tourist sight domain C. Each tourist sight cluster C i contains m i amount of tourist sight meta-data c(i, v i ).
The matrix with dimension p × max m i and containing randomly-distributed elements of m amount of tourist sight meta-data c(i, v i ) is defined as the tourist sight clustering dynamic matrix C ∧ (p×max m (i) ) . After dynamic clustering process, tourist sight meta-data c(i, v i ) are stored in the form of row vector in the sequence of cluster code in a matrix, and this matrix is defined as the tourist sight clustering steady matrix C (p×max m (i) ) . The process of setting up the two matrix C ∧ (p×max m (i) ) and C (p×max m (i) ) meets the conditions as follows, and formula (8) and formula (9) are one mode of matrix C ∧ (p×max m (i) ) and C (p×max m (i) ) distribution.
(1) Matrix has p × max m i amount of elements, in which m amount of elements relate to tourist sight meta-data c(i, v i ), others p × max m i − m amount of elements are 0; (2) Row rank meets the condition: rank(C ∧ (p·) ) ≤ 0, rank(C (p·) ) = p; (3) Column rank meets the condition: (p×max m (i) ) are randomly distributed; (5) The first row of matrix C (p×max m (i) ) only arrange those tourist sights belonging to one cluster, which contains and only contains one tourist sight growing seed point c (i,v i ) . Meta-data tourist sights c(i, v i ) are randomly distributed in the same row.
In the formula (8), each tourist sight is randomly stored in the matrix C ∧ (p×max m (i) ) . After the algorithm sorting, each seed point is located at the first element and the matrix C (p×max m (i) ) is formed, empty elements are supplemented by 0, like formula (9).
Def 3.2: Tourist sight growing seed point c (i, v i ) and tourist sight growing tree tree (i) . The initial tourist sight which depends on clustering objective function σ (c r , c ¬r ) to search nearest tourist sights is called tourist sight growing seed point c (i, v i ). In the tourist sight domain, the growing seed point represents one tourist sight cluster, through which the searched and clustered tourist sights have the most similar attributes. The amount of growing seed point is p and meets the condition The clustering objective function σ (c r , c ¬r ) value of arbitrary two growing seed points is a minimum value which could be neglected, that is, they have distant relationship. Define the growing tree structure which is formed by the same cluster tourist sights and searching path mined by one tourist sight growing seed point c (i, v i ) as the tourist sight growing tree tree (i) . Tourist sight growing seed point c (i, v i ) is the initial root node of the tourist sight growing tree tree (i) . One seed point c (i, v i ) only relates to one tourist sight growing tree tree (i) , i ∈ (0, p] ⊂ Z + , while one tourist sight growing tree tree (i) relates to one row C (i) with rank m i of one tourist sight cluster C i and tourist sight clustering steady matrix C (p×max m (i) ) . Def 3.3: Tourist sight searching immunity c + (i,v i ) . In the process of tourist sight clustering, in order to avoid repeated searching, note the tourist sight which has been absorbed into one cluster as c + (i,v i ) , and define that the tourist sight has obtained searching immunity c + (i,v i ) . And the tourist sights which have not obtained the searching immunity are noted as c − (i,v i ) . By the method, the immune tourist sights c + will not be repeatedly searched by new cluster's tourist sight growing seed point c (i, v i ). Tourist sight searching immunity could narrow the scope of tourist sight clustering gradually, and finally absorb m amount of tourist sights into p amount of clusters C i . Def 3.4: One value and one matrix. The tourist sight membership degree µ(c (i, v i ), c(i, v i )) and the tourist sight membership degree matrix µ i (p × max m (i) ). In the process of searching the tourist sights initially from the selected seed point c (i, v i ) center to form its related cluster C i , if the clustering objective function σ (c (i, v i ), c(i, v i )) value is the maximum one and larger than the threshold value, then ) value 1 will be absorbed into cluster C i and obtain the tourist sight searching immunity c + (i,v i ) . Based on tourist sight clustering dynamic matrix C ∧ (p×max m (i) ) , in the process of forming p amount of clusters C i , matrix µ i (p × max m (i) ) with the same dimension as C ∧ (p×max m (i) ) is set up, and this matrix stores the membership µ(c (i, v i ), c(i, v i )) according to the randomly distributed elements in matrix C ∧ (p×max m (i) ) . This matrix is called tourist sight membership degree matrix µ i (p × max m (i) ). The tourist sight membership degree matrix is the transition matrix from the initial tourist sight clustering dynamic matrix C ∧ (p×max m (i) ) to the terminal tourist sight clustering steady matrix C (p×max m (i) ) . Tourist sight with membership degree value 1 will be absorbed into the cluster C i of seed point c (i, v i ) and simultaneously stored into related row of matrix C (p×max m (i) ) . As soon as one seed point c (i, v i ) is searched, a new tourist sight membership degree matrix µ i (p × max m (i) ) will be generated. Formula (10) is the tourist sight membership degree function, formula (11) is tourist sight membership degree matrix µ i (p × max m (i) ). Formula (10) shows the cluster relationship between the c (i, v i ) and c(i, v i ). Formula (11) shows the distribution of the membership degree.
According to the third group of definition and algorithm thought, the algorithm process is set up as follows. After the tourist sight clustering process, the confirmation method of the tourist interest label will be set in accordance with the feature attributes of the tourist sight clusters and attribute factors k 1 ∼ k 4 . Tourists will confirm interest label by certain rule.
Step 1 Set up the Open list and Closed list. Randomly store m amount of tourist sights c r of tourist sight clustering dynamic matrix C ∧ (p×max m (i) ) into Open list O (·) in element order. Define the Open list O (·) as 1×m dimension row vector, and its element is O (r) .Tourist sight c r is randomly stored in O (r) element. Tourist sight code and element may not be related in order. At this moment, the tourist sights in the Open list O (·) all haven't obtained searching immunity c + (i,v i ) . Set up Closed list C (·) to store the tourist sights that have obtained searching immunity c + (i,v i ) . Closed list C (·) is also the 1 × m dimension row vector, and its element is C (r) .
Step 2 Confirm the No.1 tourist sight seed point c (1, v 1 ). Randomly select one tourist sight ∀O (r) ∼ c r in Open list O (·) as the No.1 tourist sight seed point c (1, v 1 ), and this seed point is also the initial seed point for cluster C 1 .
Step 3 Initialize the seed point c (1, v 1 ), and search other tourist sights ¬ c (1, v 1 ) in the Open list O (·) .
Sub-step 1 Search the No.1 element tourist sight O (1) in Open list O (·) , the subsequent searching process follows the same rule.
x If O (1) = c (1, v 1 ), turn to Sub-step 2, and continue searching the No.2 element tourist sight O (2) in the Open list Sub-step 2 Search the No.2 element tourist sight O (2) in Open list O (·) .
x If O (2) = c (1, v 1 ), turn to Sub-step 3, and continue searching the No.3 element tourist sight O (3) in the Open list i) If the two critical conditions are met simulta- (2) ) , then tourist sight O (1) belongs to cluster C 1 with seed point c (1, v 1 ). Carry out the following steps.
(1) Note the membership degree of tourist sight O (1) as 1, and it obtains searching immunity c + (1,1) ; (2) Initialize the membership degree matrix µ 1 (p × max m (i) ); (3) Store O (1) into the No.1 element C (1) in the Closed list C(·), and delete it from the Open list O (·) ; (4) Update and renew the tourist sight clustering steady matrix C (p×max m (i) ) , and store the seed point c (1, v 1 ) into the No.1 element c (1, 1) in the first row of C (p×max m (i) ) ; (5) Note tourist sight O (2) as seed point c (2, v 2 ), and store it into the No.2 element C (2) in the Closed list C (·) and the No.1 element c (2, 1) in the second row of C (p×max m (i) ) , and delete it from the Open list O (·) . Seed point c (2, v 2 ) relates to cluster C 2 . Initialize membership degree matrix (1) ), and then the tourist sight O (2) belongs to cluster C 1 with seed point c (1, v 1 ). Note tourist sight O (1) as seed point c (2, v 2 ). Initialize membership degree matrix µ 2 (p × max m (i) ). The operation process is the same as sub-steps (1)∼(5) in step i).
iii) If conditions in i) and ii) can not be satisfied simultaneously, then tourist sight O (1) and O (2) don't belong to cluster C 1 with seed point c (1, v 1 ). Turn to Step 4. Sub-step 3 Set up the first tourist sight growing tree tree (1) based on seed point c (1, v 1 ) and subordinate tourist sights.
Step 5 Continue searching. Continue searching the other tourist sights in Open list O (·) according to Step 2∼Step 4. The immune tourist sights will not be searched again. And finally output the p amount of seed point c (i, v i ), p amount of clusters C i , p amount of tourist sight growing trees, p amount of tourist sight membership degree matrix µ i (p × max m (i) ) and the tourist sight clustering steady matrix C (p×max m (i) ) . According to the tourist sight growing tree and tourist sight clustering steady matrix C (p×max m (i) ) , tourist sight clustering distributions are shown in Figure 1. Figure 1 is the process simulation of generating tourist sight seed points and clusters, in which Figure 1(1) shows the process of randomly confirming seed point c (1, 1) to search its subordinate tourist sights and searching another cluster seed point c (2, 1) while note the searching immunity tourist sights. Figure 1(2) shows the process of searching the subordinate tourist sights of c (2, 1) and another cluster seed point c (3, 1) while note the searching immunity tourist sights; Figure 1(3) shows the process of searching the subordinate tourist sights of c (3, 1) while note the searching immunity tourist sights; Figure 1(4) shows the three tourist sight clustering growing trees tree (i) with the seed points c (1, 1), c (2, 1) and c (3, 1), 0 < i ≤ 3, i ∈ Z + , in which, the red one represents the tourist sight clustering growing tree tree (1) with seed point c (1, 1), the yellow one represents the tourist sight clustering growing tree tree (2) with seed point c (2, 1), the green one represents the tourist sight clustering growing tree tree (3) with seed point c (3, 1).

III. OPTIMAL TOUR ROUTE PLANNING ALGORITHM BASED ON INTEREST LABEL
Before tourists visiting an unfamiliar tourism city, he expects to find out the most interested tourist sight classifications, specific tourist sights and the optimal tour route to get the best motive benefit satisfaction within proper tour schedule. If one or certain ones of tourist sights in a tour route cannot conform to the interests and needs of the tourist, or if they cannot provide the expected tour experience for tourists, then this tour route will directly influence tourist's total feel and experience on the whole trip, decrease the satisfaction degree and finally negatively influence tourist's evaluation on the tourism city as well as its tourists sights. The long-term consequence will be the negative decisions of subsequent tourists [15], [37]- [39]. Thus, tour route planning should firstly lay emphasis on precise tourist sight mining to ensure each one of the tourist sight in the tour route conforms to tourists' interests and needs. Secondly, plan the optimal tour routes based on precise tourist sights and confirm that tourists can get the best motive benefits. The key technique of setting up the optimal tour route planning algorithm based on interest label is the precise tourist sight mining and optimal tour route planning algorithm design.

A. PRECISE TOURIST SIGHT MINING ALGORITHM BASED ON INTEREST LABEL AND GEOGRAPHIC LOCATION
Collect tourists' interests and needs in the form of interest label. According to their own interests and needs, tourists provide the quantity of tourist sights to be visited and specific feature attributes. Smart recommender system will precisely mine and match the interested tourist sights according to the quantity and attributes. The mining and matching process is based on interest and need quantization matrix T ∧ (αβ) . When tourist confirming one piece of interest label, matrix T ∧ (αβ) will update immediately. Take city one-day trip for example, considering tourists' time schedule, physical conditions and tour experience degree, the quantity of interest labels will be set with the upper limited value b, b ∈ Z + . Here is the fourth group of definition.
Def 4.1: Interest and need quantization matrix T ∧ (αβ) . Based on the interest and need label matrix T (αβ) , the process of quantifying each single column of the matrix to represent one piece of specific interest label is called matrix quantization. The matrix T (αβ) which has been quantified and column rank has been expanded is called the interest and need quantization matrix T ∧ (αβ) . In the interest and need quantization matrix T ∧ (αβ) , one column data represents one tourist sight and interests that tourists expect. Its column rank is determined by the maximum quantity of the interest and need label. Since the upper limited value of the interest label is According to the definition and tourist individual difference, arbitrary one row of the matrix represents the same attribute for different interests and needs, while arbitrary one column represents the specific attribute matching for one piece of interest and need label, that is, the tourist sight feature attributes expected by tourists to satisfy their interests and needs. When tourists' interests and needs are different, the interest and need labels with different values and matching will be output which brings in the differences on the value distribution and column rank(T ∧ (β) ) values of matrix T ∧ (αβ) . Therefore, the matrix T ∧ (αβ) element values are determined by tourists' interest and need labels, which has strong randomness and individual difference.
Def 4.2: Interest tourist sight basic vector I (αβ) . Suppose the quantity of the tourist sight interest label confirmed by the certain tourists is h, in which each tourist sight cluster C i contains h (i) amount of labels. Set up the 1×h dimension row vector I (αβ) to store the tourist sights which are matched and mined by interest labels, and this vector is called the interest tourist sight basic vector I (αβ) . By the definition, the vector's element is noted as I (z (i) ) , and its interval (z (i) , z (i+1) ] stores h (i) amount of tourist sights of clusters C i in the order of elements. Take one tour day as the research object. Considering tourists' time schedule, tour experience and physical conditions, in one tour day, the quantity of tourist sights to be visited should not exceed 5, that is, the maximum rank of the basic vector I (αβ) is 5. It meets the following constraint conditions. (1) (3) Row rank: rank(I (α) ) = 4; (4) column rank: rank(I (β) ) = h; The interest tourist sight basic vector I (αβ) whose elements are all 0 is called the interest tourist sight null matrix I 0 (αβ) . Def 4.3: Starting searching point P and interest seed point P τ (C (i) , x (i) ) of congeneric or heterogeneous cluster. When tourists arrange one-day trip schedule, they usually take the temporary accommodation as the trip center and the starting point, and after visiting all tourist sights they will return to the accommodation for rest, which forms a complete closedloop structure. Define tourists' temporary accommodation in certain tourism city is the starting searching point P. Take the point P as center to search and mine all tourist sights which are stored in element order in null vector I 0 (αβ) and finally makes vector I (αβ) full rank, then the tourists sights are called interest seed point P τ (C (i) , x (i) ). C (i) represents the tourist sight's cluster, in the mean time, it is in the interval (z (i) , z (i+1) ] of the vector I (αβ) . x (i) represents the code of seed point P τ (C (i) , x (i) ) in cluster C (i) . τ represents the code of the mined seed from the point P, If the starting seed point P τ (C (i1) , x (i1) ) and the next searched seed point P τ +1 (C (i2) , x (i2) ) are in the same cluster, they are defined as congeneric cluster seed points, noted as P + τ +1 (C (i2) , x (i2) ), i 1 = i 2 . If they are in two different clusters, they are defined as heterogeneous cluster seed points, noted as P − τ +1 (C (i2) , x (i2) ). Def 4.4: Interest seed point full rank C i ∼ C * i . In the process of searching seed points, if the quantity of seed points which relate to cluster C i in the interest tourist sight basic vector I (αβ) reaches h (i) , it is called interest seed point full rank for the cluster, noted as C i ∼ C * i . Cluster which gets full rank will stop searching seed point.
Def 4.5: Searching orientation angle ϕ. In the tourist sight distribution map, take one center point to send a ray l 1 in the geographic north direction. Connect the center point and another point around the center point in the neighborhood area and make another ray l 2 . In clockwise, the included angle from ray l 1 to ray l 2 is called searching orientation angle ϕ. The searching orientation angle from the center of the starting point P or the selected seed point Confirm the longitude and latitude for all the meta-data tourist sights c(i, v i ) in the cluster C i as well as the ferry dis- When the spatial coordinate distance and ferry distance between one meta-data tourist sight c(i, v i ) and starting searching point is the minimum one, it represents that the two points are closest in distribution. If simultaneously meets the condition that the difference on feature attributes between the closest meta-data tourist sight c(i, v i ) and tourist interest and need label is the minimum one, then it represents that the meta-data tourist sight c(i, v i ) can not only meet the interests and needs of tourists, but also optimally distributed, which makes the cost the lowest. According to the constraint condition, the conditions for meta-data tourist sight c(i, v i ) to become the seed point P τ (C (i) , x (i) ) should meet the following criterion, in which T (z) is the No. z column of interest label in interest and need quantization matrix T ∧ (αβ) .
According to the above constraint conditions, combining with formula (6) and formula (7) , the interest seed point searching objective function (µ 1 , µ 2 ) is set up as formula (12), in which µ 1 represents the starting point P or seed point P τ (C (i1) , x (i1) ), and µ 2 represents the meta-data tourist sight c(i, v (i) ) to be searched.
According to the constraint conditions and the searching criterion, as to the No. z column of matrix T ∧ (αβ) , store the potential seed points in the descending order through the searching algorithm process and the traverse the interest labels T (z) , z ∼ (0, h] ⊂ Z + . The algorithm process of searching the interest seed points is as follows. Step 1 Take the starting searching point P as the central point and then confirm the searching orientation angles ϕ[P, µ 1 ], ϕ[P, µ 2 ], . . . , ϕ[P, µ m ] of m amount of meta-data tourist sights µ r ∼ c(i, v(ir)) in tourist sight domain C. The Step 1 starts from searching cluster C 1 , and i = 1.
Sub-step 1 Take z = 1, iterate to make calculation on the first column label T (1) in matrix T ∧ (αβ) and the meta-data tourist sights in each of the orientation angles. Obtain the value σ (T (z) , c(i, v(i1))).
(3) According to the step (1) and step (2) algorithm, continue iterating to calculate the meta-data tourist sight µ r ∼ c(i, v(ir)) in each of the orientation angle ϕ[P, µ r ] and then get the value of (P, µ r ). Judge the potential seed point P τ (C (i) , x (i) ) * relating to the No.1 interest label T (1) and the arrangement of all the potential seed points in the element interval (z (i) , z (i+1) ] of cluster C i in the vector I 0 (αβ) . Traverse all the orientation angles ϕ[P, µ r ] ∼ r(0, m], and output all elements in the interval (z (i) , z (i+1) ] of cluster C i in the vector I 0 (αβ) from the smallest to the largest (P, µh (i) ) values.
(4) Take the No.1 element potential seed point as the seed point P 1 (C (i),x(1) ) of label T (1) and store it into vector I (αβ) relative element. Vector I 0 (αβ) return to the zero vector. Sub-step 2 Take z = 2. Traverse all the orientation angles ϕ[P, µ r ] ∼ r(0, m] in accordance with Sub-step 2 process and finally output all elements in the interval (z (i) , z (i+1) ] of cluster C i in the vector I 0 (αβ) from the smallest to the largest (P, µh (i) ) values: (1) If the No.1 element potential seed point meets the condition P 1 (C (i),x(1) ) * = P 1 (C (i),x(1) ), and then take the No.2 element potential seed point P 2 (C (i),x(2) ) * as the seed point P 2 (C (i),x(2) ) of the interest label T (2) ; (2) If the No.1 element potential seed point meets the condition P 1 (C (i),x(1) ) * = P 1 (C (i),x(1) ), take the No.1 element potential seed point P 1 (C (i),x(1) ) * as the seed point P 2 (C (i),x(2) ) of the interest label T (2) ; Sub-step 3 Continue searching. And finally confirm all seed points Step 2 Continue searching all seed points P τ (C (i) , x (i) ) in the interval (z (i) , z (i+1) ] of cluster C i in the matrix I (αβ) in accordance with the Step 1 process as well as its sub-steps process, τ, Step 3 Continue searching until the h amount of the seed points P τ (C (i) , x (i) ) relating to all of the interest labels in the matrix T ∧ (αβ) are confirmed, and then output the vector I (αβ) .
The seed points stored in the vector I (αβ) are all the tourist sights that are controlled and mined by the algorithm to meet the tourists' interests and needs and have the optimal geographic distribution in the starting point neighborhood buffer with the shortest spatial distances. Based on the seed points of vector I (αβ) , optimal tour route planning algorithm is designed and developed.

B. OPTIMAL TOUR ROUTE PLANNING ALGORITHM BASED ON PRECISE TOURIST SIGHT MINING
Tourist sight seed points in the vector I (αβ) are related to the columns of the interest and need quantization matrix T ∧ (αβ) . The precise tourist sight mining algorithm based on the interest label and geographic location confirms the optimal tourist sights with the center of starting point and the best geographic distribution. Take one-day trip for example, based on the precise optimal tourist sight mining, tourists start the trip from the point P, visit h amount of tourist sights, and finally return to the starting point P, which forms an integrated closedloop. As to the randomness and discreteness of tourist sight distribution, there will be multiple tour routes for tourists. VOLUME 8, 2020 In an integrated closed-loop, the process of ferrying from the starting point P to arbitrary interested tourist sight ∀I (z (i) ) or from arbitrary interested tourist sight ∀I (z (i) ) to the next interested tourist sight I (z(i+1)) forms an integrated ferry interval. Moving in this ferry interval, tourists' distances gradually increase from zero. The ferry transportation method could be taking bus, taxi, private car, and bicycle or even walking. The ferry time will be influenced by interval distance, the quantity of traffic light, traffic congestion condition, etc. And also, for tourists, the ferry distance, traffic convenience, traffic congestion condition, waiting time for the public transportation and so on will directly influence their psychological feelings and physical feelings. The less energy the tourists spend on the tour, the more comfortable feelings they will have, and the better experience they will get in the single one tour, thus they will much easier to get the motive benefit satisfaction. One single ferry process will in some extent influence the feeling and satisfaction for tourists in the next ferry interval, that is, along with the time lapsing, ferry distance increasing and the quantity of tourist sights increasing, tourists' motive benefit satisfaction degree will be increased, too. As it will influenced by different intervals, the different tour routes formed by h amount of interested tourist sights will differ in motive benefit increasing curves, that is, different tour routes will iterate to output different motive benefit satisfaction degree along with time lapsing and ferry distance increasing. Thus, in one single closed-loop tour route, tourists' motive benefits could be presented by monotone increasing function. This function is the iteration function of motive benefit satisfaction degree in intervals, which is determined by multiple factors.
Smart recommender system provides h amount of interested tourist sights for tourists. The tourist sights are relatively discrete in geographic space and connected by the city roads respectively. From the perspective of mathematics principle, the closed-loop tour routes from the starting point P to the same terminal point P connecting with h amount of interested tourist sights are not exclusive. In total, there could be A h h sorts of tour routes. According to the analysis, since the different feelings for tourists on geographic information service, traffic information service and tourist sights, the final experience and motive benefit satisfaction degree will be different. Smart recommender system should provide the optimal one for tourists to meet their best motive benefit satisfaction. Here is the fifth group of definition.
Def 5.1: Two intervals. The ferry interval H (P, P) and the ferry sub-interval H (ξ ( * ), I (z (i) ) )(z). Starting from the point P, visit h amount of interested tourist sights, and finally return to the point P, it forms an integrated closed-loop trip, and the trip is called the ferry interval H (P, P). Tourists' ferry interval from the starting point P to the arbitrary interested tourist sight ∀I (z (i) ) , or from the arbitrary interested tourist sight ∀I (z (i) ) to another interested tourist sight I (z(i±1)) , is a one-way non-enclosed interval, this interval is called the ferry sub-interval H (ξ ( * ), I (z (i) ) )(z), and z ∈ (0, h + 1] ∈ Z + . z is the code for each ferry sub-interval. z (i) is the code for the interested tourist sight. They could not be equal. The symbol ξ ( * ) represents P or I (z(i±1)) . According to the definition, within one ferry interval formed by the point P and the h amount of interested tourist sights, there are h + 1 amount of sub-intervals H (ξ ( * ), I (z (i) ) )(z). Sub-intervals connect with each other and finally form the ferry interval.
Def 5.2: Interval motive value W (P, P) and sub-interval motive value W (ξ ( * ), I (z (i) ) )(z). During the tour process in the interval, tourists' motive benefit satisfaction degree is determined by multiple factors, and it increases along with the ferry distance increasing. The interval is formed by h + 1 amount of sub-intervals H (ξ ( * ), I (z (i) ) )(z). The output maximum value iterated and output by h + 1 amount of subintervals H (ξ ( * ), I (z (i) ) )(z) with the initial motive value w(P) from the starting point P to the same terminal point P is called the interval motive value W (P, P). The interval motive value W (P, P) measures motive benefit satisfaction degree of one tour route, and starting from the point P, this value increases in nonlinear manner along with the increase of ferry distance and quantity of tourist sights, and the value gets to the maximum one when tourists return to the point P. This maximum value is the interval motive value W (P, P). In arbitrary ferry sub-interval ∀H (ξ ( * ), I (z (i) ) )(z), tourists start the trip from the initial point ξ ( * ). With the increasing of ferry distance, motive value is increasing in nonlinear manner. The output maximum value iterated by the subinterval's multiple factors from the starting point ξ ( * ) initial motive value W (I(z (i−1) ), ξ ( * ))(z − 1) when tourists get to the next tourist sight I (z (i) ) is defined as sub-interval motive value W (ξ ( * ), I (z (i) ) )(z). According to definition, sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) is the initial value for the next sub-interval W (I (z (i) ) , I(z(i + 1)))(z + 1), thus, the two adjacent sub-intervals are the iteration relationship on motive benefit value. Within one ferry sub-interval, the motive value function is a monotone increasing function, and in the whole ferry interval, the interval motive value function is also a monotone increasing function. The latter one is determined by the former one.
Def 5.3: Motive benefit positive factor λ + (g1) and motive benefit negative factor λ − (g2) . The factors that have direct and positive impact on the process of tourists taking the tour to obtain motive benefit satisfaction is called the motive benefit positive factor λ + (g1) , while the factors that have indirect and negative impact on the process of tourists taking the tour to obtain motive benefit satisfaction is called motive benefit negative factor λ − (g2) , g 1 , g 2 ∈ Z + . In the sub-interval ξ ( * ) and ∀I (z (i) ) , according to the tourists' actual conditions, city geographic information data, traffic information data and tourist sight information data, etc., the positive factors λ    interval motive function f (P, P). The function that iterates each motive benefit positive factor and motive benefit negative factor from the initial point ξ ( * ) to output sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) is defined as the sub-interval motive function f (ξ ( * ), I (z (i) ) ). This function reflects the motive benefit satisfaction degree when tourists ferry in corresponding sub-intervals. According to the definition, function f (ξ ( * ), I (z (i) ) ) value monotonously increase with the increasing of ferry distance in the subinterval, and it gets to the maximum value W (ξ ( * ), I (z (i) ) )(z) at the terminal point. The function that iterates each subinterval motive function f (ξ ( * ), I (z (i) ) ) from the initial point P motive value to output interval motive value W (P, P) is defined as the interval motive function f (P, P). This function reflects tourists' motive benefit satisfaction degree in the whole tour route. According to the definition, the function f (P, P) value monotonously increase with the increasing of ferry distance in the whole interval, and it gets to the maximum value W (P, P) at the terminal point P. According to the iteration relationship, function f (P, P) is a segmented function, and each segment's nodes are ξ ( * ) and ∀I (z (i) ) , and its segmented function is f (ξ ( * ), I (z (i) ) ). Figure 2 is the schematic diagram of the ferry interval H (P, P), the ferry sub-interval H (ξ ( * ), I (z (i) ) )(z), the sub-interval motive function f (ξ ( * ), I (z (i) ) )(z), the interval motive function f (P, P), the sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) and the interval motive value W (P, P), in which Figure 2(1) represents the relationship of the first sub-interval H (P, I (z (i) ) )(1), the subinterval function f (ξ ( * ), I (z (i) ) )(1) and the sub-interval motive value W (P, I (z (i) ) )(1) with the initial motive value w(p) and the starting point P. Figure 2(2) and Figure 2(3) represent the same relationship in the second and the third sub-interval. Figure 2(4) represents the relationship of the ferry interval H (P, P), interval function f (P, P) and interval motive value W (P, P). In Figure 2, the blue curves represent the subinterval function and the interval function, and the red curves represent the ferry interval, and the dark red curve represents the ferry interval.
According to the definition, as the interval motive values should be sorted from the maximum one to the minimum one. Thus the maximum heap and complete binary tree could be used to store the values and make the data visualized. The two methods are direct, concise and convenient for storage. This research uses the two methods to store the values.
Def 5.5: Interval motive value growing tree Gt (k (i) ) and interval motive value maximum heap R. h + 1 amount of sub-intervals formed by point P and h amount of interested tourist sights to generate an integrated closed-loop tour route, which relates to one increasing iteration interval motive function f (P, P), and the interval motive value W (P, P), the increasing iteration sub-interval motive function f (ξ ( * ), I (z (i) ) ), sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) in the interval. The visited tourist sight topological tree that is related to the interval motive value W (P, P) and formed in the process of increasing iteration on sub-interval motive function f (ξ ( * ), I (z (i) ) ) and interval motive function f (P, P) to accumulate the sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) and the interval motive value W (P, P) is defined as the interval motive value growing tree Gt (k (i) ) , 0 < k (i) ≤ A h h , k (i) ∈ Z + . Each interval motive value growing tree Gt (k (i) ) relates to one sort of tour route and interval motive value. The maximum value heap that is formed by the interval motive values W (P, P) of A h h amount of growing trees Gt (k (i) ) is defined as the interval motive value maximum heap R. The heap element is R (k (i) ) , and 0 < k (i) ≤ A h h , k (i) ∈ Z + . Interval motive value maximum heap R directly relates to each tour route's motive value, and it is also the sequence heap of the tour route motive benefit satisfaction degree. Thus, there is relationship Gt (k (i) ) ∼ R (k (i) ) . According to the quantity h of interested tourist sight and the quantity A h h of growing tree Gt (k (i) ) , the maximum heap meets the condition as follows: (1) The initial heap is a zero value heap R (∅) and the quantity of its element is A h h ; (2) Traverse all the tour sequences and the heap becomes full rank, and it meets the condition rank(R (∅) ) = A h h ; (3) Set the value of n = A h h . The heap element code sequence k (1) , k (2) , . . . , k (n) meets the condition: (4) The level of the root node is 0. The height of the tree is d. Other child nodes either in the No. d level or in the No. d−1 level; (5) When d ≥ 1, there will be 2 d−1 amount of nodes in No. d − 1 level; (6) All branch nodes are gathered on the tree's left side in the No. d − 1 level; (7) Each element value of the node is larger than its child and grandchild elements; (8) Of all element values of the nodes in the same level, the left ones are larger than the right ones.
The storage and sequence of the heap values meet the following conditions: (1) Store the maximum heap values into database D, and store the root node value in D [1] ; (2) Set the storage element for node x as D [i] . If this node is stored in the left child node, then the storage element for the left child node is D [2i] ; If this node is stored in the right child node, then the storage element for the right child node is D [2i+1] ; (3) The parent node of non-root node According to the definition, the optimal tour route planning algorithm based on precise tourist sight searching is set up. The basic thought and principle of the algorithm is as follows. As to the No. k (i) sort of closed-loop tour route, starting from the initial motive value w(P), traverse z ∼ (0, h + 1] through the developed algorithm, iterate sub-interval motive function f k (i) (I (z−1) , I (z) )(z) from the first sub-interval to the No. z sub-interval to output subinterval motive value W k (i) (I (z−1) , I (z) )(z) and finally output the interval motive function f k (i) (P, P), the interval motive value W k (i) (P, P) and the interval motive value growing tree Gt (k (i) ) . Traverse k (i) ∼ (0, A h h ], iterate all A h h amount of tour routes with the same initial value w(P) and confirm interval motive value growing tree Gt (k (i) ) . Through the sorting algorithm, interval motive value maximum heap R is confirmed. Specific algorithm steps are developed as follows.
Step 1 Confirm the basic parameters for the algorithm. Search for the city geographic information data, traffic information data and tourist sight information data, and confirm three groups of parameters between arbitrary two tourist sights ∀I (z (i) ) and ∀ ¬ I (z (i) ) as well as all tourist sights in interested tourist sight vector I (αβ) , 0 < z (i) ≤ h, z (i) ∈ Z + : (1) The longitude and latitude data l I (z (i) ) and B I (z (i) ) of the interested tourist sight I (z (i) ) ; (2) The attraction index a(I (z (i) ) ) of the interested tourist sight I (z (i) ) ; (3) The motive benefit positive factors λ + (g1) between arbitrary two tourist sights ∀I (z (i) ) and ∀ ¬ I (z (i) ) ; (4) The motive benefit negative factors λ − (g2) between arbitrary two tourist sights ∀I (z (i) ) and ∀ ¬ I (z (i) ) .
Step 2 Set up sub-interval motive value vector J (z) with dimension 1 × (h + 1). Store the output maximum value of the sub-interval motive function f k (i) (I (z−1) , I (z) )(z) in the process of iterating interval motive function f k (i) (P, P), interval motive value W k (i) (P, P) and interval motive value growing tree Gt (k (i) ) in the element sequence of the vector. The maximum value is just the sub-interval motive value W k (i) (I (z−1) , I (z) )(z), z ∈ (0, h + 1] ⊂ Z + . According to the definition, the element values of vector J (z) increase with the element code sequence, and the maximum element code relates to W k (i) (I (z−1) , I (z) )(z).
Step 3 Set up the edge clipping circle for the closed-loop tour route. In real-world situation, tourists start from the point P and visit h amount of tourist sights I (z (i) ) and finally return to the point P, this process forms an integrated close-loop tour route, which obeys the principle of motive value iteration. Thus, it is a digraph with weight edge formed by tour route in geographic distribution. The edge weight value is the subinterval motive function f k (i) (ξ ( * ), I (z) )(z) value. Abstract the digraph with weight edge in geographic distribution into a circle, which has a virtual center O. Define the circle structure as the edge clipping circle for the closed-loop tour route. The distribution of the point P and h amount of tourist sights I (z (i) ) on the circle should meet the following conditions: (1) Set the crossover point of the north orientation ray l (O→North) from the center O and the circle arc as the starting point P; (2) Equally divide the circle into h + 1 average arc parts. Each arc part relates to radius angle θ = 2π/(h + 1); (3) The distribution rule of tourist sights I (z (i) ) on the circle is as follows: the included angle of the orientation ray l (O→North) from the center point O to the north direction and the orientation ray l (O→I (z (i) ) ) from the center O to one of the tourist sight I (z (i) ) should meet the condition l (P,I (z (i) ) ) = z (i) · θ , 0 < z (i) ≤ h, z (i) ∈ Z + ; (4) The digraph with weight edge between two points of all the point P and h amount of tourist sights I (z (i) ) on the circle should simultaneously meet the following conditions: x Take arbitrary two points from P and I (αβ) tourist sight set: i) If 0 < l (P,∀I (z (i) ) ) ≤ θ or 0 < l (∀I (z (i) ) ,∀¬I (z (i) ) ) ≤ θ, the weight edge of P and I (z (i) ) or I (z (i) ) and ¬ I (z (i) ) will be the related θ arc of l (P,∀I (z (i) ) ) or l (∀I (z (i) ) ,∀¬I (z (i) ) ) ; ii) If θ < l (P,∀I (z (i) ) ) < 2π or θ < l (∀I (z (i) ) ,∀¬I (z (i) ) ) < 2π, the weight edge of P and I (z (i) ) or I (z (i) ) and ¬ I (z (i) ) will be the connected line between the two points, that is, there is no other point between P and I (z (i) ) or I (z (i) ) and ¬ I (z (i) ) .
y The moving direction of the weight edge meets the following conditions: i) Of the point P and an arbitrary point ∀I (z (i) ) in I (αβ) : a) If the tour direction moves from the point P to an arbitrary point ∀I (z (i) ) in I (αβ) , the arrow will point from P to I (z (i) ) . Set this weight edge as the positive direction weight edge, noted as C + (P,∀I (z (i) ) ) ; b) If the tour direction moves from an arbitrary point ∀I (z (i) ) in I (αβ) to the point P, the arrow will point from I (z (i) ) to P. Set this weight edge as the negative direction weight edge, noted as C − (∀I (z (i) ) ,P) ; ii) Of an arbitrary point ∀I (z (i) ) in I (αβ) and another point I (z(i2)) : a) If z(i1) < z(i2), then the arrow direction point from I (z(i1)) to I (z(i2)) , set this weight edge as the positive direction weight edge, noted as C + (I(z(i1)),I (z(i2)) ) ; b) If z(i1) > z(i2), then the arrow direction point from I (z(i2)) to I (z(i1)) , set this weight edge as the negative direction weight edge, noted as C − (I (z(i2)) ,I(z(i1))) . z Arbitrary one piece of weight edge relates to one ferry sub-interval. Along the moving direction, the sub-interval motive function f k (i) (ξ ( * ), I (z) )(z) is monotone increasing, and the maximum value W k (i) (ξ ( * ), I (z) )(z) is output at the subinterval's terminal point; { The digraph formed by h + 1 amount of weight edges connecting with each other composes an interval motive value growing tree Gt (k (i) ) , which relates to one interval motive function f k (i) (P, P) and interval motive value W k (i) (P, P). Figure 3(1) is the built closed-loop edge clipping circle . Figure 3(2) is an example of closed-loop structure formed by weight edges.
Sub-step 1 Iterate the first closed-loop edge clipping circle (1) and output the vector J (1) as well as its related elements. Output the interval motive value growing tree Gt (1) , store the last element of value W 1 (P, P) in vector J (1)
(2) If W 2 (P, P) > W 1 (P, P): x If W 2 (P, P) > W 1 (P, P) ≥ W 3 (P, P), and then keep the storage of W 1 (P, P) and W 2 (P, P) in the No.2 and No.1 element R ∼ (2) and R ∼ (1) unchanged, and store W 3 (P, P) into the No.3 element R ∼ (3) of the temporary heap R ∼ ; y If W 2 (P, P) > W 3 (P, P) > W 1 (P, P), and then keep the storage of W 2 (P, P) in the No.1 element R ∼ (1) of the temporary heap R ∼ unchanged, descend to store W 1 (P, P) in the No.3 element R ∼ (3) , and store the W 3 (P, P) into the No.2 element R ∼ (2) of the temporary heap R ∼ ; z If W 3 (P, P) > W 2 (P, P) > W 1 (P, P), and then descend to store W 2 (P, P) and W 1 (P, P) into the No.2 and the No.3 element R ∼ (2) and R ∼ (3) of the temporary heap R ∼ , and store W 3 (P, P) into the No.1 element R ∼ (1) of the temporary heap R ∼ .
(1) If κ(2, k (i) ) < A h h , continue iterating. Compare k (i) amount of W k (i) (P, P), k (i) ∼ (0,maxk (i) ]. Descend to store each interval motive value W k (i) (P, P) into the previous k (i) amount of elements R (k (i) ) in the temporary heap R ∼ ; (2) If κ(2, Descend to store each interval motive value W k (i) (P, P) into the A h h amount of elements R (k (i) ) in the temporary heap R ∼ . Output the interval motive value maximum heap R, which meets its own conditions (1)∼(7).
Step 5 Output the result of tour route sorting sequence. The elements R (k (i) ) of the interval motive value maximum heap R are related to the interval motive value growing trees Gt (k (i) ) and tour routes. According to the definition and the algorithm rule, the value finally stored in the No.1 element of the heap R, that is, the interval motive value W k (i) (P, P) in the tree's root node is the maximum interval motive value, relating to the tour route with the highest motive benefit satisfaction degree. And thus, the optimal tour route is comprehensively the best on the aspect of the tourist sight classification, quantity, specific tourist sight, interest and need matching, geographic location and distribution, visiting sequence, geographic information service, traffic information service, tourist sight attraction index and star level, etc. The other child and grandchild nodes relate to the sub-optimal tour routes. Smart recommender system outputs certain quantity of tour routes according to tourists' interests and needs for reference.

IV. EXPERIMENT AND EXAMPLE ANALYSIS
In order to testify the effectiveness and feasibility of the algorithm, this research designs an experiment to testify the developed tourist sight clustering algorithm, the interested tourist sight mining algorithm and the optimal tour route planning algorithm. The experiment chooses the tourism city Zhengzhou and its typical tourist sights in the downtown area as the study range, not including the tourist sights in the subordinate counties, whose purpose is to ensure the accessibility to the tourist sights, and tourists can get access to the arbitrary tourist sight by urban public transportation. The selected tourist sights should meet the following conditions [16], [22], [40]- [43]: (1) They should have representativeness and attraction ability; (2) They are all distributed in the downtown area of the city, which the tourists can get access to through the urban public transportation easily; (3) All tourist sights are not continuously distributed in geographic space, the are discrete in distribution; (4) Arbitrary two tourist sights are connected by city roads, tourists can ferry between any two of them.
According to the conditions for selecting tourist sights, the experiment selects totally 20 typical tourist sights in Zhengzhou downtown area for the research. First, use the developed algorithm to do clustering research and confirm the motive benefit positive factors λ + (g1) , the motive benefit negative factors λ − (g2) , the tourist sight longitude and the latitude (l, B), the attraction index a, the optimal visiting time t, the basic tour fee c o , etc. And the tourist sight attribute similarity degree is mined and confirmed. It is the basis for the interested tourist sight mining and the optimal tour route searching. By setting the experimental group and the control group, the experiment compares the developed algorithm with the commonly used shortest path algorithms on the aspect of optimal tour route output, motive value, motive different value, space complexity and time complexity, and then analyzes the result data and visualized output guide maps.

A. DATA COLLECTION AND BASIC DATA CALCULATING
According to the experiment description and the basic data requirement, the basic data that the experiment needs include tourist sight domain, geographic information data, traffic information data, tourist sight basic information data, tourist sight text encyclopedia data, etc. Experiment data are the basic data provided by the clustering objective function, the seed point searching objective function and the tour route planning algorithm for the algorithm building. West fun park; c 20 : Guomao}. Set the main roads and streets which connect tourist sights in Zhengzhou downtown area as the basic structure and output the geographic spatial distribution maps of all tourist sights and 20 typical ones. Figure 4(1) shows all tourist sights in Zhengzhou downtown area, and Figure 4(2) shows the 20 typical tourist sights that conform to the experiments conditions, they compose the domain C for the experiment. In the maps, the selected tourist sights are feasible and all conform to the conditions for tourists to choose.

2) TOURIST SIGHT BASIC INFORMATION DATA COLLECTION
Use the Baidu search engine to obtain text encyclopedia data of the 20 typical tourist sights. Get all the tourist sights' longitude and latitude (l, B) from the website GPSspg. Use the crawler method to obtain tourist sights' attraction index a, the optimal visiting time t and the basic tour fee c o from the popular and mainstream tourism website Fengwo, Xiecheng and Feizhu, etc. Table 1 shows the basic data for the tourist sight clustering objective function σ (c r , c ¬r ). The attraction index k 2 ∼ a, the optimal visiting time k 3 ∼ t (unit: h) and the the tour fee k 4 ∼ c o (unit: ¥ yuan). The tourist sight attribute k 1 is obtained by the text similarity mining algorithm. The tourist sight basic information data is the basis for the clustering objective function and the seed point searching objective function.

B. THE FOUNDATION OF TOURIST SIGHT CLUSTER
According to the confirmed 20 typical tourist sights and tourist sight domain C, through the text similarity mining algorithm, the experiment uses the developed algorithm to confirm the tourist sight attribute clustering factor k 1 , combining with the tourist sight attraction index k 2 , the optimal visiting time k 3 and the basic tour fee k 4 , to iterate and calculate the tourist sight clustering objective function σ (c r , c ¬r ). In the process of iterating the function value σ (c r , c ¬r ), in order to ensure that the tourist sight attribute clustering factor k 1 , combining with the tourist sight attraction index k 2 , the optimal visiting time k 3 and the basic tour fee k 4 have the same magnitude act on the clustering result, it is necessary to make normalization on the four parameters. According to the actual data magnitude of the four parameters, set the normalization coefficients for the four parameters ε i in the function σ (c r , c ¬r ) as ε 1 = 1.0, ε 2 = 1.0, ε 3 = 0.1, ε 4 = 0.01. Set up the tourist sight keyword label subsets. L (1) : park, greenland, sightseeing, flower, plant, scenery; L (2) : venue, memorial, history, science, technology, knowledge, natural science; L (3) : amusement park, entertainment, theme park, swim, athletics, child cartoon; L (4) : shopping, commerce, trade, carting, cinema, by which the tourist sight label set L is set up. Use the developed text word frequency mining algorithm to calculate and obtain the tourist sight keyword word frequency matrix T c r (uv) , and obtain the attribute clustering factor k 1 and the tourist sight clustering objective function σ (c r , c ¬r ) between two tourist sights ∀c r and c ¬r , shown in Table 2. In the Table, each tourist sight c r relates two lines of data. The first data of each line's each table cell is the k 1 value, the second data is the σ (c r , c ¬r ) value. The first line of each tourist sight c r represents the k 1 and σ (c r , c ¬r ) value of c r and c 1 ∼ c 10 , and the second line of each c r represents the k 1 and σ (c r , c ¬r ) value of c r and c 11 ∼c 20 .
According to the attribute clustering factor k 1 value and the clustering objective function σ (c r , c ¬r ) value of tourist sights ∀c r and c ¬r , and the tourist sight clustering algorithm formed by seed point, searching immunity and growing tree, the tourist sight clusters C i based on meta-data tourist sights in domain C are generated. According to the calculating results, when the k 1 value and σ (c r , c ¬r ) value of tourist sight c r and tourist sight c ¬r are far larger than the same k 1 and σ (c r , c ¬r ) value of another two tourist sights, meanwhile, larger than the preset threshold value, then the two tourist sights could be grouped into the congeneric cluster, or else they are in the heterogeneous cluster. Here the preset threshold values are k 1 = 0.500, σ (c r , c ¬r ) = 0.500. Formula (16) is the tourist sight clustering dynamic matrix C ∧ (p×max m (i) ) according to the tourist sight domain C. Formula (17) is the final output tourist sight clustering steady matrix C (p×max m (i) ) . Figure 5 shows the process of searching other tourist sight seed points c (i, 1) and their subordinate tourist sights c(i, v i ) based on the randomly selected seed point c (1, 1) of the meta-data tourist sight c 1 . It finally outputs the tourist sight membership degree matrix µ i (p × max m (i) ), shown in formula (18), the tourist sight clustering steady matrix C (p×max m (i) ) , the tourist sight growing tree tree (i) and the tourist sight clusters C i . One growing tree relates to one cluster C i and the No. i row of the steady matrix C (p×max m (i) ) . The matrix in the figure represents the generated No. i group of tourist sight membership degree matrix µ i (p×max m (i) ) in the process of searching other tourist sight seed points c (i, 1) and their subordinate tourist sights c(i, v i ) based on the randomly selected seed point c (1, 1) of the meta-data tourist sight c 1 . Matrix µ i (p × max m (i) ) relates to the different steps of the seed point generation and searching its subordinate VOLUME 8, 2020 FIGURE 5. The generation process of tourist sight cluster C i , growing tree tree (i ) , membership degree matrix µ i (p × max m (i ) ) and steady matrix C (p×max m (i ) ) . tourist sights. In the matrix µ i (p × max m (i) ), the bold black number 1 and each row's No.1 element of matrix C (p×max m (i) ) represents the seed point.
According to the finally output tourist sight clustering steady matrix C (p×max m (i) ) , the tourist sight cluster C i result of Zhengzhou downtown area is output as follows. The generation of tourist sight cluster conforms to the searching process of the seed points c (i, v 1 ). In the cluster, the bold black tourist sight is the seed point c (i, v 1 ). The code for each seed point c (i, v 1 ) and other non-seed points c(i, v 1 ) are controlled and generated by the algorithm, but not random. The code for tourist sight c r in the cluster conforms to the code sequence in the domain C.   Based on the generated tourist sight cluster C i , the matched tourist sights and the optimal tour route conforming to tourists' interests and needs are mined and confirmed. The experiment object is one tourist who is unfamiliar with Zhengzhou and tourist sights. In one day, his requirement on visiting expected tourist sights according to his interests and needs is listed in Table 3. The quantity of the tourist sights to be visited provided by the smart recommender system is 4. According to the Table 3 data and the method to build the interest and need quantization matrix, the formula (19) interest and need quantization matrix is output. The arbitrary one column of the matrix relates to the requirement of tourists on their interests and needs, relating to one line data of the Table 3. When the longitude and latitude (l, B) of tourists' starting point P is confirmed, through the developed interest seed point searching objective function (µ 1 , µ 2 ) as well as searching algorithm and conditions, the experiment outputs the interest and need quantization matrix T ∧ (αβ) and the tourist sights that have the optimal geographic distribution and lowest tour expense and match tourists' interests and needs. According to the tourist sight cluster C i of the output tourist sights and the algorithm of interested tourist sight basic vector I (αβ) , experiment outputs the interested tourist sight basic vector I (αβ) with full rank elements, and its rank meets rank(I (αβ) ) = 4.
Suppose the longitude and latitude of the starting point P is l = 113.644 and B = 34.736. As to the point P, initialize the attraction index a (P) = 0, the optimal visiting time t (P) = 0 and the basic tour fee c o(P) = 0. In order to make sure that the influence of the interest seed point searching objective function (µ 1 , µ 2 ) on each factor are in the same dimension and magnitude, the experiment sets the coefficients ε 5 = 1.0, ε 6 = 0.01. Table 4 shows the confirmed objective function (µ 1 , µ 2 ) value and σ (T (z) , µ 2 ) value by the interest seed point searching algorithm based on the matrix T ∧ (αβ) data. The column values of σ (T (z) , µ 2 ) in the Table 4 represent the related σ (T (z) , µ 2 ) value of the matrix T ∧ (αβ) column. Figure 6 shows the (µ 1 , µ 2 ) value and σ (T (z) , µ 2 ) value curve tendency of the tourist sights in cluster C i relating to each column interest and need of matrix T ∧ (αβ) , in which the blue curve represents the (µ 1 , µ 2 ) value, the brown curve represents the σ (T (z) , µ 2 ) value. When the geographic location of point P changes, the (µ 1 , µ 2 ) value and the σ (T (z) , µ 2 ) value as well as the curves will alter, too. The cluster and its specific tourist sights are determined by matrix T ∧ (αβ) and geographic location of P, and this conforms to the tourists' freedom degree on selecting temporary accommodation before the trip. The smart recommender system can automatically output the optimal tourist sights according to the interest label and the starting point.
According to the mining conditions of the optimal tourist sights, the tourist sights that meet the interest and need of each column of matrix T ∧ (αβ) should be optimally distributed in geographic space primarily, and then the tourist sights' feature attributes should match tourists' interests and needs to the maximum extent, that is, gets the maximum function σ (T (z) , µ 2 ) value or the minimum σ (T (z) , µ 2 ) −1 value. The two conditions should comprehensively considered and then the optimal tourist sights could be output. If any of the two condition is not satisfied, the tourist sight will not be selected. According to the developed seed point searching algorithm and Table 3 data, by means of the storage method on interested tourist sight basic vector I (αβ) , the experiment outputs the interested tourist sight basic vector I (αβ) as I (αβ) = {c 15 , c 2 , c 16 , c 5 }.

D. OPTIMAL TOUR ROUTE SEARCHING RESULT BASED ON PRECISE TOURIST SIGHTS
On the basis of mining the optimal tourist sights, according to the optimal tour route planning algorithm based on the precise tourist sights, the ferry interval H k (i) (P, P) and its related sub-intervals H k (i) (ξ ( * ), I (z (i) ) )(z) that composes the tour route are built. Sub-interval motive benefit positive factors λ + (g1) , motive benefit negative factors λ − (g2) and tourist sight basic information data are calculated and obtained. The ferry interval H k (i) (P, P) is determined by the starting point P and the elements I (z (i) ) of the basic vector I (αβ) . When the geographic location of the point P or at least arbitrary one element ∀I (z (i) ) in the basic vector I (αβ) changes, the ferry interval H k (i) (P, P) and the certain sub-interval H k (i) (ξ ( * ), I (z (i) ) )(z) will alter, too. Each factor λ + (g1) and λ − (g2) are the basic parameters for setting up the optimal tour route, which are determined by tourism city's geographic information data and traffic information data, and the parameter values will alter with the point P and I (αβ) . According to the algorithm method and the function feature attribute of the subinterval motive function f k (i) (ξ ( * ), I (z (i) ) )(z) as well as the interval motive function f k (i) (P, P), within each sub-interval H k (i) (ξ ( * ), I (z (i) ) )(z), the motive function f k (i) (ξ ( * ), I (z (i) ) )(z) VOLUME 8, 2020 TABLE 4. Objective function (µ 1 , µ 2 ) value and σ (T (z) , µ 2 ) value confirmed by the matrix T ∧ (αβ) , point P and the mining algorithm. is a monotone function, and it outputs the sub-interval motive value W k (i) (ξ ( * ), I (z (i) ) )(z) at the sub-interval's terminal point. The relationship of the ferry sub-intervals and ferry interval determines that the motive function f k (i) (P, P) of the ferry interval H k (i) (P, P) is also a monotone function. Thus, the interval motive function f k (i) (P, P) is the iteration and segmented function of sub-interval motive functions f k (i) (ξ ( * ), I (z (i) ) )(z). Arbitrary one ferry interval ∀H k (i) (P, P) relates to one interval motive value growing tree Gt (k (i) ) . A h h amount of trees Gt (k (i) ) output interval motive value maximum heap R. Of all the shortest path algorithms which are used to plan tour route, the most commonly used algorithms include A * searching algorithm, Dijkstra searching algorithm and Floyd searching algorithm. The experiment chooses the three shortest path algorithms as the control group, the developed algorithm in this research as the experimental group, and outputs their corresponding optimal tour routes, compares the difference with each other, and finally makes conclusion on the results.

1) DATA CALCULATION AND OBTAINING OF THE ALGORITHM INFLUENCE FACTOR
According to the modeling conditions of interested tourist sight mining algorithm and tour route planning algorithm, the algorithms should combine with motive benefit positive factors λ + (g1) , motive benefit negative factors λ − (g2) and each tourist sight basic data. City geographic information data and traffic information data could be obtained from Baidu map and Zhengzhou city geographic information database, including tourist sight ferry distance δ    c(4, 2), and then return to the point P, the process composes a closed-loop tour route. Each closed-loop tour route is an integrated ferry interval. By mining geographic information data and traffic information data, the experiment uses motive benefit positive factors λ + (g1) and negative factors λ − (g2) to built algorithm and calculate to obtain each factor in Table 5.

2) THE GENERATION OF OPTIMAL TOUR ROUTE
Based on the mining of the tourist sights which best match tourists' interest and need labels, taking the tourist sight geographic information data, traffic information data and tourist sight information data as factors, and through the developed algorithm in this research, the experiment outputs interval motive value growing tree Gt (k (i) ) and interval motive value maximum heap R. In the process of generating the growing tree and maximum heap, the sub-interval motive function f (ξ ( * ), I (z (i) ) )(z) value and interval motive function f (P, P) value are generated, and finally interval motive value W (P, P) and sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) are obtained. And in the process of iterating f (ξ ( * ), I (z (i) ) ) value and f (P, P) value by the edge clipping algorithm, one time of edge clipping will create a new closed-loop tour route.
In order to ensure that the output results are in the same dimension and magnitude, set the initial iteration value of the arbitrary ferry interval ∀H (P, P) as w(p) = 1.000. According to the developed algorithm, through the formula(13), formula (14) sub-interval motive function and formula (15) interval motive function,the experiment iterate and output A h h sorts of tour route sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) and interval motive value W (P, P) in the process of tourists ferrying from the point P and h amount of interested tourist sights and return to the point P, shown in Table 6. To simplify the table content, each closed-loop tour route in the Table 6 is noted as c r1,r2,r3,r4 , standing for the closedloop tour route Pc r1 c r2 c r3 c r4 P. Figure 7 shows the output sub-interval motive value W (ξ ( * ), I (z (i) ) )(z) and the interval motive value W (P, P) from the function f (ξ ( * ), I (z (i) ) ) and function f (P, P) of the No.1 to No.24 closed-loop tour routes. In each figure, the blue curve represents the interval function f (P, P) iterating and increasing tendency of one closedloop tour route, and the increasing curve of function f (P, P) is composed of each sub-interval function f (ξ ( * ), I (z (i) ) ) iteration. The brown curve represents the increasing difference value W between two adjacent sub-intervals motive value W (ξ ( * ), I (z (i) ) )(z) and W (I (z (i) ) , ξ ( * ))(z + 1). The output W (ξ ( * ), I (z (i ) ) )(z) value and W (P, P) value of each interval function f (ξ ( * ), I (z (i ) ) ) and f (P, P).
According to the interval motive value W k (i) (P, P) and function f k (i) (P, P) tendency of each ferry interval H k (i) (P, P), combining with the developed interval motive value maximum heap R algorithm, the experiment iterates from the initial value w(p) by the edge clipping algorithm to obtain the maximum heap R, shown as Figure 8(1). Meanwhile, the complete binary tree with the maximum value max W k (i) (P, P) in the root node is shown as Figure 8(2). From the figure, the maximum value at the initial element of the maximum heap R and the complete binary tree root node are both 112.260, relating to the tour route growing trees Gt (2) and Gt (21) , that is, the No.2 and No.21 tour route. When tourists choose the two tour routes, they will visit the tourist sights which best match their interests and needs, have the optimal geographic distribution and lowest expense, and meanwhile, tourists will get the best motive benefit satisfaction. Thus, the No.2 and No.21 tour routes are the optimal tour routes. The third and fourth elements 109.641 of the maximum heap R relate to the sub-optimal routes, and they are at the No.2 and No.3 node on the second level of the complete binary tree. According to the edge clipping algorithm and the output routes, the experiment outputs the edge clipping circle as Figure 9 shows. The gray circle represents the basic structure of , and the red lines and arcs are related closed-loop tour route.

E. DATA RESULT ON THE ALGORITHM COMPARISON
Of all the shortest path algorithms based on the known nodes, the Dijkstra searching algorithm, the Floyd searching algorithm and the A * searching algorithm are the three commonly used ones. To testify the feasibility, rationality and the advantage on tour route planning of the developed algorithm in this research, the experiment makes comparison on the algorithms. The experimental group is the developed algorithm, and the control group is the Dijkstra searching algorithm, the Floyd searching algorithm and the A * searching algorithm. For the experimental group and the control group, the starting point P and the node tourist sights c 2 , c 5 , c 15 and c 16 are the same. The basic thought of the Dijkstra searching algorithm, the Floyd searching algorithm and the A * searching algorithm is to search the shortest path on the preset nodes. The Dijkstra searching algorithm starts from the set of point P, through the greedy searching process to expand the set of point P. The condition that the node could be absorbed into the set is that the distance from the point P to the node is the shortest. The Floyd searching algorithm is a dynamic planning algorithm. Set the Dist(i, j, k) is the shortest path in the nodes in set (1 . . . k) from point i to point j.
If the shortest path goes through the point k, then there should be Dist (i,j,k) = Dist (i,k,k−1) + Dist (k,j,k−1) .
If the shortest path doesn't go through the point k, there will be: (1) Dist (i,j,k) = Dist (i,j,k−1) ; (2) Dist (i,j,k) = min(Dist (i,j,k−1) , Dist (i,k,k−1) +Dist (k,j,k−1) The A * searching algorithm is a heuristic searching algorithm, and its accuracy and efficiency is determined by the selection of the heuristic function.

1) SEARCHING RESULT AND DATA COMPARISON
According to the searching method of the control group algorithms and the searching results of the developed algorithm, Table 7 data results are output. In the table, the previous VOLUME 8, 2020    Table's seventh column is the interval's motive difference value W (c, e) of the three control group algorithms to the experimental group algorithm. The calculation method is that the control group algorithms' interval motive values minus the experimental algorithm's interval motive value, in which, x represents the interval's motive difference value W (c, e) of the Dijkstra searching algorithm to the experimental group algorithm; y represents the interval's motive difference value W (c, e) of the Floyd searching algorithm to the experimental group algorithm; z represents the interval's motive difference value W (c, e) of the A * searching algorithm to the experimental group algorithm. According to the Table 7 data, the distribution figure of the optimal tour routes' sub-interval motive difference values and interval motive difference values output by the four algorithms is shown in Figure 10. From the Figure 10, the different features, performance and results on outputting the optimal tour routes could be analyzed. Figure 10(1)∼ (6) are the sub-interval motive values W (ξ ( * ), I (z (i) ) )(z) and interval motive values W (P, P) by the control group and experimental group algorithms. The blue columns represent the route 1, the brown columns represent the route 2, the gray columns represent the route 3, and the yellow columns represent the route 4. The abscissa axis data 1∼4 represent the Dijkstra searching algorithm, Floyd searching algorithm and A * searching algorithm and the developed algorithm in this research. Figure 10(1)∼(5) represent sub-interval 1 to 5, Figure 10(6) represents the whole tour route. Figure 10

2) THE COMPARISON ON THE ALGORITHMS EFFICIENCY
Since the design method and the realization process are different, the four algorithms all have different time complexity and space complexity. Under the same condition of point P and tourist sights c 2 , c 5 , c 15 , c 16 , the four algorithms all have different efficiency on outputting the optimal tour routes. According to their own principles of searching process, the optimal tour routes, the sub-interval motive difference values and the interval motive difference values are output as Table 7 shows. According to the experiment example, the quantity of nodes for the algorithms is n = 6. The time complexity and space complexity of each algorithm in the process of generating Table 7 data are shown in Table 8.

3) THE COMPARISON ON TOUR ROUTE GUIDE MAPS
According to the top four optimal tour routes in the Table 7 output by the four algorithms, the interval growing trees and tour route guide maps are generated as Figure 11 shows, in which Figure 11(1) ∼(4) are the four optimal tour routes output by the Dijkstra searching algorithm, Figure 11(5)∼(8) are the four optimal tour routes output by the Floyd searching algorithm, Figure 11(9) ∼(12) are the four optimal tour routes output by the A * searching algorithm, Figure 11(13) ∼(16) are the four optimal tour routes output by the developed algorithm in this research. The arrow lines represent the abstract tour route sequences and one growing tree relates to one tour route sequence. In the tour route guide maps, different color circles represent different cluster tourist sights c 2 , c 5 , c 15 and c 16 . The red color bold lines represent the tour route line in the real-world geographic circumstance, and the arrow direction represents the tour sequence and direction. By viewing the maps, the tour route sequence and direction, tourists can visually perceive and get the tourist sights and tour routes, also, they can understand and know the current starting point, the geographic locations of tourist sights, the specific tour route, and the visiting sequence, etc., which is convenient for tourists to get the visualized information service.

A. THE ANALYSIS ON THE EXPERIMENT BASIC DATA
Analyze the experiment results in Figure 4(1), the tourist sight domain C contains certain quantity of tourist sights that are well-known and have steady stream of tourists. The distribution of the tourist sights meets the four basic conditions preset in the experiment, and they could be the basic data source for the experiment. Preprocess the collected tourist sights data and select 20 typical tourist sights with high popularity, larger stream of tourists and high attraction index from all of the city's tourist sights, and set the 20 tourist sights as domain C. Analyze the data, the tourist sight classifications are complete and the quantity of tourist sights is sufficient for the experiment. Take the tourist sight domain as the basic data, get the tourist sights' geographic locations from the website GPSspg and the attraction index a, the optimal visiting time t and the basic tour expense c o through the crawler technique from the mainstream tourism website Fengwo, Xiecheng, Feizhu,etc. All the tourist sights' text encyclopedia data are obtained from the Baidu website, and they are used as the basic data for setting up clustering objective function. Analyze the Table 1 data, the tourist sight basic data are relatively discrepant with different feature attributes, which meets the basic conditions for setting up the clusters. Meanwhile, the mining result of the text encyclopedia data composes the vital factor k 1 for tourist sight clustering objective function, which influences the process of the cluster generation.

B. THE ANALYSIS ON THE TOURIST SIGHT CLUSTERING RESULT
The experiment uses the developed tourist sight clustering algorithm to generate tourist sight clusters. Analyze the Table data, it concludes that the tourist sight feature attribute factor k 1 values and the clustering objective function σ (c r , c ¬r ) values of the experiment output arbitrary two tourist sights ∀c r and c ¬r fluctuate in some extent in the range of (0, 2.000), which have relatively larger discrepancy. By the means of presetting the threshold values k 1 = 0.500 and σ (c r , c ¬r ) = 0.500 as the criterion to judge tourist sight attribute and clustering affinity degree, the experiment outputs the tourist sight membership degree matrix µ i (p × max m (i) ), tourist sight steady matrix C (p×max m (i) ) , tourist sight growing tree tree (i) and tourist sight clusters C i . Analyze the formula (17), formula (18) and the Figure 5, the clusters C 1 , C 2 , C 3 , C 4 and the relative growing trees tree (1) , tree (2) , tree (3) and tree (4) are generated. The tourist sights in the same cluster indeed meet the condition that the k 1 and σ (c r , c ¬r ) values are larger than the set threshold values and the minimum absolute difference. The experiment finally outputs the tourist sight membership matrix µ i (p × max m (i) ), it is the transition matrix formed in the process of generating clusters C i , and it represents the affinity degree of the searched tourist sight and the seed point tourist sight, shown as the left bottom matrix in the figure. The membership degrees in the membership matrix formed by the four clusters are differently distributed and have different ranks, which represents that they have relatively larger heterogeneity, thus they are in different attributed clusters. It proves that the developed clustering algorithm is indeed practical and effective. The steady matrix C (p×max m (i) ) is the tourist sight cluster distribution matrix with the first element of seed point in each row output by the four tourist sight membership matrix µ i (p × max m (i) ), which can directly recognize tourist sight cluster. According to the Figure 5, the growing trees generated in the process of clustering have different geometrical morphology, in which the seed points are the root nodes. The growing direction, edge length, and the included angles for two edges are all different as they are controlled by the algorithm. The process of generating the four growing trees is also the process of generating the four clusters. The different geometrical morphology is the outward manifestation of cluster heterogeneity.

C. THE ANALYSIS OF PRECISE OPTIMAL TOURIST SIGHT MINING
The tourists' interests and needs are set in Table, the data compose the interest and need quantization matrix T ∧ (αβ) . According to the developed interest seed point searching objective function (µ 1 , µ 2 ), Tabledata of the objective function (µ 1 , µ 2 ) values and σ (T (z) , µ 2 ) values are output. Analyze the Table data, considering the precise optimal tourist sight selecting conditions, the confirmed tourist sight should meet the condition that its distance to the point P should be the minimum value and the value σ (T (z) , µ 2 ) should be the maximum one. If the two conditions could not be meet simultaneously, the value could be relatively smaller value and relatively larger value respectively. From the Figure 6, the each column of the matching on tourists' specific interests and needs to the tourist sights of the objective function (µ 1 , µ 2 ) value and σ (T (z) , µ 2 ) value curve tendency can be concluded. Analyze the curve figure, the matching on the same column of the matrix T ∧ (αβ) to different tourist sight is greatly different, which makes the curves fluctuate greatly to form peak and trough. The peak and trough relate to the function σ (T (z) , µ 2 ) maximum value and minimum value. But the tourist sight relating to the peak may not be the optimal one, it is simultaneously influenced by function (µ 1 , µ 2 ) curve. The fluctuation range of the function (µ 1 , µ 2 ) curve is relatively smaller, and it also has peak and trough. The tourist sight of the trough is close to the point P. The selection of the optimal tourist sight considers both of the two curves and functions, it is a relative result. The curve of the matching result σ (T (z) , µ 2 ) values on each column of the matrix T ∧ (αβ) are totally different, that is, the geometrical morphology of the curves in Figure 6 (1)∼(4) are greatly different. This is determined by tourists' specific interests and needs and the generated tourist sight clusters. If the variation degree between the two columns tourists' interests and needs of the matrix T ∧ (αβ) is much smaller than the preset threshold value, the matching results of the two columns to tourist sights will have high homogeneity, and the function σ (T (z) , µ 2 ) fluctuating range and curve's geometrical morphology are close with each other while have tiny difference in partial section. If the variation degree between the two columns tourists' interests and needs of the matrix T ∧ (αβ) is much larger than the preset threshold value, the matching results of the two columns to tourist sights will have high heterogeneity, that is, the function σ (T (z) , µ 2 ) fluctuating range, curve's geometrical morphology and partial sections are all close with each other. The experiment results illustrate that the tourists' interests and needs, the geographic location of the point P, the tourist sight clusters have direct impact on the confirmation of the optimal tourist sights, in which the tourists' interests and needs play a decisive role. This confirms to the design and modeling thought of the algorithm on the aspect that tourists are the core of tourism activity.

D. ANALYSIS OF THE OPTIMAL TOUR ROUTE SEARCHING RESULTS
Based on the mined precise interested tourist sights, use the developed iteration algorithm to search the optimal tour routes and output the guide maps. Analyze the output results.

1) ANALYSIS ON THE OUTPUT RESULTS ON THE ALGORITHM INFLUENCE FACTORS
According to the algorithm thought, when the optimal tourist sights are confirmed, the most important factor to influence tourists' motive benefit satisfaction is the selection on the tour route, while the tour route is determined by the motive benefit positive factor λ + (g1) , the motive benefit negative factor λ  Table 5 data, take the Zhengzhou city's geographic spatial information, the traffic information and the tourist sight information as the basic data, the output maximum value 0.625 of λ + (1) appears in the subinterval c 15 c 2 , which illustrates that the ferry distance in this sub-interval is the minimum one. The larger the value λ + (1) is, the much bigger the positive influence of ferry distance on motive benefit will be, and vice versa. The output maximum value 0.500 of λ + (2) appears in many sub-intervals, and they have no big difference with 0.400, which illustrates that the public transportation has no big different influence on motive benefit. The output maximum value 0.144 of λ (2) appears in many sub-intervals, and have no big difference with −0.002, which illustrates that the negative influence of the distance from tourist sight to the nearest public transportation has no big difference. The output minimum value −0.004 of λ − (3) appears in the subinterval Pc 16 , the smaller the value λ − (3) is, the much bigger the negative influence of the average taxi waiting time on motive benefit will be, and vice versa. The output minimum value −0.020 of λ − (4) appears in many sub-intervals, and have no big difference with −0.010, which illustrates that the negative influence of the average quantity of congestive road has no big difference in each sub-interval motive benefit.

2) ANALYSIS OF THE OUTPUT RESULTS OF MOTIVE FUNCTION, MOTIVE VALUE AND TOUR ROUTE
According to the developed algorithm, Table 6 data and Figure 7, analyze the output results of the sub-interval motive function, the interval motive function, the sub-interval motive value and the interval motive value. In the experiment, the quantity of the closed-loop tour routes which could meet tourists' interests and needs is 24, and they form 24 growing trees. The edge clipping algorithm outputs the Table data, and from the data, the interval motive value maximum heap R and complete binary tree are formed as Figure 8 shows. There is a big difference on each tour route's output sub-interval motive values and interval motive values. As to arbitrary one tour route's growing tree ∀Gt (k (i) ) , when the initial value w(p) is set, the tourists' moving distance will gradually increase from 0 to the maximum value at the terminal. The function f k (i) (ξ ( * ), I (z (i) ) ) in each subinterval will monotonously increase with the moving distance increasing and finally outputs the sub-interval motive value W k (i) (ξ ( * ), I (z (i) ) ). In this process, the interval motive function f k (i) (P, P) will also monotonously increase with the moving distance increasing and finally outputs the interval motive value W k (i) (P, P). In Figure 7, as to different growing trees for each tour route, the blue curve sub-interval functions f (ξ ( * ), I (z (i) ) ) have different slopes and progressive increasing attributes. The progressive increasing attributes of the interval functions f (P, P) have relatively great difference, which is determined by the factors λ + (g1) and λ − (g2) in each sub-interval. The experiment output results conform to tourists' real-world tour route in geographic circumstance. The maximum interval motive value 112.260 appears in the No.2 tour route c 15,2,5,16 21 tour routes, they will get best satisfaction on interests and needs as they could visit the optimal tourist sights while travel in the optimal tour route. Then the No.7 and No.17 tour routes. As to arbitrary tour route growing tree ∀Gt (k (i) ) , the sub-interval motive function increasing difference value W fluctuates up and down with tourists' moving distance increasing, and the fluctuating range and curve slopes of the value W are greatly different of each tour route growing tree. It is also the common impact results of factors λ + (g1) and λ − (g2) in each sub-interval. Analyze the Figure 8, the arrangement sequence of the interval motive value elements in the maximum heap R is in the descending order, which directly reflects the impact sequence of the tour routes on tourists' motive benefits. The complete binary tree is another form of the maximum heap R. From the complete binary tree, the distribution principle of the different tour routes on the motive values can be obtained and the ability of each tour route on meeting tourists' interests and needs is sequenced. Analyze the Figure J edge clipping circles of the top four motive values, all the circles are closed-loop structures. The two optimal tour routes c 15,2,5,16 and c 16,5,2,15 do not pass through the circle arc, they form the arrow closedloop structure in the circle, in the meantime, the two suboptimal tour routes c 2,15,16,5 and c 5,16,15,2 pass through the circle arc, they form the arrow closed-loop structure with line in the circle and the arc. The difference on the closed-loop structure directly reflects the difference on the ferry direction and motive tendency of the tour routes.

E. COMPARISON AND ANALYSIS OF THE EXPERIMENTAL GROUP AND THE CONTROL GROUP
The experiment takes the Dijkstra searching algorithm, Floyd searching algorithm and A* searching algorithm as the control group, and the developed algorithm in this research as the experimental group. The control group algorithms focus on the shortest path searching to find out the optimal tour routes. Use the two groups of algorithms to output the top four optimal tour routes respectively and get the results as Table  7 and Figure 10 shows. Analyze the results, there are certain difference between the control group and the experimental group in outputting optimal tour routes, motive values and performance.
First, analyze the histograms of the Figure 10 (1)∼(5) on sub-interval motive values. As to the top four optimal tour routes output by the four algorithms, the motive values in the same code of sub-interval are different with each other. In the first sub-interval, this difference is not apparent. In the second and the third sub-intervals, as to the first tour route, the motive values of the Dijkstra searching algorithm and the Floyd searching algorithm are smaller than the A* searching algorithm and the developed algorithm; as to the second and the fourth tour route, the motive values of the Dijkstra searching algorithm and the Floyd searching algorithm are larger than the A* searching algorithm and the developed algorithm; as to the third tour route, in the second sub-interval the A* searching algorithm outputs the minimum motive value, and in the third sub-interval, the developed algorithm outputs the maximum motive value, other algorithms are almost the same. In the fourth and the fifth sub-intervals, the third and the fourth tour routes output by the A* searching algorithm are different from other algorithms, and other algorithms are almost the same in each tour route. Analyze the whole tour route Figure K(6), in each tour route, the interval motive value of the A* searching algorithm is relatively smaller than other algorithms, while other algorithms are almost the same in each tour route. In total, the developed algorithm outputs the maximum interval motive value in the whole tour route, which represents that it has the best performance on outputting interval motive value and meeting tourists' motive benefits, and then the Dijkstra searching algorithm and the Floyd searching algorithm, and the A* searching algorithm.
Second, analyze the histograms of Figure 10 (7)∼(10) on sub-interval motive difference values. There are fluctuating difference values between the control group algorithms and the experimental group algorithm in each code of subinterval. In the first sub-interval of the first and the second tour route, the motive difference values between the Dijkstra searching algorithm, the Floyd searching algorithm and the developed algorithm are the positive values, it illustrates that this two algorithms of the control group have better performance on the first sub-interval of the two tour routes. In the third and the fourth tour routes, the control group and the experimental group both output the negative motive difference values, it illustrates that on these two tour routes, the experimental group has the better performance. In the second and third sub-intervals of the first, the third and the fourth tour route, the motive difference values between the control group and the experimental group are all large positive values, and the control group performs better. In the fourth sub-interval, only in the fourth tour route, the motive difference value between the Floyd searching algorithm and the developed algorithm is the positive value, and the others are the negative values. In the third and the fourth tour routes, the A* searching algorithm has the maximum negative motive difference value, it illustrates that in this sub-interval, the developed algorithm has the best performance in each tour route. In the fifth sub-interval, the motive difference values between the control group and the experimental group are all negative values, the A* searching algorithm has the maximum negative motive difference value, it illustrates that the experimental group algorithm has the best performance in the fifth sub-interval. Analyze the whole tour route motive difference values in Figure 10 (12), the difference values between the control group and the experimental are all negative, and the A* searching algorithm has the maximum negative motive difference value. It illustrates that the developed algorithm has the best performance in the whole tour route.
Third, analyze the Table 8 data on each algorithm's effectiveness and performance. Since each algorithm has different searching and modeling algorithm, they have different effectiveness and performance on time complexity and space complexity. According to the Table 8 data, the time complexity and space complexity of the four algorithms increase with the quantity of nodes n increasing, in which, the time complexity and space complexity of the Floyd searching algorithm has the rapidest increasing speed, and then the Dijkstra searching algorithm and the A* searching algorithm, finally the developed algorithm increases with the lowest speed. It illustrates that the developed algorithm has better performance on algorithm efficiency and computer memory consuming. When the unit time is 1 nanosecond, each algorithm's run time is in the level of nanosecond. In the experiment, the total node quantity is n = 6, and the specific time complexity and space complexity are calculated. Compare with the A* searching algorithm, the developed algorithm's advantage is that there's no need to preset the heuristic function and use the distance function to estimate the point spatial distance. It can avoid the redundant data or the situation that the optimal solution cannot be searched in the A* searching algorithm process. Compare with the Dijkstra searching algorithm and the Floyd A* searching algorithm, the developed algorithm has relative rapider run efficiency and smaller space complexity.
Fourth, analyze the guide maps relating to the top four optimal tour routes output by the algorithms. Certain subintervals or the whole intervals all have some differences in tour route directions and lines, which is controlled and visualized by the algorithms. Through the visualized guide maps analyzing, the starting point geographic location, the distribution of the tourist sights to be visited, the direction tendency of the tour routes, tourist sights sequence, the pass through streets and avenues, the districts to be visited, etc.

F. ANALYSIS ON SIMILARITIES AND DIFFERENCES BETWEEN THE RESEARCHES IN LITERATURE AND THIS RESEARCH
Compared with literature [1]- [6], the algorithm in this article has some similarities and differences. First, the analysis of similarities. The research methods in this article and in literature [1]- [6] all consider the tourists' interests as the important research object and the aim that should be satisfied, and set VOLUME 8, 2020 multiple constraint factors to plan tour routes. Literature [1] also plans the tour route in accordance with the order of tourist sights and then outputs the tour route by tourists' interests. In literature [2], the clustering algorithm is also used. It takes time factor an important parameter. The research results in this article can be used as a module embedded in mobile RS, which can be well combined with the research content of literature [4]. Literature [5] also uses data sets to collect tourism data, and designs algorithms based on the data and factors. Literature [6] uses social network information collection method to mine interest data, which is similar to the method of text encyclopedia big data mining in this study. The clustering algorithm is also used to cluster the interest points, which is similar to the tourist sight clustering in this research. Second, the analysis of differences. Compared with the literature research methods, this article has the following differences and advantages. In this article, the research develops a special algorithm for tourist sight mining which ensures that each tourist sight is conforms to tourists' interests. The optimal tour route planning is based on the selected tourist sights, and uses the complete traversing algorithm to output the globally optimal solution. The literature [2] uses the heuristic search algorithm, it is similar to the three control group algorithms in this article, which will easily fall into local optimum, thus has certain disadvantages. Different from literature [3], this research does not involve emotional perception factors, but develops the algorithm from the aspect of the factors tourists mostly concern. Emotional perception involves tourists' subjective evaluation on the tourist sights and tour routes, which is suitable for the statistical research on tourist groups. However, this research mainly focuses on individual tourist. Among the factors that affect the tour route in literature [1], the weather condition is not considered in this research, because the bad weather is not suitable for tourism activity, thus it should not be taken as the factor of tour route planning. Literature [5] focuses on the analysis and evaluation on the recommendation methods, but not on the specific algorithm design. This research focuses on the algorithm development. The difference with literature [6] is that this research focuses on text encyclopedia big data mining on tourist sights, rather than big data of social network. The standards and contact points are different. Literature [6] involves manual recommendation method, while the optimal tourist sights and tour routes in this research are all automatically recommended according to tourists' interests, the mode is different.

VI. CONCLUSION AND THE FUTURE WORK
The developed smart tour route planning algorithm based on precise interested tourist sight data mining can solve the problems in current tour route planning methods, first, the tourists' interests and needs should be analyzed. Second, the tourist sights feature attributes should be researched. Third, the matching on the tourist sights feature attributes and tourists' interests and needs as well as precise interested tourist sights should be researched. Fourth, the optimal tour route planning algorithm based on the mined tourist sights should be researched. The study on tourists' interests and needs is the core and the critical point to form the source of tourists' motive benefits. The tour routes which can best satisfy tourists' interests and needs are truly the optimal ones to meet their motive benefits. Thus, on the basis of researching the tourists' interests and needs, tourist sights' feature attributes are researched to confirm the factors that the tourist sights could satisfy the needs. Combing with the tourist sight attribute factor, attraction index, the optimal visiting time and tour basic expense, etc., the tourist sight clusters are set up as the precondition for mining precise interested tourist sights. On the aspect of mining precise interested tourist sights, the algorithms consider two critical conditions, one is the matching degree of the tourist sight and the interest and need, the other is the tourist sights' geographic locations. Combining with the two conditions, the algorithm could ensure that the mined tourist sights not only meet tourists' needs but also have the optimal geographic distribution to make the tour expense lowest. Take the mined interested tourist sights as nodes, combining with the geographic information data, traffic information data and the tourist sights information data, the optimal tour route planning algorithm is developed to output the motive value maximum heap, complete binary tree and edge clipping circle and finally confirm the optimal tour route plans. By designing and operating the experiment, the feasibility and practicalness of the algorithm is testified. Set the commonly used shortest path algorithms as the control group to compare with the developed algorithm on the aspect of outputting optimal tour routes, motive values, motive difference values, performance and guide maps to testify that the developed algorithm has the best performance.
On the basis of this research, the future work could be concluded as the following three aspects. First, the tourist groups could be further precised and classified. For example, as to those tourists with the background of specialist knowledge or senior tourists with abundant travel experiences, the research should consider how to set up the special interest and needs model for these group of tourists. Second, as the the precise tourist group, the tourist sight feature attributes could be further researched on how to meet tourists' interests and needs, and set them as the factors to set up the tourist sight clustering model, for example, history, culture, custom, etc. On the aspect of mining precise interested tourist sights, the much more specific and special algorithm model could be considered to output the optimal tourist sights and tour routes for the special tourist group and meet their special motive benefits. Third, since the transportation modes in the city are multiple, different tourists might choose different transportation modes. As to each mode, the optimal tour route might also be different. Thus the recommender system should consider this condition and provide tourists with more elaborate choices. In our next step work, we will specially research on the impact of transportation mode on the output of the optimal tour route.

T (i)
Interest motive factor. T (ij) Interest motive factor feature attribute.

T
Interest and need label set. T (i) Interest and need label subset. T (αβ) Interest and need label matrix.

rank(·)
The rank of a matrix or a vector. C Tourist sight domain.

C i
Tourist sight cluster. c r Single one tourist sight. c(i, v i ) Tourist sight meta-data. k o Tourist sight clustering factor. σ (c r , c ¬r ) Tourist sight clustering objective function. L Keyword label set. L (u) Keyword label subset. T c r (uv) Tourist sight keyword word frequency matrix.

S
The matrix store labels. C ∧ (p×max m (i) ) Tourist sight clustering dynamic matrix. C (p×max m (i) ) Tourist sight clustering steady matrix. c (i, v i ) Tourist sight growing seed point. tree (i) Tourist sight growing tree. c + (i,v i ) Tourist sight searching immunity. µ Tourist sight membership degree. µ i tourist sight membership degree matrix. O (·) An Open list. C (·) An Closed list. T ∧

(αβ)
Interest and need quantization matrix I (αβ) Interest tourist sight basic vector. P Starting searching point.

R
Interval motive value maximum heap. R ∼ Interval motive value temporary heap. J (z) Sub-interval motive value vector.
Edge clipping circle. l The longitude. B The latitude. Z + The positive integer. a The attraction index of tourist sight. t The best visiting time of tourist sight.