Chinese Tourists in Malaysia: An Analysis of Spatio–Temporal Behavior Based on Tourism Digital Footprints

Travel digital footprints can be used to analyze tourists’ spatio-temporal behavior to customize travel plans and recommendations. This study focuses on Chinese tourists in Malaysia as the research subject and uses the tourism digital footprint of Chinese tourists’ travel texts on Qunar.com as the primary data source. It connects traditional quantitative analysis with complex network analysis to study tourists’ behavior, time pattern, and complex network effects. The research results were as follows: (1) there was an uneven distribution of core tourism nodes with structural holes in Malaysia, which form a network pattern of imbalanced power and fierce internal competition; (2) the digital footprints of Chinese tourists were mainly concentrated in traditional tourist hot spots in Malaysia; (3) besides the competition between domestic islands, Thailand and Singapore are Malaysia’s main competitors for Chinese tourists. These results provide helpful information for the tourism management departments of Malaysia to improve their marketing and development efforts directed for Chinese tourists.


I. INTRODUCTION
Under human-centered development, studying tourists' spatio-temporal behavior has attracted increasing attention from scholars in tourism, urban planning, geography, and other fields [1]. It is crucial to predict and analyze tourism markets to understand how time and space affect the travel patterns of tourists [2]. It can also help planners assess the attractiveness and carrying capacity of tourist destinations. In addition, it provides necessary insight into tourism mobility for local and national governments and improves transport policies according to tourist distribution and travel distance. Thus, reducing environmental pollution to promote sustainable development. The lack of proper tourism activity data has always been a significant constraint in tourism research [3], [4]. However, as the Internet continues to develop, tourists' ''digital footprint'' has generated a large The associate editor coordinating the review of this manuscript and approving it for publication was Yichuan Jiang . amount of data, making it possible to acquire customercentered personal spatial-temporal behavior and contextual information.This information has long-time sequences, large quantities, high precision, and many other advantages, which can effectively reduce the cost and inconvenience caused by offline questionnaires and interviews [5]. This helps researchers understand tourists' spatio-temporal behavior patterns in destinations and their interactions with the tourism environment. These advantages make the ''digital footprint'' a vital data source for studying tourists' spatio-temporal behavior [6].
According to a barometer compiled by the United Nations World Tourism Organization, Malaysia has been one of the most popular tourist destinations in the Asia-Pacific region in recent years. In 2018, 25.83million international tourists visited Malaysia. In 2019, the number rose to 26.1 million, and the tourist revenue reached RM86.14 billion. Currently, the tourism industry in Malaysia has become the secondlargest source of foreign exchange income for the Malaysian In 2019, the tourism revenue of Chinese tourists in Malaysia was RM12.3 billion, accounting for 14% of the total tourism revenue of Malaysia [7]. Analyzing Chinese tourists' tourism behavior in Malaysia is crucial for further developing tourism markets, improving tourism services, and strengthening the bilateral cultural exchanges and cooperation. It can also provide a sufficient reference for government departments, destination planners, and tourism service providers to exploit proper facilities in specific destinations.
In the current research on tourism in Malaysia, researchers mainly concentrate on the relationship between tourism and the economy [8]. Secondly, scholars pay more attention to the sustainable development of tourism in Malaysia and the protection of the local natural environment. Besides, they also focus on heritage tourism [9] and Muslim tourism [10]. There is very little research on Chinese tourists, and the current research focuses on medical tourism [7], [11] and economic factors that affect Chinese tourists' travel demand to Malaysia [12]. There is no literature on the temporal and spatial behavior of Chinese tourists in Malaysia.
In this context, this paper discusses the tourism digital footprint, represented by tourists' online Chinese travel texts as the primary data source. It analyzed the spatio-temporal behavior characteristics of Chinese tourists by mining the information of tourists in the two dimensions of time and space. The structure of this paper is as follows: the second section reviewes the most relevant literature reference on tourism digital footprints and tourists' spatial-temporal behavior; the third section describes the research design (data and methods) applied in this paper, including the combination of traditional quantitative methods and complex network analysis; the fourth section introduces the results, and the corresponding suggestions are put forward; Finally, the conclusion summarizes the main points of this paper according to the temporal and spatial behavior characteristics of Chinese tourists in Malaysia.

II. LITERATURE REVIEW
As China improves its national economy and relaxes its outbound tourism policies, Chinese tourists have become the major participants in the global tourism market [13]. However, due to the uniqueness of Chinese culture, Chinese tourists' behavior is often different from that of other international tourists [14]. Therefore, Chinese tourists visiting tourist destinations have shown strong behavioral diversity. Therefore, a strong behavioral diversity has been shown by Chinese tourists in tourist destinations [15]. Currently, Chinese tourists have become typical research subjects in the tourism behavior literature. Researchers conducted an indepth exploration of the aspects of demand [16], motivation [3], [17], perception of destinations [18] and travel characteristics [19], [20]. In recent years, travel social media platforms have become increasingly popular in China. Therefore, more and more Chinese tourists begin to write and share their travel diaries on the platform. In particular, the travel itinerary shared by tourists makes the analysis of Chinese tourists' spatio-temporal behavior more accurate than traditional data sources, such as statistical yearbooks, which can bring a new perspective to the detailed study of Chinese tourists' behavior [21]. In the following sections, tourists' spatiotemporal behavior and travel digital footprints are reviewed and summarized.

A. TOURISTS' SPATIO-TEMPORAL BEHAVIOR
Tourists' spatio-temporal behavior refers to tourists' spatial movement behavior and time allocation behavior in the travel process from the origin to the destination and back to the origin. The analysis of tourists' spatio-temporal characteristics can be presented from the three dimensions of time, space, and flow (Table 1). A common type of research is classifies tourists according to their temporal characteristics to compare their spatial behavior information, such as analyzing tourism demand in Spain according to seasonal changes [28]. In the example of Nanjing, three tourism flow networks corresponded to different travel durations [24], which proves the importance of time in tourism study. Regarding the selection of indicators, the Gini coefficient, the coefficient of variation, Theil index, and entropy were the most frequently used indicators to study tourists' temporal behavior. The other common study analyzed the spatial-temporal changes in tourists' behavior from a geospatial perspective, such as extracting popular tourist attractions in Beijing through spatial clustering [22]; analyzing the movement of Chinese tourists to ASEAN through the gravity center model, and tourism market model [26]. In the research methods, most researchers chose to adopt the Markov chains [29], cluster analysis [30], logistic-regression and general log-linear models [31] to analyze the tourist flow.
However, most of the above studies are qualitative, only analyzing the types of tourism flow under static conditions, without quantitative verification of the dynamic network characteristics of tourism flow. Therefore, scholars have begun to reveal the spatio-temporal characteristics of tourism through the study of tourism flows, mainly concentrating on the flow of tourists and the network patterns and characteristics generated by tourism flows, and have summarized relevant flow laws and behavior patterns [32], [33]. The researchers analyzed tourism flow density, centrality, Core-Periphery Model, and cohesive subgroups. For example, the network model of Chinese tourists' tourism flow in Japan was proposed using network density, node center degree, and other indicators [23]. However, these research methods concentrate on the explicit flow and ignore potential and internal network structure. Therefore, it cannot fully present the spatio-temporal behavior characteristics of tourists in tourists' destinations and lack the breadth and depth of pattern monitoring. The community detection indicators in complex networks can make up for this, which are widely used in many fields, such as social networks [34], [35], [36], computer networks [37], [38], and biology and ecology [39], [40]. Community detection is a typical method for identifying subgroups with strong connections in complex networks. For tourism research, community discovery provides a method to divide the network into multiple communities to maximize the connection within the same community and minimize the connection between different communities [25]. This is the key to understanding the complex network structure in virtual space and, more importantly, in geographical space [41].

B. TRAVEL DIGITAL FOOTPRINT
In 2008, American scholars Girardin and others defined the concept of ''digital footprint'' [42]. It provides a new method for collecting tourist flow data and a new research perspective for studying tourist mobility [43]. The use of travel digital footprints can accurately and quickly analyze the routes and emotions of tourists, which reveals the behavior law of tourists. Travel digital footprints can be collected from different types of data sources, such as GPS [27], mobile network data [44], geo-labeled photos [45] and User Generated Content data [46], [47].
In recent years,travel notes have become increasingly popular among User Generated Content (UGC) data sources. As a new type of travel data voluntarily shared by tourists,online travel diaries include traditional text and image information and record tourists' location information [43]. These timely and strong coverage of tourism digital footprints provides tourists with obvious space and time labels,which can achieve accurate positioning and completely restore the time traces of tourists [1]. Therefore, tourism digital footprint has become a new measurement method for studying the spatiotemporal characteristics of tourism flow in the era of intelligent tourism, providing a new tool perspective for studying the spatiotemporal characteristics of tourism flow [45]. It can help reveal the spatio-temporal characteristics of tourist flow, and accurately restore the selection process of tourist destination. For example, online travel notes were used to study the spatial behavior characteristics of tourists, tourism flows, and patterns of temporal heterogeneity [24], [46]. However, most scholars obtain the location information of tourists using travel text extraction. This method may lead to the following problems. First, travel notes reflect tourist attractions more than administrative locations. After extracting the specific location, the text needs to rearrange and summarize the specific location of each scenic spot, which requires too much work. Second,tourists generally write travel notes based on their memories after traveling, which are highly random. Many notes are not written according to the chronological order of travel, leading to incomplete and inaccurate descriptions of tourism flows [48]. The above will result in errors in the data and affect analysis results. With the development of location-based services (LBS), an increasing number of social media platforms have integrated location-based service modules into travel note-writing tools so that tourists can directly share their travel itineraries to improve the reading experience of travel blogs [49]. These itineraries were directly edited by tourists and included rich spatio-temporal label information, which can provide a detailed description of tourists' itineraries [50]. These data overcome the deficiencies in traditional data (statistical yearbooks, questionnaires, etc.) and avoid the problems of information redundancy and record deviation, which are common to other types of digital footprint data sources.
Therefore, this paper utilized tourists' itineraries with geographic information as the source of digital footprint data and used the indicators of complex networks such as centrality, structural hole, and community detection to construct a research framework from the three dimensions of time, space, and flow to carry out an in-depth study on tourists' spatialtemporal behavior.

A. RESEARCH FRAMEWORK ON SPATIAL-TEMPORAL BEHAVIOR
This paper proposes a research framework for tourists' spatiotemporal behavior analysis based on itinerary text data containing geographic location information. This research framework ( Figure 1)combines traditional quantitative analysis with complex network analysis. First, intelligent data acquisition and processing rules are designed to construct the data set. Second, to comprehensively analyze the characteristics of spatial-temporal networks, the research contents of several detection methods (Flow direction statistics and complex network analysis) were divided into two parts: (1) community detection (to analyze the clusters formed by tourism destinations and analyze the ''implicit'' spatial network characteristics); (2) frequent flow direction, centrality and structural holes (to analyze the ''explicit'' flow characteristics of tourist flow between attractions).

B. STATISTICS OF FLOW DIRECTIONS
Tourist flow direction refers to the movement sequence of tourists between two attractions, such as ''From Kuala Lumpur to Malacca''. The frequency formula of flow direction is as follows: where P i is the frequency of the flow direction i; v i is the quantity of the flow direction i, that is, the occurrence number of the flow direction i in the sequence data; V is the total flow of all flow directions i, that is, the number of tourists' movements between two attractions in the data. By calculating and sorting the frequency of each flow direction in the data, the frequent directions of tourist flows can be obtained.

C. COMPLEX NETWORK ANALYSIS
Complex network analysis was used to analyze human movement or behavior and reveal spatial structure or interaction. If a user travels from one city to another, the two cities are considered to interact with each other. All cities are connected by tourism flows,and a network structure can be constructed with vertices representing cities and edges representing interactive relationships [46]. On this basis, the complex network analysis method was adopted to study the spatial network structure of tourist destinations, and various internal relationship characteristics of tourism flows can be found [43]. Therefore, this study employed Ucinet and Netdraw software to conduct quantitative processing and visual analysis of complex network analysis. It utilizes community discovery, centrality analysis, and structural hole measurement to quantitatively measure the spatial behavior patterns, network structure, and node characteristics of Chinese tourists to Malaysia. Then,the network structure was visualized using the analysis results.

1) COMMUNITY DETECTION
In complex network theory, community refers to a dense subnetwork within a more extensive network [51], meaning they are more closely related to in-group nodes than to out-group nodes [52]. Community detection provides a typical method for understanding the spatial structure of complex networks in geographic space [53]. We detected tourism communities by establishing connections between different locations according to tourists' large numbers of digital footprints. Such a community reflects the compactness of tourists' spatial movements [54]. Generally, nodes that are more closely connected within the same network are more likely to form clusters. The calculation formula of cluster partition density is as follows: In this formula, A ij represents the connection between the node i and the node j. k i = ij A ij represents the sum of weights of all edges connected to the node i, C i andC j represents the clustered index of the node i, and the node j. And the following function, δ(c i , c j ) represents whether the node i and the node j are in the same cluster (If they are in the same cluster, the function is equal to 1; if they are not, the function is equal to 0). Finally, m = 1 2 A ij represents the sum of connection weights of the whole network.

2) CENTRALITY ANALYSIS
Centrality analysis can quantify the position and center position of each node in the measurement network, and is an important means of quantifying the rights in the social network.The important nodes in the measured network are represented by degree centrality, betweenness centrality, and closeness centrality. Degree centrality was divided into indegree and outdegree centrality, representing the direct association between the target node and other nodes; the higher the value, the more well-known and widely popular these sites are among tourists, and they are necessary places for tourists to carry out travel activities. Betweenness centrality indicates the degree of the target node's control over other nodes. The higher the degree of betweenness centrality, the higher the irreplaceability of these sites in the tourist flow network, as an important ''bridge'' for tourists to transfer to other sites, so other sites should enhance the connection with these sites to increase more tourist flow. Closeness centrality was used to measure the degree of closeness between nodes [32]. Higher values indicate relatively higher overall accessibility and more convenient transportation, which can better attract tourists. The details can be found in the following formula:

a: DEGREE CENTRALITY
In-degree centrality formula: Out-degree centrality formula: where r ij , in and r ij , out represent the directional relationship between the nodes i and j, i.e., the number of tourists from attraction i to attraction j or the opposite; the former indicates that j flows to i, and the latter indicates that i flows to j, and n is the number of nodes in the tourist flow network.
where g jk is the number of paths that the traveler reaches the node k from the node j, g jk (i) is the number of paths by node i in the paths from the node j to the node k, and n is the number of nodes in the tourist flow network.
where g ij is the number of paths from the node i to the node j, and n is the number of nodes in the tourist flow network.

3) STRUCTURAL HOLES MEASUREMENT
In 1992, Burt proposed the theory of ''Structural holes'' [55]. The calculation of structural holes enables the identification of bottleneck problems in regional tourist flow [56]. The dominant nodes of structural holes generally have strong regional competitive advantages, and are less affected by the tourist flow of the surrounding nodes. They are irreplaceable, and there are significant differences in accessibility between them and surrounding nodes. For this end, Burt proposed the use of effective size and constraint as metrics of structural holes in social networks. These metrics are widely used and are thus employed here [32].

a: EFFECTIVE SIZE
Effective size refers to the non-redundant factor of the node in the network; that is, the effective size of one node is equal to the individual network scale of the node minus the network redundancy. The individual network scale of a node is the number of nodes contained in its neighborhood. The redundancy degree is equal to the average degree of other nodes in the individual network member of the node; therefore, the effective size is equivalent to the individual network scale minus the average degree of all network members of the node. Effective size can measure the whole influence of this node, which can be used to measure the importance of structural hole nodes: the higher the effective size, the more obvious the competitive advantage of the target node in attracting visitors and the higher the number of visitors. The calculation formula is as follows: where z ip is the number of connections from node i to node q, p iq is the proportional relationship between the tourist nodes i and q, i.e., the number of connections between node i and node q divided by the number of all the connections of nodes i; m jq is the marginal strength between nodes j and q, which is the number of connections between node j and node q divided by the maximum number of connections between node j and other nodes; and n is the number of nodes in the tourist flow network.

b: CONSTRAINT
Constraint refers to the ability of a node to use structural holes in its network. The dependence of the node on other nodes is regarded as the evaluation criterion. The larger the value, the stronger the constraint and the more obvious the competitive disadvantage of the target node in attracting visitors,the lower the number of visitors.
where p ij is the proportional relationship between node i and node j; p jq is the proportional relationship between node i and node q; p qj is the proportional relationship between node q and node j; The calculation method of the proportional relationship between nodes is the same as equation . n is the number of nodes in the tourist flow network.

D. STUDY AREA
Malaysia is located in Southeast Asia and consists of Malaya to the south of the Malay Peninsula and Sarawak and Sabah to the north of Kalimantan Island. Being rich in tourism resources is an advantage for Malaysia, making it a tourist spot for tourists worldwide [57]. According to the Malaysian Department of Statistics, tourism accounts for 15.9% of its Gross Domestic Product (GDP) in 2019 (Department of Statistics, Malaysia, 2020). However, the COVID-19 outbreak in 2020 exerted an enormous negative impact on Malaysia's tourism industry, and the Malaysian government then introduced a wide range of policy measures to attract more Chinese tourists [58]. It is of great significance for the sustainable development of Malaysia's tourism industry by considering Malaysia as a case study to clarify the spatial characteristics of its tourism and the strengths and weaknesses of its development.

E. DIGITAL FOOTPRINT DATA
This paper focused on the Chinese website Qunar (http://www.qunar.com) as its data source. Qunar.com is the most famous travel media in China. It includes tourism strategies, reviews, and other information on more than 60000 tourism destinations worldwide, with more than 130 million registered users. Users can share their travel experiences here on the website, whose relevant information is relatively complete and updated in time [32]. This study used Python's web crawler technology and collected 1990 Malaysian travel routes voluntarily shared by Chinese tourists as initial data. Considering the amount of data issues, it was decided to expand the time period from 2010 to 2019, so all 1,990 data can be used. Due to the epidemic's impact in early 2020, strict measures were taken by all countries to restrict inbound and outbound travel to curb the spread of the disease [59]. As a result, no new data on visiting Malaysia has been generated since January 2020.
Considering the possibility of information errors and logic errors in the initial data, we designed the following data cleaning rules: 1)exclude the travel records with incomplete itineraries; 2)if the travel records involve places outside Malaysia, only the point of interesting (POI) records of the latest entry into or departure from Malaysia shall be retained; 3)remove the data of repeated scenic spots (if POI numbers of the same scenic spot appear in the data in sequence, it is considered that the user did not leave the scenic spot, so the redundant records were deleted). Finally, 1,148 valid samples were obtained based on primary data screening and sorted to build the digital footprint database of Chinese tourists in Malaysia, including tourist names, travel dates, travel nodes, length of stay, and other data. Finally, 49 pieces of node information for Malaysia and 3,270 edges between them were obtained.
According to the order of visits, the actual daily trips of tourists were split into directed nodes. For example, Kuala Lumpur -Penang -Langkawi -Kuala Lumpur was divided into Kuala Lumpur -Penang, Penang -Langkawi, and Langkawi -Kuala Lumpur. If there was a direct flow between nodes, it was denoted as 1 and if there was no direct flow, it was denoted as 0 to construct the data matrix of the flow direction of the travel nodes. By cleaning the travel information and extracting the data of node paths, a 49×49 multi-valued directed relation matrix was obtained in this study(figure 2). The characteristics of Malaysia's tourism flow network structure were analyzed using this matrix.

A. TEMPORAL CHARACTERISTICS
Based on the premise that ''the number of travel notes is positively correlated with the flow of tourists''. this paper probed the regularities of the time distribution of Chinese tourists in Malaysia. Time is a type of widespread data that objectively records the states of the observed characteristic values at different times or time points. The time involved is mainly divided into two types:travel period, including the distribution of traveling years and months; the other is the length of stay, presented in the form of the number of days of traveling.
The itinerary text released by tourists reflects the specific travel time. As shown in figure 3, the number of Chinese tourists has been rising steadily, with a relatively small but still growing number from 2010 to 2010 to 2014. This is mainly due to the Malaysian government's ninth Five-Year Plan for Tourism Development, which for the first time regards tourism as one of the four priorities for economic development [26].The number of Chinese tourists dropped suddenly in 2015 was a consequence of the global economic slowdown and unfavourable domestic events in Malaysia such as the worst flood in the past 30 years that has affected several states in early 2015; the earthquakes in Ranau, Sabah in middle of 2015 altogether with the lingering effects of the MH370 and M17 incidents. In their efforts to counter the adverse effect of MH370 tragedy as well as the kidnapping incident in Sabah towards the Chinese visitors [12]. During times of political instability, hazards, and economiccrises, Malaysia has made many changes to the Chinese market, such as relaxing visa requirements, introducing electronic visa measures, and changing marketing strategies to attract  Chinese tourists [60]. Therefore, the number of Chinese tourists has increased annually since 2016.
As shown in figure 4, the tourist itinerary from 2010 to 2019 is divided into three periods:2010-2013, 2014-2016 and 2017-2019, and is marked according to the monthly time period, which studies the overall distribution features and characteristics of Chinese tourists in Malaysia. The number of monthly visits in these three time periods is basically the same,with little change. Figure 4 shows that the number of travel notes was the highest in July and August, followed by January and February, which were the peak tourist seasons. This is because July and August are the summer months in China, coinciding with the summer holiday for students. Thus, there are more groups of students and groups of parents with their children deciding to travel during this period. January and February are winter months in China, and the weather during this period is cold. Chinese tourists prefer to spend their holidays in places with warm climates. Meanwhile, this period meets the Chinese New Year, and more people tend to travel with their families. Figure 5 can be obtained by combing and comparing the travel notes. Temporal behavior characteristics of tourists can be divided into two types: The first was the sightseeing tour, in which the visitors stay 5 to 7 days, and they accounted for 42%. The primary purpose of this kind of outbound tourist is to visit the classic areas of Malaysia, and they will choose one to two famous tourist destinations to visit. They either choose the island tour, as shown in travel notes like ''Independent travel to Langkawi'', or just the city tour, as seen from travel notes like ''Tour to Kuala Lumpur and  Malacca''. The second was the in-depth tour, during which the visitors spend 8 to 10 days, accounting for 29%. For example, in travel notes like ''eight days' in-depth delicacy enjoying and touring in Malaysia'' and ''Independent spring travel in Malaysia''. Most of these tourists travel extensively and deeply in Malaysia. There are usually more than three tourism nodes, including urban and island tourism.

B. SPATIAL CHARACTERISTICS 1) FREQUENT DIRECTIONS OF TOURIST FLOWS
The data shows that in the 3270 edges, 452 of them were produced by returning from and leaving for other overseas countries.After removing the 452 edges generated by Malaysia's neighboring countries and undergoing a screening process, 2818 pieces of data on tourist flows within Malaysia were obtained. We calculated the number of Chinese tourists visiting each destination to determine the most popular destinations in Malaysia that are frequently visited by Chinese tourists. As shown in table 2, there are seven destinations most visited by Chinese tourists. Kuala Lumpur was the most popular among Chinese tourists because it is the capital of Malaysia and serves as a major transportation hub for tourists. Chinese tourists usually visit Kuala Lumpur and then transfer to other places. Second was Tawau, which is often used as a springboard for tourists to and from Semporna because of its airport. Overall, traditional destination cities remain the most popular for Chinese tourists, which shows that Chinese tourists may not know enough about Malaysia's tourism resources. At the same time, and there may also be time constraints: visitors stayed for relatively short periods, so they were more likely to visit the best-known places.
The online travel itinerary text data showed a total of 1146 visitors' tourist activities between the scenic spots. According to formula (1), these movements were sorted according to the frequency of tourist flow of each scenic spot, and the passenger flow ranking of Malaysia was obtained. Table 3 shows the top 10 tourist flow directions. Frequent tourist flow direction reflects the connection between scenic spots. The tourist flow relationship between Kuala Lumpur and Malacca was the most noteworthy (two-way tourist flow accounted for 18.36% of the total tourist flow). The second was the tourist flow relationship between Tawau and Semporna, accounting for 14.82%. The passenger flow between Kota Kinabalu and Tawau accounted for 9.55%, between Kuala Lumpur and Penang accounts for 8.83%, and between Kuala Lumpur and Langkawi accounted for 6.59%. This shows that these routes were the core routes for Chinese tourists in Malaysia.

2) COMMUNITY DETECTION
Community detection method was adopted to divide destination cities into different tourist groups based on the linking strength between cities in the inbound tourist flow network. As shown in Figure 6, Malaysia is divided into three communities. Community A was city tour from Kuala Lumpur to various major cities in Malaysia. The most popular cities are Malacca, Penang, and Langkawi. Community A can be further divided into cultural tours (e.g., Malacca, Kudat, Kuching) and natural tours (e.g., Sarawak, Langkawi, and Pahang). Both Communities B and C were island tours. Community B started in Kuala Lumpur or Kota Kinabalu, went to Tawau, turned to Semporna, and headed for each island. Community C started from Kuala Lumpur, flowed to Terengganu, and transferred to Pulau Redang and the other two islands. Because both B and C are island tours, competition exists. Generally, Chinese tourists would choose one of these two. The data showed that more Chinese tourists chose community B. The reasons can be inferred from the Chinese travel notes: (1) the Malaysian government developed Pulau Redang in 2000, which was long ago. Hence, the facilities are relatively outdated compared with those in Semporna. (2) The scenery of Semporna is much better, and its popularity is higher in China thanks to the Internet and vigorous promotion by travel agencies. Therefore, more Chinese tourists tend to choose Semporna for island tours.   In terms of island tours, besides the competition between communities B and C, there was competition for Chinese tourist resources from neighboring countries in community A. The data show that in the 3270 edges, 452 of them and 25 POIs were produced by returning from and leaving for other overseas countries. Many Chinese tourists traveled to the islands in countries such as Singapore, Thailand, Indonesia, and the Philippines after visiting Community A, rather than choosing the islands within Malaysia. Table 4 lists the top seven locations with the most frequent flows. The results show that Singapore, Thailand, and Indonesia were the uppermost alternative places for Malaysia, consistent with the results presented by F.Habibi et al. [61], and R. Suppiah et al. [62].

3) CENTRALITY ANALYSIS OF TOURIST FLOW NETWORK
According to the Chinese network text data, a flow matrix between attractions was constructed for the social network analysis. Malaysia's tourist flow network structure is illustrated according to the online travel diary data. Node centrality metrics (in-degree centrality, out-degree centrality, betweenness centrality, and closeness centrality) were calculated according to Equations (3)-(5) using UCINET6. The sample results for the most relevant attractions are shown in Table5. Figure 7 shows the tourism flow network according to the value of the degree centrality of these nodes. The size of the nodes in Fig.7 represents the level of node centrality, while the thickness of the connections between nodes indicates the volume of tourist flows.
By comparing Figure 7(a) and 7(b),the calculated results of the outdegrees and indegrees of the Malaysian tourism flow network nodes were the same. The value of degree centrality of Kuala Lumpur, Kota Kinabalu, Malacca, Semporna, and Penang ranked in the top five. Kuala Lumpur's value of centrality of outdegree and indegree was the highest,which indicated that Kuala Lumpur's core position was prominent, and the degree of radiation was more significant than that of aggregation.
As seen in Figure 7(c),from the betweenness centrality, Kuala Lumpur, Kota Kinabalu, Putrajaya, Semporna, and Langkawi ranked in the top five,which shows that these five attractions have the highest ''irreplaceability'' and dispersion ability of tourist flows in the tourist flow network. However,the betweenness centrality values between every scenic spot varied greatly among attractions and the difference between the maximum and minimum values was 291.72. The influence range of control force between core nodes and edge nodes varies greatly, which leads to unbalanced tourism development in different regions. The attraction with the highest betweenness centrality value was Kuala Lumpur, a ''core intermediary'' in the Malaysian tourism flow network. In comparison, the betweenness centrality was zero for the following areas, including Kapalai Island, Kuala Selangor, As shown in Figure 7(d), the closeness centrality values of the nodes of the Malaysian tourism flow network were more evenly distributed than that of the degree centrality and betweenness centrality. Kuala Lumpur, Kota Kinabalu, Penang, Langkawi, and Malacca had higher closeness centrality values. The locations with the lowest closeness centrality values were Mataking Island, Kudat, Pontian, Pulau Tioman, Sepangar Island, and Tuaran.This showed that the overall accessibility of most tourist attractions in Malaysia was relatively high, had good independence, and was close to the network center. However, the construction of tourism transportation needs to be strengthened in marginal areas.
In conclusion, (1) Kuala Lumpur, Kota Kinabalu, Malacca, Semporna, Penang, and other places are the core nodes of the Malaysian tourism flow network. These places are the earliest developed scenic spots in Malaysia, and they are also the most promoted places through cooperation with Chinese travel agencies.With convenient transportation, perfect tourism infrastructure,and large passenger flow, it is a popular destination for Chinese tourists. (2) The distribution of nodes in the tourism flow network in Malaysia is not balanced; The streamlining of the tourism flow network is mainly ''controlled'' by these core nodes. Most other nodes are highly ''depending'' on these core nodes. During traveling, Chinese tourists take these nodes as the core tourist areas, and evacuate and gather from them. (3) The node centrality values of the neighboring core nodes had lower centrality values. Therefore, it can be assumed that the core nodes have certain constraints on the tourism flows of the surrounding attractions (i.e. it may be a structural hole in the tourism flow network).

4) STRUCTURAL HOLES MEASUREMENT OF TOURIST FLOW
Ucinet software was used to analyze the related indicators of the structural holes, and the calculated results are shown in Table 6. The size of the nodes in Fig.8 represents the level of the structural hole indicators, and the width of the connectionsbetween the nodes represents the size of tourist flow.
Comparing the results of the structural hole indicators in Figure 8(a) and (b), the effective size and constraint were higher in Kuala Lumpur and Kota Kinabalu. It showed that they were in the center of the network, playing the role of ''bridge'', which can organically connect the tourism of other cities into a network. It also be affected by other popular, surrounding, and nearby attractions. For example, tourists to Kuala Lumpur generally visit Malacca and Penang to disperse, and Kuala Lumpur has formed a structural hole with Malacca and Penang. Kuala Lumpur strongly constrains tourism in Kota Kinabalu because Kota Kinabalu is situated in East Malaysia, and Kuala Lumpur is located in West Malaysia. Both cities have direct flights from China. Most of the tourists who go to East Malaysia choose to fly directly to Kota Kinabalu, so there was a certain competition between these two places. The surrounding nodes such as Kudat, Pontian, Pulau Tioman, Pulau Lang Tengah, Sepangar Island, and Tuaran have low effective value and high constraint value, which indicates that the development of tourist flow in these surrounding areas was limited and that attractions were at a disadvantage in the competition to attract tourists.

V. CONCLUSION
This paper discussed the temporal pattern and complex network of tourists' behavior as the research object, utilized the tourism itinerary text of Qunar to construct a tourist trajectory database,and applied the mathematical statistics and complex network analysis methods to analyze the spatial-temporal behavior of Chinese tourists in Malaysia in detail. The main conclusions are as follows.

A. THEORETICAL IMPLICATIONS
This paper proposes a research framework for analyzing the spatio-temporal behavior characteristics of tourists based on tourists' digital footprint data of online tourism itineraries. This framework combines the traditional spatial quantitative analysis method with complex network analysis, providing a new perspective for the future study of tourists' spatiotemporal behavior. The framework combines community discovery with centrality and structural holes and analyzes the tourism spatial pattern from ''implicit'' and ''explicit'' aspects. It provides a comprehensive research framework for tourism spatial patterns and enriches theoretical research on the tourism flow network structure.

B. POLICY IMPLICATIONS
Based on the research results above, combined with the the outbreak of the COVID-19 pandemic in 2020, Malaysia's tourism is faced with a sharp decline in tourist sources, homogeneous competition, periodic operating difficulties, etc. This paper proposes following tourism policy suggestions: 1) The government should give the highest prioritize to the role of core cities. On the one hand, the government needs to deeply explore and improve the cultural connotations and quality of urban tourism products. The government must actively guide tourists to transfer and spread to the next destination and develop more suitable tourism routes for short-term travel. 2) the government should conduct institutional and technological innovation to further optimize the tourism ecosystem, including policies, environment, infrastructure and cultural resources,etc., which are conducive to tourism development in the Chinese market. During the epidemic, Chinese tourists could not leave the country. Malaysia can create an immersive tourism experience using 5G, virtual reality, and other cross-application technologies. Showing unique and eye-catching cultural relics, natural scenery, and folk customs of Malaysia to Chinese short-video users via live videos of Malaysian tourism. Based on the existing urban tourism, the country should vigorously promote emerging or potential tourist destinations, take advantage of resources, ensure sustainable development and enhance Malaysia's international tourism competitiveness. 3) the Malaysian government should focus on the formulation and implementation of marketing strategies in the Chinese market, integrate regional resources, enrich the perception of tourist destinations, refine the tourism market to meet the needs of different tourist groups, guide the diverse development of featured tourism products, and further promote regional collaborations. The government should fully use a digital marketing strategy and carry out spatial drainage through product mix. Meanwhile, relying on big data on tourists' consumption behavior, digital precision marketing must be carried out to effectively attract potential Chinese tourists.

VI. LIMITATIONS AND FURTHER STUDY
Limitations of the research framework and suggestions for subsequent research are as follows: First, there are some deficiencies in the digital footprint data source we used. Duggan points out that young and educated travelers are more likely to use online travel sites. We must combine online travel data with official survey data in the future. Meanwhile, the existing data can be supplemented by questionnaires and other methods to better conduct corresponding analysis. Second, With regard to the infrastructure changes, which were not specifically considered in the course of the study, due to this study scope limitations, this is a shortcoming in our study and can be further explored in future studies. Third, the research framework lacks information about tourists' nonspatial behavior. This study only uses POI data of tourists' itineraries and does not analyze the contents of tourism notes. This information can help us analyze the deep-seated reasons for forming tourism spatial patterns and the driving factors of tourists' behavior. That will be the next step. Finally, the COVID-19 pandemic has heavily impacted the global tourism industry. How to meet the needs of tourists during the pandemic and provide an innovative way for the full recovery of the tourism industry in the post-pandemic era is also an area we need to study further.