Extracting Network Patterns of Tourist Flows in an Urban Agglomeration Through Digital Footprints: The Case of Greater Bay Area

Network patterns of tourist flows can reveal differences in tourism resources among destinations from the perspective of network science, providing valuable suggestions for tourism managers and policymakers to promote the balanced and sustainable development of tourism. This paper focuses on urban agglomerations, a highly developed spatial form of integrated cities, and proposes a research framework to extract the network patterns of tourist flows through digital footprints. Based on an illustrative case study using geo-located travel blog data from Qunar.com, we built a tourist flow network for the Guangdong-Hong Kong-Macao Greater Bay Area (GBA), China. The analysis shows: (1) GBA’s tourist flow network is obviously heterogeneous, showing a pattern of “four cores and three poles”; (2) the strong “administrative barrier effect,” revealed by community detection within the network, is the main obstacle to integrating regional tourism; (3) strengthening the infrastructure of tourism mediation cities such as Guangzhou, Zhuhai, and Shenzhen, so as to avoid the “structural hole” caused by the tourist flows between cities, is an urgent issue that the GBA government needs to address. To summarize, the research framework can provide a theoretical basis and concrete suggestions for the planning and management of the tourism industry in urban agglomerations.


I. INTRODUCTION
Urban agglomerations are highly developed spatial forms of integrated cities, and are the spatial carrier of regional economic and industrial development [1]. Tourism, as an important driving factor for economic growth in cities, is an external manifestation of the integrated development process of urban agglomerations [2], [3]. Evaluating tourism development in urban agglomerations not only provides an important reference for the spatial restructuring of industry, but also has important significance for promoting industrial collaboration and resource exchange within and outside the urban agglomeration, thereby contributing to building an international trade platform [4].
Patterns of human mobility can provide insights into social phenomena in cities [5], [6]. Tourist flow, generated by the movement of tourists between destinations, is therefore an important metric to evaluate tourism development [7].
The associate editor coordinating the review of this manuscript and approving it for publication was Gang Mei . Current tourist flow studies mainly focus on theoretical frameworks [8], influencing factors [9], flow patterns [10], and prediction models [11], among which analysis of the spatial characteristics of tourist flow has been a research hotspot in recent years. However, survey-based data sources used in traditional tourism research often cannot accurately and comprehensively reflect the movement of tourists, which creates difficulties when researching spatial patterns from a quantitative perspective. In recent years, scholars have begun to introduce the concept of ''digital footprints'' [12], which facilitate detailed modeling of tourist movement.
Tourist flow can depict the spatial distribution of tourists from a dynamic perspective. Scholars usually use Markov chains [13], spatial clustering [14], kernel density estimation [15], exploratory spatial data analysis [16], and other methods to detect the spatial patterns of tourist flows. In recent years, the introduction of network science has provided a new way to analyze tourist flow [17]. Through the construction of tourist flow networks, tourist flows and destinations can be integrated into an efficient spatial system, and the role relationship among tourist destinations can be analyzed mathematically and conceptually from a sociological perspective.
In view of this approach, this paper proposes a research framework for extracting the network patterns of tourist flows in urban agglomerations through digital footprints. Taking the Guangdong-Hong Kong-Macao Greater Bay Area (GBA) in China as a case study, tourists' digital footprints (collected from travel blogs on Qunar.com) are used to verify the proposed research framework. The structure of this paper is as follows: the second section reviews the most relevant previous literature on tourists' digital footprints and tourism network research. The third section describes the research design, including the research framework and case study data. In the fourth and fifth section, the analysis results of the case study are introduced and discussed. Finally, the last section makes a summary conclusion of the practical implications of the research framework.

II. LITERATURE REVIEW A. TOURISTS' DIGITAL FOOTPRINTS
Tourists' digital footprints are the electronic traces left by tourists during their trips [12]; they include many data sources, such as GPS trajectory data, check-in data, mobile phone signaling data, travel blogs, geotagged photos, etc. As more and more tourists are willing to share their travel experiences on social platforms, the ability of digital footprints to depict tourists' behavior has increased [18]. At present, many scholars have utilized different digital footprints to study tourist behavior in cities. For instance, Roura et al. [19], Li et al. [20], and Jing et al. [21] have utilized, respectively, travel blogs, GPS trajectory data, and geotagged photos.
Many scholars have proposed research frameworks based on tourists' digital footprints. For example, Walden-Schreine et al. [22], Mou et al. [23], and Martí et al. [24] have proposed a research framework for, respectively, exploring the spatial relationship between tourists and infrastructure in nature reserves, detecting the spatial patterns of tourist flows in the cities, and identifying urban tourism hotspots. However, these proposed research frameworks are mostly limited to a single city or scenic area, while the agglomeration of tourist destinations is rarely involved. The mechanism of tourism in urban agglomerations is more complex, making it worth exploring to obtain more valuable implications for tourism management. Therefore, to meet this need, in this study we proposed a research framework for large-scale urban agglomerations based on digital footprint data.

B. NETWORK SCIENCE IN TOURIST FLOW RESEARCH
Network science, derived from graph theory, can describe the structure of relations between given tourism entities in the form of nodes and links [25], providing a new perspective for tourist flow research [26]. The relevant research can be divided into two types according to its analysis approach. One explores the network characteristics among tourist destinations at the level of administrative units such as cities or countries. For instance, Qin et al. [27], Xu et al. [28], and Zeng [29] have studied the network structure of inbound tourist flow of cities in South Korea, China, and Japan, respectively. Provenzano et al. [30] and Chung et al. [31] both discussed the dynamic characteristics of tourist flows among European countries. Another common research stream is to explore the network characteristics of tourist flows in inner cities at the attraction level. For instance, Liu et al. [32] and Jin et al. [33] have built tourist flow networks at the attraction level and explored, respectively, the networks' underlying mechanisms and temporal heterogeneity.
At the theoretical level, many scholars have introduced social network theory in their research to explore the sociological relationship between tourist destinations. For example, Liu et al. [7] utilized centrality analysis and automorphic equivalence analysis to reveal the ''roles'' played by destinations in the Tourism Region of South Anhui; Mou et al. [23] explored the ''power distribution'' of attractions in Qingdao by detecting structural holes in the network. In addition, as the number of network nodes increases, the complexity of tourist flow networks has also attracted the attention of scholars. For example, Miguéns et al. [34], Wu et al. [35], and Gao et al. [36] have studied, respectively, the scale-free characteristic, small-world effect, and community structure of tourist flow networks.
In view of this, the data source and theoretical basis of the research framework proposed in this paper are digital footprints and network science, respectively. In addition, the research framework divides the tourist flow network into two levels: attractions and cities. With the help of various network indicators, the network patterns of tourist flow in urban agglomerations can be extracted in detail.

A. RESEARCH FRAMEWORK
We propose a novel research framework, as shown in Figure 1. In the framework, digital footprints are used to build the tourist flow network at the level of attractions and cities. By dividing the tourist flow network into two levels, a multi-scale analysis can be performed, which has been proved to provide a more systematic and coherent knowledge of tourism [37], [38]. In the network, attractions or cities are abstracted as nodes, and the tourist flows between nodes are abstracted as weighted edges. The weights are based on the number of tourist movements between two nodes in the network. On this basis, four types of network indicators are selected to analyze the network characteristics of tourist flows.

1) NODE DEGREE, WEIGHTED DEGREE, AND FLOW BETWEENNESS CENTRALITY
The three indicators (node degree, weighted degree and flow betweenness centrality) reflect importance of particular nodes to tourist flows. Node degree is commonly utilized to plot the degree distribution to analyze the heterogeneity of a complex network with a large number of nodes. Therefore, this indicator is applied to the attraction level of the research framework. Weighted degree and flow betweenness centrality are applied to the city level of the research framework since these two indicators are more focused on a fine-grained analysis of each node's importance in the network. The three indicators are described below.
Node degree and weighted degree reflect the link strength of the node in the network. The main difference between the two indicators is that weighted degree takes into account the influence of the weight of the edge (i.e., the volume of tourist flow) on the link strength of the node. The two indicators are defined as follows: where ND i is the node degree of node v i and WD i is the weighted degree of node vi, l ij is the number of directed edges between node v i and node v j , w ij is the sum of the weights of directed edges between node v i and node v j (which can be regarded as the volume of the inflows and outflows of the target destination), and N i is a collection of adjacent nodes of node v i . Flow betweenness centrality measures the degree of control of the target node over other nodes. Our research framework utilized the form of flow betweenness centrality in directed-weighted networks [39]: where BC i is the flow betweenness centrality of node v i , m jk is the maximum flow from node v j to node v k (i.e., the weight value of the path with the largest weight from node v j to node v k ), node m jk (i) is the maximum flow from node vj to node vk through node vi, and n is the total number of nodes.

2) STRUCTURAL HOLES MEASUREMENT
Structural holes are an indicator of connection fractures between network nodes. The calculation of structural holes enables the identification of bottleneck problems in regional tourist flows. The nodes with structural hole advantages generally have strong regional competitive advantages [40]. Therefore, our research framework utilizes structural holes at the city level to evaluate the competitive advantage of each city as a tourist destination. Effective size and constraint are utilized to measure the positive and negative aspects of structural holes, as described below. Effective size measures the non-redundant part of the target node connected to all other nodes. The higher the effective size, the more obvious the competitive advantage of the target node. It is calculated as follows: where z iq is the number of connections from node v i to node v q , p iq is the proportional relationship between the tourist node v i and node v q (i.e., the number of connections between node v i and node v q divided by the number of all the connections of nodes v i ), m jq is the marginal strength between nodes v j and node v q (i.e., the number of connections between nodes v j and node v q divided by the maximum number of connections between node v j and other nodes), and n is the number of nodes in the network. Constraint reflects the degree of the target node's dependence on other nodes. The smaller the constraint, the higher the status and therefore the competitive advantage of the target node in the region. On the contrary, the greater the constraint, the greater the impact of other nodes on the target node, signifying lower competitiveness. It is calculated as follows: where p ij , p iq , and p qj are the proportional relationships between nodes v i and v j , nodes v i and v q , and nodes v q and v j , respectively, and n is the number of nodes in the network.

3) PAGERANK
PageRank was originally designed to measure the importance of specific webpages relative to other webpages in a search engine. Later, it was introduced into complex network analysis to measure the importance of nodes [41].
Compared with centrality indicators such as node degree and weighted degree, PageRank can achieve a more comprehensive consideration of a target node's network status. Our research framework utilizes this indicator at the attraction level to observe the spatial distribution of high-importance attractions. The indicator is defined as follows: where PR i is the PageRank value of node v i , N i and N j are the collections of adjacent nodes of nodes v i and v j , respectively, w(v j , v k ) is the weight value of the directed edge from node v j to v k , and α (0 < α < 1) is the damping coefficient. The larger α is, the easier it is to distinguish nodes' PageRank values. Generally, α takes the empirical value 0.85. Through several iterations, the PageRank value of each node will eventually stabilize, and the sum of the PageRank values of all nodes will approach 1.

4) COMMUNITY DETECTION
Communities are dense subnets in the whole complex network. Therefore, the community detection algorithm is applied to the attraction level of our research framework. The structural strength of communities can be measured by their modularity, which is defined as follows: where m is the number of edges in the network, A ij is the adjacency matrix representing the weight of edges between nodes v i and v j , k i is the weight of all edges connected with node v i , and c i is the community number of node v i . If nodes v i and v j are in the same community, the return value of δ(c i c j ) = 1, otherwise δ(c i c j ) = 0. Communities can be detected by finding the approximate optimal partition with larger modularity among all possible community partitions. Traag et al. [42] recently proposed the Leiden algorithm, which is considered to be the best community detection method based on modularity optimization in operating performance and effectiveness. Therefore, our research framework utilized this method to effectively detect community structures in the network.

B. STUDY AREA
We chose GBA, a world-class urban agglomeration in China, as the study area for the validation of our framework. GBA, also known as the ''upgraded version of the Pearl River Delta Urban Agglomeration,'' is a large urban agglomeration composed of 11 cities (Guangzhou, Zhuhai, Shenzhen, Foshan, Dongguan, Huizhou, Zhongshan, Jiangmen, Zhaoqing, Hong Kong and Macao), with a total land area of about 56,000 square kilometers. By the end of 2017, the permanent residential population of GBA had reached nearly 70 million, and its GDP had exceeded 10 trillion yuan (US$ 1.5 trillion), making it one of the most economically dynamic bay areas in the world. The aim of Chinese government planning is that GBA should not only be built as a world-class urban agglomeration with high-quality development, but also become a high-quality region suitable for living, business, and tourism. GBA has been identified as the fastest growing area in China's tourism industry, and received more than 400 million visitors from home and abroad in 2016 [43].
According to the statistical data of 2017, Guangzhou, Shenzhen, Hong Kong and Macau attracted 204 million, 130 million, 58 million, and 33 million visitors respectively, and their tourism revenue accounted for 73 percent of the whole GBA [44]. Therefore, these four cities are generally regarded as the leading cities of the tourism in GBA. There are also numerous well-known attractions in GBA, such as the Window of the World in Shenzhen, the Canton Tower in Guangzhou, the Victoria Harbor in Hong Kong, and the Ruins of St. Paul's in Macao.

C. DIGITAL FOOTPRINT DATA
Since different data sources for digital footprints portray different aspects of tourists' behavior, their selection has proved controversial. Tourists' location data as recorded by sensor-based data sources (such as geotagged photos, checkin data, GPS trajectory data, etc.) are affected by regulatory issues (such as ethics, areas where photography is prohibited, signal shielding of position sensors, etc.) [45] and often report redundant locational information. Geo-located travel blog data, directly edited by tourists, is therefore chosen as the digital footprint data source within our framework.
The two most popular travel websites in China are Ctrip (www.ctrip.com/) and Qunar (https://www.qunar.com/) [46]. Despite the fact that Ctrip has a slightly larger user base than Qunar, the website's travel blog data lacks standardized geotags, making it unable to suit our geospatial research demands. Therefore, we chose Qunar as the case's primary data source. Qunar.com provides a smart travel blog editing platform; when users write travel blogs on the website, they can insert point of interest (POI) tags for specific content. When the travel blog is published, these POI tags are recorded in the source code of the blog's webpage and can be utilized to visualize the user's travel route. Kaufmann et al. [47] and Zheng et al. [48] called this type of data ''geo-located travel blog data.' ' We collected 8786 travel blogs from Qunar.com dating from 2010 to 2019, which were voluntarily shared by tourists, as the initial data. The collected data contained user ID, blog ID, departure date, and the sequence of POIs visited ( Table 1). The travel blog data often have information errors and other problems, so we designed rules to clean the data, as shown in Table 2. After data cleaning, a total of 26843 visits to 1419 attractions in 3765 blogs were retained. The spatial distribution of the data is shown in Figure 2.

A. ATTRACTION LEVEL 1) DEGREE DISTRIBUTION
We built a GBA tourist flow network at the attraction level. In the tourist flow network, attractions were abstracted as nodes and tourist flows between attractions were abstracted as weighted edges. Figure 3 shows the distribution of the node degrees in the network (based on Equation 1) and its power fitting curve; it can be seen that the node degree distribution of the tourist flow network at the attraction level basically meets the power law distribution, and the R 2 value (i.e., the degree of power fitting) reaches 0.886, which indicates that the network has obvious scale-free characteristics. Tourist flow networks with scale-free characteristics are usually heterogeneous [49], that is, there are fewer nodes with larger degree values and more nodes with smaller degree values, indicating that most of the tourist flows in GBA are mainly concentrated in a few attractions.

2) NODE IMPORTANCE
To comprehensively evaluate the status of network nodes, the PageRank algorithm (based on Equation 6) was utilized to calculate the importance of nodes in the network at the attraction level. The spatial distribution of the calculation result of each node (attraction) is shown in Figure 4.
As shown in Figure 4, most of the important nodes in the GBA tourist flow network are located in Guangzhou, Hong Kong, Macao, and Shenzhen. These four cities are therefore the most popular tourist destinations in GBA. In addition, the attractions in the cities near the three most popular cities (Guangzhou, Hongkong and Macao), such as those in eastern Foshan, southern Shenzhen, and eastern Zhuhai, also have higher PageRank values, showing the effect of geographical proximity. Therefore, the results highlight the ''four cores and three poles'' distribution of the tourism industry in GBA: There are four core cities of Hong Kong, Macao, Guangzhou and Shenzhen, and three growth poles of Hong Kong-Shenzhen, Macao-Zhuhai, and Guangzhou-Foshan.

3) COMMUNITY DETECTION
The Leiden algorithm was utilized to detect communities in the tourist flow network at the attraction level; a total of 32 communities were detected. The spatial distribution of the results is shown in Figure 5. Since the number of nodes in the 10th to 32nd communities is less than 1% of the total, and most of these communities are located on the geographic edge of GBA, they are unified as one independent community in the map.
As shown in Figure 5, although the spatial distribution of attractions in GBA is relatively dispersed, the communities of attractions have clear geographical boundaries. The communities of attractions in Hong Kong, Macao, and Shenzhen are close to the actual administrative divisions. The community boundaries between Zhuhai and Macao, Hong Kong and Shenzhen, and Guangzhou and Foshan are also consistent with the actual administrative boundaries. In particular, the attractions of Wanshan archipelago, located to the east of Zhuhai, are not included in the communities of Hong Kong or Macao due to geographical proximity, but are in the same communities as those of Zhuhai inland. Therefore, the network structure of tourist flows in GBA has been affected by real-world administrative divisions, showing a strong ''administrative barrier effect.''

B. CITY LEVEL 1) WEIGHTED DEGREE AND FLOW BETWEENNESS CENTRALITY
We also built a tourist flow network at the city level. In this network, attractions located in the same city are merged into one node. Since the weighted degree (based on Equation 2) of a city in the tourist flow network is the volume of its bidirectional tourist flow, Figure 6 was drawn to show the flow relation between cities in GBA.
As shown in Figure 6, the tourist flows between Hong Kong and Macao, Zhuhai and Macao, Guangzhou and Zhuhai, and Hong Kong and Shenzhen have the largest volume. In terms of weighted degree, Hong Kong, Macao, Zhuhai, Guangzhou, and Shenzhen have the highest values. Among these five cities, Zhuhai, Guangzhou, and Shenzhen are the cities with the highest weighted degree values in the part of GBA that lies within mainland China. Considering that transportation between cities in mainland China is more convenient, it seems that Guangzhou, Shenzhen, and Zhuhai play a ''mediation role'' in the GBA network pattern. This interpretation is further confirmed by the calculation of flow betweenness centrality (based on Equation 3) as shown in Figure 7. It can be seen from Figure 7 that Zhuhai, Guangzhou, and Shenzhen are indeed the three mediation nodes of the tourist flow network.

2) MEASUREMENT OF STRUCTURAL HOLES
The two indicators of effective size (based on Equation 4) and constraint (based on Equation 5) were utilized to measure Overview of the data cleaning procedure for geo-located travel blog data. structural holes in the GBA tourist flow network. The results for each city are shown in Table 3, and a visualization of the tourist flow network, according to the effective size and constraint values, is shown in Figure 8. The size of the nodes in the figure represents the level of the structural hole indicators.
When comparing the results of structural hole indicators in Figure 8(a) and (b), it becomes evident that the distribution of nodes' effective size and constraint in the tourist flow network possesses certain regular characteristics. The four city nodes of Guangzhou, Zhuhai, Hong Kong, and Shenzhen have higher effective size values but lower constraint values, indicating that these cities have obvious competitive advantages. Zhaoqing has the lowest effective size value and the highest constraint value. Therefore, Zhaoqing is the city with the least competitive advantage, which is consistent with Zhaoqing's marginal status in GBA. In addition to Zhaoqing, Zhongshan and Jiangmen also have competitive disadvantages due to their lower effective size value and higher constraint value. This indicates that the tourist flows to and from these   cities have been limited; they are disadvantaged in the competition for tourists. In addition, the two indicators for Macao are at a medium level, indicating that Macao almost does not participate in competition for tourism; i.e., it neither significantly impacts, nor is impacted by, other cities.

3) VISUALIZATION OF NETWORK PATTERNS
Based on the above analysis, the network patterns of tourist flows in GBA can be summarized according to the proposed research framework (Figure 1). A schematic diagram of the visualization of network patterns is presented in Figure 9.
As shown in Figure 9, the tourist flows in GBA form a network pattern showing fierce competition and

V. DISCUSSION
Building GBA into a world-class tourist destination is an important strategic goal for the Chinese government. The analysis results verified that the proposed research framework was able to effectively extract the network patterns of tourist flows in an urban agglomeration at the attraction and city levels. The framework can provide valuable implications for tourism managers and policymakers to promote the development and integration of the tourist industry, as described below. First, the GBA tourist flow network has been proved to be heterogeneous at the attraction level ( Figure 3). GBA is a large-scale bay area, with diverse tourism resources and environments among member cities. Therefore, it seems difficult to promote balanced tourism development in GBA.
Three city groups (also the development poles), Guangzhou and Foshan, Hong Kong and Shenzhen, and Macao and Zhuhai, have many important nodes in the tourist flow network ( Figure 4). Efforts to strengthen the tourism connections between these three city groups and their surrounding cities, such as by promoting tourism industry cooperation and combined marketing, will drive the development of tourism in the entire region to a certain extent.
Second, ''administrative barriers'' are the main obstacle to promoting the integration of tourism in GBA. For example, the spatial distribution of attraction communities located at the three development poles of Guangzhou-Foshan, Hong Kong-Shenzhen and Macao-Zhuhai are obviously influenced by administrative boundaries ( Figure 5). GBA has a unique ''9+2'' structure, which covers nine mainland cities and two special administrative regions (Hong Kong and Macao). This structure appears to make it difficult to break the ''administrative barriers'' of the tourism industry in this region through policy measures. In order to solve this problem, the local tourism management departments need to take the ''Outline Development Plan for the Guangdong-Hong Kong-Macao Greater Bay Area'' [50], promulgated by the Chinese government, as an important reference in promoting reforming innovation and industrial cross-regional cooperation. One such effort is the construction of ''Guangfo city'' by integrating Guangdong and Foshan into one connected city. Furthermore, cities should cooperate to formulate a development plan for tourism resource integration, so as to weaken the impact of the ''administrative barriers'' on GBA tourism. Additionally, GBA's transport network needs to be optimized. At present, the Guangzhou-Hong Kong High-Speed Railway and the Hong Kong-Zhuhai-Macao Bridge have been built to enhance transport links between cities in GBA. However, due to various policy issues, especially complicated transport procedures, the benefits of these two transport infrastructures have not been maximized. In the future, it will be necessary to enhance the convenience of transportation and create high-quality tourism routes, to promote the coordinated development of tourism in GBA. VOLUME 10, 2022 Finally, more attention should be paid to the ''interaction mechanism'' of tourist flows among cities in GBA. Guangzhou, Zhuhai, and Shenzhen have high flow betweenness centrality values in the tourist flow network. These three cities are, thus, identified as the important mediation cities of tourist flows, which is consistent with their ''hub'' role in the GBA transportation system. Strengthening the construction of expressways, ports, airports, and other transportation infrastructures in these three hub cities will promote the efficient connection of tourist flows among cities in GBA. In addition, the structural hole effect caused by the adjustment of tourist flows requires vigilance. For example, the calculation of structural hole indicators show that although Zhongshan and Jiangmen are located between Guangzhou-Foshan and Macao-Zhuhai, the two development poles of GBA, they lack a competitive advantage in the tourist flow network. Therefore, giving full play to the positive role of competitive cities and avoiding their negative impact on tourist flows in surrounding cities is a vital issue that tourism managers and policymakers need to address.
The analysis clarified the development status of cities in GBA. For example, Zhaoqing has the most obvious competitive disadvantage in the tourist flow network, which is consistent with the city's status as the lowest ranking city in total annual GDP and proportion of service industry output value. Zhaoqing is a city that has yet to break through the development dilemma in terms of its industrial structure. Although Zhuhai's ''static'' tourism popularity (i.e., number of visits) is not high, Zhuhai's high weighted degree and high flow betweenness centrality in the network reveal its strong ''dynamic'' role in circulation of tourist flows, which shows its tourism development potential. Macao has no obvious structural hole effect in the tourist flow network, and lacks participation in the competition of tourist flows. Macao is thus indeed a suitable place for building recreation centres in GBA.

VI. CONCLUSION
This paper proposed a novel research framework for extracting the network patterns of tourist flows in urban agglomeration through digital footprints. Geo-located travel blog data from Qunar.com were collected for the GBA case study. The conclusions can be summarized as follows: At the attraction level, the research framework effectively evaluated the complexity of the tourist flow network. In the case of GBA, the heterogeneity of the network was proved by its significant scale-free characteristics, and the development issue of ''administrative barriers'' was revealed by the detected communities.
At the city level, the research framework clarified the interaction of tourist flows among cities from a sociological perspective. In the case of GBA, Guangzhou, Zhuhai, and Shenzhen play a meditating role in the tourist flow network. Zhongshan and Jiangmen are the most disadvantaged in competition for tourism.
The empirical results of the research framework can provide a theoretical basis and practical implications for the planning and management of tourism industry in urban agglomerations. The new findings of empirical results are mainly reflected in three aspects. Firstly, the pattern of ''four cores and three poles'' in GBA was extracted from the perspective of tourism, which corresponds to the ''Outline Development Plan for the Guangdong-Hong Kong-Macao Greater Bay Area'' promulgated by the Chinese government [50]. Secondly, the ''administrative barrier effect'' was quantitatively confirmed to have an impact on the tourism in GBA. Finally, the status of cities in the tourism of GBA was effectively determined through network indicators. In addition, our case study of GBA also provides implications for other bay areas or urban agglomerations. For example, the ''four cores and three poles'' pattern of tourism in GBA may give guidelines for tourism development in single-core Bay areas such as Tokyo Bay Area in Japan. The structural hole effect, caused by the limits of surrounding dominant cities, has impacts on the tourism of Zhaoqing, Zhongshan, and Jiangmen in GBA. Similar phenomena may have occurred in other urban agglomerations similar to GBA, such as the Yangtze River Delta urban agglomeration, which should arouse the vigilance of tourism managers.
In summary, the framework proposed in this paper is highly feasible and can be applied to other tourists' digital footprint data sources. At the theoretical level, our research framework simultaneously introduces network science and multi-scale analysis into tourist flow research, allowing for a more comprehensive and systematic extraction of network patterns of tourist flows. At the application level, our research framework was applied to GBA, the first Bay Area Urban Agglomeration in China, and exhibited the pattern detection findings in the form of a simplified map (Figure 9), which provided intuitive implications for tourism managers and policymakers.
However, the limitations of the research and suggestions for future work are as follows: although travel blog data has advantages in describing tourists' behavior compared with other digital footprint data sources (see section 3.3), its sample size is usually small, and research is thus often based on only hundreds of blogs [29], [48], [51]. Therefore, the use of travel blog data is often accompanied by user representativeness and data bias. The users of Qunar are mainly born between the years 1980 and 1999 and have high educational levels and income [52]. Furthermore, although Qunar has been operating in Hong Kong and Macao for nearly ten years, it is still less popular in these two regions than in mainland China. Therefore, the case study in this paper may be affected by this ''skewness.'' As a typical social media data source, an ideal bias evaluation of travel blog data can be carried out from four perspectives (who, where, when, and what), but there is still a lack of solutions to completely address or quantify these perspectives [53]. In view of the above limitations, an obvious next step is to improve the data collection part of the research framework, such as by incorporating multi-source data fusion, and designing more effective methods of data cleaning and data bias evaluation. Furthermore, our research framework only utilizes the geotags of travel blogs, with no regard for the blog text. We will utilize the text of travel blogs to expand the framework since it can assist us comprehend the driving forces of tourist flows.

ACKNOWLEDGMENT
The author is grateful to Yunhao Zheng for the inspiration of research idea. He also offered valuable suggestions for data visualization and paper writing.