An Adaptive Network-Constrained Clustering (ANCC) Model for Fine- Scale Urban Functional Zones

Urban functional zones are considered significant components for understanding urban landscape patterns in the socioeconomic environment. Although the spatial configuration of road networks contributes to urban function delineation at the block level, the morphological uncertainties caused by the road network structure in fine-scale urban function retrieval are ignored. This paper proposes an adaptive network-constrained clustering (ANCC) model to map urban function distributions at a finer level. By utilizing points of interest (POIs) to indicate independent functional places, the adaptive road configuration with a multilevel bandwidth selection strategy is proposed. On this basis, a term frequency–inverse document frequency-weighted latent Dirichlet allocation (TW-LDA) topic model is designed to delineate urban functions from semantic information. Taking Futian District, Shenzhen, as a case study, the results show an overall accuracy of approximately 77.10% in urban function mapping. A comparison of a block-level mapping model, a non-adaptive network-based model and the ANCC model reveals accuracies of 53.10%, 59.20% and 77.10%, respectively, indicating the advantages of the ANCC model for improving urban function mapping accuracy. The proposed ANCC model shows potential application prospects in monitoring urban land use for sustainable city planning.


I. INTRODUCTION
Rapid urbanization processes have led to drastic socioeconomic evolution [1], [2]. In the process of urban development, the potential impacts and interactions of population, road, social and economic activities have been revealed by many studies [3], [4]. The significant socioeconomic changes have improved transportation and have further led to the diversity of urban functions such as residential, industrial, commercial and sports areas [5]. Identifying these functional areas and delineating their distribution characteristics are essential to understanding urban spatial structures and guiding sustainable city development [6]- [8]. Conventional approaches to identify urban functions rely heavily on remote sensing and land use data and professional surveying to delineate The associate editor coordinating the review of this manuscript and approving it for publication was Weimin Huang .
block-level urban functions [9]- [11], and these approaches are limited by time and labor-consuming data acquisition.
The widespread availability of mobile positioning technology has led to opportunities to obtain large-scale geotagged data that contain rich human activity information, such as mobile phone signaling data, points of interest (POIs), social media data, GPS positioning records and street view images [12]- [14]. Geotagged data have been widely utilized in urban function retrieval in previous studies. Jiang et al. [15] demonstrated that Twitter density can represent population density under certain conditions, thus reflecting the activities of urban residents. Lloyd et al. [16] explored the spatial distribution of the urban retail industry by studying the Twitter distribution. In addition, by utilizing reclassified POI data, single functional areas and mixed functional areas have been identified quantitatively [17]. Yunliang et al. [18] calculated the overlap rate between a POI distribution with obvious VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ category characteristics and clustering results and determined the functions of urban functional areas. Despite the achievements made by applying large-scale geotagged data, disperse distributions that cause biases in retrieving continuous urban functional zones have not been considered. To solve this issue, Luo et al. [19] integrated user trajectory data and density-based spatial clustering for noisy applications (DBSCAN) to explore high-density regions in space. Tang et al. [20] proposed a novel model, namely, Detecting and Evaluating Urban Clusters, using an agglomerative hierarchical clustering method to detect urban clusters based on the similarities in the daily travel space of urban residents. Tu et al. [21] proposed a new framework based on hierarchical clustering integrating remote sensing imagery and mobile phone positioning data to analyze urban functional zones with landscape and human activity metrics. In addition, they systematically integrated geotagged data distributions and building morphologies to depict urban functional characteristics. Spyrou et al. [22] proposed a tile-like partitioning algorithm to divide predefined geographical regions. Yuan et al. [23] developed a topic modeling-based approach to cluster segmented regions into functional zones by leveraging mobility and location semantics mined from latent activity trajectories.
Based on the retrieved continuous urban functions, the driving factors that change urban functional zones should be considered to improve the retrieval accuracy. In fact, road networks can effectively reflect the spatial heterogeneity of urban functions from the perspective of city morphologies [24], [25]. Existing research has provided evidence suggesting the significance of road networks in developing various urban functions [26]. Okabea et al. [27] proposed a clustering method based on road network constraints and verified that human mobilities are essentially limited by traffic networks. Moreover, Ma et al. [28] proposed an Epanechnikov-based kernel density estimation (KDE) with a bandwidth selection strategy to extract road-constrained areas of interest.
Despite the morphological characteristics of urban structures that road networks represent, the discrepancies of urban functions supported by road networks are ignored. Previous studies solely considered road networks as corridors connecting different functions of blocks and buildings, instead of as independent spaces. In fact, in addition to transportation, road networks serve as a significant component in various functions [29], [30]. For instance, Zhu et al. classified the areas around the road into different functions based on the spatial-temporal patterns of urban mobility on the roads. Thus, it is essential to retrieve network-based functional zones based on urban structures. To accurately depict the functional zones, an adaptive approach to retrieve network-constrained urban functions is required.
In this study, an adaptive network-constrained clustering (ANCC) model is proposed to retrieve fine-scale urban functional zones. First, the adaptive road configuration with a multilevel bandwidth selection strategy is proposed to delineate fine-scale zones. A term frequency-inverse document frequency-weighted latent Dirichlet allocation (TW-LDA) topic model is then proposed to delineate urban functions. By utilizing POIs to indicate independent functional places, the proposed ANCC model is applied in Futian District, Shenzhen. The accuracy of the ANCC model is further evaluated and compared with that of a block-level mapping model and a non-adaptive network-based model.
The remainder of this paper is arranged as follows. Section II describes the datasets and study area used in this study. Section III introduces the overall framework and methodology of the ANCC model and TW-LDA. Section IV displays the experimental results. Section V evaluates the model accuracy and compares the ANCC model with a nonadaptive urban function mapping model. Section VI summarizes the conclusions and future work of this study.

II. STUDY AREA AND DATA
The study area is in Futian District, Shenzhen, Guangdong Province. Shenzhen is considered one of the national economic centers and international cities in China and occupies an important position in economic development, including high-tech industry, financial services, foreign trade and export, marine transportation and creative culture. Futian District, the political and business center of Shenzhen, reflects a significant variety and complex distribution of urban functional zones.
This study area and the corresponding data sources including POIs and road networks are shown in Figure 1. In particular, both POIs and road network data were collected from Gaode (available at: lbs.amap.com). Since POIs are updated by Gaode at the minute level (lbs.amap.com/home/ advantage?active = data), the spatial occupation of data can be ensured, which can reflect the spatial distribution of human activities related to urban functions with high accuracy. In this study, we collected a total of 132,584 POIs with accurate location coordinates and detailed classification information. Specifically, these POIs are classified into nine primary categories such as educational, commercial and residential classes; 68 secondary categories such as accommodation and companies; and 217 tertiary categories such as Park Plaza and industrial parks. From a visual inspection, one can find that these POIs can cover the entire study area (excluding the green land and water bodies, which are not accessible). Moreover, the road networks collected from Gaode were preprocessed by eliminating urban expressways and correcting position deviation. Based on the correlation between various types of roads and urban functional zones, freeways and ramps are excluded. Finally, 2522 roads after preprocessing were used in this experiment.

III. ANCC MODEL FOR RETRIEVING URBAN FUNCTIONAL ZONES
In this study, an ANCC model is proposed to retrieve finescale urban functional zones. The overall framework consists of two components, the adaptive road configuration and TW-LDA semantic function recognition, as shown in Figure 2. First, the adaptive road configuration with a multilevel bandwidth selection strategy is proposed to depict urban zones. In particular, POIs are adaptively constrained by roads based on different bandwidths that fit multilevel street networks. A TW-LDA model is proposed to identify the semantic functions corresponding to the urban zones. Each semantic function is identified through the POI categories by applying the TW-LDA model.

A. ADAPTIVE ROAD CONFIGURATION
To depict morphological characteristics affected by road networks, this section proposes an adaptive road configuration approach with a multilevel bandwidth selection strategy. The adaptive road configuration involves a road-constrained KDE method, which is modified based on the traditional KDE approach. Traditional KDE usually maps homogeneous spaces based on Euclidean distance or emphasizes the constraint functions of road networks [31] and ignores the uncertainties of zones caused by road networks. Since POIs representing different urban functions are usually distributed near the sides of roads, it is necessary to consider different bandwidths in traditional KDE to depict POIs within road zones.
To contend with this issue, a network-constrained KDE is proposed based on the traditional KDE. Generally, the traditional KDE is defined as follows: where λ (x) is the KDE; x represents the location of POI; x i represents the surrounding POIs within the bandwidth; h is the attenuation threshold of the path distance (i.e., bandwidth); n is the number of POIs whose distance from position x is less than or equal to the bandwidth hh; and k(·) is the kernel function. Among many kernel functions, the Epanechnikov kernel function is chosen because it shows potential VOLUME 9, 2021 to provide sufficient smoothing in the present independent data. It improves delineating region boundaries with adaptive bandwidths. The Epanechnikov kernel function is applied as follows: From the above definition, one can find that traditional KDE uses a fixed bandwidth and fails to depict the characteristics of the road-network spatial distribution [32]. Considering the urban function-driven road networks, the limitations caused by the traditional KDE can be summarized in the following two aspects: (i) Due to the heterogeneous distribution of roads, with dense road networks usually distributed in central urban areas, the continuous urban areas generated by traditional KDE could be dominated by those densely distributed road networks, leading to a close and overlapping distribution. (ii) Morphological characteristics cannot be depicted accurately in residential and industrial areas in which few roads are distributed. Examples of these areas indicating specific urban functions are parks and manufacturing districts, which are often located in small clusters away from major roads. Accordingly, it is necessary to determine the proper bandwidths to accurately identify spatial zones with regard to different urban functions.
To address these challenges, the road network is categorized into different levels based on the road densities in urban space. The key consideration for obtaining the appropriate road network density is to determine the search radius of the road network density. By setting a large number of control experiments, the search radius suitable for the experimental area is determined to calculate the road network density. Then, the Natural Breaks (Jenks) method is used to adaptively classify the density intervals of road networks. The Natural Breaks (Jenks) data classification method is designed to optimize the arrangement of a set of values into ''natural'' classes. This classification method seeks to minimize the average deviation from the class mean while maximizing the deviation from the means of the other groups. The average deviation of road network density is defined as follows: where AD is the average deviation of road network density; d is the pixel value of each density;d is the arithmetic mean; and n is the number of pixels. Based on the Natural Breaks (Jenks), the classification results of road network density can be obtained. Then, the coefficient of variation is used to set the appropriate bandwidth for each classification. Because the dimensions of each classification are different, the coefficient of variation (CV), which is affected by both the dispersion degree and the average value of the variable values, is used to measure the relationship between the bandwidth of each classification. The adaptive bandwidth based on road-network density classification is defined as follows: where h i is the bandwidth of the road from the ith classification; σ i is the standard deviation of the ith classification;d i is average density of the ith classification; n i is the number of pixels in the ith classification; α is the calculation coefficient. After measuring the relationship of each classification by the CV, a large number of control experiments are set to determine the value of α. After repeated experiments, setting α = 5 is the most suitable for the experimental area. As a result, the network-constrained KDE can adaptively set the bandwidth according to the road network density to depict urban zones with higher accuracies. The classification of the adaptive bandwidth strategy and the bandwidth size set at each level are determined by the variation in network density of the experimental area. In other words, different experimental areas can flexibly vary according to the density distribution characteristics and complexity of the network. The network-constrained KDE can effectively retrieve the urban functional zones within the scope of road distribution but fails to capture POIs that are away from the road networks. As a supplement, we extract POIs that are away from the road distribution according to an adaptive bandwidth strategy and obtain their KDE results through the traditional KDE method. As a result, the urban functional zones that are not significantly affected by road networks are retrieved.
The overall process of the adaptive road configuration approach can be summarized as the following steps. (i) The network-constrained KDE is applied to discrete POIs in space to obtain the kernel density results near the road networks. (ii) The remaining POIs are extracted to obtain the kernel density results by traditional KDE. (iii) Since the KDE results with network constraints and the traditional KDE results obtained based on different processing methods cannot be integrated directly, two KDE results based on pixels must be normalized. On this basis, the contours of urban functional areas are extracted by vectorization. The normalization formula for the density value is expressed as follows: where δ i is the value of normalized KDE pixel i, P i is the traditional KDE value at pixel i, and N i is the networkconstrained KDE value at pixel i.

B. TW-LDA TOPIC MODEL
Based on the spatial zones of urban functional areas obtained by the adaptive road configuration approach, specific urban functions must be identified among them. Considering urban functional zones as independent units, the three-level textual type of POIs is used as the basis for function identification.
In particular, because of the mixed urban functions in real 53016 VOLUME 9, 2021 cases, a small number of representative functions such as medical care and education can be easily concealed by a large number of commercial functions. To address this issue, a TW-LDA topic model is proposed to set appropriate weights for each type to improve the accuracy of function recognition. First, a term frequency-inverse document frequency (TF-IDF) model is used to determine the weights of POI types. The text of all POI types in a urban functional zone is set as the document, and the document set of all city functional zones is set as the corpus. The weights of a POI type increase with the frequency of their appearance in the document but decrease with the frequency of the appearance of the document that contains them in the corpus. The TF-IDF is defined as follows: where j represents the document of an urban functional zone, tf ij denotes the frequency of the type of word V i that appears in document j, IDF (V i ) denotes the inverse document frequency of the type of word V i in the corpus using log n n i +a , n is the total number of documents in the corpus, n i is the number of documents containing the type of word V i in the corpus, and a is the adjustment factor. Usually, a = 1 is set to avoid a zero denominator. Because this experiment uses POI type text to calculate the weight, some types are either overly abundant or excessively sparse, so the weight is not suitable, which reduces the accuracy of urban function recognition results. Therefore, to account for weight inaccuracy, we adjust the weight by adjusting a to obtain more realistic recognition results. On this basis, the weight of a certain type w is defined as follows: Based on the calculated weights of each word indicating a type, a latent Dirichlet allocation (LDA) model [33] with the word weights is used to calculate the topic distribution of representative functions and is defined as follows: where W is the total number of words indicating types, K is the number of topics, V is the number of words indicating types without repetition, k is the topic variable, weight(w) is the weight of a word of type w, n (k) md denotes the number of words of type i assigned to topic k in the document m, ¬i means that the influence of the current word is ignored in the sampling process, and α and β are the hyperparameters of θ and ϕ in the Dirichlet distribution, respectively. The distribution θof the ''functional document topic'' is as follows: The function types are further identified based on the results of the topic distribution of POI types, which is defined as follows: In the output results of functional zone document m, words indicating types with a probability higher than 0.07 and of similar type in each topic were selected to infer specific functions among topics, and those words indicating different types in the topic were deleted. As a result, the topics with the highest values of distribution probabilities were selected as the target functions.
A comparative study is proposed to assess the accuracy of the proposed TW-LDA model (compared with the block-level mapping model and the non-adaptive network-based model). Details are illustrated in Section V.A.

A. SPATIAL RETRIEVAL OF URBAN FUNCTIONS 1) ADAPTIVE BANDWIDTH SELECTION BASED ON ROAD NETWORK DENSITY
Since the traditional KDE with fixed bandwidth fails to capture urban functional zones driven by road networks, the adaptive road configuration approach with a multilevel bandwidth selection strategy is proposed. The key issue of the multilevel bandwidth selection strategy is the selection of the bandwidths. Higher or lower values of bandwidths in the network-constrained KDE may lead to biases in retrieving urban zones. To determine the target bandwidths, the relationship between road network densities and bandwidth variation must be investigated. Specifically, the selection of bandwidth is determined by the appropriate road network density. The key consideration for obtaining the appropriate road network density is to determine the search radius. Comparative studies are proposed to explore the appropriate search radius, as displayed in Figure 3. Although the smaller search radius (h = 100 m or 150 m) can generate fine-scale morphological patterns, patterns that are restricted to road networks are discretely distributed (Figure 3(a) and (b)). Furthermore, the larger search radius (h = 200 m or 250 m) cannot effectively distinguish the road network densities within areas, which are displayed in continuous patterns that have been underfit (Figure 3(c) and (d)). The results indicate that higherdensity road networks tend to be associated with a search radius from 150 m to 200 m to depict more accurate urban zones. While a larger search radius can cause over-fitting of discrete morphological patterns, a smaller search radius leads to continuous morphological areas and cannot be used to establish an adaptive bandwidth strategy.
Through comparative evaluations, a search radius of 170 m is selected to fit the patterns in our experimental study, as shown in Figure 4. On this basis, the road network densities are calculated within 0-34, and the effective density range is 3-34 after filtering out the invalid data. From a visual inspection, one can distinguish that the road densities in the eastern and southern areas are much higher; in contrast, in areas far from the central areas, the roads are relatively sparse with much lower road densities. Thus, adaptive bandwidth selection strategies are required to fit the variation of road densities. Based on the Natural Breaks (Jenks) method, the classification number is set to 4, and the calculated road densities within 3-34 are divided into four levels-3-9, 9-14, 14-19, and 19-35, as shown in Figure 5. The CV in the adaptive bandwidth calculation method to measure the bandwidth relationship of each road classification is calculated based on the above road-network density classification results. The calculation coefficient α is determined by comparing the results of bandwidth and road network constraints when it is 1-10. The experimental results show that excessively small or overly values will lead to biases of the network-constrained KDE. Finally, α is determined as 5. Based on this approach, the bandwidth of each road classification is set as 55 m, 80 m, 100 m and 125 m ( Table 1). The spatial distribution of road classification based on the four bandwidth levels is displayed in Figure 6. As a result, each road is set up with an appropriate bandwidth for applying the network-constrained KDE.

2) NETWORK-CONSTRAINED ZONE RETRIEVAL
Based on the classified road network levels and the corresponding bandwidths, the POIs and road network data are applied in the adaptive road configuration. It should be noted that POIs were divided into two parts: road-constrained POIs and POIs deviating from roads. The POI deviation was proposed according to the bandwidth of each road: POIs within the bandwidth scope were considered as road-constrained POIs, whereas the remaining POIs were considered deviations from the roads.
For the network-constrained KDE, road-constrained POIs are utilized and estimated in network-driven urban zones, which are displayed in Figure 7. In particular, the KDE values are higher in areas with a dense road network, whereas the values in areas with a sparse road network are much lower. The degree of estimation is consistent with the 53018 VOLUME 9, 2021  road distribution. However, as the scope of the networkconstrained KDE is limited around these distribution patterns, urban functions that are less significantly affected by road zones cannot be captured.
To resolve this issue, traditional KDE is integrated in the network-constrained KDE. Specifically, POIs deviating from the road networks (the remaining POIs after extracting the near-road data) are utilized. As shown in Figure 8, those POIs are mostly distributed in blocks far from the road network or in large areas with few roads, especially for those in residential and industrial areas. Traditional KDE with a fixed bandwidth of 85 m based on those data is applied, in which the bandwidth is determined based on the KDE parameter selection strategy proposed VOLUME 9, 2021  in this study. The estimated results are shown in Figure 9. Consistent with the POI distribution, most of the traditional KDE results are distributed in the areas far from the road at the edge of the city and in the blocks in the downtown area.
Since the estimated results of traditional KDE and network-constrained KDE are calculated based on different approaches, a normalization method was applied to integrate traditional KDE and network-constrained KDE estimation. As shown in Figure 10, both urban functional zones affected  by the road network and those solely influenced by POI distributions are retrieved. The proposed adaptive road configuration approach considers different scenarios of whether functional zones are constrained by the road networks to improve the accuracy of urban functional zone retrieval.

B. SEMANTIC RETRIEVAL OF URBAN FUNCTIONS
Based on the estimated urban functional zones, the types of the corresponding POIs can be obtained. All types from POIs in the urban functional zone are regarded as one document, and all the document sets in this region are input into the VOLUME 9, 2021  TW-LDA model. Each POI contains three levels of types, and the results of TW-LDA model are affected by three levels of types. TF-IDF is used to calculate the weight of the type, so the weight of the type varies in each urban functional zone document. Usually, the adjustment factor a = 1 is set to avoid a zero denominator. In particular, in the extreme number distribution of some types in the document set, it is more appropriate to adjust the weight of the type by adjusting the adjustment factor a. In an urban function zone document, the weights of all types are calculated, and examples are displayed in Table 2. The weight of ''Medical'' in level 1 is 0.28, which is much greater than that of ''Commerce'' with the weight of 0.09. This result indicates that this weighting strategy shows the potential of eliminating the weighting allocation biases caused by large quantities of specific textual information classes (i.e., ''Commerce'' in this example), which helps improve the accuracy of function identification.
After the TF-IDF weights of words indicating types were obtained, all documents extracted from POI textual type were applied in the LDA model based on the calculated weights. The number of topics k = 9 was determined according to the number of urban function classes. In the output results of each functional zone document, words indicating types with a probability higher than 0.07 and similar city functions in each topic were selected to infer specific functions among topics, and those words indicating different urban functions in the topic were deleted. As a result, the topics with the highest values of distribution probabilities were selected as the target functions. The urban function classification results within urban zones are shown in Figure 11.

A. MODEL VALIDATION
A gridding method was utilized to verify the urban functional zone classification results. The study area was divided into grid cells with a resolution of 50 m, and the classification results of these grid cells were further validated based on high-resolution remote sensing images, Internet maps and street view images. In particular, noise grids such as road and open space were removed to effectively extract the target grids. Based on the data of remote sensing images and street view images, the outline and function of urban functional zones are evaluated as accurate or inaccurate. When multiple  types are mixed in one element, we mainly rely on the street view image for verification. Street view images are collected from the field, providing 360-degree panoramic images that can reflect the actual function types comprehensively. In the mixed function, the identification of the main function depends on the information contained in the street view image, such as shop signboard information and the location of facilities, which can help to effectively identify the main function. In particular, it is very challenging to identify the main functions in residential-commercial mixed functions. We take the location of mixed function pixels in street view images as an important basis for identification.  For example, mixed function pixels in residential communities with densely distributed residential buildings are often identified as residential buildings, while those in commercial areas with densely distributed shops and restaurants are identified as commercial buildings.
The model validation is divided into two parts: (1) identifying mixed function zones in specific areas and (2) assessing the overall accuracy of the entire study area based on stratified random sampling.
In the first part, four sites, Shenzhen Art School, Shenzhen Sports Center, Shenzhen Children's Paradise and Xiasha Cultural Square, were selected for urban functional zone verification. The verification areas were divided into grid cells, as shown in Figures 12 (c) The verification indicates that the selected sites are correctly identified based on the proposed ANCC model. In particular, Figures 12 shows that the roads around Shenzhen Art School are sparse while the textual information in POIs is rich, which indicates that the distribution patterns of both road networks and POIs were associated with the improvement in classification accuracy. Figures 15 shows that Xiasha Cultural Square contains denser roads and near-road POIs, indicating that urban functional zones are significantly restricted by road networks. By utilizing the TW-LDA model, Children's Paradise and the sports center can be accurately identified in areas with highly mixed POIs. In addition, Figures 13 and 14 show that the park area and cultural area can be accurately classified among commercial-function POIs, revealing that textual information biases can be eliminated by applying the proposed TW-LDA model.  The verification results of the four sites are shown in Table 3. The accuracies of Shenzhen Art School, Shenzhen Sports Center, Shenzhen Children's Paradise and Xiasha Cultural Square are 86.49%, 76.16%, 77.30% and 81.17%, respectively. This result indicates that the proposed ANCC model can be effectively utilized for urban functional zone identification when mixed urban functions are encountered.
In the second part, the entire experimental area is divided into grids of 50 * 50 m, and 12,371 effective grids are obtained for validation, as shown in Figure 16. Based on stratified random sampling, different numbers of grids are selected based on the proportion of the classified urban functions. The verification results are shown in Table 4. It can be seen that the accuracies of urban functions, including residence, commerce and business functions, are relatively lower

B. MODEL COMPARISON
To evaluate the proposed ANCC model, two comparative models, the block-level model and non-adaptive networkbased model, are proposed. In the block-level model, road networks are utilized to delineate urban functional zones. The non-adaptive network-based model applies fixed-bandwidth KDE that ignores the effect of road network restrictions.
Based on the same experimental area and random stratified sampling, the urban functional zone results of the three models are displayed in Figure 17.
The classification results of the block-level model are shown in Figure 17 (a). Since this approach considers road networks as zone boundaries, the delineated spatial units contain more mixed urban functions and fail to capture the road structures. Figure 17 (b) displays the classification results of the non-adaptive network-based model. The fixed bandwidths usually generate urban functional zones in high-roaddensity areas and ignore those in sparse-network areas. For example, downtown areas contain high-density roads and are identified as continuous urban zones without restrictions to road structures. Moreover, most urban zones were classified as commercial functions because of the biased allocation of POI textual information. As residential and industrial VOLUME 9, 2021   functions are far from the road network, solely using networkconstrained KDE leads to biases in extracting functional areas in areas without roads. Figure 17 (c) shows the urban functional zone results classified by the ANCC model. The adaptive bandwidth strategy makes it possible to flexibly identify functional areas by integrating POI aggregation and the road distribution.
To assess the classification accuracies of the three models, the validation sites selected from the above section were utilized. As shown in Table 5, the overall accuracies of the block-level model, non-adaptive network-based model and ANCC model are 53.10%, 59.20% and 77.10%, respectively, revealing that the accuracies of the proposed ANCC model are approximately 24% and 17.9% higher than those of the block-level model and non-adaptive network-based model, respectively.

VI. CONCLUSION AND FUTURE WORK
This paper introduces an ANCC model for urban functional zones. By utilizing POIs to indicate independent functional places, an adaptive road configuration approach with a multilevel bandwidth selection strategy is proposed to capture network-constrained morphological characteristics. A TW-LDA topic model is then designed to delineate urban functions from POI textual information. The proposed ANCC model is further evaluated by a comparison with the blocklevel model and non-adaptive network-based model. Taking Futian District, Shenzhen, as a case study, urban functional zones determined by this model show an accuracy of approximately 77.10%, which is approximately 24% and 17.9% higher than those of a block-level model and non-adaptive network-based model, respectively. The proposed ANCC model improves the accuracy of the traditional urban zone methods. However, limitations remain in the current research. The proposed ANCC model relies heavily on the spatial distribution of the road network. Missing road network data and deviations can lead to biases in functional zones. Moreover, urban functions that are distributed in the form of mixed functions can result in inaccurate identification. The limitations in the semantic information of POIs also lead to drawbacks in fully representing the human activities related to urban functions. In addition, road network types and accessibility have significant impacts on urban function allocation [34]. Different types of roads, such as main roads, auxiliary roads and community roads, can exhibit various constraint effects on urban zones. In future work, we will try to explore the heterogeneous patterns of road networks and incorporate multisource semantic data to improve urban functional retrieval accuracies JIE SONG received the B.S. degree in geographic information science from Shandong Normal University, China, in 2014, where he is currently pursuing the M.S. degree. His research interests include the spatial analysis of land use/land cover and applied urban modelling.
HANFA XING received the Ph.D. degree in cartography and geographical information engineering from Central South University, China, in 2012. He is currently working with the College of Geography and Environment, Shandong Normal University. His research interests include urban landscape analysis, human activity mining, and applied urban modelling.
HUANXUE ZHANG received the master's and Ph.D. degrees in cartography and remote sensing from the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing, China, in 2014 and 2017, respectively.
She is currently a Lecturer with the College of Geography and Environment, Shandong Normal University, Jinan, China. Her research interests include agriculture monitoring with remote sensing, including crop identification, crop acreage estimation, and so on.
YUETONG XU received the Ph.D. degree from the Department of Earth Sciences, Nanjing University, China, in 1994. He is currently working with the College of Geography and Environment, Shandong Normal University. His research interests include tectonic geochemistry, sulfide deposit, and Arctic glaciology.
YUAN MENG received the M.S. degree in cartography and geographical information science from Shandong Normal University, China, in 2019. She is currently pursuing the Ph.D. degree with The Hong Kong Polytechnic University. Her research interests include the spatial analysis of land use/land cover, built environment, and human behavior patterns. VOLUME 9, 2021