Using POI Data and Baidu Migration Big Data to Modify Nighttime Light Data to Identify Urban and Rural Area

The spatial difference between urban and rural areas is the direct result of urban-rural relations. Accurate identification of urban-rural area is helpful to judge the urban-rural mechanism and promote the integration development of urban-rural area. Previous studies only used single nighttime light (NTL) data to identify urban and rural areas, which is likely to have an impact on the identification results due to the large brightness difference of lights. Therefore, based on NTL data and combine with data level fusion algorithm, this study separately fuses point of interest (POI) data that representing the quantity distribution of urban infrastructure and Baidu migration big (BM)data that representing the change relationship of regional population mobility to identify urban and rural areas by using deep learning method. The results show that the highest accuracy of urban-rural spatial identification with single NTL data is 84.32% and kappa is 0.6952, while the highest accuracy identified by data fusion is 95.02% and kappa is 0.8259. It can be seen that the differences caused by light brightness are effectively corrected after data fusion, which greatly improves the accuracy of urban and rural spatial identification. By comparing the results of NTL data modified by different big data, this study analyzes and identifies the accuracy of urban and rural area by using deep learning method, which not only enriches the study of data fusion in urban area, but also provides a basis for analyzing regional urban-rural relations and urban-rural development. Therefore, this study is believed to have important practical value for the coordinated development of urban and rural areas.

phenomenon [42], [43], which means that the areas with 102 nighttime light are larger than actual urban areas [44], [45]. 103 Therefore, studies that aim to improve the extraction accuracy 104 of NTL data have been conducted [46]. 105 Compared with traditional data, big data not only has the 106 advantages of wider sampling range and faster collection 107 speed [47], but it has also been found that there is a strong 108 perceived fit between big data and the urban internal structure 109 [48], [49]. At present, there are three kinds of spatial location 110 big data that are widely used in studies related to urban space, 111 including POI data [50], cellular signaling data and popula-112 tion migration data [51], [52]. Among which, POI data are a 113 dataset where different urban entities are extracted in virtual 114 geographical space, with which urban functions with differ-115 ent attributes can be represented by the construction of POI 116 data [53], [54]. POI data is used to delineate urban boundary 117 and to identify urban centers with different functions [55]. 118 As for BM data, it can reflect the population flow in different 119 regions by reflecting the degree of population agglomeration 120 in specific period on the map [56]. Additionally, this study 121 analyzes the spatial vitality of different regions through the 122 population heat provided by BM data [57]. Therefore, it can 123 be concluded that there is a strong relevance between POI 124 data and NTL data in urban-related studies. Therefore, POI 125 data and NTL data have been fused by researchers in the hope 126 of obtaining a better observation effect on urban space after 127 data fusion [58]. 128 Although there is a significant difference among infor-129 mation provided by different data, it is becoming increas-130 ingly harder for single-source data to accurately represent the 131 complex information within urban cities, so it has become 132 increasingly popular to use different source data to modify 133 NTL data to perform studies related to urban cities [59]. 134 Data fusion refers to the fusion of multiband information 135 from the same sensor and refers to the fusion of different 136 types of sensors to remove the redundancy and contradiction 137 that might exist among different sensors, hoping to perfect 138 the timeliness and reliability of remote sensing information 139 extraction to obtain a more defined, more secure, and more 140 reliable estimation and judgment than single-source data 141 [60], [61]. In order to identify urban and rural area more 142 accurately, this study attempts to fuse POI data, BM data 143 and NTL data through deep learning method, to make up 144 for the deficiency of single-source data. Then the results 145 of different data compensating for NTL data are verified. 146 Finally, feasible methods and paths for urban and rural spatial 147 identification are proposed. By accurately identifying urban 148 and rural space, this study enriched the theoretical study on 149 urban and rural differences on the one hand. On the other 150 hand, it is also believed that the accurate identification of 151 urban and rural space area would undoubtedly contribute 152 to the accurate judgment of urban-rural differences, thus 153 providing a theoretical foundation for regional governance 154 and policy-making regarding the harmonious urban-rural 155 development.

177
The study data used in this study are mainly NDVI data, 178 Luojia-01 NTL data, POI data and BM data.

179
The NDVI data are obtained from MODIS with a spatial POI data refer to the point dataset in a networking elec-197 tronic map, which consists of four attributes: name, address, 198 coordinate and category [31]. At present, numerous map 199 companies, such as Baidu Maps, Amap and QQ Maps, have 200 provided developers with API (application programming 201 interface) access services, which allow users to sense all 202 kinds of reasonable data. By accessing the API provided by 203 Baidu maps (www.baidumap.com), this study sourced POI 204 data from Zhengzhou in December 2021, and the category 205 and quantity of POI data were 22 and 623,354, respectively. 206 After screening, duplicate checking, filtering and cleaning all 207 the obtained POI data, the category and quantity were 16 and 208 385,632, respectively. The quantity and spatial distribution of 209 POI data in Zhengzhou are shown in Figure 3. BM data can directly represent the spatial distribution, den-211 sity, and variation trend of regional population by represent-212 ing the color depth and brightness of the data, so as to reflect 213 the important data source of regional population change. 214 Since BM data correspond to different spatial resolutions 215 according to different levels, the 130m spatial resolutions 216 of POI data and NTL data are considered in this study. 217 The migration data from January to December 2020 were 218 obtained by accessing Baidu Map API. After average pro-219 cessing of the obtained data, the population heat distribution 220 in the main urban area of Zhengzhou is obtained, as shown 221 in Figure 4. 222 VOLUME 10, 2022 gradual increasing trend from suburbs to urban centers. When 262 the value of NTL-NDVI is zero, this area is mostly the 263 transition zone between urban centers and suburbs. Taking 264 this transition zone as the critical point, the closer it is to 265 the urban center, the greater its saturation degree would be. 266 Therefore, it can be concluded that the distribution trend of 267 composite urban population density has higher modification 268 value for NTL data on the premise of reflecting urban spatial 269 heterogeneity.
where, f (t) is the signal vector, ϕ(t) is the wavelet function, 291 α controls the scaling of the wavelet function, τ controls the 292 translation of the wavelet function, and b is the parameter.

293
The main idea of Formula 3 is to perform translation τ 294 in the wavelet function firstly, and then do inner product 295 with the analysis signal f (t) at different scales α to achieve 296 multi-scale image fusion. In the actual fusion process, the 297 original images such as NTL data, POI data and BM data 298 are decomposed by wavelet to obtain a series of sub-images 299 of different high and low frequency bands, which can reflect 300 the local features of the image. Then, different high and 301 low frequency components are processed by different fusion 302 rules. Finally, the fused images can be obtained by inverse 303 wavelet transform. The identification of urban and rural spatial scope is 306 essentially the classification and extraction of urban and 307 rural spatial characteristics. On the other hand, deep learn-308 ing has obvious advantages in image feature extraction, 309 firstly, the main feature and advantage of deep learning 310 is that it can greatly improve the interpretation of data 311 by learning the inherent laws and representation levels of 312 sample data so as to achieve accurate extraction of image  The main process of urban and rural spatial identification 338 is shown in Figure 5. The study ideas of this study are as follows :(1) perform 340 desaturation on NTL data to obtain desaturated NTL data, and The NTL data modified by the CEANI index is shown 355 in Figure 6. From the comparison between Figure 6 and 356 Figure 2, it can be found that the oversaturation of NTL 357 data has been modified to a certain extent, and the internal 358 contour and spatial heterogeneity of the urban area have 359 been highlighted, especially in the main areas of population 360 activities with high nighttime light values such as in Jinshui 361 District, high-speed railway stations and university town. and Xingyang District. Therefore, it can be seen from the dis-375 tribution of high and low NTL values that there is a significant 376 spatial difference in the spatial development of Zhengzhou 377 urban-rural area.

378
In this study, U-net is used to extract image features and 379 the urban-rural area identified is shown in Figure 7. Figure 7 380 shows that the area identified by NTL data mainly has the 381 following features. First, the area of urban-rural space iden-382 tified by NTL data is 512.69 and 1514.51 km 2 , respectively, 383 accounting for 25.29% and 74.71% of the whole urban-rural 384 area of Zhengzhou, we can find that the urban space identified 385 by NTL data is smaller than the rural area, accounting for 386 66.33% of the whole urban-rural area of Zhengzhou. Second, 387 the identified urban area is mainly concentrated in Erqi 388 District, Guancheng District, Zhongyuan District and Jinshui 389 District, while the identified rural area is mainly concentrated 390 VOLUME 10, 2022 Guancheng District, Zhongyuan District and Jinshui District, 432 there are only single urban area clusters in both Xingyang city 433 and Xinzheng District. Third, from the perspective of identi-434 fied urban areas, not only did the number of rural clusters 435 identified within urban clusters significantly decrease, but 436 also the degree of patch fragmentation and the complexity 437 of urban-rural boundaries also decreased. In general, the 438 results of urban-rural area identified by NTL_POI data are 439 significantly improved.  In this study, BM data and NTL data are firstly fused 452 by wavelet transform and the fused image is shown in 9.a, 453 then the features of fused images are extracted by U-net, and 454 the identified urban and rural area is obtained as shown in 455 Figure 9.b. Figure 9.b shows that the area identified after 456 the fusion of POI data with BM data mainly has two fea-457 tures. First, the area of urban-rural space identified by fused 458 BM_NTL data is 561.37 and 1465.83 km 2 , respectively, 459 accounting for 27.69% and 72.31% of the whole urban-rural 460 area of Zhengzhou, we can find that the urban space identified 461 by BM_NTL data is bigger than that identified by POI_NTL 462  Zhengzhou's whole urban built-up area. Therefore, it can be 483 concluded that from the area of the identified urban space, 484 although the area identified by the fused BM_NTL data is 485 the largest, the improvement of BM_NTL data on urban area 486 identification results is not as obvious as POI_NTL data.

487
From the identification results of NTL data, POI data 488 and BM data, there is a high degree of similarity in macro 489 geographical space among these three data, they also directly 490 reflect the urban internal spatial structure, which helps to 491 distinguish urban and rural space more obvious., Secondly, 492 from the urban center to the edge of the city and even to 493 the rural areas, the values of the three kinds of data all 494 show a declining trend. Additionally, the results of the urban 495 and rural area identified by POI_NTL data is similar to that 496 identified by BM_NTL data. In general, different data can 497 accurately identify the urban internal spatial structure.

498
By comparing the urban and rural areas identified by dif-499 ferent data (Figure 10), it can be found that since the NTL data 500 only has the unique attribute of nighttime light brightness, the 501 NTL data identifies the area with high light value as urban 502 area, identifies the area with low light value or no light as 503 rural area, which will cause errors in the identification results 504 to a certain extent. Specifically, on the one hand, only a small 505 amount of light is generated in residential and commercial 506 areas inside the city at night, resulting in obvious light holes 507 in urban space. While these light holes are often identified as 508 rural space by NTL data. On the other hand, as NTL data often 509 show discontinuity in areas with rapid decline in light value, 510 that is, there is a great difference in light value change at 511 the urban-rural edge, which makes the urban-rural boundary 512 identified by NTL data more complicated and inconsistent 513 with the actual situation of urban-rural spatial development. 514 Additionally, in the process of identifying urban and rural 515 area, the level of NTL value cannot reflect the possible con-516 nection between urban and rural areas, especially the mutual 517 flow of population.

518
After the fusion of POI data, the fused POI_NTL data fur-519 ther comprehensively analyzes the concentration of POI num-520 ber in the process of identifying urban-rural area rather than 521 only considering the nighttime light brightness. Figure 10 522 shows that the fused POI_NTL data has the following two 523 characteristics: First, the number of interlacing rural patches 524 is decreasing, that is, the overall patch fragmentation is 525 improving. The reason is that POI data are also widely 526 distributed in some areas with weak light at night, which 527 weakens the impact caused by light holes. Secondly, near the 528 main road connecting urban and rural area, due to the needs 529 VOLUME 10, 2022 of urban development, although these places will generate 530 strong light brightness at night, which will form a high con-531 centration of NTL value around the road, there is almost no 532 distribution of POI data next to the road, which reduces the 533 complexity of the identified urban-rural boundary. In short, 534 essentially speaking, the fused POI_NTL data is still a static 535 element, which needs to supplement the mutual flow of 536 dynamic elements within urban-rural area. rural patches identified by different data, it can be found that 552 the number of urban spatial clusters identified by NTL data 553 is the largest, and the existence of each cluster is relatively 554 isolated, although the number of urban patches identified by 555 the fused POI_NTL data is significantly reduced, there is 556 no connection between each patch. While the urban patches 557 identified by the fused BM_NTL data are obviously con-558 nected by a linear space. The reason is that the population 559 flow in the regional area is often connected, and such vir-560 tual connection in geographical space is identified as urban 561 area.

562
In general, although NTL data, POI data and BM data 563 all can represent urban spatial structure, different data play 564 different roles. The advantage of NTL data is to distinguish 565 urban and rural areas through the difference of light bright-566 ness, but the difference of light brightness will have an impact 567 in turn on the identification results. While the fused POI_NTL 568 data and the fused BM_NTL data not only consider the 569 level of urban development but also take into account the 570 actual conditions of different cities including infrastructure 571 distribution and population flow, which greatly modify NTL 572 data and make the urban and rural areas better identified. 573 93520 VOLUME 10, 2022

2) PRECISION VERIFICATION OF IDENTIFICATION RESULTS
Since the changing state of urban and rural area, it is very  Table 2 is the proportion of all the pixels that have been 594 successfully verified. While the kappa coefficient is used to 595 verify classification precision to further track consistency, the 596 possible values of the kappa coefficient range from −1 to 1; 597 the closer the value is to 1, the better the extraction will be.

598
As can be seen from

610
In this study, the characteristics of NTL data, POI data and 611 BM data within urban-rural area are analyzed first, and then 612 the advantages and disadvantages of identifying urban-rural 613 areas by NTL data are used for reference to fuse POI data 614 and BM data to improve identification accuracy after data 615 fusion by using deep learning method. After competitive 616 analysis of the urban-rural area identified by NTL data with 617 that identified by the fused POI_NTL data as well as by the 618 fused BM_NTL data, it can be concluded that compared with 619 identifying the urban-rural area by single-source data, the 620 result identified by data fusion is superior.

621
Although NTL data are one of the commonly used types of 622 data in urban-related studies, the deficiency of NTL data leads 623 to large errors in the study results of urban-rural identification 624 [78]. Moreover, there is light overflow and oversaturation 625 in NTL data. Therefore, researchers began to try to fuse 626 urban big data to improve the accuracy of NTL data in urban 627 space on the basis of considering the strong spatial corre-628 lation between big data and NTL data in urban space [79]. 629 Among which, it is more common to fuse POI data with NTL 630 data to extract urban built-up areas and to delineate urban 631 agglomeration boundaries, etc. The accuracy of some studies 632 even reaches more than 95%, indicating the high accuracy 633 of data fusion in urban-related studies and making a great 634 contribution to the study of data fusion [80], [81]. On the basis 635 of referring to relevant studies, this study verifies the previ-636 ous conclusions through case analysis. Although the highest 637 accuracy achieved in this study is 95.02%, which is close 638 with the highest accuracy of other studies [82], then this study 639 further compares and analyzes the fusion of different big data 640 with NTL data. The results show that unlike other results 641 obtained by other studies (the more data used, the higher 642 the accuracy of the results will be), there is no significant 643 difference in big data's modification on NTL data in this 644 study, which indicates that while big data modifies NTL data, 645 NTL data also modifies the two kinds of big data, so the final 646 difference is not obvious.

647
The traditional identification and extraction of urban-rural 648 areas mainly depends on the subjective will of the government 649 or the use of socioeconomic and statistical data [83]. The 650 extensive application of remote sensing data represented by 651 NTL has made the identification of urban-rural area gradually 652 VOLUME 10, 2022