Simultaneous Update of High-Resolution Land-Cover Mapping Attempt: Wuhan and the Surrounding Satellite Cities Cartography Using L2HNet

Land-cover mapping is important for urban planning and management, and current land-cover mapping products are unable to meet the needs of cities due to frequent land surface changes. In this study, based on the low-to-high network (L2HNet) network, we generate a high-resolution land-cover mapping product for Wuhan and its surrounding areas. In this article, we adopt a simplified L2HNet by removing the confident area selection and the L2H loss module to shorten the cycle time of the entire mapping process. The mapping process used ESA LandCover (2021) as low-resolution labels and Google Maps as high-resolution remote sensing images. In the course of the experiment, we also calculate the four indicators mean intersection over union (MIoU), overall accuracy (OA), frequency weighted intersection over union (FWIoU), and Kappa, evaluate the accuracy of our product in predicting fine feature structure using a point-based test method, and compare it with six mainstream land-cover mapping products. The product achieves a 1m-resolution land-cover product in study areas while maintaining an over 75.21% MIoU. OA, FWIoU, and Kappa all maintain values above 85.00%, showing excellent prediction results. In quantitative analysis, compared to ESA LandCover(2021), the L2HNet product has a significant improvement in mapping accuracy for build-up and permanent water, including an exciting 21.08% improvement in permanent water accuracy and an amazing improvement in build-up. The comparison with mainstream products also shows the credibility and practicality of the product. The end result of this research fills a gap in Wuhan and its surrounding areas' 1m-resolution land-cover mapping product. While significantly improving the product's resolution, L2HNet makes time- and labor-saving periodic mapping a reality.

could serve as the primary information sources for managing water resources [1] and the classification of forest species [2]. Land-cover mapping can directly provide the appearance characteristics of natural landscapes and an informative description of human development, which enjoys great significance in some applications, such as agricultural research, infrastructure planning, and resource development planning [3]. In many aspects, the land-cover is updated from the local to the global intermittently, while the demands for regularly updated high-precision land-cover information grows heavily [4]. For example, it is crucial to meet the urgent demand for the constantly updated high-resolution (HR) land-cover information because of climate change and natural disasters, which cause the land-cover maps to fall out of date rapidly. Therefore, efficient large-scale landcover mapping has great significance for social development. In the early stage, the low-resolution (LR) land-cover products were generally, since the low/medium spatial resolution images were the only ones available. With the improvement of the spatial resolution of remote sensing images [5], [6], we are allowed to get much easier access to the high spatial resolution images, which makes it possible to constantly update the HR land-cover products and obviously increase the accuracy of land-cover classification.
Since the late 1980s, using multispectral images with LR for land-cover mapping has become a research focus area [4], and a variety of machine learning methods have been used to tackle the LR mapping tasks, including support vector machine (SVM) [7], decision tree (DT) [8], and random forest (RF) [9]. However, the spectral or texture information of the LR images was used as input in these pixel-based approaches, which does not consider enough neighborhood information and causes noise in the mapping process.
Because of the lack of neighborhood information, researchers found that these pixel-based methods are not suitable for HR mapping tasks. In order to reduce the generation of noise in HR land-cover mapping, the object-based image analysis (OBIA) method was proposed. In the OBIA method, an object's shape, size, spatial and spectral characteristics have been incorporated directly into the classification process, leading to greatly improved mapping accuracy. Though the OBIA method has been applied to some research, the low efficiency of the manually derived rule-based classification scheme hinders the application of the OBIA method to large-scale automatic land-cover mapping. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ With the boom in deep learning techniques, land-cover mapping based on convolution neural networks (CNN) has been progressing at a dramatic pace [6], [10]. Using deep learning techniques, we can greatly reduce the economic and time consumption and obtain large-scale HR land-cover products with higher accuracy in the era of remote sensing Big Data [11]. In Wang et al.'s [12] research, OBIA is combined with CNN and a new object-scale adaptive convolutional neural network (OSA-CNN) for high-speed remote sensing image classification, which effectively improves the accuracy of image classification is proposed. In Huang et al.'s [13] study, a semitransfer deep convolutional neural network (STDCNN) approach is proposed to produce a land-cover mapping product with an overall accuracy (OA) of 91.25%, a Kappa coefficient of 0.903, and an OA of 80.00% and a Kappa of 0.78 for land-cover classification in Shenzhen, which has excellent mapping accuracy.
However, owing to supervised classification methods rely on training data collection, the mapping coverage of these methods which need manually annotated features is severely limited by the time-consuming and laborious manual annotation [14], [15].
In the experiments of Brown et al. [16], they proposed a new method to continuously generate HR land-cover maps at 10mspatial resolution on a global scale by using deep learning and Sentinel-2 images. In order to train the fully convolutional neural network, a great effort was made in collecting reference data. About 4000 Sentinel-2 images were manually annotated by experienced image interpreters. All annotators were asked to annotate at least 70% of the images within a maximum time of 60 min per image. This problem is common in deep learning-based HR land-cover mapping. In Li et al.'s study [17], [18], change detection is a geospatial application for social good, and its development is then limited by the slow development of remote sensing image tagging technology and outdated classification labels. They proposed a change cross-detection approach for multitemporal semantic change detection with weak, noisy, and LR labels based on label improvements and multimodel fusion. The suggested approach won the first place prize in both phases of the 2021 Data Fusion Contest's MSD track. In Wang et al.'s [19] study, they proposed a new coarse-to-fine deep learningbased land-use change detection method, which also makes an attempt for fast and efficient land-cover mapping. In the study of Claudia et al. they proposed an operating system for fast, efficient, and unsupervised automatic generation of HR large-scale land-cover maps [20]. Therefore, the pursuit of fast and efficient land mapping methods has become a new research focus.
Nowadays, the available HR land-cover products covering the areas of China are still limited. Using the global sample set with a 30m-resolution around the 2015, Gong et al. created FROM-GLC30, which had reached an accuracy of 72.43% [21]. Lately, by employing a pixel-based classification system on Landsat imagery, Zhang et al. [22] made a global product with an accuracy of 82.5% called GLC_FCS30. Through the pixeland OBIA-based classification method, GlobeLand30(2020) was made with an accuracy of 72.23% [23]. In the past three years, many institutions and scholars have proposed global land-cover products with higher resolution, which have the highest resolution were provided by ESRI_LandCover (2021) [24], ESA_LandCover(2021) [25], and FROM_GLC10 [26]. In 2019, Gong et al. [21] employed the classifiers of RF to create a 10m-resolution global land-cover called FROM-GLC10 (2019), which reached an accuracy of 75.9%. These products have a wide range of application scenarios and high reference value, and have played an important role in many fields. Under the study of Mi et al. [27] GlobeLand30 has an important reference value for the updating of land-cover mapping used in change detection. In the study by Dong et al. [28], they demonstrated the contribution of FROM-GLC10 in understanding Chinese agricultural systems. However, the land-cover products with 30m-and 10m-resolutions still cannot meet the needs of local applications. For the city-scale, the applications of urban planning and surveys generally require land-cover products with higher resolution. The need for HR land-cover mapping is ubiquitous in the following application scenarios.
1) Urban land use mapping is an important but challenging task in the field of remote sensing. Despite the fact that a variety of classification techniques have been created to acquire land use information in urban areas, their accuracy and efficacy are insufficient to meet the needs of real-world applications like urban planning and land management [13]. Wu Shuosheng et al. [29] proposed a classification of detailed urban land use based on geometrical, textural, and contextual information of land parcels using a case study in Austin, Texas, where 50 parcel attributes were tested to classify nine urban land uses, overcoming the heterogeneity of urban land use categories in land use classification. However, because the GIS data necessary for land use classification are either not readily available or are too expensive to acquire, the classification may not be applicable to rapidly expanding metropolitan areas in developing countries. L2HNet enables rapid mapping and can provide the latest land-cover mapping products for developing countries. 2) Points of Interest (POI) data can link the informal world of everyday human discourse and the formal world of geographic information system (GIS). According to the survey, there are no reports on the use of POI data and satellite data together to produce detailed land use maps.
To fill this gap, Hu Tengyun et al. [30] developed a protocol to map urban land use using medium-resolution satellite images and POI data to determine detailed land use categories in urban areas. The global pattern of urban land use distribution was well reflected. However, the accuracy of this method is relatively low for most secondary urban land use category applications. L2HNet can realize 1 m HR land-cover mapping with high accuracy, which can reflect the land use situation of urban areas relatively accurately, and the update cycle is short, which has more real-time value to promote urban development. 3) Land use/cover change is a key aspect of global environmental change and in a sense indicates the impact of human activities on the natural environment [31]. The University of Maryland Global Land Analysis and Discovery (GLAD) team developed and implemented an automated Landsat data processing system in 2020 that generates globally consistent analysis-ready data (GLAD ARD) as input for land-cover and land use mapping and change analysis. Together, the GLAD ARD dataset and the ARD analysis and characterization tools provide an end-to-end solution for national and regional users for Landsat-based complimentary natural resource assessment and monitoring [32]. L2HNet enables rapid cycle mapping and realtime monitoring of land-cover changes to better reflect changes in the natural environment. L2HNet enables rapid cycle mapping and can monitor land cover changes in real time, thus reflecting changes in the natural environment in a more timely manner. Wuhan is the capital of Hubei Province, the only subprovincial city and megacity in the six central provinces, an important industrial base, a science and education base, and a comprehensive transportation hub in China. This region is the location for the majority of Wuhan's universities as well as numerous high-tech businesses, well-known tourist attractions, a significant number of residential neighborhoods, and industrial zones, creating a complex urban environment that makes the land-cover product in this area more representative. In the process of Wuhan's development, urban planning, natural resource management, and epidemic prevention all need a comprehensive and detailed survey of land-cover. Therefore, it is significant to have a periodic HR land-cover product covering Wuhan and its surrounding cities.
As society's demand for HR land-cover mapping grows, the low-to-high task has advanced [17], [18]. In Dong et al.'s study, they proposed a solution that could produce 3m-resolution landcover maps across the country without involving human effort, a major breakthrough in the low-to-high task [33]. In the context of the demand for higher resolution land-cover mapping, a lowto-high network for large-scale high-resolution (L2HNet) [34] is used in this article, which breaks the barrier of resolution between the HR remote sensing images and the LR training labels. Technically speaking, HR land-cover maps can be created just by using LR labeled data and HR remote images by using L2HNet. Because the mapping process doesn't need any HR labels or related auxiliary information, the method can accelerate the automatic updating pipeline of the large-scale HR land-cover maps, which greatly reduces the time consumption of manual annotating of data.
In this article, the Confident area selection (CAS) module and Low-to-high (L2H) loss of L2HNet method are removed while retaining the RP-backbone of L2HNet in order to improve the efficiency of land-cover mapping. Based on the simplified L2HNet method, we generated a 1m-resolution land-cover product covering Wuhan and its surrounding cities. In order to evaluate the performance of the product based on the condensed L2HNet, we constructed labeled maps of groundtruth (GT) and examined the classification quality of the product as well as attributes like accuracy and picture consistency. For more refined mapping results, a test point-based accuracy measurement method was employed to test more accurate mapping results. We contrasted the output with other widely used land-cover mapping to show its dependability. To the best of our knowledge, this is the first attempt to develop large-scale land-cover mapping since the L2HNet approach was proposed. Besides addressing the gap in 1m-resolution land-cover mapping in China, we are able to save time and effort for large-scale land-cover mapping.
The rest of this article is organized as follows: Section II explains the working mechanism of L2HNet and the method of getting time-saving land-cover products with 1m-resolution. Section III, analyzes the performance of the obtained L2HNet land-cover mapping product and compare it with common landcover mapping. Finally, Section IV concludes his article.

II. METHOD
The study focuses on four cities in Hubei, including one provincial capital and three large cities, which represent a variety of urban sizes, landscape characteristics, and urbanization intensities.
This section will outline the workflow of HR (1 m) land-cover mapping. The whole workflow is comprised of three stages, as illustrated in Fig. 1. The following subsections will provide an introduction to related work and Fig. 1's three phases.

A. Related Work
During the complete process of L2HNet mapping in Li's experiments, they set CAS module and L2H loss to utilize the unmatched LR labels more reasonably due to the mismatched resolution between their labels and the predicted product. The CAS module splits predictions into confident area and vague area according to the confidence probability maps for the subsequent calculation of the L2H loss. Finally, the L2H loss, including a modified weak-supervised cross entropy loss and a novel self-supervised dynamic vague area loss, is calculated based on the split areas and concentrated features. In the experiments, L2HNet has shown outstanding low-to-high mapping performance on sufficient experiments in the USA [34].
In our mapping process, we remove the CAS module and the L2H loss module in order to use a simplified L2HNet. We have improved the L2HNet method in our experiments, which are not exactly the same as Li's, and the specific optimization strategies and reasons for the improvements are as follows.
1) Optimization Strategies: In Li's experiment [34], in order to calculate the loss and further tweak the prediction outcomes, they separated the acquired prediction maps of the RP-backbone output into confidence areas and vague areas. In the simplified L2HNet method, the CAS module and the L2H loss module are removed, leaving only the RP-backbone. The output of the RP-backbone is then used as the actual prediction result. 2) Reasons for the Improvements: In Li's experiment [34], a 30m-resolution training label was used for the 1mresolution land-cover mapping product. The resolution span between 30 and 10 m is large and there is a big difference in the products, therefore the prediction of the same part as the label will have a greater degree of confidence. The CAS module can extract the more confidence areas of the prediction results and pass the simulation results of the confidence areas into the L2H loss module for calculation, while the vague areas are not passed into the L2H loss module. After such an operation, it is possible to better retain the part with better prediction results and make a larger adjustment for the areas with less confidence. In this experiment, the training label is 10m-resolution, which has a good match with the prediction results of 1m-resolution.
In the prediction results, most of the them have a good degree of confidence, so even after dropping the CAS module and L2H loss module, we can still get good prediction results. After eliminating these two modules, the simplified L2HNet method is optimized in terms of operation time and is more practical. In model training, the model is trained with a batch size of 16. The RP-backbone is made to reliably extract from the images hierarchical features with HR representations. The research establish numerous RP-blocks in the RP-backbone, as seen in Fig. 2. In each block, we set three convolution layers with sizes of 1 × 1, 3 × 3, and 5 × 5 to sample the input map. We refer to the input, middle, and fusion feature maps of the bth block as I (b) , M (b) , and F (b) , respectively. As shown in Fig. 2, the kernel numbers of multiscale convolution layers are inversely proportional with their kernel sizes. The 1 × 1 kernels which have the largest kernel numbers encourage the block to extract feature details. Secondary, kernels with larger sizes (i.e., 3 × 3 and 5 × 5) provide necessary neighboring information. The convolution kernels in each layer are set to the proportion in Fig. 2. Therefore, the parallel layers can scan them with appropriate receptive fields instead of the reduction of feature resolution caused by over down-sampling I (b) . In particular, a 3 × 3 convolution input layer with four input channels (the bands R-G-B-NIR of images) and C I output channels is used to create the input feature map of the first block.
In the input layer, RP-backbone collects information from Google Maps images to form a dense input feature map, and then a large number of RP-blocks continue to integrate features while maintaining the size, channel, and resolution of the map in the process.

B. Database and Data Preprocessing
To train our model in the initial stage of this research, LR labels and HR remote sensing images are required, but there is a lack of HR images that are suitable and available for China. HR satellite images from commercial US satellites like Worldview and Planet are available in large areas but expensive, while commercial Chinese satellites like Gaofen-2 and Jilin-1 are free in partial areas. However, their images have incomplete coverage. Google Maps with 1m-resolution, which can give vector maps comprising information on international urban regions, are chosen for HR remote sensing photos, and ESA_LandCover(2021)  are chosen as labels since it gives access to high-precision data for the most recent 10, 30, and 100 m products.
In the preprocessing module, since Wuhan and the surrounding cities are the focus of our investigation, we need to crop Google Maps to the province-scale first.
In the overall workflow, data preprocessing mainly includes two parts: 1) aligning and 2) cropping. The experiment first aligns Google Maps and ESA_LandCover(2021) for each city after upsampling 10m-labels for each city. As shown in Fig. 3, every city is cropped into several tiles with a size of 6000 × 6000. Then, as training samples, every tile will be randomly cropped into 200 chips with a size of 312 × 312. In order to avoid splicing gaps in the final map, our images' overlap is 156.

C. L2HNet Training
At this stage, each model is trained separately. In order to obtain a better fitting effect, the experiments train each model using the corresponding city, which takes the various development stages and geographic dispersion of each city into account. Therefore, training models with variable feature patterns can be eventually obtained by this training method.

D. Mapping
After completing the model training, the experiment selects some 1m images from Google Maps as the input in the last stage. In model training, the experiment uses chips as input during model training. Therefore, to generate a city-scale image, these chips need to be merged to generate a city-scale image in the last stage. As Fig. 4 shows, during the inference process, the experiment merge chip-scale to tile-scale and finally arrange 6000 × 6000 tile-scale images in order to get city-scale images.
It will take two to three weeks to complete the mapping procedure at the province-scale. The only process that requires manual down/up loading is data download. The operations of preprocessing, training, and mapping are all highly automated which means a time-saving and labor-saving mapping process.

III. EXPERIMENT
The results for Wuhan and three additional representative cities are provided in Fig. 5. These products demonstrate that the characteristics of the land-cover can be well depicted by the L2HNet, leading to satisfactory classification results. Fig. 5 shows the L2HNet product of Wuhan and its surrounding cities. L2HNet product can be used to distinguish between smaller or finer structures that are visible in contemporary urban environments. The product not only offers improved building recognition accuracy, but the higher resolution also enables the product to offer more details in land-cover products. For the growth of towns and cities as well as the usage of land, this is of paramount strategic importance.

A. Experiment Setting
Fig. 5 also displays the classification system applied in this investigation. Eleven land-cover classes are adopted, including tree cover, shrubland, grassland, cropland, built-up, bare/sparse veg, snow and ice, permanent water, herbaceous wetland, mangroves, moss and lichen. This legend is modified from the legend of ESA_Landcover (2021), which has the same land classes but different colors. In this study, the model is implemented on the GeForce RTX 3090 GPUs. The optimizer and the criterion are separately set as AdamW and cross-entropy loss. The weight for the loss function is determined based on the frequency of occurrence of each category. The initial learning rate is set at 0.01, and it will be reduced to 10% of the current learning rate when the loss stops decreasing after two epochs. During the training process, the model was trained for 80 epochs with a batch size of 16. Additionally, the default value of the random seed is set to 0 in order to ensure the training model is reproducible. The experimental training process equipment and parameters are shown in Table I.

1) Qualitative Analysis:
In order to better demonstrate the effectiveness of the L2HNet-based land-cover mapping product for the Wuhan region, the following four metrics were computed for the product.

c) Frequency weighted intersection over union (FWIoU):
FWIoU is a slight enhancement on MIoU. Compared to MIoU, FWIoU sets weights for each class based on its frequency of occurrence, which is an important evaluation metric in practical semantic segmentation problems. d) Kappa coefficient: Kappa Coefficient is a measure of classification accuracy that is used to measure the degree of agreement between two models when judging the same image, and the results range from 0 to 1. 0.0-0.20 very low agreement (slight), 0.21-0.40 fair agreement (fair), 0.41-0.60 moderate agreement (moderate), 0.61-0.80 high agreement (substantial), and 0.81-1 almost perfect agreement (almost perfect). The MIoU and the OA indicate the accuracy in categories and pixels, respectively. The FWIoU combined the category occurrence frequencies to assess the accuracy of the prediction results. The Kappa Coefficient measures the extent to which two models agree when judging the same image. These four metrics have a wide range of applications in the quality analysis of semantic segmentation tasks [35], [36], [37]. During the evaluation we replaced one of the models with a GT result. The research randomly selected 20 sets of images as test samples. These coefficients calculated from the test samples are provided in Table II. The accuracy throughout the 20 images is quite consistent, suggesting reasonably reliable mapping outcomes.
The average MIoU score across all images in these test samples is 75.21%, the average FWIoU and Kappa scores are always above 80%, with average values of 87.28% and 87.26%, respectively, and the average OA score is 85.23%. Individual images scored extremely well, demonstrating the excellent mapping results of the L2HNet product. Meanwhile, the overall variance of each indicator is at a low level, indicating that our mapping results have a stable play across land-cover classes. In order to visually represent the predicted results of the product, we have selected five out of test samples and set their close-up images, GT label in Fig. 6. In the five selected images, it is easy to visualize the prediction effect of the L2HNet product in towns as well as in the countryside. In densely built-up areas, L2HNet products offer a significant improvement in accuracy over LR land-cover mapping products. Since the accuracy for buildings is an important components of urban scenes, L2HNet product can be regarded as satisfactory. The average FWIoU (93.40%) and the Kappa scores (93.14%) in these five images are extremely good. This indicates that common land classes in urban scenes as well as in suburban scenarios can be predicted well in the L2HNet product. Although in the figure we still find some errors in the predictions for some water bodies, there is  little variation in the accuracy among the 20 images, implying fairly robust mapping results. It is important to note that we discovered that the road will play a significant role in the 1mresolution land-cover mapping when comparing GT labels with the anticipated images. As a result, we highlighted the obvious road in red on the GT labels, even though no such land class appeared in our prediction results.
2) Quantitative Analysis: In the four cities, a total of 900 test points were generated. Specifically, 300 test points were generated in Wuhan. The other three cities contain 200 test points each. To avoid large measurement errors due to the small sample size and to better verify the validity of the product, land-cover classes that account for a relatively large proportion of the experiment should be selected for testing. Fig. 7 shows the percentage of each land type among the 900 test sites. Based on the percentage, tree cover, build-up, cropland, and permanent  water can be obtained as the larger proportions of all land-cover classes. Since the point generation process in the experiment is random, a few test points will be generated at the edge of the land-cover mapping product, and these points are considered invalid points. The number of invalid points is small and has no effect on the experimental results. Table III indicates the accuracy of these four land-cover classes in each city. The bolded part of the table indicates that the L2HNet product outperformed ESA_LandCover (2021) in this test result. The product exhibits a significant improvement in the accuracy of four land-cover classes that have a larger proportion of all classes in the data study in Wuhan region. Hubei's provincial capital city, Wuhan, contains a wide variety of dense urban structures. Build-up's features are frequently poorly detected in the LR product. In addition, when pursuing very fine point measurement processes, results frequently fail to meet expectations in the LR product. Build-up's accuracy in the L2HNet offering is 63%, a significant improvement over ESA_LandCover(2021)'s accuracy of 7%. The accuracy rate for cultivated land has dropped, but it is still at a high level, and the result of 80% accuracy is still extremely outstanding.
The distribution of land-cover classes differs among cities, Fig. 8 shows the number of test points for each of the major land-cover classes in each city. Unlike Wuhan, which is the capital city of the province, the other three cities differed significantly from Wuhan in terms of urban development and building density. In Wuhan, where building density is great, the L2HNet product is far ahead of ESA_LandCover (2021) in terms of OA. In places where building density is low, the L2HNet product nevertheless maintains a robust level of accuracy, with an accuracy difference of no more than 2.5% compared to ESA_LandCover (2021) in each city. L2HNet's mapping outcomes in these cities continue to perform well in terms of building recognition, even hitting 100% accuracy in Ezhou. The data also demonstrate that the mapping outcomes for aquatic bodies have improved, with an average accuracy that is 21.08% greater than that of ESA_LandCover(2021). Comparing 16 indicators across four cities demonstrates that L2HNet has a strong capability for producing HR mapping findings. It is important to note that the L2HNet product delivers a 1m-resolution land-cover mapping product and can periodically map swiftly, producing a time-saving HR mapping result while preserving a more robust result.
3) Comparative Products: In Fig. 9, we illustrate five areas with a high number of fine features of Wuhan to demonstrate the mapping performance of the L2HNet product, such as dense areas of buildings, water bodies, forests, suburbs, etc., and compare them with the current more mainstream land-cover mapping products. In order to more intuitively represent the semantic segmentation effect of the products, we have made some adjustments to the legends of the mainstream products. For similar land classes, we use the colors corresponding to those in the L2HNet products to represent them. For land categories specific to other products, we kept the original legend colors. Fig. 10 shows the legend used for each product. By observing the results of Fig. 9, L2HNet product shows robust and accurate mapping capacity in different kinds of situations while the resolution is increased. The legend in Fig. 10 illustrates that the L2HNet product has a moderate number of classifications and provide more diverse land information based on higher resolution mapping results. The outline of build-ups can be clearly observed in the L2HNet product, and land-cover classes with larger areas, such as forests and cropland are represented with better effect. These classes are the main part of the land-cover classes in the city, which means that the L2HNet product has made great progress in the land-cover mapping of the city.

C. Discussion
Based on the Google Maps and ESA_LandCover (2021) dataset, L2HNet products of Wuhan and its surrounding are made along with a good discrimination effect in a relatively short time period. Compared to the land-cover products mentioned above, the L2HNet product maintains a high level of accuracy while increasing resolution. The results suggest that the L2HNet product in Wuhan and its surrounding cities is capable of meeting the demand for the analysis of citispatial patterns.
To further evaluate the results of land-cover mapping in Wuhan area, Fig. 11 shows L2HNet product and ESA_LandCover (2021) mapping results in Wuhan area. The L2HNet product shows powerful and accurate mapping capabilities and retains a greater degree of land-cover detail in its predictions. In areas with more details, the L2HNet product not only has higher accuracy than other products but also has a strong fitting effect and a detailed display.

D. Limitations and Possible Improvements
Though images in Google Maps are free with a wide coverage, there still exists Mosaic data with inconsistent time in the database. As shown in Fig. 12, there are three different time points in one image tile. Inconsistent time points in the Google images would lead to incredible products.
Although Wuhan's land-cover mapping is somewhat representative of the distribution in most cities in China, there are differences in the development direction and management policies of individual cities, which can lead to differences  in the distribution of land classes in each city's land-cover mapping product. For example, there are large differences in land distribution between highly modern cities and forest-dominated tourist cities. In the mapping process of this study, the weights of each land-cover classes are adjusted in the mapping process for different cities in order to achieve more accurate mapping results. In large-scale land-cover mapping, adjusting the weights for each city is still a time-consuming and laborious process.
The roads are commonly not considered in the product with 10 or 30 m resolution, but they are obvious in the 1m images. However, the L2HNet product does not achieve good classification results in the identification of road categories. Due to their similar spectral reflectance, the roads are misclassified as cropland or bare/sparse veg. Therefore, the open source street maps could be a valuable complement to the labels.
In the subsequent experiments, we hope to complete the mapping process of entire China after overcoming the shortcomings described above.

IV. CONCLUSION
Highly automated HR land-cover mapping is an important task as the demand for fine and extensive land-cover products grows dramatically. In this article, a land-cover mapping product is proposed based on a simplified L2HNet neural network and also describes the production process of the product. Compared to traditional networks that require properly labeled data sources, such as UNet, Deeplab v3+, and HRNet [38], mapping process based on L2HNet is more efficient without the huge demand for properly labeled data sources. During the mapping process, since the 10m land-cover maps still match well with the 1m images, the separate work of the RP-backbone ensures an efficient mapping process.
Comprehensive experiments on the Google Maps and ESA_LandCover (2021) datasets in the Wuhan area show the superiority of the proposed L2HNet compared to other mainstream land-cover mapping products. In L2HNet products, the average MIoU reach 75.219%, while the OA reach 85.737%. Also 87.289% FWioU and 87.267% kappa illustrate the accuracy and high consistency of land-cover mapping, respectively. In areas, such as dense buildings, L2HNet has finer semantic segmentation results, and the OA in quantitative analysis is improved by 20% compared to ESA_LandCover (2021).
Overall, based on the analysis and comparison in this article, the proposed L2HNet product is considered an effective HR land-cover mapping product. The mapping process in Wuhan and surrounding areas has proven the feasibility and high accuracy of the L2HNet network for land-cover mapping, filling the gap of 1m-resolution land-cover mapping in Wuhan. In the future, the prospect of work on the L2HNet network will be on national as well as global scale data. With high quality HR images and a wide selection of LR land-cover products, we will continue to produce HR land-cover mapping products for the China region. Yan Huang is currently working toward the bachelor's degree in electronic information engineering with Wuhan University, Wuhan, China.
His research interests include semantic segmentation, image processing, and deep learning.
Yuqing Wang is currently working toward the bachelor's degree in communication engineering with Wuhan University, Wuhan, China.
She joined the artificial intelligence special class, Wuhan University. Her research interests include deep learning and image processing.
Ms. Wang was the recipient of the first prize scholarship from the School of Electronic Information, Wuhan University. Zhanbo Li is currently working toward the bachelor's degree in communication engineering with the College of Electronic Information Technology, Wuhan University, Wuhan, China.
His research interests include in-depth machine learning and land cover mapping.
Mr. Li was the recipient of the second prize scholarship from the School of Electronic Information, Wuhan University. He is currently a Senior Engineer with the School of Electronic Information, Wuhan University. His research interests include machine vision and image processing.