Land cover classification of resources survey remote sensing images based on segmentation model

Land type survey is an important task of land resources survey and the basis of scientific management of land resources. With the increasingly prominent problems of population, resources, and environment, there is an urgent need for a fast and accurate classification method of large-scale land use and land cover based on remote sensing data. Traditional machine learning classification methods based on pixel classification achieved sufficient results and are widely used, such as maximum likelihood classification and random forests method. However, with the development of the novel technology of deep learning, in practical application, for multi-classified land resources, how to use the fast and effective classification method of low and medium resolution RS images needs further research. This paper takes the land resource classification of the Tonghe medium resolution RS dataset of the third land survey in China as an example to screen and compare traditional machine learning classification methods and semantic segmentation models FC-DenseNet56, GCN, BiSeNet, U-Net, DeepLabV3, AdapNet, and PSPNet, which aim to select the optimal feature extraction model. The results show that the classification accuracy of the U-Net model can reach 93.62%, which is more accurate and effective than traditional machine learning methods and other semantic segmentation models. It is suitable for multi-classification tasks of land cover resources in low and medium resolution RS images and shows a superior effect in practical application. Besides, the conclusion of this study can provide a demonstration for large-scale land cover resources investigation using low and medium resolution RS images.


I. INTRODUCTION
Remote sensing (RS) data is widely used in land use and land cover (LULC) research, and automated image classification is one of the easiest and preferable techniques to prepare LULC of a resource survey area [1]. With the progress of RS satellite observation technology and the improvement of image acquisition convenience, pixelbased multi-classification of land resources has become an important research topic [2]. In the past few decades, the maximum likelihood classification (MLC) and random forests (RF) of traditional machine learning are fast and feasible RS image classification methods for LULC, but they have high requirements for RS image quality [3]. While the deep learning (DL) technology improves the extraction ability of feature information for LULC classification, has a stronger learning ability and lower requirements for RS image resolution. In RS image recognition, DL can derive complicated, hierarchical, and non-linear features from data and overcome several limitations of traditional methods. Because of their modeling and learning capabilities, DL approaches represent the link between the image object and its realworld elements as closely as possible, allowing for more real-world information [4], [5]. In some situations, DL's classification effect and accuracy may be better than those of traditional machine learning classification methods. The recent research on the DL method mainly uses high resolution RS images for detailed classification. Although these methods can obtain an accurate land cover map, it needs to be explored to use the DL classification method and only rely on medium and low resolution RS images to realize large-scale land cover classification of land resources survey. In the actual LULC classification of resource survey, large-scale land cover resource survey usually manually marked the categories and boundaries of land resources concerning low and Medium Resolution Satellite RS images, which demand complex engineering skills and strongly rely on expert experience. Based on the above, these methods are not practicable for large-scale land cover resources surveys. To automate large-scale land cover resources survey, low and medium RS images and existing DL segmentation methods are applied to address the problems mentioned above.
According to the past LULC practical experience, the RS image classification method based on traditional machine learning is undoubtedly mature and stable. The most widely used methods of LULC are MLC and RF methods. The application of MLC in RS image classification is mainly to monitor the change of vegetation coverage [6], [7] and analyze the causes of vegetation change or damage [8], [9]. However, some comparative analysis results show that the accuracy of MLC is not high compared with other classification methods in some cases [10]. In contrast, the RF machine learning algorithm can achieve almost realtime monitoring and the effect is excellent [11]. Some scholars combine optical and radar imagery and use the RF method for classification [12]. To improve the accuracy of classification, most of them rely on enhancing the resolution of RS data or adding auxiliary information. Although there are many active and established new machine learning classification algorithms based on MLC and RF, it seems unlikely to achieve a breakthrough in these algorithms [13]. Especially in the case of low or medium resolution of RS images and lack of enough auxiliary information, the methods of DL classification may be more effective.
Based on DL classification and semantic segmentation methods, the use of RS image data to carry out resource investigation, application, and analysis of LULC has aroused great interest from researchers in the field of RS science. From the perspective of published articles in recent years, the research focus of scholars can be divided into two aspects as a whole. Many scholars try to improve the ability of classification, segmentation, or detection of the DL algorithm itself, to improve the accuracy of multi-objective classification, improve the effect of detail resolution, solve the problems of insufficient training samples, unbalanced labeling samples, and so on [14], [15]. At present, it is popular to improve based on classical convolutional neural network (CNN) and include the information processing skills of attention mechanism in the design [16], [17]. Through the performance comparison of experimental metrics, it is proved that the designed new model has advantages [18]- [20]. On the other hand, the application research of LULC which focuses on the combination of DL methods to solve the actual problems of land resources science is less. In fact, the research on the DL algorithm, including model improvement and redesign, does improve the ability of automatic classification of RS images, but most of them are comparative experiments carried out with high resolution public datasets. In many cases, the largescale LULC resource survey carried out by the government only distributes low and medium resolution satellite RS data, which is more complex and more challenging. The slight accuracy improved by the algorithm research itself is of little significance to the development of actual jobs. In the process of this land resources survey in part of Heilongjiang Province, the LULC work still refers to the RS data to manually mark the category and boundary of land resources. Although expert experience and field investigation and evidence collection can accurately depict the boundary of land resource categories, it is obvious that completely relying on a manual LULC classification map will greatly increase the workload and reduce the work efficiency. Therefore, how to use the existing classical DL semantic segmentation model to solve the LULC of low and medium resolution RS images has become more concern for many geoscientists.
Up to now, there are a variety of classical approaches have been proposed for scene classification. The most commonly used approach is CNN, it has been widely applied to RS image semantic segmentation. Since Ronneberger proposed the U-Net CNN model [21], because the network and training strategy rely on data enhancement to use the available annotation samples more effectively, the improved networks based on it has been widely used in images segmentation and classification [22]- [24]. Zhao et al. [25] exploit the capability of global context information by different region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet), it is well established and has been shown to perform well in image recognition [26]- [28]. Kipf and welling proposed a semisupervised learning method based on CNN, which operates the graphics directly [29], and the graph convolutional network (GCN) is an extension of CNN. Compared with the classical CNN method, GCN is considered to be more effective in learning the feature representation of graphstructure data [30]. To improve the performance of semantic segmentation models, researchers have designed a series of CNN algorithm frameworks in recent years, including AdapNet [31], [32], BiSeNet [33], DeepLabV3 [34], and FC-DenseNet [35]. AdapNet architecture is considered to have better performance than other end-toend semantic segmentation networks. AdapNet is designed to run efficiently on platforms with dynamic resource allocation, which is an adaptive runtime model for executing streaming applications on multiprocessor systems [36], while BiSeNet is proposed to improve the speed and accuracy of real-time semantic segmentation.
In these CNN networks, AdapNet and BiSeNet are rarely used in RS image classification and recognition, while DeepLabV3 and FC-DenseNet are usually regarded as stable standard CNN models to compare and verify the CNN frameworks designed by researchers. For instance, Zhang et al. Proposed an improved algorithm for the high spatial resolution [37] and compared it with DeepLabV3 to prove the improvement of the accuracy of the method. Lin et al. proposed a DL model based on the structure of DeepLabV3 and compared the experimental results with FC-DenseNet to prove the improvement of segmentation accuracy [38]. Li et al. developed a deep adversarial network, and the generator produced pixel-wise image classification maps using the FC-DenseNet model [39]. Gao et al. proposed a semantic segmentation network based on the FC-DenseNet to recognize and map the landslide disasters [40]. Guo et al. proposed a deep conditional generative adversarial network integrating the FC-DenseNet for the application of high resolution RS images [41]. For purpose of improving the semantic segmentation performance of land cover in high spatial resolution satellite RS images and obtaining a pixel-wise classification of land cover, combined with the advantages of DenseNet and U-Net, Khan et al. proposed a hybrid DL model [42], which demonstrate the most advanced performance on public datasets. Obviously, most of the current proposed DL methods for pixel-based image segmentation and classification in the field of RS are based on the improvement of the above classical network models. The reason why the above semantic segmentation model applies to LULC can be attributed to the following points. Different from the artificial identification features based on specific domain knowledge, they have deep feature extraction neural networks, usually dozens to hundreds of layers. Moreover, many experiments have proved that the computational efficiency of the DL pixel-level segmentation model is very high, and the ability of CNN to learn features is very effective in large-scale target detection and image recognition. Most importantly, the LULC algorithms based on the semantic segmentation of these CNN models are more representative, which realize a series of improvements such as model feature extraction, upsampling and feature fusion.
More and more researchers begin to pay attention to the application of the DL methods in LULC resources surveys.
For example, people study the application of generating classified maps of LULC changes over time [43]. Some researchers also use urban hyperspectral RS images for feature extraction and land cover classification [44]. Some people proposed a weakly towards strongly supervised Learning framework for land cover classification to adapt insufficient datasets [45]. Usually, using large-scale and high resolution RS images and an optimized network can get more accurate classification results [46], [47]. However, the above research needs high resolution RS images or hyperspectral RS images. At present, the national resource survey is still unable to meet the distribution of such largescale RS satellite data. At present, the large-scale LULC resources survey mainly depends on low and medium resolution RS images. The research on the resources classification methods and implementation process of low and medium resolution RS images are more in line with the current needs. For the application of actual RS data, the performance of CNN with different structures is not alike. It is worth exploring and discussing how to utilize the existing segmentation or classification models to solve practical automating large-scale LULC problems. To further improve the performance of the RS scene classification, we should compare the metrics of the above classical networks. Based on the analysis of their ability to segment and classify, a more appropriate classification model should be selected for our medium and low resolution RS datasets. In general, the major contributions of this paper are as follows: 1. The current situation of semantic segmentation and classification of LULC resources and the limitations of manual labeling and traditional machine learning classification methods are discussed. Then it expounds on the necessity of using the DL segmentation model to segment and classify medium and low resolution RS images in the process of large-scale land resources surveys nationwide, mainly referring to medium and low resolution RS images.
2. Taking the Tonghe RS image of China's third land and resources survey as an example, the data of low and medium resolution RS images are preprocessed, and the results of manual classification are verified. MLC and RF methods of traditional machine learning are used to segment and classify the LULC resources, and the results are briefly analyzed.
3. In order to select the segmentation model suitable for our low and medium resolution RS images, we systematically compare the performance of 7 classical segmentation models to obtain the optimal model. 4. Finally, the experimental results in the test area prove the effectiveness and practicability of our selected segmentation model. This paper focuses on the practical application of low and medium resolution satellite RS LULC. The experimental conclusions can provide a demonstration for the future large-scale LULC resources investigation.

II. BACKGROUND
A land survey is an important means to find out the distribution of land resources. Mastering the current situation and changes in land use, including land types, location, area, and distribution is the main content of China's third land resources survey, in which the category and distribution of farmland and forest are the key content. The RS database in this paper is a part of the land resources survey database in China. This task is based on the purpose of improving the basic data on land use and mastering the current situation of land-use changes in natural resources.
In this section, we briefly introduce the pre-processing and statistics of RS image data, as well as the field verification in the land resources survey. This is the premise of researching the LULC method based on the DL segmentation and classification model.

A. REMOTE SENSING DATABASE
The study area, as shown in Figure 1, is located in northeast China. The RS data is gathered from the third land survey database of Heilongjiang Province, it covers the whole area of the Tonghe County in Heilongjiang Province, located between 128°09′ E -129°25′ E and 45°53′ N -46°40′ N. The mountainous area in the north of Tonghe county has dense vegetation and rich forests, and the southern part is the alluvial plain of Songhua River. The ecological environment here is healthy, the soil is fertile, and it also has a national scenic spot. Therefore, the investigation of LULC resources is of great significance for the protection of the ecological environment and the regulation of cultivated land resources. The third land survey database of Heilongjiang province comes from the RS images taken by the China Gaofen satellite and Resources satellite. It is mainly used in land use dynamic monitoring, mineral resources survey, urban and rural planning monitoring and evaluation, traffic network planning, forest resources survey, and desertification monitoring. The land survey database used this time includes satellite RS images taken by Gaofen-2 and Resources satellite 3 observation systems. After processing, the RS image data is composed of two parts, one with a resolution of 17 m and the other with a resolution of 20 m. In Figure 1 (a), this is the RS image after light correction, splicing, and clipping. In Figure 1 (b), the data of land resource classification is derived through on-screen manual delineation from the RS images. As we all know, manual land types labeling is a laborious and complex process, but it is very important and fundamental. In the last step, the whole study area is marked according to different land resources types, and there are 34 different land resources types. To facilitate the next research, we cut the RS image map and the artificial map of LULC resources together. In Figure 1(c), there are more than 5000 pairs. In particular, the pixels of each cut image is 512 × 512, and the corresponding area is 512 m × 512 m.

B. VERIFICATION OF RESOURCES CLASSIFICATION
It should be noted that when the classification map of land cover resources is manually drawn on the computer screen, the investigators need to verify the types of LULC resources on the spot. After an extensive field investigation, we obtained evidence of LULC resources types in the study area. Figure 2 shows some photos of evidence collection, and each photo represents different land resources classifications.
From the pictures, it is not difficult to find that there are certain similarities between many land categories, such as highways and urban roads, as well as buildings and facilities for various purposes. This part of the work not only makes up for the lack of resolution of medium and low resolution RS images but also verifies the accuracy of manually labeled LULC types.

C. STATISTICS OF LAND COVER CLASSIFICATION
In Table 1, the total research area covered by RS images is approximately 5661.54 km 2 , of which the area proportion of growing arbor forest is 68.69%, and the area proportion of cultivated land is 25.55%, and the total proportion of other categories is 5.76%. The cultivated land area can be further divided into paddy fields and upland fields, accounting for 19.17% and 6.38% respectively. From the perspective of traditional LULC, the land types in this area can be roughly divided into five categories, namely forests, rivers and lakes, arable land, roads, and buildings. More detailed multi-classification problems obviously face great challenges. Through the statistics of land resources classification, we find that various categories are seriously unbalanced, and the area of arbor forest, paddy field, and upland field accounts for more than 90% of the whole region. As shown in Figure  3, there are 30 land cover classifications accounting for less than 1% of the study area. The process of DL training, controlled by background examples that are easy to classify, will lead to low efficiency of segmentation and classification. Therefore, it is necessary to use a reasonable segmentation model for the comparative test of the sample imbalanced RS image data in the study area.

III. CLASSIFICATION AND SEGMENTATION METHOD
As mentioned above, the current LULC classification mainly relies on manual annotation or traditional machine learning methods to segment the categories and boundaries of land resources. The research content of this section is divided into two parts. Firstly, the feasibility and effect of MLC and RF methods for LULC classification of low and medium resolution RS images are discussed. Then, when the traditional machine learning methods have a poor effect in realizing large-scale automatic classification, the general method of LULC automatic classification based on the DL method is discussed. We use the environment for visualizing images (ENVI) to realize the training of MLC and RF traditional methods supervised classification, and realize deep learning segmentation and classification model construction based on the TensorFlow system.

A. MAXIMUM LIKELIHOOD AND RANDOM FORESTS
Maximum likelihood and random forests have always been important methods to solve the classification of multicategory RS images of land use and land cover. Because the difference in shooting time of RS images will influence the results of MLC and RF classification methods. Therefore, we only select Gaofen-2 satellite data in the region for supervised classification and comparative tests. We determine the classification system according to the actual land use situation and establish suitable classification samples. The land types distinguish by calculating the separable value Jeffries-Matusita coefficient of the samples. Finally, it is generally marked as five types, namely forests, rivers and lakes, arable land, roads, and buildings. After supervised classification, the majority and minority analysis method is used for noise reduction processing. The final classification results are shown in Figure 4.
The overall accuracy of MLC method classification is 86.7%, while the accuracy of RF method classification is 90.3%. The two methods are effective in the classification of forest categories. However, as the areas marked on the map, their classification results are easy to confuse roads, buildings, and arable land, and it is even more difficult to identify rivers and Lakes between cities and towns. Of course, the effect of land classification using the MLC or RF method also depends on the human operation, but the level of resolution has a major impact on the classification results.
In fact, after many experiments, using MLC and RF methods, we are still unable to classify the RS image of Resources satellite 3. These two methods have some disadvantages when using supervised classification to extract feature information from Gaofen-2 RS images. It is easy to ignore sporadic small area features, resulting in mixed and wrong classification to a certain extent. It requires high precision and quality of RS images, and there will be a large deviation in classification for images in different seasons.
Observing the several areas delineated by the dotted line in Figure 4, the classification results obtained by the two traditional machine learning methods MLC and RF are very different from the actual land resource types. Most of the arable land is identified as buildings, and the classification effect of the mountain shadow area of RS images by these two methods is poor. Such results are obviously difficult to meet the needs of LULC resources in multiple classifications in this area.

B. THE BASIC ARCHITECTURE OF THE ALGORITHM
As shown in Figure 5, the preprocessed RS image data first passes through the residual network (ResNet-50) [48]. ResNet-50 consists of convolution layers and pool layers, and it uses a connection called shortcut connection. ResNet-50 is simple and practical, widely used in target detection, classification, and recognition. For the dataset of the Tonghe RS image, ResNet-50 is selected as the backbone, which can completely solve the extraction of key information.
As mentioned in the background, the categories of land cover resources are seriously imbalanced, so it is uncertain which segmentation model has the best adaptability for our dataset. Therefore, we choose 7 classical convolutional neural networks as sample training models. RS image data is input into the CNN with RGB 3 channels, and the corresponding result at the output is the manually marked land resources types. The experimental process relies on 7 computers to run independently and is calculated by GeForce RTX 3070 laptop GPU. It takes about 20 days to complete the training of the network model after many times tuning the learning rate, activation function, optimizer, and other parameters and hyperparameters.
The following content is the implementation details that need to be emphasized in the process of this experiment.
To adapt to the serious imbalance between positive and negative samples in the training process, we decided to use the Focal Loss [49] as a unified loss function. It is introduced from the cross-entropy (CE) loss for binary or multi-value classification. The form of Focal Loss we adopt is as follows. ( The loss function is used in all the experiments as it yields slightly improved accuracy over the non-α-balanced form, and p is the model's estimated probability for the class.
Finally, we determine that when the value of αt is 0.25 and the value of γ is 5, the effect is the best. Compared with CE loss, the core of Focal Loss is to use an appropriate function to measure the contribution of difficult and easy to classify samples to the total loss.
The selection of an optimization algorithm is also an important part of the model. In order to make up for the defects of the decline of naive gradient, this paper selects adaptive motion estimation (Adam) [50] as the optimizer. The Adam is a first-order optimization algorithm that can replace the traditional Stochastic Gradient Descent (SGD). It can iteratively update the weight of the neural network based on training data, and the learning rate during this training is uniformly given as 10 -3 .
Obviously, for different datasets, to prevent underfitting and overfitting, the epoch given by us during training is different. The epoch value given in this paper is 200. We think that the training effect is the best at this time, and it is not in an overfitting state.

C. DATA AUGMENTATION
Due to the limited training samples and random sampling data in nature, there is a large probability of long-tailed distributions, and it is difficult to achieve sample balance for different categories. To eliminate the influence of long-tailed distribution and data augmentation are usually adopted during training to increase data diversity, prevent over-fitting of the training process, and make neural networks have good generalization ability. In the experiment, we adopted the following four data augmentation techniques, which are random flip, random clipping, random rotation, and random translation. A random flip is to flip the original picture vertically or horizontally. The purpose of randomly clipping RS images for regions of interest is to increase random disturbance. By randomly rotating a certain angle or shifting a certain distance, the data can be changed based on the original image, which increases the number of data samples, but the label value of the data does not change.

D. DIVISION OF THE DATA SET
The training set of this experiment is divided into two mutually exclusive subsets by using the common hold-out method, which directly samples the data set randomly according to a certain proportion.
The larger number of samples is used as the training set and the other as the test set. In the process of data division, the sample proportion of each category needs to be consistent. This can maintain the consistency of data distribution and avoid the interference of the deviation introduced by data division on the evaluation results. The existing practical experience of neural network training shows that the training set accounts for two-thirds to fourfifths of the whole data set, which is a reasonable division scheme. Our training set accounts for 70% and the verification set accounts for 30%.

IV. PERFORMANCE COMPARISON OF DL MODELS
In this section, before comparing the optimization model classification results, the major metrics used to evaluate the performance of the DL algorithms will be described. The commonly used metrics include accuracy, precision [51], recall [52], and IoU [53].
Accuracy defines the ability of the model to generate correct quantitative predictions for observations. The definition formula is expressed as follows.

TP TN Accuracy TP TN FP FN
Precision and Recall are defined as follows.

TP Precision TP FP
where TP and TN denote true positive and true negative. Similarly, FP and FN represent false positive and false negative.
IoU (Intersection over Union) is thought the most popular metric, where B gt is ground truth, and B is the predicted box.
Finally, the classification algorithms of seven segmentation models are adopted, and the performance evaluation of these algorithms is listed in Table 2. Because forests account for about 69%, the overall classification accuracy will be high, while the IOU metric is more intuitive to reflect the classification effect. For the same dataset, we focus on the differences in metrics, to identify the performance of the network models. Obviously, from the performance of the IoU metrics, the segmentation models show great differences for the same RS data. It is observed that the U-Net segmentation model exhibits the best performance as compared to others, while the DeepLabV3 model exhibits the worst. In addition, the IoU metrics of predicting by Bisenet and Adapnet segmentation models are not more than 50%. FC-densenet56, GCN, and PSPNet show almost as good performance as the U-net. Therefore, we should focus on the above four segmentation models. On the other hand, the FC-densenet56 model has the least parameters and the PSPNet model has the most parameters. The metrics of them are almost identical, and it shows the potential of the FC-densenet56 model in solving the classification of low resolution RS images. Figure 6 shows the performance curves of the verification dataset based on different segmentation network models during the RS data experiment in the study area. In this way, we can have the most intuitive understanding of the performance of the segmentation models. The loss curves of the verification set based on different models are given in Figure 7. As is shown, with the increase of the epoch number, the loss decreases gradually and finally tends to be stable when the number of epochs reaches 200.
Specifically speaking, when the number of epochs reaches 200, the loss values of all the segmentation models are below 3×10 -4 , showing relatively stable performance. It should be noted that the loss value of PSPNet is less than 5×10 -5 among all the models. The IoU metric of PSPNet also reflects the excellent performance of the model, but it has not shown its advantages for low resolution RS datasets. The two models with the slowest attenuation of loss curve are DeepLabV3 and DenseNet-56, which shows that the convergence ability of the model is poor during training.

V. RESULTS AND DISCUSSION
The above analysis is only based on the metrics by the models and the loss value curves, while experimental results are the most intuitive to reflect the effect of land cover resources classification, and we are most concerned about it. In the process of training, to monitor the training process in real-time, the prediction results of each epoch model are recorded. We selected the prediction results of 7 kinds of models at the 200 epoch. We picked up the prediction results of 7 kind models at 200 epochs, and selected representative RS images to classify land cover resources from simple to complex.

A. PERFORMANCE ANALYSIS OF DL MODELS
The satellite RS images, the artificial annotation images, and the prediction result images by models are displayed in turn in Figure 8. This paper has the following understanding of the classification effect of land cover resources based on the prediction results.
Firstly, all the selected segmentation models can classify and segment the low-resolution RS images to a certain extent after training. Particularly, for the arbor forest, paddy field, and upland field, all the models have an obvious effect on land cover resource classification. However, for the classification and segmentation of roads, rivers, and so on, the detailed effect in the prediction results is not good. This result is acceptable because the resolution of RS images is low, and the field survey is carried out when the land resource types are manually marked so that the ground truth labels contain more information.
Secondly, all the selected models, after training, have a poor effect on the detailed classification of buildings. The classification of building categories is the most difficult and the reflection characteristics are the most complex. Building samples are regarded as difficult samples in the process of DL training, which will have a great impact on the metrics of training. Because the classification of building categories is not calibrated by the shape outline of buildings but determined by manual visits and investigation, the DL models can not learn its characteristic law, and the poor prediction effect is inevitable. Even so, the models selected in this paper have a certain effect on the recognition of building groups in RS images.
Finally, through the analysis and discussion of this computational experiment, we believe that it is more suitable for our RS data to be classified by U-Net, PSPNet, or GCN model. For the task of our land cover resources survey, the U-Net model is the best choice without considering relying on the DL model to segment and classify more details. The classification of specific details still depends on manual actual investigation, especially the classification of building groups use, or the difficult recognition areas caused by the limitation of RS image resolution. The above understanding is based on the images prediction results of the validation set. We should test the application effect of the selected model on the test area, to facilitate the subsequent practical promotion in land cover resources investigation and classification.

B. REMOTE SENSING TEST AREA
As shown in Figure 9, the test area is located in the middle west of the study area, located between 128°28′39.6″ E -129°43′51.6″ E and 46°12′50.4″ N -46°18′21.6″ N, which is composed of 190 cropped RS images. The RS data tested in this part are set aside separately and will not participate in the training of the CNN model during the whole process. The area is mainly covered by forests and fields, and there are cities and towns inhabited by human beings, with rivers passing through. The paddy field and upland field are staggered in the test area, and the roads between towns are complex. In the previous section, we conducted detailed experiments on the performance of 7 models and finally selected the U-Net model concerning the viewpoints obtained from our experiments. The trained model includes the backbone ResNet-50, combined with the U-Net convolutional neural network. The U-Net network model after training is put into the practical application of land cover classification in the above test area.

C. THE RESULTS OF USING U-NET DL MODLE
We compare the classification results of artificial land cover resources with the prediction results of the segmentation model U-Net. As shown in Figure 9 (c), it can be found that the prediction results basically restore the overall classification of land cover resources. It can quickly and accurately classify forests, cultivated land, towns, villages, and so on.
However, if we carefully compare the manual annotation results with the classification results of the DL segmentation model, we can see the difference between them. The DL segmentation model can not better predict the panorama of rivers and rural roads, nor can it accurately divide the special uses of buildings as a manual annotation. Nevertheless, even if the results obtained by the DL segmentation model are rough, it is of great significance for land cover resources investigation. Because manual labeling is time-consuming and laborious work, especially when there is no reference, it is only judged according to satellite RS images, and the DL segmentation model provides great convenience for investigation workers.

VI. CONCLUSIONS
Traditional supervised classification methods, such as RF and MLC, have the characteristics of low training time, few labeled samples, and stable ability of simple classification. However, they have high requirements for the resolution of RS images and poor ability of feature extraction. To solve the problem of resources classification in the process of land resources investigation, this paper mainly experiments with the performance of the classical CNN segmentation model, including FC-DenseNet56, GCN, BiSeNet, U-Net, Deeplabv3, AdapNet, and PSPNet. The results of various metrics show that the U-Net segmentation model is the most competitive model for low-medium resolution satellite RS images. At the same time, the classification results of the validation set samples are analyzed, and finally, the trained segmentation model U-Net is used in the test area to obtain the land cover resources classification. The results show that the land resources classification images are almost as well as the manual annotation images. Furthermore, this research work covers the whole process of land resources classification survey, including satellite RS image processing in the early stage, actual exploration and certification of investigators, and careful labeling and classification of land resources by indoor personnel in the later stage. In the process of completing the above investigation work, the experimental result of the optimal