High-Confidence Sample Generation Technology and Application for Global Land-Cover Classification

Deep learning technology has become one of the most important technologies in remote sensing land classification applications. Its powerful sample-learning and information-mining abilities promote the continuous improvement of classification accuracy. A large volume of high-quality and representative sample data is the premise for the successful application of deep learning technology. Conventional methods of obtaining samples through manual delineation or surface surveys require a great deal of manpower and material resources. Therefore, the inability to obtain sufficient and widely representative high-quality samples is one of the key factors limiting the application of deep learning technology. In this study, the method of generating sample data obtains high-confidence classification results from a variety of existing high-quality classification products as deep learning samples, which are then used to support the application of deep learning technology for land-cover classification. When the three global land-cover classification products, FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30, have the same type of discrimination, the sample is considered a high-confidence sample. Based on this, a large volume of sample data widely distributed around the world was obtained. Using the extracted samples, a random forest classifier was trained using multiple types of information from the Landsat data, and land-cover classification was achieved. Application experiments were conducted in several typical regions, and the classification results were verified. The results showed that the classification accuracy of random forests under the support of samples generated using the sample extraction method proposed in this article was considerably higher than that of the three land-cover classification products.


I. INTRODUCTION
L AND cover refers to a combination of various types of materials on the earth's surface and their natural attributes and characteristics. Accurate land-cover/use classification products are important for natural resource management and urban governance [1].
A variety of land-cover/use classification methods based on remote sensing images have been developed [2], [3]. The visual interpretation method based on expert prior knowledge is one of the classification methods that has been developed earlier and is widely used. Many typical global and regional land-cover classification products were produced based on this method, such as the coordination of information on the environment land cover [4] and the land use and cover area frame survey, which has a spatial resolution of 2 km [5]. In addition, Jiyuan et al. [6] of the Chinese Academy of Sciences produced the Chinese 30-m land-cover classification product, the accuracy of which was more than 92%. Visual interpretation classification methods have great advantages in classification, but they still have several disadvantages. It is highly subjective and inefficient, and in addition, some multidimensional feature information may not be used.
To solve the problem of subjectivity in visual interpretation methods, a variety of unsupervised classification algorithmsincluding clustering analysis and K-means clustering-have been applied to land-cover classification research. Vogelmann et al. [7] produced 30-m NLCD-1992 in the United States. Combined with an unsupervised clustering algorithm for land-cover classification, Loveland et al. [8] produced IGBP-DISCover with a spatial resolution of 1 km. Grascon et al. [9] produced the 2009 South American continental land-cover classification product MERISSAM2009 (MERIS South America) with a spatial resolution of 30 m. Although an unsupervised classification algorithm has great advantages in terms of time and labor costs, several shortcomings remain. First, when the volume of data to be processed is too large, clustering algorithms often consume a considerable amount of computational resources and time [10]. Second, the complete automation of an unsupervised classification algorithm can cause the fragmentation of the class to lose control, leading to the classification results falling into a local optimal state instead of a global optimal state [11].
With the enrichment and production of various auxiliary datasets over the years, land-cover classification has gradually transitioned from unsupervised classification to supervised classification [12] and from parametric classifiers to nonparametric classifiers. Parametric classifiers generally must be familiar with the prior probability distribution in the research area because there are great limitations in practical applications. Nonparametric classifiers do not require any prior knowledge, but they need enough training sample data to train the classifier, thereby obtaining better classification results than parametric classifiers [13], [14]. In recent years, corresponding land-cover classification products have been increasingly developed. For example, the European Joint Research Centre [15] produced GLC2000 with a spatial resolution of 1 km, and the European Space This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Agency (ESA) produced GlobCover [16]. Hansen et al. [17] produced University of Maryland (UMD), and Gong Peng et al. [18] produced FROM_GLC_Africa30. Although the supervised classification model has been optimized for land-cover classification to a large extent, some deficiencies remain. The ANN-based classifier offers no time-efficiency advantages. The decision tree classifier can be prone to problems such as overfitting. Gómez et al. [19] suggested that the random forest classifier has the best performance among several nonparametric classifiers.
In recent years, deep learning technology has developed rapidly. In an era of Big Data, its powerful sample learning and information mining abilities provide new technical support for land-cover classification research, and it has become an important remote sensing land-cover classification technology [20]. For example, Kussul et al. [21] used the convolutional neural network (CNN) to complete land-cover classification in Ukraine. Zhao et al. [22] used the multiscale convolutional neural network algorithm to classify various hyperspectral datasets for land-cover classification and found that the accuracy of the algorithm was considerably improved in urban areas.
However, the advantage of deep learning technology is that it relies on a large number of high-quality samples. When the number of training samples is limited, the network model will be overgeneralized, and the accuracy of the prediction data will be greatly reduced [23]. In the early days, sample data often relied on simple transformations of images to expand training samples, such as translation and rotation, etc., but these solutions have a strong dependence on the data itself.
In the current research, sample acquisition was primarily accomplished by means of manual delineation, a method that consumes a great deal of manpower and time, making the sample collection speed low. Consequently, it is difficult to obtain a large number of samples using this method [24]. Professional personnel are required to conduct onsite inspections of real objects, determine the spectral information and classification labels of objects, and then delineate artificial samples. However, there is a certain quality difference between manual delineation samples. Expert samples are more expensive, and ordinary samples have a high error rate. Labeling large amounts of high-quality sample data is time-consuming and expensive [25].
To solve this problem, many experts and scholars are committed to increasing the number of samples to improve classification performance. Wu et al. [26] clustered labeled and unlabeled pixels based on a clustering algorithm and finally used samples with pseudolabels to train a CNN. Although this method of sample augmentation improves the image classification performance to a certain extent, the algorithm is severely limited by the quality of the sample. Damodaran et al. [27] aimed at the problem of a poor deep network learning model caused by inaccurate training sample labels in pseudosamples and proposed an optimal transmission distance based on entropy regularization to improve the model's tolerance to noisy labels with an end-to-end deep neural network structure. To make full use of the limited sample data, Rao et al. [28] input the three-dimensional structural block of pixels as an independent unit into the network, provided a specific amount of sample data for each category of ground objects, measured the similarity between the existing marked samples and query samples, and determined the query sample data. Finally, they expanded the training samples and improved the classification performance of images.
To improve the relative accuracy of the sample data, Ma et al. [29] proposed a joint local decision-making strategy and a global strategy of deep learning based on the pixel weight information of the window neighborhood to jointly determine the semisupervised learning scheme of the sample data. The purpose of this method is to reduce the effect of noise in cheap sample data, thereby improving classification accuracy. Shi et al. [30] proposed a multiview collaborative learning enhanced sample strategy based on the CNN network, first using superpixels to enhance the initial training samples, then dividing the independent training set according to the feature multiview, and finally enhancing the classification performance according to the training set collaborative learning.
Although some progress has been made in solving the shortage of training samples, for deep learning, a competitive neural network model often requires more high-quality training samples. The training samples collected by the existing sample generation methods still cannot meet the needs of deep learning given the amount of input data, and there are specific noise labels in a large number of pseudolabels, which may mislead the network model. Particularly, when the training dataset is limited, the noisy labels will cause fatal interference to the network model. Clearly, the deep network structure is complex and computationally intensive. Additionally, with extremely high requirements on equipment and considerable time required to obtain a large number of high-quality sample data, its time efficiency is low. The problem of improving the quantity and quality of sample acquisition under the premise of ensuring time efficiency is an important one that currently limits the development of deep learning technology.
Aiming at the problem of insufficient high-quality samples in remote sensing land-cover classification, which limits the application of deep learning technology, this research study proposes a method to synergistically generate high-quality sample data based on the existing high-quality land-cover classification product data. Based on the existing high-quality land-cover classification products, this method can obtain a large number of high-confidence training samples under the condition of high time efficiency to support the data requirements of deep learning for training samples. For the methodology, in Section Ⅱ-A, high-quality sample sources, classification systems, and sample generation methods for sample generation techniques are introduced. To verify the improvement in the classification performance of the training samples produced by this sample generation technology, this article completes land-cover classification experiments with the support of high-confidence samples. In Section Ⅱ-B, the selection of study areas, high-confidence sample support, and assessment of classification accuracy for applied experiments are presented.
FROM-GLC-2015 is a global land-cover classification product with a spatial resolution of 30 m. It used Landsat 5/7/8 reflectance data, Chinese high-resolution satellites, resource satellites, environmental satellites, and the shuttle radar topography mission terrain data as the original dataset. It primarily used Landsat8 reflectance data from 2014 to 2015, and for Landsat 8 data with cloud cover exceeding 50%, it was replaced with Landsat 5 data. Compared to FROM-GLC-2010, FROM-GLC-2015 solved the problem of FROM-GLC-2010 caused by the long time intervals of the original data acquisition. The sample data of FROM-GLC-2015 were classified by visual interpretation combined with the Global Mapper software application. Global Mapper obtained a preliminary dataset based on MODIS EVI data, monthly temperature data, precipitation data, and Google Earth images of time series for interpretation. Training samples and validation samples were then cross-validated by experts and scholars with more than five years of experience in visual interpretation of global land-cover images. Finally, training samples and validation samples for the classifier were determined. These training samples were used to train the random forest classifier to obtain the land-cover classification results on a global scale. The land-cover types of FROM-GLC-2015 included 9 first-level categories and 25 second-level categories. The accuracy of the classification results was verified using the validation samples, and finally, the global land-cover classification product FROM-GLC-2015 with an overall accuracy (OA) of 70.2% for the first-level category was produced.
However, the research team of FROM-GLC-2015 indicated that there were some potential problems in this dataset. First, there were specific potential biases in the process of visual interpretation. The second problem was spectral reflectance data. Finally, only random forest classifiers were trained in the FROM-GLC-2015 production process, and some deep learning-based classifiers could be expected to improve the accuracy of global land-cover mapping [14].
GLC_FCS30-2015 is aimed at global-scale land-cover mapping. First, this research study used the Global Spatial Temporal Spectra Library global training sample data, a spatiotemporal image spectral library with a temporal resolution of 8 days constructed using the 2015 MCD43A4 NBAR surface reflectance product and the Climate Change Initiative Global Land Cover (CCI_LC) land-cover product produced by the ESA. Second, using the spectral and texture features of the time-series Landsat data and the corresponding training data, this research study proposed to extract the stable spectral position information of ground objects during the year from the image spectral library and import it into the Google Earth Engine to map the globalscale land cover. Consequently, the study adopted a time-series data synthesis scheme to carry out dimension reduction processing with Landsat surface reflectance data. Finally, random forest model training and prediction on tile-by-tile data using prior spectral location information of ground objects was conducted; a locally adaptive random forest model for each 5°× 5°g eographic tile was built, and GLC_FCS30-2015 containing 30 land-cover types for each tile was generated. The overall validation accuracy of GLC_FCS30-2015 was 82.5% [31]. However, the research team of GLC_FCS30-2015 also noted a few problems. First, because the training samples of GLC_FCS30-2015 came from the CCI_LC land-cover product, it inherited the fine classification system used in some areas of the source data, whereas other regions used the characteristics of the rough classification system. Consequently, although GLC_FCS30-2015 is a global 30-m land-cover product of 30 land-cover types, the 14 Land-Cover Classification System (LCCS) second-level detailed land-cover classifications are only applicable to some regions but not globally. Second, although the patch problem that occurs in the single-day land-cover classification has been resolved in GLC_FCS30-2015, there is still a slight boundary effect between adjacent tiles in the transition region [32].
To meet the needs of global change research and earth system model research for high-resolution land-cover data, the National Basic Geographic Information Centre led the development of GlobeLand30 and produced the world's first 30-m global landcover data product. GlobeLand30 divided the global area into five working areas, including Asia, Europe, Africa, America, and Oceania, and it extracted ten first-level categories by performing layered masking from simple to difficult as follows: water body, wetland, artificial surfaces, arable land, ice and snow, bare forest, shrub and grass, and tundra. First, when extracting each type of land cover, they used the product data layer of the extracted type to mask the 30-m resolution remote sensing image, leaving only the image data that did not include the extracted type. Second, in the production process, a global land-cover remote sensing mapping method combining "Pixel-Object-Knowledge" was developed, which broke through the complexity of the universal application of existing classification methods at a global scale. The classification errors caused by the same object with different spectrums and different objects with the same spectrum were significantly reduced. Finally, the splicing and cutting from the single-scene classification data to the standard frame were completed. The integrity of the data and the correctness of the code were checked, and the relevant coordinate information and metadata were established. Eventually, GlobeLand30 was produced with an OA of 83.50% [33], [34], [35].
2) Classification System: Because of the different problems addressed by the FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30, different classification systems were used for the three products.
FROM-GLC-2015 is the first full-season training and validation sample set using the Landsat 8 data for global land-cover classification to meet the needs of cartography and to study the global land-cover distribution characteristics of different landcover types at different resolutions. When designing the classification system for FROM-GLC-2015, the research team simultaneously modified the FROM-GLC-2015 land-cover system and the Globe Land Cover 2000 (GLC2000) classification system. The 30 second-level categories and 11 first-level categories in the FROM-GLC-2015 classification system and the 33 second-level categories and 8 first-level categories in the GLC2000 classification system were modified into the unique classification system of FROM-GLC-2015, including 25 second-level categories and  I  FROM-GLC-2015 LAND-COVER CLASSIFICATION TYPES CHANGED TO THE  UNIFIED CLASSIFICATION SYSTEM   9 first-level categories, reflecting the all-seasonal characteristics of various vegetation land-cover types. Table I presents the classification system of FROM-GLC-2015 [14]. GLC_FCS30-2015 aimed to address the lack of land-cover classification products with both fine classification systems and spatial resolution in global land-cover products. In the production process of GLC_FCS30-2015, the CCI_LC was selected as the training sample. The CCI_LC product has a detailed classification scheme, including 36 types of land cover, and has achieved high classification accuracy in homogeneous areas. The fine classification system used in GLC_FCS30-2015 inherits the classification system in the CCI_LC product after removing the four mosaic land-cover types, including mosaic natural vegetation, cultivated land, mosaic forest, grassland, and   TABLE II  GLC_FCS30-2015 LAND-COVER CLASSIFICATION TYPES CHANGED TO THE  UNIFIED CLASSIFICATION SYSTEM shrubs. The revised fine classification system includes 16 LCCS land-cover types and 14 detailed regional land-cover types. Table II presents the classification system of GLC_FCS30-2015 [31], [32].
GlobeLand30 was designed to meet the growing needs of scientific research applications, such as global change research and Earth system models. In the production process of Glo-beLand30, a unique classification system suitable for the product was developed, and technical specifications, such as the global land-cover classification method and classification system were formulated based on the classification accuracy target. Finally, a classification system was determined, including cultivated land, forest, grass land, shrubland, wetland, water body, tundra, artificial surfaces, permanent snow and ice, and bare land [34]. Table Ⅲ shows the classification system of GlobeLand30.
This research study aims to generate high-confidence samples based on the existing global land-cover classification products, but it does not make a detailed distinction of vegetation defoliation. Thus, second-level categories of FROM-GLC-2015 were not used. This study does not make a detailed distinction within the cultivated land or forest and grass land. Consequently, this study did not use the fine LCCS designed by GLC_FCS30-2015. After comprehensively considering the needs of high-confidence sample extraction in this study, we decided to adopt the classification system of ten main land cover types designed by the GlobeLand30 dataset, including cultivated land, forest, grass land, shrubland, wetland, water body, tundra, artificial surfaces, permanent snow and ice, and bare land.
3) Sample Generation Method: Existing high-quality landcover classification products ensure classification accuracy, but with all land-cover classification products, there are specific misclassifications. When using too many wrong samples as training samples for land-cover classification, although the number of training samples is expanded, this process reduces the relative accuracy of the samples, which will significantly reduce the classification accuracy. To rapidly expand the number of training samples and improve the accuracy of training samples at low time cost, this study proposes a new sample generation method.
This study selects three high-quality land-cover classification products as the sample sources for high-confidence samples (i.e., FROM-GLC-2015, GLC_FCS30-2015, and Glo-beLand30). The former two products were produced in 2015 and the last one in 2020. Because the time span of these products is not significantly long, changes in land-cover types caused by a long time span were not considered. Moreover, only the areas where the land-cover type has not changed are retained as the final high-confidence samples in this study. Thus, changes in land-cover types in a relatively short period of time will not have a large impact on the final acquisition of large-scale sample data. Then, the classification systems of FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 were unified. Finally, a series of training samples were generated that integrated the three characteristics of high efficiency, high quality, and large quantity.
For the same sample, different products may have different discrimination results for land-cover types, but the high-quality products have been verified by the research team to ensure that the land-cover types of most samples are consistent with the real land-cover types. In this method, high-quality land-cover products are used as the initial discrimination of sample land-cover type, and on the basis of this verified high-quality sample, the land-cover type of this sample is revoted. When the sample has zero votes, one vote, or two votes for a specific type of land cover, it is considered that there is a dispute in the type of the sample and that it is a low-credibility sample. To reduce the impact of low-accuracy samples on the classification accuracy, the low-confidence samples that cannot be completely guaranteed to be accurate are discarded. In the sample discrimination process, the number of votes for the sample type is three. That is, when the three high-quality products have the same type of discrimination, it is considered that the sample is unified in type discrimination, is consistent with the real surface type, and is a high-confidence sample. The time cost of this method of generating high-confidence samples is significantly less than that of methods, and the initial judgment of the samples saves considerable time for the generation of high-confidence samples. The method of voting on a sample-by-sample basis after the initial judgment is also low cost. This sample generation method significantly saves time, expands the number of samples to the greatest extent, ensures the quality of the samples, and allows the acquisition of a large number of high-confidence samples.

B. Typical Region Experiments Supported by High-Confidence Samples 1) Typical Region Selection and Landsat Data Preprocessing:
For the land-cover classification application supported by high-confidence samples, this study selected three typical regions in the global region for experiments. Fig. 1 shows the true color images of three typical regions. The three typical regions are the northern part of Qingdao City, Shandong Province, China, and its surrounding areas (hereinafter referred to as Typical Region 1), Moscow and its surrounding areas (hereinafter referred to as Typical Region 2), and southwest of Larkana, Sindh Province, Pakistan (hereinafter referred to as Typical Region 3).
Typical Region 1 is located at 119°30'E-121°00'E and 35°35'N-37°09'N, in East China, the southeastern part of the Shandong Peninsula. There are many types of vegetation in the northern part of Qingdao and its surrounding areas, with the land cover being highly complex and the types of land cover being relatively complete, enabling better use of high-confidence samples for land-cover classification research [5].
Typical Region 2 is located between 55°-56°N and 37°-38°E, in the middle of the Eastern European Plain, across the Moskva River and its tributaries, and the Yauza River. The reason for choosing Moscow and its surrounding area as a typical region is that it is a region in which vegetation surrounds an urban area. Compared with Typical Region 1, Typical Region 2 can better demonstrate the classification between types of forest, cultivated land, and artificial surfaces, which can obtain the application of high-confidence samples under the complexity of different surface types.
Typical Region 3 is located between 27°-28°N and 67°-69°E. It belongs to the Sindh Province, Pakistan, in terms of administration. Typical Region 3 comprises the southwestern part of Larkana in Sindh Province, and this region has a large area of bare land that the other two typical regions do not have in terms of land cover. It is also dominated by the agricultural economy. In terms of surface type, the selection of Typical Region 3 complements the lack of surface type in the first two typical regions, making the high-confidence sample type more complete and ensuring the realization of application experiments of multiple surface types.
Landsat data have been used in remote sensing land-cover classification research for many years because of their availability and stability. In this research study, the 2015 Landsat8 data are selected as the training data, and four phase data covering typical regions in spring, summer, autumn, and winter in 2015 are selected. This study selects the data of the study area in 2020 as the data to be classified, selecting four time-phase data for processing. The four phase characteristics are selected from spring, summer, autumn, and winter. Typical Region 1 has the same date with lower cloud cover. There are two scenes in each of the Landsat 8 images, and one scene in each of the Landsat 8 images with lower cloud cover is selected for Typical Region 2 and Typical Region 3.
2) High-Confidence Sample Support: This study selected the FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 as the sample extraction datasets. Through the method of highquality land-cover classification products cooperating to generate high-confidence samples, the datasets of high-confidence samples corresponding to three typical regions were produced. In terms of the number of samples, the number of highconfidence samples is less than any of the three high-quality land-cover classification datasets, but the number of samples obtained using this sample acquisition method is considerably larger than that obtained using the traditional sample acquisition methods. Additionally, to a large extent, it ensures the quality of the samples and reduces time consumption.
3) Random Forest: Features selected by this study, including six surface reflectance data and nine spectral index features in the four phases of spring, summer, autumn, and winter, add up to 60 features. In the feature selection of this study, multiple spectral index features were added because of the distinction of various types of land cover, including the normalized vegetation index (NDVI), normalized burning ratio (NBR), enhanced vegetation index (EVI), automated water extraction index (AWEI), normalized snow index (NDSI), tasseled cap greenness (TCG), tasseled cap brightness (TCB), tasseled cap wetness (TCW), and tasseled cap angle (TCA) [36]. Table IV contains formulas for calculating these spectral indices. One of the most widely used machine learning models, random forest, estimates feature importance in predictive processing [37], [38], [39]. In this study, the importance of features was ranked to show how input features influence the classification results. Fig. 2 shows the ranking of feature importance.
Among them, the NDVI, NBR, and EVI could reflect the impact of different types of land cover on vegetation-related indices. AWEI could reflect the impact of changes in land-cover types on the proportion of water bodies, and NDSI could reflect the impact of changes in land-cover types on ice and snow. TCB, TCG, TCW, and TCA are the components of the spectrum obtained by linear transformation and spectral space rotation. TCB represents the brightness of soil and reflects soil spectral information. TCG represents greenness and reflects vegetation spectral information. TCW represents the moisture content of ground objects and reflects the humidity information of the object. TCA represents the angle information of the object. In this study, the phenological periods and vegetation changes are considered, and four temporal characteristics are introduced to distinguish different land-cover types.
The results of high-confidence samples in typical regions are regarded as high-confidence sample datasets in the typical region, and the sample quality of these datasets meets the requirements for training samples. To verify the general applicability of the datasets produced by this sample generation method, only a set of sample data is randomly selected from the datasets corresponding to typical regions as training samples. Because the real land-cover types in all regions cannot reach a natural balance, there must be specific types of land-cover types that are the main land cover types in the region. In the application experiment, the balance of selecting training samples was also considered. Zhu et al. [40] found that extracting training data proportionally to the occurrence of land cover classes was superior to an equal distribution of training data per class. The problem of unbalanced training data was alleviated by extracting a minimum and a maximum.
Therefore, in the typical region application experiment, a group of high-confidence samples is randomly and proportionally selected from the corresponding typical region highconfidence sample dataset as the training samples of the typical region. In general, high-confidence samples are selected in proportion, and for the land-cover classification categories with too few high-confidence samples, all samples were selected. Additionally, the categories with too many high-confidence samples after the proportional selection process were restricted and reduced, with the proportion being maintained in principle. The final sample selection was completed after the number of overall samples was relatively balanced. The randomly selected samples were divided into training samples and verification samples based on a 7:3 ratio. The training samples were used to combine the Landsat 8 images for the random forest classifier training, and the verification samples were used to verify the accuracy of the random forest classifier [41], [42].

4) Accuracy Evaluation:
The authors trained the random forest classifier by using high-confidence samples as training labels. The 2020 Landsat 8 image data from the typical region were input into the random forest classifier for classification, and the final classification results were obtained. One thousand sample points were randomly selected from a typical region, and the random points were visually interpreted using Google Earth to obtain the actual land-cover category. The accuracy verification conducted on the experimental application results was supported by high-confidence samples, FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30. A confusion matrix was calculated for the verification points of each typical region, and a measure of classification accuracy was obtained from the matrix, i.e., the OA, producer accuracy (PA), user accuracy (UA), and kappa coefficient.

A. High-Confidence Sample Dataset
Using Typical Region 2 as an example to extract highconfidence samples, the sample datasets used were FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 datasets. The proportion of different types selected as high-confidence samples among the three high-quality land-cover products varied. Fig. 3 shows the proportion of high-confidence samples provided by each high-quality dataset. The high-confidence sample generation method selected samples with the same type of land cover in the same geographical location and building samples based on high-quality land-cover classification products, with the quality of the sample source being guaranteed. The samples obtained using this method have higher quality and reliability. Among the types of cultivated land, forest, grassland, water body, and artificial surfaces, this sample generation method used the same type of pixels in the same location as high-confidence samples, as the three types of products have different types in different regions. For wetlands, although the three high-quality land-cover classification products were distributed, only one pixel sample was generated after the high-confidence sample extraction process, which meant that the wetland type in the three products had only one high-confidence sample that met the requirements.  In the high-confidence dataset of Typical Region 1, cultivated land has the largest number of high-confidence samples, followed by artificial surface and water body. Fig. 4(a) shows the proportion of high-confidence sample types in Typical Region 1. Fig. 5(a) shows the sample distribution of FROM-GLC-2015 in Typical Region 1. The figure shows that cultivated land is the land-cover type with the largest coverage area, followed by artificial surface and water body, whereas grassland also has a relatively wide distribution. Fig. 5(b) shows the sample distribution of GLC_FCS30-2015 in Typical Region 1. The figure shows that cultivated land is the land cover type with the largest coverage area, followed by artificial land surface and water body, and the coverage areas of grassland and forest are significantly less than those of grassland and forest in FROM-GLC-2015. Fig. 5(c) shows the sample distribution of GlobeLand30 in Typical Region 1. The figure shows that cultivated land is the land-cover type with the largest coverage area, followed by artificial surface. The GlobeLand30 result is partially due to the absence of sea and is, therefore, significantly less than FROM-GLC-2015 and GLC_FCS30-2015 regarding the water-body coverage area. Fig. 5(d) shows the distribution of high-confidence samples in Typical Region 1. Cultivated land and artificial surface are the main land cover types in Typical Region 1, and a large number of high-confidence samples are obtained through the high-confidence sample generation method. Due to the difference in the distribution of forest and grassland in FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30, a small number of samples were selected as high-confidence samples according to the high-confidence sample generation method.
In the high-confidence dataset of Typical Region 2, the artificial surface has the largest number of high-confidence samples, followed by forest. Fig. 4(b) shows the proportion of high-confidence sample types in Typical Region 2. Fig. 6(a) shows the sample distribution of FROM-GLC-2015 in Typical Region 2. The figure shows that the interior of Typical Region 2 is dominated by artificial surfaces, and the exterior of Typical Region 2 is interspersed with forest and grassland. Fig. 6(b) shows the sample distribution of GLC_FCS30-2015 in Typical Region 2, and the spatial distribution of GLC_FCS30-2015 is similar to that of FROM-GLC-2015. In GLC_FCS30-2015, the interior of Typical Region 2 is dominated by artificial surfaces, and the exterior of Typical Region 2 is staggered with forest. Fig. 6(c) shows the sample distribution of GLC_FCS30-2015 in Typical Region 2. As shown in the figure, the interior of Typical Region 2 is dominated by artificial surface, whereas the exterior of Typical Region 2 is interspersed with forest and cultivated land, and the coverage area of grassland is small. According to the high-confidence sample generation method, the samples in FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30, where the land-cover types are consistent, are selected as high-confidence samples. Fig. 6(d) shows the distribution of high-confidence samples in Typical Region 2. A large number of high-confidence artificial surface samples were generated in Typical Region 2. However, in the exterior of Typical Region 2, due to differences in the determination of land surface types, a large number of high-confidence forest samples were generated, whereas a small number of grassland and cultivated land high-confidence samples were generated.
In the high-confidence dataset of Typical Region 3, cultivated land has the greatest number of high-confidence samples, followed by water body and bare land. Fig. 4(c) shows the proportion of high-confidence sample types in Typical Region 3. Fig. 7(a) shows the sample distribution of FROM-GLC-2015 in Typical Region 3. The figure shows that cultivated land is the land cover type with the largest coverage area, followed by bare land. The bare land is concentrated in the west of Typical Region 3, and the grassland is discretely distributed inside the region. The water body coverage is distributed in the west and southeast and is smaller than that of grassland. Fig. 7(b) shows the sample distribution of GLC_FCS30-2015 in Typical Region 3. Cultivated land is the land cover type with the largest coverage area. However, different from FROM-GLC-2015, the land cover type distributed in the west of Typical Region 3 is grassland and shrubland, with only a little bare land. The water body distribution is basically consistent with that of FROM-GLC-2015. Fig. 7(c) shows the sample distribution of GlobeLand30 in Typical Region 3. Cultivated land has the largest coverage area, followed by bare land. Grassland and shrubland are mainly distributed in the discrete areas near the water body on the southeast side and on the north side. Fig. 7(d) shows the distribution of high-confidence samples in Typical Region 3. The high-confidence sample dataset of Typical Region 3 contains a large number of high-confidence samples of cultivated land. Due to the difference between FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 in determining land cover types in the west of Typical Region 3, a small number of high-confidence samples are generated.

B. Land-Cover Classification Results Based on High-Confidence Sample Dataset
In this study, the high-confidence sample generation method was used to generate three high-confidence sample datasets suitable for the corresponding regions in three typical regions. A set of balanced land-cover sample data was selected from the high-confidence datasets of three typical regions, and Landsat data and the random forest model were used to conduct high-confidence sample application experiments in typical regions. Intuitively, the classification results supported by highconfidence samples from three typical regions accurately show the spatial distributions of various land-cover types, which are consistent with the actual spatial pattern. Table V shows the accuracy evaluation of the experimental results supported by high-confidence samples and three land-cover products FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 in three typical regions. In the three typical regions, the OA of the land-cover classification results supported by high-confidence samples is significantly higher than that of FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30.
In Typical Region 1, the OA of the classification results of the application experiments supported by high-confidence samples is 83.9%, and the Kappa coefficient is 0.707. Fig. 8 shows the land-cover classification results supported by high-confidence samples of Typical Region 1. Table VI summarizes UA and PA for different land-cover types in the classification results supported by high-confidence samples, FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 in Typical Region 1. From the perspective of UA, among the land-cover classification results supported by high-confidence samples, artificial surface has the highest accuracy, followed by cultivated land, forest, water body, and grassland. As the main land cover types in Typical Region 1, cultivated land and artificial surface maintain high precision. The accuracy for grassland is relatively low, with 26.7% of it classified as cultivated land and 13.3% as artificial surface. From the perspective of UA, grassland and cultivated land are the most accurate in FROM -GLC-2015, GLC_FCS30-2015, and GlobeLand30. From the perspective of PA, among the land-cover classification results supported by high-confidence samples, cultivated land has the highest accuracy, followed by grassland, water body, forest, and artificial surface.
In Typical Region 2, the OA of the classification results of the application experiments supported by high-confidence samples is 86.6%, and the Kappa coefficient is 0.769. Fig. 9 shows the land cover classification results supported by the highconfidence samples of Typical Region 1. Table VII summarizes UA and PA for different land cover types in the classification results supported by high-confidence samples, FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 in Typical Region 2. From the perspective of UA, among the land-cover classification results supported by high-confidence samples, grassland has the highest accuracy, followed by artificial surface, water body, forest, and cultivated land. For cultivated land, 4.6% is classified as forest, 9.1% as grassland, and 18.2% as artificial surface. For forest, 2% is classified as cultivated land, 11.4% as grassland, and 14.1% as artificial surface. For grassland, 1.4% of grassland is classified as cultivated land and 3.7% as artificial surface. This indicates that there are different degrees of confusion among cultivated land, forest, and grassland, and these three types are also misclassified as artificial surface types in Typical Region 2. From the perspective of UA, forest and artificial surface are the most accurate in FROM-GLC-2015 and GLC_FCS30-2015. In GlobeLand30, the accuracy for cultivated land and artificial surface was the highest, whereas the accuracy for grassland was the lowest. For grassland, 8% was classified as cultivated land, 14.3% as forest, and 29.5% as artificial surface. This suggests that the confusion between the grassland and the artificial surface results in a loss of accuracy. From the perspective of PA, among the classification results supported by high-confidence samples, forest and artificial surfaces were the most accurate, followed by grassland, cultivated land, and water body.
In Typical Region 3, the OA of the classification results of the application experiments supported by high-confidence samples is 88.8%, and the Kappa coefficient is 0.721. Fig. 10 shows the land-cover classification results supported by highconfidence samples of Typical Region 3. Table VIII summarizes UA and PA for different land-cover types in the classification results supported by high-confidence samples, FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 in Typical Region 3. From the perspective of UA, among the land-cover classification results supported by high-confidence samples, cultivated land has the highest accuracy, followed by artificial surface, shrubland, and bare land. Grassland has the lowest accuracy: 20.8% was classified as cultivated land, 6.2% as shrubland, 6.2% as water body, 4.2% as artificial surface, and 2% as bare land. This indicates that there is some confusion between grassland and cultivated land, resulting in a loss of grassland accuracy. In FROM-GLC-2015, the accuracy for bare land and water body was the highest, whereas that for the artificial surface was the lowest. In GLC_FCS30-2015, the accuracy for cultivated land and artificial surface was the highest, whereas the accuracy for water body was the lowest. In GlobeLand30, the accuracy for cultivated land and grassland was the highest, whereas the accuracy for forest was the lowest.
From the perspective of PA, cultivated land and bare land have the most accurate classification results supported by highconfidence samples, followed by water body, artificial surface, grassland, shrubland, and forest.

IV. DISCUSSION
It is challenging to gain efficient access to large numbers of high-quality land-cover samples, which greatly influence the classification accuracy for remote sensing land-cover. The application of deep learning technology in remote sensing landcover classification improves the classification accuracy to a large extent, but it also puts forward higher requirements for the quality and quantity of samples. The acquisition of a large number of high-quality land-cover classification samples has become a major factor restricting the application of deep learning technology in remote sensing land-cover classification. In this study, we propose a method for the collaborative generation of high-quality sample data based on existing high-quality land-cover classification product data. By discriminating the land-cover types of FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30 pixel by pixel, the samples with the same landcover types among the three were selected as the high-confidence samples. Therefore, three high-confidence sample datasets are generated in three typical regions, and a large number of highconfidence samples are obtained on the premise of ensuring time efficiency. The samples generated in this study provide a large number of samples with high confidence for typical regions, reducing the impact of wrong samples participating in the classification on the classification accuracy. The typical region application experiments using samples cogenerated with high-quality products also show that the classification accuracy in the same region is improved.
In Typical Region 1, the classification results supported by high-confidence samples maintain spatial homogeneity with FROM-GLC-2015, GLC_FCS30-2015, and GlobeLand30. However, the OA for the classification results supported by the high-confidence samples is higher than that for the three landcover products. There are different degrees of confusion among cultivated land, forest, and artificial surface in Typical Region 1. In the classification process of cultivated land, forest, and grassland, this research study considers that these three land-cover types can be prone to serious confusion, so multiple vegetation indices are introduced-such as the NDVI, EVI, NBR, and TCG indices. The feature importance ranking showed that TCG had the greatest correlation with land-cover classification results.
This indicates that the addition of TCG has somewhat alleviated the confusion of land-cover types consisting of different vegetation, such as cultivated land, forest, and grassland. The vegetation index was added as a feature to the training of the random forest classifier, with its effect on cultivated land and forest being relatively clear. However, there remained some confusion when comparing grassland. Most of the confusion between cultivated land and artificial surface in Typical Region 1 arose in complex areas where small rural residential plots and cultivated land were mixed. The mariculture area is misclassified because its texture features are similar to those of cultivated land, which somewhat affects the classification accuracy for water body in FROM-GLC-2015 [43], [44].
In the spatial distribution of land-cover types in Typical Region 2, the central area is dominated by artificial surface, and the surrounding area is dominated by grassland and forest. The highconfidence sample support method provides a large amount of sample data for these two surface types and ensures the quality of the sample data, thereby improving the classification accuracy. At the urban boundary of Typical Region 2, forest and artificial surface are somewhat mingled, causing confusion between the two surface types in the classification results. However, from the experimental results, the high-quality samples provided by the high-confidence sample dataset for grassland and artificial surface significantly improve the classification accuracy. Even when all the artificial surface classifications are highly accurate, because of their high quality, the samples produced in this study further improve the classification accuracy. Experiments show that the generated high-confidence samples discard some controversial samples, thus improving the sample quality and ultimately the classification accuracy. FROM-GLC-2015, GLC_FCS30-2015, and Globeland30 provide a large number of high-confidence samples in Typical Region 2, and the three land-cover classification products were also verified to have high accuracy in Typical Region 2 [45], [46]. The results of land-cover classification supported by high-confidence samples showed high consistency with those FROM-GLC-2015 and GLC_FCS30-2015. Globeland30 shows a large area of cultivated land in Typical Region 2, and this study verified that some cultivated land areas are confused with grassland.
The main land cover type of Typical Region 3 is cultivated land, and there is a large amount of bare land in Typical Region 3. Regarding the land-cover type, the selection of Typical Region 3 is to supplement the bare land omitted in the previous two typical regions. In the nonphenological period, no crops were planted in the cultivated land areas, which somewhat affected the distinction between bare land and cultivated land. When obtaining high-confidence samples in the three western regions of the typical region, because of differences in the discrimination of bare land types among FROM-GLC-2015, GLC_FCS30-2015 and Globeland30, the number of high-confidence bare ground samples is reduced, and only uncontroversial high-confidence bare ground samples are generated. These misjudged areas are primarily located in the west of Typical Region 3. After visually interpreting many sample points, it was determined that this area is indeed a bare land type, and the misjudgment of many bare land types reduces the OA. In GlobeLand30, the classification accuracy for bare ground types is higher than that of GLC_FCS30-2015. However, it is lower than that of bare ground supported by FROM-GLC-2015 and high-confidence samples. The high-confidence samples used in typical regional experiments are derived from high-quality products. However, the classification accuracy supported by these samples is higher than that of high-quality products. In this study, the most credible samples in the high-quality products were selected as training samples, which improved the OA for Typical Region 3 [47], [48].

V. CONCLUSION
Deep learning technology has been widely used in the field of remote sensing and has become one of the more important technologies in land-cover classification. Traditional methods of obtaining samples, such as the conventional manual delineation of land-cover classification samples, not only consume a great deal of time and resources but also cannot meet the requirements of sample quantity and quality at the same time. Moreover, inefficient sample acquisition methods limit the development of deep learning techniques for remote sensing land-cover classification.
The method proposed in this research to synergistically generate sample data using existing high-quality surface classification products could obtain a large number of samples on a global scale, with the advantages of high efficiency, low time consumption, and large data volumes. This research study selected several typical regions around the world and conducted experiments supported by high-confidence samples, demonstrating the following results.
1) The method proposed in this research to synergistically generate sample data using existing high-quality landcover classification products could generate a large volume of sample data for selected regions on a global scale in a short period of time. The study selected three typical regions around the world to conduct high-confidence sample support experiments. The results showed that the random forest classification accuracy based on the sample support generated using the sample generation method proposed in this research was considerably higher than that of the three types of land-cover products examined. 2) With the support of high-confidence samples, combined with spectral, index, and temporal features, a random forest classifier was trained to classify the land cover in the study areas. According to the research in this article, the sample generation method proposed in this study can provide a large number of high-confidence samples for deep learning. Some problems that need further investigation remain. In this study, only spectral features were considered during feature extraction, but texture features were not considered. Consequently, in future research, a method of introducing texture features combined with spectral features should be considered. For further research on land-cover classification, it would be necessary to consider the balance problem of further optimizing the sample. In the classification results, some artificial surfaces were classified incorrectly as cultivated land, with the confusion occurring primarily between villages and cultivated land. This study also tried to use the difference between different phases to solve this kind of confusion, but the results showed that the addition of this difference feature did not improve this kind of problem, which still requires to be solved in future research. Combining all high-quality samples in the region for deep learning classification could improve the classification accuracy, and it remains a key research direction for high-confidence sample extraction methods.