An Ensemble Learning Approach for Land Use/Land Cover Classification of Arid Regions for Climate Simulation: A Case Study of Xinjiang, Northwest China

Accurate classifications of land use/land cover (LULC) in arid regions are vital for analyzing changes in climate. We propose an ensemble learning approach for improving LULC classification accuracy in Xinjiang, northwest China. First, multisource geographical datasets were applied, and the study area was divided into Northern Xinjiang, Tianshan, and Southern Xinjiang. Second, five machine learning algorithms—k-nearest neighbor, support vector machine (SVM), random forest (RF), artificial neural network (ANN), and C4.5—were chosen to develop different ensemble learning strategies according to the climatic and topographic characteristics of each subregion. Third, stratified random sampling was used to obtain training samples and optimal parameters for each machine learning algorithm. Lastly, each derived approach was applied across Xinjiang, and subregion performance was evaluated. The results showed that the LULC classification accuracy achieved across Xinjiang via the proposed ensemble learning approach was improved by ≥6.85% compared with individual machine learning algorithms. By specific subregion, the accuracies for Northern Xinjiang, Tianshan, and Southern Xinjiang increased by ≥6.70%, 5.87%, and 6.86%, respectively. Moreover, the ensemble learning strategy combining four machine learning algorithms (i.e., SVM, RF, ANN, and C4.5) was superior across Xinjiang and Tianshan; whereas, the three-algorithm (i.e., SVM, RF, and ANN) strategy worked best for the Northern and Southern Xinjiang. The innovation of this study is to develop a novel ensemble learning approach to divide Xinjiang into different subregions, accurately classify land cover, and generate a new land cover product for simulating climate change in Xinjiang.


I. INTRODUCTION
M onitoring land use/land cover (LULC) is indispensable for investigating earth system processes, as it can provide thematic information of the earth's surface while capturing biotic and abiotic characteristics that closely correlate with the ecological conditions on the ground [1]. Moreover, accurate mapping of LULC can greatly improve analyses of environmental change and is more capable of reflecting the interactions between human activities and the geographical environment [2], [3].
Although these LULC products are diverse and readily available, they are plagued by large inconsistencies in their accuracies [16]. For example, the overall accuracies of FROM-GLC, MCD12Q1, GLC2000, and GlobeLand30 are 64.9%, 78.8%, 77.9%, and 80.3%, respectively [11], [13], [14], [17]. Accordingly, it remains difficult to compare and combine these products to extract more accurate LULC information related to ecological, hydrological, and climatological studies [18], [19], especially the study of climate change. Previous studies have shown that the accuracy of land-cover data below 80% has a This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ great impact on the results of precipitation research, and, as the accuracy continues to decrease, the results may become worse [20]. Unfortunately, both the overall and class-specific accuracies of most datasets do not meet the common requirements for regional climate modeling [21]. Furthermore, research has shown that the accuracy of these products over Xinjiang are significantly lower than that in other regions [22], [23], thus failing to meet the high requirements for optimal use in this area and restricting further improvements in research related to regional-climate simulations, desertification monitoring, and ecosystem service assessments. Therefore, it is necessary to generate high-precision land-cover datasets for climate simulation based on existing land-use, land-cover, and some auxiliary datasets.
To overcome these limitations, the objective of this study is to present an ensemble learning method for LULC classification in Xinjiang with complex topographic areas. Furthermore, this study aims to generate a set of high-precision land-cover data via the application of multisource geographical datasets, including Landsat 8 OLI images, FROM-GLC, MCD12Q1, ESA-LC, a digital elevation model (DEM), the enhanced vegetation index (EVI), net primary production (NPP), and leaf area index (LAI). The Xinjiang Province in northwest China was chosen as a typical case, and it was divided into three subregions according to the complex topography and climate. Subsequently, different ensemble learning strategies were developed for each sub-region. Lastly, training samples were obtained, and the parameters of machine learning algorithms were calculated to assess the accuracy of each ensemble learning-enabled LULC classification. The main contribution of our study lies in taking topographic conditions into account in land-cover classification and developing a new ensemble learning method to accurately classify land cover in Xinjiang. Furthermore, we developed a new land-cover product using a two-level classification system, which solves the problem of low accuracy of land-cover classification in Xinjiang climate simulation and can be used in other research areas, such as territorial space planning and hydrological simulation. The outcomes of this study will help gain a deep understanding of the interactions between land and atmosphere in arid areas.
The article is organized as follows: Section II summarizes the theoretical basis of machine and ensemble learning in LULC classification. Section III introduces the study area and data used. Section IV describes the basic framework of the proposed method Section V analyzes the experimental results. Section VI discusses the broader application and future works. Finally, Section VII summarizes the findings and conclusions of this study.

A. LULC Classification
LULC classification accuracy is closely related to the remote sensing data used and the classification method chosen [24]. With the advancement of computational abilities and remote sensing technologies, machine learning has gradually become one of the most effective methods for LULC classification [25], [26]. Among the available methods, the most widely employed classification algorithms for remote sensing imagery include Knearest neighbor (KNN), decision trees (DTs), random forest (RF), support vector machine (SVM), artificial neural network (ANN), and extreme learning machine [27].
Previous studies have shown that owing to the complexity of multisource remote sensing imagery data, distinct machine learning algorithms have different advantages when classifying certain LULC types; thus, any single classifier is limited in its ability to significantly improve the accuracy of all LULC types [28]. Accordingly, identification of the optimal classification algorithms remains challenging. For example, in one LULC classification within an individual area, SVM accuracy for irrigated herbs was 92%, while that for oak forests was only 72% [24]. Recently, classifier ensembles (i.e., multiple classifier systems) have received considerable attention in remote sensing image analyses owing to their high classification accuracy by exploiting the advantages of different classifiers while minimizing their limitations [29], [30], [31]. Various combinatorial strategies have been developed and widely used to integrate different classifications. For example, Chen et al. employed SVM, C4.5, DTs, and ANNs to construct ensemble learning classification, resulting in an overall accuracy (OA; 88.12%) and Kappa (0.87) value superior to those of any basic classifiers [32]. Hu et al. used two ensemble methods based on ANNs to classify LULC in the Zoige wetlands of China, showing that the ensemble technology improved the classification ability and stability of any single ANN [26].
In addition, the successful implementation of classification methods largely depends on the characteristics of the study area and nature of relevant data [33]. Therefore, to improve the effects of multiclassifier ensemble learning, both the machine learning algorithm and feature data characteristic must be considered [32]. Ultimately, compared to traditional classification methods, machine learning algorithms are efficient and effective because they do not rely on normal assumptions or statistical parameters [34], yet appropriate classification-algorithm selection for a given area remains essential.

B. Machine Learning
The theoretical basis of the ensemble learning classification methods used in the present study is summarized in this section. For a deeper understanding of the theoretical background of a particular algorithm, refer to the references provided. 1) KNN: KNN is a nonparametric machine learning algorithm and one of the simplest instance-based regression and supervised classification techniques [35]. It determines the sample category by evaluating the distance relationship between an unknown category and K adjacent training samples, ultimately calculating the similarity degree between the two [36]. For further details on the KNN algorithm, see [35]. 2) SVM: Support vector machine (SVM) is a machine learning method proposed by Vapnik in the 1990s and was initially applied on the recognition of hand-written digits (i.e., pattern recognition) [37], [38], [39]. It is characterized by a small number of training samples, high noise resistance, support for high-dimensional data, and strong stability [32]. Moreover, it can generate high accuracy for modeling complex nonlinear decision boundaries and is not easy to be over fitting. Therefore, it is a supervised learning technique commonly used in a series of remote sensing applications [25]. A detailed description of the SVM algorithm can be found in [40]. 3) RF: RF is a modeling classification algorithm that integrates multiple unrelated classification and regression (decision) trees using a bagging strategy. It is characterized by eliminating generalization errors to achieve nondeviation classification, thereby enhancing classification accuracy, particularly for multisource remote sensing classification [41]. Accordingly, RF has been widely employed in LULC classification due to its high accuracy [42]. For detailed introduction to RF, refer to paper [43]. 4) ANN: ANN is a machine learning algorithm developed to simulate the ability of the human brain to resolve problems related to pattern recognition [34]. It is advantageous owing to its nonlinearity, strong anti-interference, high adaptability, parallel processing, and self-organization of learning-process characteristics; therefore, ANN has been applied to ever-increasing remote sensing image classification in recent years [44]. Refer to the following resources for further ANN algorithm details [45], [46]. 5) C4.5: C4.5 is a decision tree algorithm modified on the basis of the previously developed Iterative Dichotomiser 3 algorithm [47], and is characterized by its advantageous strong logic, simple rule set, and effective suppression of image noise for suitable multisource remote sensing image data classification [48]. Refer to the following resources for further C4.5 details [47].

C. Ensemble Learning
Ensemble learning is a method of training multiple machine learning algorithms to improve the predictive performance and classification accuracy based on the additive effect of the advantageous characteristics [49]. In addition to fully utilizing the respective advantages of different classifiers, ensemble learning can address the problem of over-fitting any single classifier in instances with small amounts of data [50], [51]. Refer to the following resource for further details on ensemble learning [52]. , respectively [53]. The average annual precipitation in the Taklimakan Desert is < 50 mm·yr −1 , whereas that in Tianshan is ∼800 mm·yr −1 [54]. Xinjiang has a unique spatial shape for a mountain basin [55], with the general topographical characteristics of three mountains and two basins: the Altai Mountains to the north, Tianshan in the middle, and Kunlun Mountains to the south. Between the Altai Mountains and Tianshan, there lies the Gurbantungut Desert in the Junggar Basin, while the Taklamakan Desert in the Tarim Basin lies between Tianshan and the Kunlun Mountains [56]. The unique geographical location and topographical characteristics in Xinjiang have created a unique ecosystem in the region.

B. Data
Landsat satellite-image data are often used to classify regional-level LULC owing to their free availability and global coverage [57]. Here, because spring and summer images contain most of the phenological changes [24], [58], Landsat 8 L2 images were downloaded from the United States Geological Survey website for Xinjiang. The images from April 1, 2015 to August 31, 2015 were used in LULC classification. The EVI, normalized difference building index (NDBI), and improved normalized water index (MNDWI) were calculated via Landsat bands 2, 3, 4, 5, 6, and the composite spectral index of Landsat 8. Additionally, four sources of LULC data in Xinjiang-1) CLUDs, 2) FROM-GLC, 3) CCI-LC, and 4) MCD12Q1-were used in the present study. To ensure highly accurate LULC classification, auxiliary data including the following: 1) vegetation type maps of 11 populations and 54 types; 2) physical geographical data including elevation (DEM) and slope information; 3) NPP; 4) LAI were also employed here. Table I lists all data and sources used in the present study.

A. Basic Framework
Previous studies have shown that geographical characteristics (e.g., complex landforms) and diverse climates affect LULC classification accuracy [24]. Considering the complex topography and diverse climate of the study area, Xinjiang was divided into three sub-regions, and different ensemble learning strategies were developed for each to optimize overall classification accuracy. The basic framework was as follows (see Fig. 1).
First, the subregional division according to climate and topographic features is discussed in Section IV-B. Second, the spatial consistency of different products was analyzed, and, on this basis, a hierarchical random sampling method was adopted to obtain machine learning training samples (see Section IV-C). Third, machine learning algorithms suitable for the study area were selected according to different data characteristics of Landsat 8, NPP, EVI, DEM, LAI, etc. (see Section IV-D). Lastly, a confusion matrix was used to evaluate the accuracy of different ensemble strategies of LULC classification across each subregion, and the most suitable technique was selected to draw the spatial patterns of LULC classification in Xinjiang.

B. Regionalization
The spatial distribution of vegetation is affected by both climate and terrain variability. For example, changes in topographic characteristics drive the gradual shift in surface vegetation from low mountain deserts, arid grasslands, irrigated farmlands, and broadleaf forests to mid-mountain grasslands, evergreen needleleaf, and mixed forests, and further to alpine meadows/dwarf shrubs [59]. Accordingly, during LULC classifications, the introduction of geographic divisions by climate and topography can reduce the probability of errors. Based on meteorological station data and the DEM, KNN was used to further divide the research region into three sub-regions: Northern Xinjiang, Tianshan, and Southern Xinjiang (see Fig. 2). Refer to [55] and [60] the partitioning method for details.

C. Spatial Consistency Analysis
In the present study, an inference rule for spatial data mining based on grid consistency was proposed to identify LULC types where the primary methods consisted of spatial feature extraction and consistency analyses. First, for the zoning shape to facilitate LULC classification, fishnet was created using ArcGIS v.10.2, (resolution, 0.25 km 2 ). Second, four LULC products, vegetation type, and other auxiliary data were extracted from the fishnet feature points by extract multiple values to points. Then, the net feature points and net data were spatially aggregated to obtain the net data containing 13 types of attribute information. Lastly, the spatial consistency levels of CLUDs, FROM-GLC, CCI-LC, MCD12Q1 and vegetation type maps were analyzed using the grid comparison method under the IGBP classification systems.

D. Machine Learning Algorithm Selection
During data mining, the characteristics of remote sensing data, as well as the advantages and disadvantages of various machine learning methods must be considered before classifying LULC [34], [41]. Combining different characteristics of Landsat 8, NPP, EVI, DEM, LAI, etc., the most suitable machine learning algorithm for LULC classification in Xinjiang was screened according to the following process: First, the EVI and LAI of different vegetation types can be similar; therefore, KNN classification was adopted here because it is based on the similarity of training sample data. Secondly, because remote sensing data have nonlinear properties (e.g., Landsat 8 spectral information data, DEM, etc.), the radial basis kernel function (RBF) of SVM can map nonlinear data in high-dimensional space, thereby generating classification hyperplane decision boundaries and making it linearly separable. Therefore, SVM was employed in the present study. Moreover, we observed that Landsat 8 contained "salt and pepper" noise. When compared with other machine learning techniques, RF showed the greatest insensitivity to noise, training stability, and efficiency; therefore, it was employed here. Because the spectral mixing degree of Landsat 8 data is high, especially in montane areas with complex topography, ANN was also employed here because it adapts well to the characteristics of rich texture and high spectral confusion of remote sensing. Furthermore, numerous missing values in the remote sensing data throughout alpine areas (especially in Tianshan) were observed here owing to cloud cover. Because C4.5 has a superior processing effect on samples with missing values, it was also selected here. For different machine learning combinations, model training was carried out across Xinjiang, as well as the three sub-regions to identify the most suitable parameters (see Section V-B for specific parameters).

E. Ensemble Learning Strategy
Knowledge of the differences between base classifiers is key to constructing ensemble learning [61]. Considering the differences, advantages, and disadvantages of different machine learning techniques, different stacks of ensemble learning strategies across Xinjiang and its three subregions are proposed here. First, the random sample data were divided into two at a 3:1 ratio of training to validation data. Second, ≥2 of the base classifiers (KNN, SVM, RF, ANN, and C4.5) were selected as level 0 of the stacking ensemble strategy model, and RF was selected as level 1. Lastly, across Xinjiang and the three subregions, the ensemble learning models under different stacking strategies were trained, and the OA, producer's accuracy (PA), user's accuracy (UA), and Kappa were compared to obtain the optimal LULC classification ensemble strategy.

F. Evaluation Indicators
The classification effect of machine learning algorithms represents the most critical evaluation index. A confusion matrix was used to evaluate the classification abilities of different algorithms and ensemble learning strategies. Accordingly, the OA, PA, UA, and Kappa were calculated from the confusion matrix according to the following equations: where N represents the total number of training samples; r is the number of rows in the confusion matrix; X ii is the number of samples in row i and column i of the confusion matrix (i.e., on the diagonal at the intersection); and X +i and X i+ are the marginal totals of row r and column i, respectively. Although the OA and Kappa are the two most popular metrics for assessing classification accuracy, the samples used to calculate Kappa cannot always be independent in all cases because the same test set is used when evaluating the accuracy of each map [34]. Hence, the pairwise Z-score test was used to evaluate whether the differences in classification accuracy among different ensemble learning strategies were statistically significant [62]. A Z-score greater than 1.96 was considered statistically significant at the 5% level.
Field surveys and visual inspection are the ideal methods used to select samples to obtain high land-cover classification accuracy [13], [63]. Therefore, in order to compensate for the shortcomings of the confusion-matrix schedule evaluation, 42 sampling points were selected across Xinjiang, which overall covered all land cover types described in this study. For each sampling point, we identified real land cover type of the sampling point through field photography and Google Earth.

A. Spatial Consistency
On comparing the five remote sensing products, the inconsistency among the LULC classification was readily apparent, particularly for the forest classes. The CAS divided the forest into forested land, open forest, shrubland, and other forests, whereas other products split forest cover into evergreen and deciduous on the basis of leaf fall, and broadleaf and needleleaf on the basis of leaf size. Accordingly, to best facilitate comparison, different classification schemes were converted into IGBP schemes (Table  S1, see the Supporting Information); 0 represents complete disagreement between the five products, and 5 indicates perfect consistency among them. The highest consistency of forest was 4 because the forest in the CAS classification cannot be converted into IGBP schemes, and, for other land types it was 5. Here, it was assumed that when the consistency of certain forest attributes was ≥ 3, it was defined as the corresponding forest type, whereas all other land types were defined as the corresponding land type only when the consistency was ≥4. Fig. 3 shows the spatial consistency level of the five LULC products as well as the sampling points under the IGBP classification system. Notably, complete or level 4 consistency was achieved for water areas, deserts, and the Gobi belt. In the transition zone between oasis and desert or mountains, the observed data consistency level was  generally <3. Fig. 4 shows the proportion of machine learning training samples in different regions.

B. Machine Learning Parameters
The key to KNN classification lies in selecting an appropriate K value; thus, KNN classification accuracy was tested here using K ranging from 1 to 30. The RBF of SVM is primarily controlled by two parameters: the cost (C) and weight of the RBF kernel (γ). Here, the optimal combination of C (2 −2 , 2 −1 , 2 0 , 2 1 , 2 2 , 2 3 , 2 4 , 2 5 , 2 6 , 2 7 ) and γ (2 −5 , 2 −4 , 2 −3 , 2 −2 , 2 −1 , 2 0 , 2 1 , 2 2 , 2 3 , 2 4 ) was tested on training and validation samples. RF classification relies on the number of random samplings as a candidate variable (Mtry) and the number of random trees (Ntree) [34], [64]. It is recommended that Mtry be set as the square root of the input variable, and Ntree be set as a multiple of 500 to achieve the optimal classification effect [42]. Therefore, different Mtry values from 1 to 16 and Ntree values from 500 to 5000 (500 intervals) were examined here. ANN accuracy depends largely on the number of hidden layer nodes, which ranged from 1 to 50 in the present study [34]. Furthermore, the neurons that receive input information are then transformed using an activation function to ensure nonlinear prediction [26]. This study used the sigmoid function as the activation function [65]. Alternatively, C4.5 does not require any parameter setting due to its inherent simplicity. The machine learning parameters are shown in Table II. Table III shows the effects of different classifiers across Xinjiang and the three subregions, revealing that KNN had the best classification effect on all assessed regions (except for Tianshan), of which northern Xinjiang has the best performance (OA = 94.67%); however, according to the PA and UA (see Fig. 5), KNN maintained poor classification effects on evergreen needleleaf forests, shrublands, paddy fields, and industrial and mining across Xinjiang. Specifically, the classification accuracy for evergreen needleleaf forests, mixed forests, shrublands, dry lands, paddy fields, and industrial and mining in Tianshan was poor, whereas the classification of evergreen needleleaf forest, mixed forest, and paddy fields was poor in Northern and Southern Xinjiang. This might be due to the imbalance of KNN [66]; some of the land LULC types with high PA or UA increased the OA [25]. Therefore, OA and Kappa, in addition to PA and  UA, must be considered when employing machine learning to classify LULC [32], [67], [68].

C. Classification Effect of Single Classifier
Except for KNN, C4.5 had the highest classification accuracy across Xinjiang (OA = 86.91%; Table III), whereas SVM had the highest classification accuracy in Tianshan (OA = 90.73%), and RF had the highest classification accuracy in Northern and Southern Xinjiang (OA = 92.87% and 92.54%, respectively). When comparing the classification effects for the three subregions, the average OAs for Southern, Northern Xinjiang, and Tianshan were 91.98%, 91.46%, and 90.13%, respectively, notably higher than the average OA across Xinjiang (86.26%). Thus, geographically subdividing the study area based on topographic conditions further improved the LULC classification accuracy by machine learning. Additionally, Tianshan maintained the lowest accuracy among the three subregions, indicating that terrain may be the reason for the low classification accuracy and the importance of considering topographic factors during LULC classification [69], [70].

D. Ensemble Learning Classification
Accordingly, this study adopted SVM, RF, ANN, and C4.5 to construct various ensemble learning models for LULC classification in Xinjiang, and its three subregions. The OA and Kappa of each ensemble strategy were compared to select the optimal classification method, and the accuracy results are shown in Table IV.
The greatest level of improvement in LULC classification accuracy was achieved across Xinjiang, with the OA and Kappa increasing by ≥6.85% and 8.25%, respectively. By subregion, Tianshan and Northern Xinjiang were the next most improved, with OAs increasing by ≥5.60% and 5.24%, and Kappa values increasing by ≥6.63% and 5.99%, respectively. The lowest increase was observed in Southern Xinjiang, where the OA and Kappa increased by ≥5.01% and 6.25%, respectively. Thus, it was concluded here that ensemble learning classification based on different single classifier combinations can significantly improve LULC classification accuracy across Xinjiang and its three subregions, especially in montane areas.
Moreover, by comparing the different ensemble learning classifications in the three subregions and across Xinjiang, the OA and Kappa of the four-machine learning algorithm-SVM, RF, ANN, and C4.5-ensemble strategy in Tianshan and across Xinjiang was found to be significantly higher than those of any three-machine learning ensemble, where the OA increased by 1.99%-2.33% and 1.88%-2.75%, and the Kappa increased by 2.58%-4.52%, and 3.12%-5.48%, respectively. However, in Northern and Southern Xinjiang, the OA values of any three-machine learning ensemble strategies were similar, with an increase of ≤1.00% and 0.80%, and Kappa increased by ≤0.48% and 1.01%, respectively (see Table IV). Thus, the number and type of classifiers in ensemble learning have a strong influence, with a theoretical ideal number and type necessary for achieving an optimal classification effect [71], [72].
The significance-test results (see Table V) show that the Z-scores of pairs different ensemble learning combination strategies across Xinjiang and three subregions. There were significant differences between the combination of four different machine learning algorithms (i.e., SVM, RF, C4.5, ANN) and each of the combinations of three different machine learning algorithms (Z This proved that the ensemble learning strategy combining four machine learning algorithms (i.e., SVM, RF, ANN, and C4.5) was superior across Xinjiang and Tianshan; whereas, the strategy combing three algorithms (i.e., SVM, RF, and ANN) worked best for Northern and Southern Xinjiang.
Additionally, compared with the ensemble learning classification across Xinjiang, the ensemble learning in the three subregions can significantly improve the accuracy of LULC classification, with the OAs of Southern Xinjiang, Northern Xinjiang, and Tianshan increasing by 6.86%, 6.70%, and 5.87%, respectively. Thus, geographically subdividing Xinjiang based on topographic conditions effectively reduced the misclassification of vegetation types caused by terrain differences, or highly heterogeneous surface vegetation in the LULC classification across the complex conditions of Xinjiang. Fig. 6 shows the PA and UA of different LULC types throughout Xinjiang and its subregions under different ensemble learning strategies. Compared with machine learning classification (see Fig. 5), ensemble learning significantly improved the PA and UA of different land classes. Among them, in the entire region of Xinjiang, forests (evergreen needleleaf, deciduous needleleaf, deciduous broadleaf, mixed, and shrubs), urban, industrial and mining as well as snow and ice, exhibited the most obvious improvement effects, with the PA and UA of forests increasing by 3.55%-20.22% and 4.81%-14.98%, 11.52% and 12.82%, 14.81% and 11.85%, and 12.44% and 19.68%, respectively. By subregion, the PA and UA improvement effects of Northern Xinjiang were similar to the entire region of Xinjiang for forests (8.42%-19.93% and 8.54%-13.42%, respectively), urban (15.29% and 11.29%), paddy fields (21.41% and 18.81%), industrial and mining (17.85% and 13.98%), as well as snow and ice (11.59% and 13.68%).

E Ensemble Learning Effects on Classification
The PA and UA in Tianshan increased the most dramatically for forests (6.15%-24.02% and 6.70%-18.01%, respectively), paddy fields (17.36% and 24.56%), industrial and mining (20.67% and 25.54%), and permanent wetlands (14.32% and 13.68%). In southern Xinjiang, the most significant improvement in PA and UA were observed for forests (5.07%-16.12% and 5.73%-24.64%, respectively), paddy fields (10.66% and 15.13%), industrial and mining (13.48% and 10.98%), permanent wetlands (17.84% and 13.85%), as well as snow and ice (20.64% and 12.08%). Overall, the correct classification of forests, permanent wetlands, paddy fields, urban, industrial and mining, as well as snow and ice in ensemble learning, were the keys to its higher classification performance compared to any single classifier. Table VI shows the comparison of PAs and UAs for Xinjiang and its sub-regions. Compared with the entire region of Xinjiang, subregion ensemble learning showed the following three improvement trends for different land categories: First, shrublands, permanent wetlands, dry land, and paddy fields, showed significant improvement effects, with their PAs and UAs increasing by an average of 7.04% and 7.21%; 11.15% and 6.82%; 5.29% and 5.38%; and 7.03% and 7.18%, respectively. This may be due to the spectral and phenological characteristics of the same vegetation types across different geographical regions being too dissimilar, as it is difficult to identify the same vegetation types using ensemble learning across Xinjiang, whereas subregion level ensemble learning reduced any difference in surface spectra and vegetation phenology within a given vegetation type, thereby improving the classification. Second, the improvement effects of mixed forests, evergreen needleleaf, deciduous needleleaf, and deciduous broadleaf forests, in addition to grasslands, croplands/natural vegetation mosaics, urban, rural, and industrial and mining were clear in one or two zones. For example, grassland classification improved significantly in northern Xinjiang but not in either of the other two sub-regions. Lastly, snow, ice, and water showed insignificant improvement effect and even negative correlation in the three sub-regions. For example, the PA of snow and ice in Northern Xinjiang and Southern Xinjiang decreased by -1.08% and -3.94%, respectively; whereas the UA of water in Tianshan decreased by -2.45%, possibly because these LULC types are unique, and do not require ensemble learning methods to identify them. The spectral signature for these land classes may be an overfit, thus resulting in the mild decrease in accuracy observed. Furthermore, the ensemble learning under complex topographic conditions here may have been limited by too few training samples. Fig. 7 shows the spatial distribution of LULC in Xinjiang in 2015 based on ensemble learning classification, in which the three-machine learning (SVM, RF, and ANN) ensemble strategy was used in Northern and Southern Xinjiang, and four-machine learning (including C4.5) ensemble strategy was used in Tianshan. Evergreen needleleaf forests were mainly distributed in the Altai Mountains of Northern Xinjiang. Alternatively, deciduous needleleaf and mixed forests were mainly distributed in Tianshan, whereas deciduous broadleaf and shrub forests were mainly distributed on the oasis edge of Southern Xinjiang. Grassland and sparse vegetation were mainly distributed in the Altai Mountains, Tianshan, Kunlun Mountains, and desert transition zone of northern Xinjiang; furthermore, grasslands at the edge of the oasis were also abundant. Dry land, paddy fields, urban, and rural were mainly distributed near river oases; wetlands around rivers and lakes; and snow/ice was mainly distributed in the Tianshan and Kunlun Mountains.

F. LULC Products Based on Ensemble Learning
Based on the proposed method of LULC classification in Xinjiang, a map of LULC data (resolution 500 m), including the entire of Xinjiang, and its three sub-regions under IGBP classification scheme was generated. The coordinate system is WGS84 coordinate system using horizontal Mercator projection. Data can be opened, viewed, and processed using Esri ArcGIS software. Our findings can be used for climate change simulations, as well as for assessing ecological services and the hydrological cycle across Xinjiang.

VI. DISCUSSION
This section further discusses potential improvements, comparison with other land-cover products, broader application, and future scope of this study.

A. Limitations
In this study, owing to limited data sources, various remote sensing data with different spatial resolutions were resampled into coarse resolutions to produce the final land cover product at 500 m. However, raster data model is a representation using regular grids and the discrete cells in a raster structure cannot reflect the accurate boundary of spatial objects. Therefore, the process of resampling high-resolution data to coarse resolutions inevitably suffers from a loss of spatial information. The properties of a spatial object such as the area, shape, and location can be lost during the process [73]. The consistency of topological relations between different spatial objects may also be lost [74]. This introduces different types of errors and uncertainties when further employing the resampled product for other research.
For classified land-use data, effective approaches to reduce the loss in spatial properties have been proposed. Specifically, many studies have focused on the vector-to-raster conversion and have put forward different methods to preserve an individual spatial property of area, topology, or shape [75], [76], and to achieve balanced preservation of various properties [77]. In addition, some researchers have paid attention to the resampling process and presented an area-preserving method for maintaining the area property after resampling categorical raster data [78]. In this study, we have chosen the majority sampling method to upscale the remote sensing data to reduce the loss in spatial properties. However, the errors and uncertainties introduced by the resampling process still remain unclear. In future, we would like to evaluate such type of errors and investigate effective approaches to further preserve important spatial properties for land cover classification.

B. Comparison With Other Land Cover Products
The difference between classification schemes is one of the main factors leading to inconsistent classification results [16]. This also makes rigorous comparison and collaborative use of different maps challenging [13], [23]. Moreover, to compare land cover datasets that use different schemes, it is generally necessary to convert land cover products with more detailed classes to land cover classification schemes with fewer classes. This may result in loss of the ability to describe detailed land cover features and the conversion is not possible, such as forest land in CAS classification scheme cannot be converted to IGBP classification scheme [79].
Considering that main purpose of this study was to generate a new land cover data product that can be suitable for simulating climate change across Xinjiang, the internationally used IGBP land cover classification system is optimal for famous climate models such as WRF model and MM5 model for land cover classification [80]. Because the classification system of MCD12Q1 [81] and CCI-LC [12] land-cover products is an IGBP classification system or can be easily converted to an IGBP classification system, we selected the resolution of MCD12Q1 and CCI-LC land cover products to be 500 m by field sampling method, and compared the accuracy of these products with our product. As can be seen from Table S2 (see the Supporting Information), the classification accuracy of MCD12Q1 and CCI-LC land cover products in Xinjiang was only 59.52% (25 out of 42 sampling points are correct) and 66.67% (28 out of 42 sampling points are correct), which is consistent with the results of previous studies [16], [23], [79]. In comparison, the sampling accuracy of our land cover products reached 88.09% (37 out of 42 sampling points are correct).

C. Ensemble Learning Strategy of This Study
In this study, Xinjiang, with its highly heterogenous LULC under complex topographical conditions, was chosen as a representative study region and was divided into three sub-regions according to the climatic and topographic characteristics. Then different ensemble learning strategies were formulated, and the accuracy of land cover classification was compared for each subregion. Furthermore, significance test results (see Table V) show the Z-scores of pairs different ensemble learning combinations strategies across Xinjiang and its three subregions.
The results of this study show that in the complex topographic area of Tianshan compared with the three different ensemble learning enabled land cover classifications, the four machine learning algorithm ensemble strategies in Tianshan can improve the OA by at least 1.99%-2.33% (see Table IV), and the difference of classification accuracy passed the significance test of Z score (Z ≥1.96 and p ≤0.05) (see Table V). In Tianshan, with its complex topographical conditions and extremely uneven spatial distribution of LULC, the integration of more machine learning algorithms may be necessary for greater accuracy; using fewer algorithms may be insufficient for constructing an ensemble, which is stronger than the single classifier in LULC classification of Tianshan [71]. However, the OA for the northern Xinjiang and southern Xinjiang could only be improved by at most 0.99% and 0.80%, and there was no significant difference. As previous studies have indicated, an optimal number of component classifiers for an ensemble can obtain the most accurate results with these classifiers. Increasing or decreasing the number of classifiers from this ideal point may worsen the prediction or not add any benefit to the overall performance [72], [82]. Some studies suggest the number of class labels in a dataset as the ideal number of component classifiers [72], [83]. However, the real-world data are very complex and it is still challenging to determine the ideal number of classifiers in ensemble learning. More machine learning in ensemble learning classification might not always yield better classification effect [72]. For example, compared with the combination of three machine learning algorithms (i.e., SVM, RF, ANN), the OA of the combination of three machine learning algorithms (i.e., SVM, RF, C4.5, ANN) only improved by 0.04% and 0.51% in Northern and Southern Xinjiang.

D. Broader Application and Future Work
Previous studies have shown that the complexity of topographic conditions is the main factor leading to the high heterogeneity of LULC estimates, which greatly limits the improvement of LULC classification accuracy [24]. Accordingly, the influence of topographic conditions must be considered when classifying LULC for large areas [69]. To reduce the impact of topographic factors on ensemble learning classification, it is necessary to subdivide complex topographic areas according to climate and topographic characteristics. The method used here has been proved to be effective for improving accuracy. According to this logic, geographically subdividing the study area based on topographic conditions can potentially be an effective method for further improving the classification accuracy of typical arid regions [34], [84]. Therefore, the subregions used here could potentially be subdivided further, especially the Tianshan region, for additional improvements in accuracy [60], [85], [86].
Future efforts should focus on following factors: First, five primary types of machine learning algorithms (KNN, SVM, RF, ANN, and C4.5) were applied in the present study. A larger number or more diverse assortment of machine learning algorithms could be employed for more efficient ensemble learning. In addition, the stacking strategy is primarily used in ensemble learning for classification, and other diversity strategies should be tested in LULC classification; for example, Bagging and Boosting, could also be assessed for their capacity to improve accuracy. Moreover, Xinjiang is only divided into three regions: Northern Xinjiang, Tianshan Mountain, and Southern Xinjiang. Future research needs to further subdivide Xinjiang to improve the classification accuracy of land cover. Such as, the Tianshan can be subdivided into the North Tianshan, the East Tianshan, and the West Tianshan [34]. Furthermore, the presented ensemble learning approach, which considers the difference in various geographical regions, can also be applied in other global locations with similar geographical characteristics, including montane areas with complex topographic conditions and arid desert-oasis mosaic landscapes with highly heterogeneous surface vegetation types. Lastly, the resolution of the generated LULC products in this study was 500 m, and higher resolution remote sensing imagery could be combined for achieving an improved resolution of the results.

VII. CONCLUSION
Here, an ensemble learning approach was proposed for performing LULC classification in Xinjiang, northwest China. The study area was divided into three subregions according to climate and topographic characteristics, and five machine learning algorithms-KNN, SVM, RF, ANN, and C4.5-were integrated to develop different ensemble learning strategies for LULC classification. The accuracy and efficiency of each proposed ensemble learning approach were evaluated and analyzed, and the following three primary conclusions were drawn.
First, compared with individual machine learning algorithms, the ensemble learning strategy proposed here significantly improved LULC classification accuracy, with the approach having the greatest effect on Tianshan (OA and Kappa values were increased by 5.60% and 6.63%, respectively), followed by Northern Xinjiang (5.24% and 5.99%), and Southern Xinjiang (5.01% and 6.25%). In addition, the correct classification of different forest types via ensemble learning was the main contributor driving the higher classification accuracy.
Second, optimal combinations of machine learning algorithms were revealed for use in ensemble learning so that LULC classification was optimized for each subregion. Specifically, adopting SVM, RF, and ANN in the ensemble learning strategy was the most efficient for Northern and Southern Xinjiang (OA values of 96.35% and 96.92%, respectively), whereas a strategy employing SVM, RF, ANN, and C4.5 for ensemble learning was the most efficient for Tianshan (OA = 96.33%).
Lastly, the impacts of the proposed ensemble learning approach on classifying different land types could be summarized into three scenarios: First, the proposed approach was efficient for all three sub-regions, particularly for land types including shrublands, permanent wetlands, dry land, and paddy fields. Second, the approach was more successful for one or two subregions with mixed forests, evergreen needleleaf, deciduous needleleaf, and deciduous broadleaf forests, in addition to grasslands, croplands/natural vegetation mosaics, urban, rural, and industrial and mining. Third, the approach was not successful in any of the three subregions with land types of snow, ice, and water.
In summary, we introduced complex terrain and climate conditions of Xinjiang into land-cover classification, and developed a new ensemble learning method to achieve accurate land-cover classification. Furthermore, we produced a new land-cover product with a two-level classification system, which solves the problem of low accuracy of land-cover classification in Xinjiang climate simulation and can be used in other research areas, such as territorial space planning and hydrological simulation.