Local and Global Spatial Information for Land Cover Semisupervised Classification of Complex Polarimetric SAR Data

Each of the three satellites constituting the RADARSAT Constellation Mission (RCM) provides compact polarimetric synthetic aperture radar (CP SAR) data. The complex CP data have similar properties to the complex quad polarimetric (QP) data provided by prior RADARSAT missions. In this article, a land cover classification method using spatial information is designed based on the statistical characteristics of the complex CP and QP SAR data. First, the local spatial dependency among pixels is captured by superpixels. Second, a graph is constructed on the superpixels to model the global spatial dependency among superpixels. The land cover classification image with land cover type labels is then estimated by propagating labels from the few labeled superpixels to the unlabeled superpixels. Classification of two RCM complex CP and QP scenes demonstrates that the proposed method, with few labeled pixels, provides much higher classification accuracy than methods that do not exploit global spatial dependency.


I. INTRODUCTION
T HE RADARSAT Constellation Mission (RCM) satellites, as the successors to the RADARSAT 1-2 satellites, were launched in June 2019. These satellites provide data in a variety of acquisition modes that are suitable for different applications such as maritime surveillance, ecosystem modeling, and disaster management. In addition to the RADARSAT 1-2 dual-polarized (DP) and quad polarimetric (QP) synthetic aperture radar (SAR) modes, RCM also has a compact polarimetric (CP) SAR mode. A CP SAR is a dual-polarized SAR that coherently captures the received backscatter, and, in this manner, the phase difference information between the two received linear polarizations is maintained [1]. Therefore, using CP is advantageous over using only intensity images: like-and cross-polarized two-dimensional mappings of radar brightness. A simple yet robust implementation of compact polarimetry is to radiate a circular polarization in transmit and receive two orthogonal linear polarization on receive. This mode has been implemented on the RCM.
Land cover classification is an important task in monitoring the Earth's surface. The high-resolution RCM complex SAR data can be used for land cover classification [2]. QP SAR data have the full information acquired by a polarimetric SAR; however, the typical swaths covered by a QP SAR are relatively small. Therefore, CP SAR is an attractive alternative since, as demonstrated by many studies [3], [4], [5], [6], CP classifications are comparable to those from a QP SAR, and the RCM CP scenes can be acquired in wide swaths (∼350 km), which are more suitable for studying larger Earth regions. Many studies on land cover classification using SAR data are dedicated to QP SAR data [7], [8], [9], [10], and there has been very limited work on CP land cover classification [11], [12]. In this article, we propose a land cover classification method that models the "local" (using a superpixel-based approach) and "global" (using a graph-based formulation) spatial dependency information in the QP and CP SAR data types.
Using superpixels [13] is advocated in computer vision algorithms since compared with the rigid pixel representation of images, a superpixel representation utilizes the "local" spatial dependency between adjacent pixels and, by forming pixel groups, greatly reduces the number of image primitives [14], [15], [16]. A superpixel segmentation algorithm is expected to preserve the image boundaries by generating homogeneous superpixels that contain only one surface object type [16]. Moreover, the superpixel segmentation should be adaptive, in that it should capture the important local details without having to utilize many superpixels in homogeneous regions [17], i.e., unnecessarily segmenting homogeneous regions into too many superpixels. Our proposed segmentation algorithm utilizes statistical properties of QP and CP SAR data calculated based on superpixels.
Coherent acquisition of the backscatter in a CP SAR allows for constructing the Stokes vector, or equivalently, the 2 × 2 complex coherence matrix. As demonstrated in the previous work [18], [19], both the CP 2 × 2 complex coherence and the QP 3 × 3 covariance matrices have a complex Wishart distribution. Based on this characteristic of the QP covariance and CP coherence matrices, Yu et al. [20] and Ghanbari et al. [19] proposed QP SAR and CP SAR extensions of an unsupervised segmentation algorithm called Iterative Region Growing with Semantics (IRGS). IRGS is a region-based approach combined with a Markov random field (MRF) spatial context model where the data likelihood (the unary potential) and the labeling (the pairwise potential) terms are uniquely defined [21].
Full polarimetric IRGS (PolarIRGS) [20] and compact polarimetric IRGS (CP-IRGS) [19] utilize the statistical properties of the complex QP and CP data types in both the unary and pairwise potentials of IRGS. IRGS is region-based, in that the statistical properties of regions are used, and, as a result, it is less sensitive to speckle noise and incidence-angle-induced spatial nonstationarities in SAR images [22]. The nonstationarities in SAR images are defined as the variation in the backscatter values for the same radar target in near and far ranges. In other words, mainly due to the incidence effect, the statistics for a particular class (open water for instance) vary across the scene. Another advantage of IRGS is that it uses edge strength in its formulation to assist in determining when adjacent superpixels should be merged [21]. These characteristics make PolarIRGS and CP-IRGS algorithms effective for superpixel segmentation. PolarIRGS and CP-IRGS model local spatial dependency using edge strength and information from neighboring superpixels.
The superpixels are then used in a graph-based classification approach. Considering each superpixel as a vertex, a fullyconnected undirected graph is constructed on the superpixels to model the "global" spatial dependency between superpixels. Once the graph is constructed, learning involves assigning labels to the superpixels. Given a few labeled superpixels, graph learning is performed in a semisupervised manner [23] where the information from labeled as well as unlabeled superpixels is used to predict the labels of unlabeled superpixels. The advantage of this method is that a very limited number of labeled superpixels, which are spatially distributed over the scene, are needed as the labeled data [24].
A semisupervised graph learning is based on the assumption that the vertices connected by a high-similarity edge are likely to have the same label [25]. In essence, the labels of labeled data propagate to the unlabeled data according to the adjacency information of the graph vertices [26]. Inspired by the recent work on hyperspectral classification by Sellars et al. [24] and Jia et al. [27], in this article, we propose semisupervised fully-connected graph learning based on label propagation [26] for complex QP and CP SAR data. The adjacency relationship between the superpixels is measured by a metric that consists of two components: 1) spatial and 2) backscatter difference. While the spatial difference is based on the spatial distance between superpixel centroids, the backscatter distance is measured by a distance measure between complex matrices [28].
In summary, we have designed, implemented, and tested a scene classification approach that models the local and global spatial (LGS) dependency in a unified framework using complex QP and CP via superpixels and a graph-based approach. The novel approach involves semisupervised fully-connected graph learning to classify superpixels by modeling the global dependency among all scene superpixels. The graph-based approach uniquely characterizes superpixel differences by spatial distance and statistical properties of complex matrices.

II. BACKGROUND
Recently, the incorporation of spatial dependency among pixels in image classification methods has drawn increasing attention [27], [29], [30]. In general, studies have incorporated spatial dependency information in classification in two main categories: 1) using derived features for classification and 2) designing a classification method that inherently incorporates spatial dependency. In the first category, "hand-crafted" texture features based on gray-level co-occurrence matrices (GLCM) [31], [32] and Gabor filters [33] are extracted and used in classification. To overcome the difficulties in effective feature representation, deep convolutional neural networks (CNNs) inherently learn features [30], [34], [35].
In the second category, the spatial dependency effect is embodied in the classification method [27], [36], [37]. Remote sensing image classification methods inherently incorporating the spatial dependency effect are based on either pixels or superpixels. In a pixel-based classification method [38], a hypergraph constructed on image pixels with spatial and spectral hyperedges was used in a semisupervised method for image classification. In more recent studies, image classification based on superpixels has attracted significant interest. In a study by Lv et al. [37], to overcome the high computation demand of CNNs, a superpixel CNN classification was proposed. They compared various superpixel segmentation methods while using different-scale deep CNN features. Recently, a graph was constructed on superpixels generated from a hyperspectral image using the entropy rate segmentation method [27]. Then, an adaptive version of the dynamic label propagation method [39] was designed to pass labels from labeled to unlabeled superpixels.
The advantage of capturing spatial dependency in QP SAR data was demonstrated by Frey et al. [40] using the Potts model in a Bayesian classification scheme. They showed that to improve the classification results, incorporating spatial dependency information has more impact than enhancing the statistical modeling of complex QP data. Yang et al. [9] employed a sequence of polarimetric responses with different orientations called dynamic texture to model spatial dependency. Recently, semisupervised classification of QP SAR data has been used by many researchers to exploit both labeled and unlabeled data information to obtain accurate classifications using limited labeled samples. Hou et al. developed a graph-based semi-supervised classification that incorporates the spatial consistency of labels. The classifier was proposed to deal with the impurity of QP SAR pixels-containing more than one land cover type-and inaccurate labeled data [41]. A multiattribute graph model was developed by Liu et al. [42] for QP SAR land cover classification. In their method, after the spatial dependency effect was modeled as a term in their objective function, a weight for each graph and the label of unlabeled pixels were optimized. Following the idea of increasing the diversity of classification, Wang et al. [43] proposed a tritraining-based algorithm where three groups of QP-derived features were used to train three different classifiers. In recent semisupervised classification studies, CNNs were employed to model the spatial dependency effect in the classification framework [44], [45]. A CNN classification network incorporates two semantic priors to preserve the spatial consistency and boundaries [44]. Active learning has also been integrated into a CNN-based architecture [45] to select the most informative training data for annotation based on the CNNs output. Also, an MRF model was applied to the output probability maps of the CNN to encourage spatial consistency.
A few methods also aimed to integrate superpixel-driven information into the deep neural networks [46], [47]. Other QP semisupervised classification methods based on CNNs either perform on a pixel-level basis or lack an effective model for the spatial dependency effect [48], [49]. Recently, a multiscale graph constructed on superpixels was proposed to overcome the limitations of pixel-based classification techniques [50].
From this discussion, we recognize the following shortcomings based on observations drawn from the research literature.
1) Using CP SAR data as a source of data for the classification of land cover types has yet to be studied using spatial dependency information. Although there has been a great amount of work on land cover classification using QP data, there is a limited number of CP land cover classification studies. 2) Research using dependency among superpixels in SAR images in a graph-based approach is limited. We are not aware of published methods that learn the global dependency using superpixel-based graphs for the purpose of SAR land cover classification. 3) Superpixel segmentation methods used in the land cover classification studies are generic and do not account for the statistical properties of QP SAR data. We implement a method that is able to address these shortcomings, which are as follows.
1) We design and implement a land cover classification method for CP complex data that is also applicable to complex QP. 2) This method utilizes the local spatial information via superpixels. A graph-learning method is then developed to effectively model the global spatial dependency among the superpixels.
3) The statistical properties of QP and CP SAR data are used to perform the superpixel segmentation. In the following, the fundamental properties of SLC CP and QP SAR data are described.

A. Fundamental Properties of Complex SAR Data
In an SLC CP SAR dataset, the data measurement is a 2 × 1 complex vector E that corresponds to the backscattered field. The radar scattering matrix S relates the incident field to the backscattered field [51] whereû t is the unit Jones vector related to the incident field (transmit) and E r is the backscattered field (receive). E HC and E V C are the CP measurements where the transmit field has circular polarization and on receive, the polarizations are horizontal and vertical, respectively. It is defined here that E r is shown by E CP in the CP case to be distinguished from the QP case.
The measurement in the case of an SLC QP SAR dataset is the scattering matrix S in (1) that can be represented as a vector using the lexicographic basis set as Ω QP = [S HH √ 2S HV S V V ] when the reciprocity assumption, S HV = S V H , holds.
By multiplying E CP and Ω QP by their Hermitian conjugates, the Hermitian positive semidefinite complex CP coherence (C CP ) and QP covariance (C QP ) matrices are derived, respectively [51], [52] (2) where · · · shows temporal or spatial averaging, † indicates Hermitian conjugate, * is complex conjugate, and n l is the number of looks used for averaging. As demonstrated in previous studies [18], [19], assuming the E CP and Ω QP measurement data are complex Gaussian distributed, the matrices In other words, the 2 × 2 CP coherence matrix and the 3 × 3 QP covariance matrix are both positive semidefinite and both follow the complex Wishart distribution [18], [19]. Therefore, in terms of statistical properties, the main difference between these two is the size of these matrices. The similar statistical properties of the complex CP and QP data allow us to design a unified classification method. Using both data types to evaluate the proposed method allows for a better assessment of the robustness and generalization capability of the proposed semisupervised method.

A. Overview
The proposed land cover classification method mainly consists of two components. First, using the multilook complex QP/CP data, the PolarIRGS/CP-IRGS algorithm generates the superpixels on the scene. In this manner, the local spatial dependency among pixels as well as the strong edges in the image is preserved. Second, a fully-connected undirected graph is constructed on the superpixels. Each vertex in the graph corresponds to a superpixel for which three different features are derived: 1) the mean complex matrix (calculated using the superpixel's pixels), 2) the weighted superpixel-based mean complex matrix (calculated using the mean complex matrices of a superpixel and its neighboring superpixels), and 3) the superpixel spatial centroid. The affinity matrix, where each element represents the similarity between the corresponding superpixels, is then constructed using these three features. The affinity matrix comprises the global dependency information between all pairs of superpixels across the complex data.
A few pixels that are spatially distributed all over the image were manually labeled and, then, the superpixels containing the labeled pixels were assigned the same label. The spatial distribution of the labeled superpixels is important since the semisupervised method propagates labels from labeled to unlabeled superpixels. Using the labeled superpixels and the affinity matrix, the graph-based method estimates the labels of all superpixelsincluding the labeled and unlabeled ones-generating the land cover classification map of the SAR scene. The block diagram of the proposed land cover classification method is shown in Fig. 1. The PolarIRGS and CP-IRGS superpixel generation algorithms are described next. Then, the details of the calculation of the graph affinity matrix and the graph-based approach are explained.

B. Superpixel Generation Using PolarIRGS and CP-IRGS
To generate homogeneous superpixels, we apply PolarIRGS and CP-IRGS segmentation algorithms. PolarIRGS and CP-IRGS, as the extensions of the IRGS algorithm for QP and CP SAR data, were inherently developed for the application of unsupervised image segmentation [19], [20]. In the historic names of IRGS algorithms [19], [20], [21], a "region" is used to describe a group of pixels that belong to a class. The term "superpixel" is also used to refer to a group of pixels; however, a superpixel does not have a class label. Many different superpixel generation algorithms have also been proposed in the literature [16], [53], [54]. The simple linear iterative clustering (SLIC) algorithm is a well-referenced superpixel generation method [53]. Compared to SLIC, PolarIRGS and CP-IRGS algorithms have the following three main advantages.
1) The PolarIRGS and CP-IRGS were designed based on the statistical properties of complex QP and CP data, and as such, they are more suitable for these data types.
2) The IRGS-based algorithms explicitly incorporate the concept of edge strength to better preserve the edges when generating superpixels.
3) IRGS-based algorithms merge oversegmented initial superpixels to perform segmentation. This allows the adjacent small superpixels to be merged and create a minimal number of homogeneous superpixels that match the nature of the local region.
Here, a short description of the PolarIRGS and CP-IRGS segmentation methods is provided. Assuming S is the image and s ∈ S is an image pixel. Also, let x = {x s |s ∈ S} represent the image data and y = {y s |y s ∈ M, s ∈ S} is a label configuration on the image with discrete-valued random variables y s having a value from the label set M = {1, . . ., m}. The purpose of image segmentation is to find the optimum label configuration y * from the set of possible label configurations Y. IRGS is superpixel-based and uses a region adjacency graph (RAG) [55], G = (V, E), where V and E denote the image superpixels as vertices and arcs that are the boundaries of adjacent superpixels. Thus, a superpixel v ∈ V in the image contains a set of image pixels denoted by S v . The optimization problem in PolarIRGS and CP-IRGS is solved by minimizing two energy terms [19], [20], [21] where C s is the complex QP or CP matrix of the pixel s, C i is the mean complex matrix over all the pixels that are labeled i from the set M, g(∇ s ) is called the edge penalty term [21], v i is a subset of all superpixels with label i, and ∂v ij indicates all the boundary pixels from superpixels labeled i and j where these superpixels are adjacent to each other. The edge penalty function g(∇ s ) is a monotonically decreasing function that is smaller for a strong edge than when the edge between two superpixels assigned to different classes is weak. In this manner, two neighboring superpixels are assigned the same labels in segmentation only when the edge between two superpixels is weak [21]. The parameter β controls the smoothness of the segmentation with the greater values of β leading to smoother segmentation results. This parameter, in particular, allows us to control the level of oversegmentation of superpixels.
A main advantage of the IRGS algorithm is incorporating a greedy superpixel merging method in each iteration of the optimization. This increases the algorithm's speed in moving toward the optimized segmentation. Starting from an oversegmentation, a superpixel merging process is executed in each iteration. For each pair of neighboring superpixels with like labels, (4) is calculated. Adjacent superpixels with like labels that reduce the energy the most are merged [21].

C. Semisupervised Graph-Based Classification Method
The output of the PolarIRGS and CP-IRGS segmentation is a label configuration with labels from the set M. To use this segmentation, we assign a unique label to each superpixel in the segmentation map. In this manner, each superpixel in the segmentation is assigned a unique label. Then, an undirected graph is constructed with each superpixel as a vertex in the graph. The graph is uniquely shown by the corresponding affinity matrix A ∈ R N ×N (also called weight/similarity matrix) that is N × N where N is the number of vertices/superpixels and each element in the matrix indicates the similarity between a pair of superpixels in the graph. The affinity matrix in the proposed method is defined as where A l ij is related to the similarity of the superpixels i and j in terms of their location in the image where L i is a vector of two scalars corresponding to the mean x and y coordinates for all the pixels in the ith superpixel. Also, · represents the l 2 -norm distance, and σ l is the width of the RBF kernel. A c ij is the term corresponding to the similarity of superpixels based on the mean C m and the weighted superpixelbased mean C w complex matrices where D MH (C 1 , C 2 ) represents the statistical dissimilarity between the complex matrices C 1 and C 2 calculated by the maximum value of Hotelling-Lawley traces [56] This metric has been demonstrated to be more effective in the application of change detection as compared to the likelihood ratio test [56] and other similar metrics such as symmetric Wishart distance and Kullback-Leibler divergence [57]. In (7), C m i is the mean complex matrix over all the pixels in the superpixel i and C w i is defined as which calculates a weighted average of the K neighboring superpixels of the superpixel i. The weight from the kth superpixel is defined as where h is a scale parameter. γ ∈ [0, 1] in (7) is a scale parameter that balances the effect from the mean complex matrices as against the weighted superpixel-based mean complex matrix. σ c indicates the width of the RBF kernel. A complete affinity matrix, that is equivalent to a fullyconnected graph, models the global spatial dependency effect between all pairs of superpixels in the image. Then, a semisupervised classification method based on label propagation [26] is performed. First, the superpixels containing labeled pixels are assigned labels. Then, from a conceptual perspective, the semisupervised method propagates the labels from the labeled superpixels to the unlabeled ones.
Assume Z ∈ R N ×T is the initial label information, where T is the number of land cover types. Each row of the matrix Z corresponds to a superpixel: If the superpixel i is labeled j, the element Z ij = 1, otherwise, for all j, Z ij = 0. For all the unlabeled superpixels, Z ij = 0 for all j. The label propagation is performed based on the assumption that the classification matrix F ∈ R N ×T in each iteration is a function of the spatial dependency information between superpixels and the initial label information [26] where F(i) indicates the classification matrix in iteration i, B ∈ R N ×N is a diagonal matrix with its ii-element equal to the sum of the i th row of A, and the parameter α ∈ [0, 1] balances the relative effect from the global spatial dependency information and the initial labeling information. The classification matrix F converges to a closed-form solution F * as follows [26]: where I is the N × N identity matrix and μ is a weight parameter, with α = 1 μ+1 . The final land cover labels of the superpixels are then calculated as f i = arg max j F * ij , in which f i , the label of superpixel i, corresponds to the maximum value of the elements in the row j of the matrix F * .

IV. DATASETS AND RESULTS
In this section, the RCM SLC datasets used in the experiments are described. The preprocessing of the datasets is then explained followed by the experimental setup to evaluate the proposed method and analyze the effects of local as well as global spatial dependency in the land cover classification method.

A. RCM Datasets
Two RCM SLC datasets were used in the experiments. The first dataset is a very high-resolution SLC CP dataset that was acquired over Winnipeg city in Manitoba, Canada, on February 2, 2020. The sampled pixel and line spacing for the dataset are 1.39 and 2.16 m, respectively. The CP scene has a size of 14066 × 9734 pixels. The second dataset is an SLC full QP data acquired over the Québec City in Québec, Canada, on December 29, 2019. The sampled pixel and line spacing for the SLC QP dataset are 3.13 m and 3.31 m, respectively. The size of the QP scene is 8007 × 2935.
To evaluate the performance of the proposed method, two subscenes were acquired from each SLC scene. Figs. 2 and 3 show the Google Earth and intensity images of the CP and QP scenes, respectively. Each of the subscenes from the CP scene, indicated in Fig. 2, consists of four land cover types that are visually identified. The size of Subscene 1 of CP data is 1622 × 1272 pixels and includes low-rise residential (LRR), high-rise residential (HRR), vegetation (VEG), and asphalt (ASP) land cover classes. Subscene 2 of CP data has a size of 1000 × 1200 with LRR, HRR, VEG, and train (TRA) land cover types.
The two subscenes taken from the QP scene, shown in Fig. 3, have 700 × 500 pixels, for Subscene 1, and 500 × 700 pixels, for Subscene 2. Subscene 1 from the QP scene consists of six classes: river (RIV), LRR, HRR, VEG1, VEG2, and shore (SHR). Subscene 2 includes all the classes in Subscene 1 except the class HRR. Visual analysis of the Google Earth images and the SAR intensity images shown in Fig. 3(a) show that two types of "vegetation" were seen in the QP images: VEG1 and VEG2. VEG1 is the label of forested areas where the volume scattering of the incident SAR signal on the forest contributes to high pixel value in the intensity images shown in Fig. 3(b). VEG2 is the label of the agricultural areas in the scene where a notable amount of the SAR signal power scatters away, and therefore, the pixel values are low in these areas in the intensity images.
The total area covered by the datasets can also be calculated from the image size and the pixel and line spacing numbers. The total area covered by CP subscene 1 and subscene 2 are approximately 6.2 and 3.6 km 2 , respectively. Also, the QP subscenes, which have equal size, both cover 3.6 km 2 . Next, the experimental setup including the preprocessing of the SLC datasets, the parameter setting of the proposed method, and the compared methods are described.

B. Experimental Setup
The RCM SLC CP dataset contains two files that represent the complex elements E H and E V in (1). Also, the SLC QP dataset includes three files corresponding to the complex elements of the scattering matrix in (1). The complex CP coherence (C CP ) and the complex QP covariance (C QP ) were, respectively, derived based on (2) and (3) with the number of looks, n l = 1. Box-car averaging with a window size of 5 × 5 was then performed only on the complex CP data. The averaging was found unnecessary for the complex QP data for both segmentation and classification-since the pixel and line spacing for the QP scene are larger than those for the CP scene, to fully preserve the boundaries in segmentation and classification, the averaging was not performed in the case of QP data.
Each of the superpixel segmentation methods PolarIRGS [20] and CP-IRGS [19] produces an image where each pixel is assigned to a particular class with an unknown label. To start these unsupervised methods, the number of segmentation classes needs to be set. In this article, CP-IRGS and PolarIRGS were performed with 10 segmentation classes. The number 10 for segmentation classes was found to perform well for all cases. Previous works [6], [58] have shown that the segmentation results were not particularly sensitive to minor variations of the segmentation input parameters. After the segmentation, each superpixel in the segmentation image, regardless of its segmentation class, is given a unique label. Then, the graph is constructed on the superpixels.
To set the values of the parameters of the proposed LGS method including h (10), σ l (6), σ c (7), γ (7), and μ (12), a coarse-to-fine search method was performed. Using this search method first, in a coarse grid of discretized set of values for each parameter and, then, in a fine grid, provides the parameter values that lead to the desired classification results for any test complex SAR scene. The impact of each parameter value should be analyzed to set the range of discretized values for each parameter in the search method.
The two components of the affinity matrix elements in (5), naming A l ij and A c ij , should have comparable values so that the effect from both spatial and backscatter differences between superpixel pairs is balanced. The parameters σ l and σ c have a significant impact on the values of affinity matrix elements. To incorporate the effect from both spatial and backscatter information, the width of the spatial RBF kernel σ l is set to a large value to balance the large values of l 2 -norm distance between superpixels far apart from each other across the scene. Incorporating the similarity of the superpixels apart from each other helps information to properly propagate across the scene. Also, as shown in previous work [24], with a complex land cover structure, information from neighboring superpixels (here modeled by the weighted complex matrix, C w i ) should be limited by choosing a high value for the parameter γ to incorporate more information from within each superpixel (here modeled by the mean complex matrix, C m i ). In general, the classification results were not considerably sensitive to the values of the parameters γ, h, and μ.
The following parameter values were found after performing the coarse-to-fine search. The parameter h was set to 10. The parameters σ l and σ c were set to 1000 and 1, respectively. The large value of σ l relative to σ c is due to the large values of l 2 -norm distance between superpixels across the image. The parameter γ was set to 0.9 allowing for more impact from C m i (the superpixel itself) than that of C w i (the neighboring superpixels). Finally, the parameter μ was also set to 0.1. All these values were kept the same for all experiments.
To evaluate the performance of the land cover classification method, the user's accuracy values of the classes, the overall accuracy (OA), and the Kappa (κ) coefficient were used. For comparison, these values were also calculated for four other methods: 1) support vector machine (SVM) [59], 2) random forest (RF) [60], 3) superpixel-based SVM (SSVM), and 4) superpixel-based RF (SRF). The latter two methods, SSVM and SRF, exploit the superpixels for the classification. In particular, the mean value of the feature vectors for all pixels in each superpixel was calculated and used for estimating the label of the superpixel. The input to the proposed method is the complex data that were used to extract several features as the input to the compared methods including SVM, RF, SSVM, and SRF. In the case of CP data, all the Stokes-derived features, and in the case of QP data, the features extracted from the QP covariance matrix (C QP ) including the original SAR features (full QP coherency matrix elements), SAR discriminators (SPAN and dependency coefficients), and various decomposition parameters were used in the experiments [6].
For each class, around 50 training pixels and a minimum of 30 test pixels (usually a much higher number) that were independently collected were used for all the classification methods. The number of train and test samples for each land cover type in the QP and CP SAR subscenes is provided in Tables I  and II. For the four compared methods, a hyperparameter tuning  TABLE I  NUMBER OF TRAIN AND TEST PIXELS FOR THE TWO SUBSCENES OF CP SAR  DATA   TABLE II  NUMBER  (increment factor of one for the power of two) or n e ∈ [50,2000] (increment factor of 50) and m d ∈ [1,110] (increment factor of 2) was executed to select the hyperparameter values that provide the highest κ when half of the training pixels were used for training and the remaining half for testing.

C. Results of CP Data
The RH images, superpixel segmentation boundaries overlaid on the RH images, and the classification images of the proposed method along with the SRF and SSVM are presented in Fig. 4. The quantitative results for the CP subscenes are shown in Tables III and IV. In the first CP subscene, the proposed LGS method provides slightly better OA and κ values than the compared methods, as demonstrated in Table III. This is because, as seen in the first subscene (the first row in Fig. 4), the land cover types have distinguishable radar backscatter. Although the SRF and SSVM methods provide higher class accuracy values for HRR and ASP classes than the proposed method, class accuracy values for LRR and VEG classes and higher using the proposed method than those using SRF and SSVM methods. The pixel-based RF and SVM classifiers, which do not exploit the local spatial dependency effect of superpixels, perform poorly in land cover classification.
In the second subscene, the OA and κ values for the proposed method, in this case, are noticeably higher than the compared methods, as given in Table IV. The class TRA has a similar radar backscatter to that of HRR and LRR classes, as given in Fig. 4(f). The proposed method incorporates the global spatial information among superpixels and prevents the misclassifications of LRR and HRR classes to TRA class, which happen in the case of SRF and SSVM methods [see Fig. 4(h)-(j)]. The pixel-based RF, in this case, assigns all the pixels to only VEG and TRA classes failing to correctly classify this subscene.

D. Results of QP Data
In Fig. 5, the results for the two QP subscenes were shown. As shown in the HH images of the two subscenes [see Fig. 5(a) and (f)], due to the similar backscatter values for the classes, the classification task is more challenging in this case than the CP dataset. Also, as mentioned in the dataset description in Section IV-A, the QP dataset has a coarser resolution than that of the CP data. This is the reason for lower accuracy values in the case of the QP dataset. In subscene 1 of the QP data, shown in the first row of Fig. 5, the proposed method performed much better than the methods SRF and SSVM, which is also supported by the accuracy values in Table V. There are many misclassified areas using compared methods. These areas include the misclassification of vegetation to the river across the scene in the case of the SRF method [see In the second QP subscene, the SRF method provides classification results as accurate as the proposed method. The other three methods including the SSVM, RF, and SVM were unable to perform the classification accurately where each of these methods misclassifies a whole class (see the user's accuracy values in Table VI). Finally, the standard deviation of the estimated Kappa values [61] were also calculated and presented in the classification quantitative results. For both the CP and QP subscenes, the standard deviation values were close in the five compared methods with the proposed method having the smallest value (more reliability of the estimated Kappa) in all the cases as compared to the other classification methods. Next, the analyses of the variation of OA values as functions of the number of training pixels and the number of superpixels are provided.

V. DISCUSSION
In the previous section, a description of the experiment setup and classification results to assess the performance of the proposed semisupervised superpixel-based classification method was provided. Here, the proposed LGS method is discussed from different aspects: the data used in the experiments, the type of SAR data as the input to the LGS method, the number of graph nodes, and the number of training pixels. In the literature, previous research was conducted mostly based on using simulating CP from QP data in the experiments and analysis [4], [6], [11], [12]; however, here the effort was to use real CP   data from the RCM mission. This was performed using two pairs of RCM complex SAR subscenes. These scenes were captured from the rural areas and it would be an interesting future study to apply graph-based approaches for SAR data acquired from other applications such as sea-ice classification and concentration.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   An important characteristic of the proposed method is that it employs the complex CP coherence and QP covariance matrices, and in this manner, it preserves the phase information inherent in the complex SAR data. In comparison, the main body of previous studies [3], [4], [62], [63] have used the intensity images and the child parameters derived from CP in their experiments. From the perspective of using superpixels (instead of pixels), it was shown in the previous section that the poor performance of the pixel-based SVM and RF methods noticeably improves using superpixels. Also, superpixels make the application of the proposed graph-based method feasible since it is not effective in terms of computation time to construct a graph with the pixels as the graph nodes because the number of superpixels is much less than the number of pixels.
An important parameter that can reduce the exponentially increasing computational cost of the proposed method with larger graphs is to choose a higher value for the parameter β, the weight of the spatial context model, which controls the smoothness of the segmentation results. By setting a large value of β in (4), the PolarIRGS and CP-IRGS segmentation methods provide a smoother segmentation image with fewer superpixels. The sensitivity of OA as a function of the number of superpixels was shown in Fig. 6. In this experiment, the β values were reduced gradually allowing for more superpixels in the segmentation image. For all the cases, after an increase in OA accuracy, the values of OA remain relatively steady. This shows that it is unnecessary to construct the graph using a highly oversegmented image as with a lower number of superpixels the algorithm executes much more quickly. Using the spatial resolution and size of the scenes, as well as the number of superpixels, the average ground area per superpixel can be calculated. For the number of superpixels associated with the highest OA accuracy value for each subscene from Fig. 6, the average ground area per superpixel was approximately 3191 and 3486 m 2 for the CP and QP subscenes, respectively.
The CP and QP scenes were acquired from different areas with different land cover types. Although some land cover types are similar in both datasets (e.g., residential and vegetation), depending on the nature of the land cover types that are different from CP to QP scenes (e.g., asphalt and train in the CP scenes and river and shore in the QP scenes), the classification performance varies for the similar land cover types from CP to QP scenes. For instance, HRR class in the CP scene has a similar SAR backscatter to TRA. However, in the QP scenes, HRR has a similar backscatter to SHR and RIV classes, and therefore, the classification of the class HRR is more challenging in the QP scene than in the CP scene. This makes it difficult to draw any conclusion about comparing classification performance for the class HRR between the CP and QP scenes. As another example, the misclassification of the VEG class is due to the presence of the class ASP in the CP scene; however, there are two types of VEG in the QP scene and they have a similar appearance to the class RIV. Thus, the classifier is more challenged when it is applied to the QP scene than the CP scene.
Furthermore, CP and QP scenes have different resolution values; as discussed before, the CP scene has a better resolution than the QP one. This as well can make it difficult to draw a fair conclusion between these two modes (CP and QP) using the current SAR datasets. Future work should investigate the performance of these modes using a pair of CP and QP scenes that are acquired from the same area-and preferably with the same resolution.
The number of graph nodes increases with a larger scene. Although the computation time still increases as the number of superpixels becomes larger for large scenes, it is not a big concern due to the following two reasons: 1) the calculation of the affinity matrix elements needs to be done only once (not an iterative process) for the whole classification process, 2) the number of superpixels in an image does not depend on the size of the image but mainly depends on the homogeneity of the image, and as such a large homogeneous image will not necessarily have an exponentially-increased number of superpixels.
Another aspect that is discussed here is to evaluate the performance of the LGS method using a very low number of training pixels. This is important because collecting labeled pixels is not only costly but can also be inaccurate. To assess the performance of the proposed method as compared to the other methods with small numbers of training pixels, the classification OA values were calculated by choosing the number of training pixels from the set {2, 5, 10, 20, 30, 40, 50}, and plotted, as presented in Fig. 7. As expected, in general, the OA values of all the methods increase with more number of training pixels. The proposed method in all datasets provides consistently higher classification rates compared to the other methods. Also, the accuracy values of the RF method are mostly higher than those of the SVM method comparing either pixel-based or superpixel-based versions of these two classifiers. In all cases, with only five training pixels per class, the proposed method provides accuracy of ∼ 80% where the other methods typically need a much higher number of training pixels to obtain accuracy as high as 80%.
In comparison with the CNN-based methods, although the proposed method can have the disadvantage of relying on superpixels (erroneous superpixels may lead to wrong boundaries) and the need for having labeled pixels in the scene that is being classified, it provides a simple closed-form solution and incorporates spatial information globally. Future studies involve developing different CNN architectures that are defined on a graph (an example study is a work by [64]) and applied to complex SAR data input based on their unique statistical properties. Such studies could be suitable compared to the proposed LGS method.

VI. CONCLUSION
In this article, a unified complex QP and CP SAR land cover classification method is proposed. The proposed method incorporates spatial dependency information on both global and local scales. The local spatial information is obtained by using the superpixels generated by the PolarIRGS and CP-IRGS segmentation methods. A semisupervised graph-based learning method models the spatial dependency among the superpixels in a global manner. Two RCM SLC datasets were used to evaluate the performance of the proposed method.
The effect of local spatial dependency was demonstrated by comparing the results of the SVM and RF classifiers in pixelbased (SVM and RF) and superpixel-based methods (SSVM and SRF), where much higher classification accuracy values were obtained using superpixel-based methods. The effect of global spatial dependency was also shown by comparing the results of the proposed graph-based method and those of the superpixel-based SVM and RF classifiers, which do not take advantage of the global-scale spatial information. The results show that by using global spatial dependency, the proposed method prevents the misclassifications that happen in the case of superpixel-based SVM and RF methods. Another important advantage of the proposed method is that, with a very limited number of training pixels, the method provides highly-accurate classification images.