A fast and Accurate Similarity-constrained Subspace Clustering Framework for Unsupervised Hyperspectral Image Classification

Accurate land cover segmentation of spectral images is challenging and has drawn widespread attention in remote sensing due to its inherent complexity. Although significant efforts have been made for developing a variety of methods, most of them rely on supervised strategies. Subspace clustering methods, such as Sparse Subspace Clustering (SSC), have become a popular tool for unsupervised learning due to their high performance. However, the computational complexity of SSC methods prevents their use on large spectral remotely sensed datasets. Furthermore, since SSC ignores the spatial information in the spectral images, its discrimination capability is limited, hampering the clustering results' spatial homogeneity. To address these two relevant issues, in this paper, we propose a fast algorithm that obtains a sparse representation coefficient matrix by first selecting a small set of pixels that best represent their neighborhood. Then, it performs spatial filtering to enforce the connectivity of neighboring pixels and uses fast spectral clustering to get the final segmentation. Extensive simulations with our method demonstrate its effectiveness in land cover segmentation, obtaining remarkable high clustering performance compared with state-of-the-art SSC-based algorithms and even novel unsupervised-deep-learning-based methods. Besides, the proposed method is up to three orders of magnitude faster than SSC when clustering more than $2 \times 10^4$ spectral pixels.


I. INTRODUCTION
Spectral remote sensing systems acquire information of the Earth's surface by sensing a large amount of spatial data at different electromagnetic radiation frequencies. Spectral images (SI) are commonly regarded as three dimensional datasets or data cubes with two dimensions in the spatial domain (x, y) and one in the spectral domain (λ) [1]. Based on the acquired spectral/spatial resolution, spectral imaging sensors can be categorized in Hyperspectral (HS) and Multispectral (MS). Typically, HS devices capture hundreds of spectral bands of the scene, however, their spatial resolution is often lower compared to that obtained with a MS sensor, which has a low spectral resolution [2].
As shown in Fig. 1, every spatial location in a spectral image is represented by a vector whose values correspond to Carlos Hinojosa and Henry Arguello are with the Department of System Engineering and Informatics, Universidad Industrial de Santander, Bucaramanga, Colombia (e-mail: carlos.hinojosa@saber.uis.edu.co; henarfu@uis.edu.co).
Manuscript received April 19, 2005; revised August 26, 2015. the intensity at different spectral bands. These vectors are also known as the spectral signature of the pixels or spectral pixels. Since different materials usually reflect electromagnetic energy differently at specific wavelengths [1], the information provided by the spectral signatures allows distinguishing different physical materials and objects within an image. In remote sensing, the classification of spectral images is also referred to as land cover segmentation or mapping and it is an important computer vision task for many practical applications, such as precision agriculture [3], vegetation classification [4], monitoring and management of the environment [5], [6], as well as security and defense issues [7]. Accurate land cover segmentation is challenging due to the high-dimensional feature space and it has drawn widespread attention in remote sensing [8], [9]. In the past decade, significant efforts have been made in the development of numerous SI classification methods, however, most of them rely on supervised approaches [10], [11]. More recently, with the blooming of deep learning techniques for big data analysis, several deep neural networks have been developed to extract high-level features of SIs achieving state-of-the-art supervised classification performance [12]. However, the success of such deep learning approaches hinges on a large amount of labeled data, which is not always available and often prohibitively expensive to acquire. As a result, the computer vision community is currently focused on developing unsupervised methods that can adapt to new conditions without requiring a massive Fig. 2. Clustering accuracy (left) and running time (right) of the SSC algorithm compared with the proposed method for land cover segmentation. In this example, we performed the two subspace clustering algorithms on the full image and two regions of interest (ROIs) of the Indian Pines dataset (See Section IV). The first ROI has N = 4900 pixels and k = 4 classes; the second has N = 10000 pixels and k = 12 classes; the whole Indian Pines image has N = 21025 pixels and k = 17 classes. amount of data. [13].
Most successful unsupervised learning methods exploit the fact that high dimensional datasets can be well approximated by a union of low-dimensional subspaces. Under this assumption, the sparse subspace clustering (SSC) algorithm captures the relationship among all data points by exploiting the selfexpressiveness property [14]. This property states that each data point in a union of subspaces can be written as a linear combination of other points from its own subspace. Then, the set of solutions is restricted to be sparse by minimizing the 1 norm. Finally, an affinity matrix is built using the obtained sparse coefficients, and the normalized spectral clustering algorithm [15] is applied to achieve the final segmentation.
Assuming that spectral pixels with a similar spectrum approximately belong to the same low-dimensional structure, the SSC algorithm can be successfully applied for land cover segmentation. [16], [17], [18], [19], [20]. Despite the great success of SSC in land cover segmentation, two main problems have been identified: (1) The overall computational complexity of SSC prohibits its usage on large spectral remote sensing datasets. For instance, given a SI with N r rows, N c columns, and L spectral bands, SSC needs to compute the N ×N sparse coefficient matrix corresponding to N = N r N c spectral pixels, whose computational complexity is O(LN 3 ). Moreover, after building the affinity matrix, spectral clustering performs an eigenvalue decomposition over the N × N graph Laplacian matrix which also has cubic time complexity, or quadratic using approximation algorithms [21] (see Fig. 2 right). (2) Under the context of SI, the SSC model only captures the relationship of pixels by analyzing the spectral features without considering the spatial information. Indeed, the sparse coefficient matrix is piecewise smooth since spectral pixels belonging to the same land cover material are arranged in a common region; hence there is a spatial relationship between the representation coefficient vector of one pixel and its neighbors.
Paper contribution. This paper proposes a fast and accurate similarity-constrained subspace clustering algorithm to enhance both the clustering accuracy and execution time when performing land cover segmentation. Specifically, our main contributions are as below 1) We propose to first group similar spatial neighboring pixels in subsets using a "superpixels" technique [22], [23]. Then, instead of expressing each pixel as a linear combination of all pixels in the dataset, we constrain each pixel to be solely represented as a linear combination of other pixels in the same subset. Therefore, the obtained sparse coefficient matrix encodes information about similarities between the most representative pixels of each subset and the whole dataset. In this paper, we present an efficient algorithm for selecting the most representative pixels of each subset by minimizing the maximum representation cost of the data. 2) Our second contribution is the enhancement of the obtained sparse coefficient matrix via 2D smoothing convolution before applying a fast spectral clustering algorithm that significantly reduces the computational cost. Specifically, the proposed method enforces the connectivity in the affinity matrix and then efficiently obtains spectral embedding without the need to compute the eigenvalue decomposition that has a computational complexity of O(N 3 ) in general. Increasing the number of data points and the classes enlarges the computation time and make clustering more challenging. The proposed method, shown with the blue line in the in Fig. 2, can be up to three orders of magnitude faster than SSC and outperforms it in terms of accuracy when clustering more than 2 × 10 4 spectral pixels. This paper evaluates and compares our approach on three real remote sensing spectral images with different imaging environments and spectralspatial resolution.

II. RELATED WORKS
In the literature, the scalability issue of SSC and its ability to perform land cover segmentation on spectral images have been studied separately. In this section, we review some related works from these two points of view. Considering a given collection of N data points X = {x 1 , · · · , x N } that lie in the union of k linear subspaces of R D , SSC expresses each data point x j as a linear combination of all other points in X, i.e., x j = i =j c ij x i , where c ij is nonzero only if x i and x j are from the same subspace, for (i, j) ∈ {1, · · · , N }. Such representations {c ij } are called subspace-preserving. In general, assuming that c j is sparse, SSC solves the following optimization problem where τ > 0 and c j = [c 1j , · · · , c N j ] T encodes information about membership of x j to the subspaces. Subsequently, an affinity matrix between any pair of points x i and x j is defined as A ij = |c ij | + |c ji | and it is used in a spectral clustering framework to infer the clustering of the data [15], [14]. Although the representation produced by SSC is guaranteed to be subspace preserving, the affinity matrix may lack connectedness [24], i.e., the data points from the same subspace may not form a connected component of the affinity graph due to the sparseness of the connections, which may cause over-segmentation.

A. Fast and Scalable Subspace Clustering Methods
Taking into account the self-expressiveness property, an early approach to address the SSC scalability issue assumes that a small number of data points can represent the whole dataset without loss of information. Then, authors in [25] proposed the Scalable Sparse Subspace Clustering (SSSC) algorithm to cluster a small subset of the original data and then classify the rest of the data based on the learned groups. However this strategy is suboptimal since it sacrifices clustering accuracy for computational efficiency.
In [26], authors replace the 1 optimization in the original SSC algorithm [14] with greedy pursuit, e.g., orthogonal matching pursuit (OMP) [27], for sparse self-representation [28]. While SSC-OMP improves the time efficiency of SSC by several orders of magnitude, it significantly loses clustering accuracy [29]. Besides, SSC-OMP also suffers from the connectivity issue presented in the original SSC algorithm. To solve this issue, authors in [30] proposed to mixture the 1 and 2 norms to take advantage of subspace preserving of the 1 norm and the dense connectivity of the 2 norm. Specifically, this algorithm, named ORacle Guided Elastic Net solver (ORGEN), proposed to identify a support set for each sample. However, in this approach, a convex optimization problem is solved several times for each sample which limits the scalability of the algorithm.
More recent works [31], [32], [33] use a different subset selection method for subspace clustering. In particular, the method named Scalable and Robust SSC (SR-SSC) [33] selects a few sets of anchor points using a randomized hierarchical clustering method. Then, within each set of anchor points, it solves the LASSO [34] problem for each data point, allowing only anchor points to have non-zero weights. However, this method does not demonstrate that their selected points are representative of the subspaces.
Similar to the SSC-OMP paper, authors in [35] proposed an approximation algorithm [36] to solve the optimization problem in Eq. (1). Specifically, instead of using all the dataset X, the Exemplar-based Subspace Clustering (ESC) algorithm in [35] selects a small subsetX ⊆ X that represents all data points, and then each point is expressed as a linear combination of points inX ∈ R D×M , where M < N . In particular, the selection ofX is obtained by using the Farthest first search (FFS) algorithm, which is a modified version of the Farthest-First Traversal (FarFT) algorithm [36]. Indeed, the main difference between FarFT and FFS is the used distance metric. Explicitly, FarFT uses the Euclidean distance, while FFS uses a custom metric, derived from Eq. 1, that geometrically measures how well a data point x j ∈ X is covered by a subsetX. The authors propose to constructX by first performing random sampling to select a base point and then progressively add new representative data points using the defined metric. However, a careful selection of the search space and the first selected data point could speed up the unsupervised learning process. The complete algorithm proposed in [35] is known as ESC-FFS, and we compare it against our proposed method in Section IV.
In general, the previously described algorithms provide an acceptable subspace clustering performance on large-scale datasets. However, these general-purpose methods do not fully exploit the complex structure of remotely sensed spectral images, ignoring the rich spatial information of the spectral images, which could boost the accuracy of these algorithms.

B. SSC-based Methods for Land Cover Segmentation
Some SSC-based methods have been proposed for land cover segmentation, which take advantage of the neighboring spatial information but still present the scalability issue of SSC. Under the context of SIs, the N r × N c × L 3D image data cube can be rearranged into a 2D matrix X ∈ R D×N to apply the SSC algorithm, where N = N r N c and D < L is the number of features extracted from the spectral signatures after applying principal component analysis (PCA) [10]. Taking into account that the spectral pixels belonging to the same land cover material are arranged in common regions, different works [18], [19], [37], [38], [17], [16], [39] aim at obtaining a piecewise smooth sparse coefficient matrix to incorporate such contextual dependence. In particular S-SSC [18] helps to guarantee spatial smoothness and reduce the representation bias by adding a regularization term in the SSC optimization problem which enforces a local averaging constraint on the sparse coefficient matrix. More recently, authors in [39], propose the 3DS-SSC algorithm which incorporate a 3D Gaussian filter in the optimization problem to perform a 3D convolution on the sparse coefficients, obtaining a piecewisesmooth representation matrix.

III. FAST AND ACCURATE SIMILARITY-CONSTRAINED SUBSPACE CLUSTERING (SC-SSC)
This section presents a subspace clustering algorithm for land cover segmentation that incorporates both properties: it can better handle large-scale datasets and takes advantage of the neighboring spatial information of SIs to boost the clustering accuracy. The complete workflow of the proposed method is shown in Fig. 3. In general, we exploit the selfrepresentation property within subsets of neighboring similar pixels to select the most representative data points of the whole spectral image. Then, we enhance the sparse representation and perform fast spectral clustering to obtain the segmentation result.

A. Similarity-constrained Most Representative Spectral Pixels Selection
As neighboring spatial pixels commonly belong to the same land cover material, the proposed method aims to select a small subset of pixels that best represent their neighborhood. In this regard, we start by obtaining a segmentation map of the overall SI using a superpixels algorithm, which commonly expects a three-band image as input. Therefore, we first perform PCA to retrieve the three principal components of X, and form the matrix X P CA ∈ R 3×N . Then, we use the SLIC algorithm [40] to obtain a segmentation mapm ∈ R N from X P CA , such thatm j ∈ {1, · · · , E}, where E is the number of segments. For instance, ifm j = e means that the pixel x j belongs to In the first stage, we apply PCA to obtain the three principal components of the spectral image. Then, we segment the image in different subsets using a superpixel algorithm. Note that we only use PCA to extract spatial similarities, but we perform the following procedures on the spectral pixels, as we depicted with the circular flow symbols. In the second stage the most representative spectral pixels from each subset are obtained by solving Eq. (2) via Algorithm 1, and then all the representative spectral pixels from each subset are stacked as column in the matrixX. In the third stage, the {c j } vectors are obtained solving Eq. (5). Finally, we reshape each row of the matrix C = [c 1 , · · · , c N ], perform a 2D convolution with a Ks × Ks kernel, and reshape back the result to obtain a piecewise-smooth coefficient matrix. We obtain the final data segmentation via fast spectral clustering, as described in Section III-B. The computational complexity of the overall algorithm is O(ρ 2 N 3 ), as analyzed on Section III-C2.
the segment e. Note that PCA is only performed to obtaiñ m from X P CA via SLIC; then, we usem to select the most representative spectral pixels x j from X within each segment e.
Let p e ∈ R Ne be the vector containing the indices of the N e most similar spectral pixels belonging to the subset e. We are interested in selecting the M e = ρN e most representative pixels from each subset, where ρ ∈ (0, 1). Taking advantage of the self-expressiveness property, the selection of the pixels within each neighborhood e is obtained by searching for a subset X * e ⊆ X that minimizes where F τ is the self-representation cost function defined as The metric function f τ (x j , X e ) geometrically measures how well a data point x j ∈ X : j ∈ p e can be represented by the subset X e , and we define it as where τ ∈ (1, ∞) is a parameter. Note that with Eq. (3), we constrain Eq. 2 to search only for pixels x j within the subset e, using the vector p e . To efficiently solve Eq. (2) for each subset e, we use the approximation algorithm described in Algorithm 1. Note that, instead of using a random initialization, we select the centroid spectral pixelx e as the initialization data point since it is the most similar point, in the Euclidean distance, to all other data points in e. The search space constraint-given by dividing the SI into subsets-in conjunction with selecting the Algorithm 1: Similarity-constrained spectral pixels selection Input : Data X ∈ R D×N , Indices vector p e ∈ R Ne , Parameters 0 < ρ < 1, and τ > 1.
(pe) k gets the k element of the vector pe.
e ) for k = 1, · · · , Ne, and j = (pe) k . Let o 1 , · · · , o Ne be an ordering of 1, · · · , Ne such that bo p ≥ bo q when p < q. 8 Initialize max_cost = 0. B. Enhancing the sparse representation coefficients for fast spectral clustering Once the most representative spectral pixels from each subset are obtained, we build the matrixX by stacking the results as columns, i.e.,X = [X 1 , · · · , X E ]. Then, the sparse Algorithm 2: SC-SSC for land cover segmentation Input : The spectral image in matrix form X ∈ R D×N , parameters τ > 1, 0 < ρ < 1, E > 1, K s > 1. Output: The segmentation of X.
where 1 is an all-ones matrix of size Ks 15 Run k-means clustering algorithm on the top k right singular vectors ofCD −1/2 to obtain the segmentation of X. 16 return The cluster assignments of X coefficient matrix C of size M × N , with M = ρN , can be obtained by solving the following optimization problem, similar to Eq. 4, Note that C encodes information about the similarities be-tweenX and X. Besides, each row of C contains the representation coefficients distribution of the whole image with respect to a single representative pixel. Taking into account that spectral pixels belonging to the same land cover material should be regionally distributed in the image, i.e., two spatially neighboring pixels in a SI usually have a high probability of belonging to the same class. Then, according to the self-expressiveness property, their representation coefficients should also be very close concerning the same sparse basis; hence, each row of C should be piecewise-smooth. Therefore, an intuitive approach to include the spatial information to boost the clustering performance is to apply a 2D smoothing convolution on the sparse coefficients C. Given a blur kernel matrix I Ks of size K s ×K s , we will denote the 2D convolution process asĈ = G(C, I Ks ). Specifically, as depicted in Fig. 3 within dashed blue line, we propose to perform G by first reshaping each row of C to a window of size N r × N c , which corresponds to the spatial dimensions of the SI, and then conducting the convolution with I Ks . Finally, the convolution result is rearranged back as a row vector of the piecewisesmooth coefficient matrixĈ = [ĉ 1 , · · · ,ĉ N ] ∈ R M ×N . SinceĈ is not square, it is not feasible to directly build the affinity matrix A be used with spectral clustering as in SSC [15], [14]. To resolve this issue we use a fast spectral clustering approach to efficiently obtain the spectral embedding of the input data. Specifically, let us consider the columns ofC = [c 1 , · · · ,c N ] ∈ R M ×N , wherec j = |ĉ j |/ ĉ j 2 , and compute the i-th element of the degree matrix D as follows where α = N j=1c j ∈ R M . Next, we can find the eigenvalue decomposition of D −1/2 AD −1/2 by computing the singular value decomposition [41] ofCD −1/2 ∈ R M ×N . Finally, the segmentation of the data can be obtained by running the k-means algorithm on the top k right singular vectors for CD −1/2 = UΣP T . As a result, the computational complexity of spectral clustering in our framework is linear with respect to the size of the data N , which makes it suitable for largescale datasets. The proposed SC-SSC method is summarized in Algorithm 2, and its computational complexity is analyzed on Section III-C2.

C. Analysis of the Proposed Method
We now analyze how the proposed method optimizes the sparsity (subspace-preserving property) and the connectivity in the representation coefficient matrix. Furthermore, we analyze the computational complexity of Algorithm 2.
1) Subspace-preserving Property and Connectivity: As mentioned in Section II, one of the main requirements for the success of subspace clustering methods is that the optimization process recovers a subspace-preserving solution. Specifically, the non-zero entries of the sparse representation vector c j should be related only to the intra-subspace samples of x j . Indeed, as the following definition states, the representation coefficients among intra-subspace data points are always larger than those among inter-cluster points.

Definition 1 (Intra-subspace projection dominance, IPD [42])
The IPD property of a coefficient matrix C indicates that for all x u , x v ∈ S and x q / ∈ S, where u, v, q ∈ {1, · · · , N }, and S is a subspace of X, we have C uv ≥ C uq .
Since the proposed method selects the most representative spectral pixels for each subset e based on the selfrepresentation property, it is expected that each subset is subspace-preserving, i.e., c ij is nonzero only if x i and x j , for i, j ∈ p e , belong to the same subspace S. Furthermore, note that it is very probable that a subset e has more spectral pixels from the same class due to the spatial dependence in SI; then, the resulting coefficients vector will have large values for those spectral pixels within e. Therefore, the strategy adopted in the proposed method will improve the structure of the vectors c j obtained by Eq. (5) and will improve the probability that c j satisfies the IPD.
Besides, using the 2D smoothing convolution procedure G(C, I Ks ), the proposed method improves the connectivity of the data points by preserving the most significant values in the coefficient matrix C and reducing the small or noisy isolated values, based on the IPD property [42]. Then, the resulting matrixĈ will have localized neighborhoods in the sparse codes making the representation coefficients of spatially neighboring pixels very close as well, following our main assumption in section III-B.
2) Computational Complexity Analysis: As shown in Fig.  3, the proposed method mainly involves four stages: the extraction of spatial similarities, the selection of similarityconstrained representative spectral pixels, the sparse coefficient matrix estimation by solving Eq. (5), and enhancing the representation coefficients for fast spectral clustering. Given a spectral image in matrix form X ∈ R D×N and E subsets X e ⊆ X of dimensions D × M e , with M e = ρN e , we will show the complexity of each stage before establishing the total complexity of Algorithm 2. Specifically, in the first stage, we acquire the segmentation mapm for an SI. Such procedure involves computing PCA over X to retrieve only the three principal components, which takes O(N ), and performing SLIC superpixels [40] which also has linear time complexity O(N ). The second stage requires to execute Algorithm 1, which has O(ρN 2 e ) time complexity over E subsets, then the overall complexity of this stage will be O(ρ max(N 2 1 , · · · , N 2 E )). The third stage entails solving Eq. (5) which is a LASSO problem that can be efficiently computed in O(M 2 N ) using the LARS algorithm [43]. Finally, in the last stage, the 2D convolution takes O(N ) as K s N and, since for the spectral clustering we only need the k largest singular values, we can use the truncated singular value decomposition (SVD), which takes O(k 2 N ). Thus, the overall complexity of this stage is O(k 2 N ). Therefore, the complexity of Algorithm 2 will be dominated by the complexity of the third stage, hence it will run in O(M 2 N ) = O(ρ 2 N 3 ), where ρ ∈ (0, 1).

IV. EXPERIMENTAL EVALUATION
In this section we show the performance of SC-SSC 1 for land cover segmentation. The sparse optimization problem in Eq. (4), and Eq. (5) are solved by the LASSO version of the LARS algorithm [43] implemented in the SPAMS package [44]. All the experiments were run on an Intel Core i7 9750H CPU (2.60GHz, 6 cores), with 32 GB of RAM.

A. Setup
Databases. The proposed subspace clustering approach (SC-SSC) was tested on three well-known hyperspectral images 2 with different imaging environments, see Fig. 4. The Indian Pines hyperspectral data set has 145 × 145 pixels and 200 spectral bands in the range of 0.4 − 2.5µm. The second scene, Salinas, has 512 × 217 pixels and 204 spectral bands in the range of 0.24 − 2.40µm. The third scene, University of Pavia, comprises 610 × 340 pixels, and has 103 spectral bands with spectral coverage ranging from 0.43 − 0.84µm. In order to make a fair comparison with non-scalable methods, we select, for each image, the most frequently used region of interest (ROI) in spectral image clustering, as shown in Fig. 4. The Indian Pines ROI has a size of 70 × 70 pixels, which includes four main land-cover classes: corn-no-till, grass, soybeans-no-till, and soybeans-min-till. The Salinas ROI comprises 83 × 83 pixels and includes six classes: brocoli-1, corn-senesced, lettuce-4wk, lettuce-5wk, lettuce-6wk, and lettuce-7wk. Finally, the University of Pavia ROI is composed of 200 × 200 pixels, and includes all the classes (nine) as in the full image: asphalt, meadows, gravel, trees, metal sheets, bare soil, bitumen, bricks, and shadows. For all experiments (including the baseline methods), we reduce the spectral dimensions of each image using PCA to D = 0.25L, where L is the number of spectral bands. Then, we rearrange the data cube to form a matrix X ∈ R D×N , and normalize the columns (spectral pixels) to have unit 2 norm.
Baselines and Evaluation Metrics. We compare our approach with the SSC-based methods highlighted in Section II-A: SSC [14], SSSC [25], SSC-OMP [26], ORGEN [30], SR-SSC [33], ESC-FFS [35], S-SSC [18], and 3DS-SSC [39]. We also show the results with SSC as an additional reference. To make a fair comparison, we compare the performance of the proposed SC-SSC algorithm with the non-scalable methods (SSC, S-SSC, 3DS-SSC, and ORGEN) on the ROIs of the remote sensing images shown in Fig. 4. Then, we compare the performance of the SC-SSC with the scalable methods (SSC-OMP, ESC-FFS, SR-SSC, and SSSC) on the full hyperspectral images. For the sake of completeness, we also compare our approach with non-SSC-based methods that use fast spectral clustering [45], [46], [47]. Specifically, we gather the clustering results on the Indian Pines image reported on such works and compared them against the proposed method. To compare the clustering performance of our model, we rely on five standard metrics: user's accuracy (UA), average accuracy (AA), overall accuracy (OA), Kappa coefficient, and normalized mutual information (NMI) [48], [49]. In particular, UA, AA, OA, and Kappa coefficient can be obtained by means of an error matrix (a.k.a confusion matrix) [48]. UA represents the clustering accuracy of each class, while AA is the mean of UA, and OA is computed by dividing the total number of correctly classified pixels by the total number of reference pixels. UA, AA, and OA values are presented in percentage, while Kappa coefficients and NMI values range from 0 (poor clustering) to 1 (perfect clustering). We also  compare the methods in terms of clustering time, and the results are presented in seconds.

B. Parameters analysis and tuning
The parameters ρ, E, and K s of Algorithm 2 were manually adjusted for each dataset. We conduct different experiments varying each parameter, with the others fixed, to obtain the best overall accuracy with each spectral image. During simulations, we observe that the parameter ρ has a direct impact on the execution time of the proposed method. Figure 5 presents the running time of SC-SSC for all the databases. As shown, increasing ρ directly increases the running time; however, the most significant increment in time is given by the number of spectral pixels N , as observed with the differences in time between the curves. As we analyze in Section III-C2, this behavior is expected since the computational complexity of the algorithm is O(ρ 2 N 3 ).
To find the best configuration, the parameters were varied between the following values: τ ∈ {5, 10, 15, 20}, ρ ∈ {0.2, 0.25, 0.3, 0.35}, E ∈ {100, 300, 500, 700, 900, 1100, 1300, 1500, 1700, 1900}, K s ∈ {3, 5, 8, 16}. Figure 6 shows the performance of the proposed method with a different combination of the parameters for all the databases, where the overall accuracy is shown between 0 and 1, and the parameter τ was fixed. Given the results of the different combinations of the parameters, we selected the best ones and summarized them in Table I. By analyzing Figure 6, we observe that the precision changes with different values of ρ and K s . The parameter ρ determines the number of the selected mostrepresentative data points within each of the E segments, and K s is the kernel size used in the 2D convolution. Then, we can conclude that an adequate balance between the selection of the number of representative pixels and the inclusion of spatial information in the spectral clustering algorithm is crucial to obtain the best performance.

SSC-OMP [26]
K=40, thr=1e-07 K=30, thr=0.001 K=10, thr=1e-06 In order to make a fair comparison, we also performed several simulations with the baseline methods to manually select the best parameters in their configurations. Table II presents the selected parameters for each method after running the experiments. We present the same parameters symbols used in the original works. Note that, for the SSC, S-SSC, and 3DS-SSC algorithms, we select the same parameters reported in [16] and [39], since they are already optimal for spectral image clustering.

C. Ablation Studies
We conduct six ablation experiments to investigate different configurations for the proposed subspace clustering approach. Specifically, we compare the proposed workflow's performance in Fig. 3 when incorporating/excluding PCA, superpixels, and the 2D convolution. Table III present the results obtained from the different combinations in terms of OA and NMI for the three tested images. We observed that using superpixels to extract spatial similarities improves the clustering performance for the three tested images in all the cases, which evidence the importance of the neighboring spatial information in our workflow. Also, using superpixels and the 2D convolution (Experiment II) leads to the secondbest result, while only using 2D convolution (Experiment III) does not lead to a significant clustering improvement. Finally, Experiment VI corresponds to our proposed approach where we show that we achieve the best results in terms of OA and NMI when using the three operations as described in the workflow in Fig. 3.

D. Visual and Quantitative Results
1) Comparison with non-scalable methods: Figure 7 presents the obtained land cover maps on the Indian Pines, Salinas, and University of Pavia ROIs, where we compare the performance of our SC-SSC method with the non-scalable  Table IV, in which the best results are shown in bold and the second-best is underlined. From Table IV, it can be clearly observed that, in general, the proposed SC-SSC method performs better than others. Specifically, SC-SSC achieves an OA of 93.14% and 99.42%, in only 1.63 and 2.06 seconds, for the Indian Pines and Salinas dataset, respectively, which are remarkable results for unsupervised learning settings. Similarly, for the University of Pavia ROIs, it is observed from Table IV that the proposed SC-SSC achieves the best clustering performance in all the accuracy evaluation metrics, among all the other algorithms.
2) Comparison with scalable methods: We now compare the performance of SC-SSC with the scalable approaches: SSC-OMP, SSSC, ESC-FFS, and SR-SSC. Figure 8 and Table  VI present the visual and quantitative results respectively on the full spectral images. From both, qualitative and quantitative results, we observed that the proposed SC-SSC method outperforms the other approaches in terms of OA, Kappa and NMI score. Note that, although the proposed method is not the fastest one, it provides high clustering performance in a shorter amount of time in comparison with other methods.
3) Comparison with unsupervised deep-learning-based methods: For the sake of completeness, we compare the proposed SC-SSC method with unsupervised deep-learningbased methods based on autoencoders (AE) for spectral image clustering. Three of them were proposed in [50] (VAE, AE-GRU, and AE-LSTM), and the 3D-CAE method was proposed in [51] which is based on a 3D convolutional AE. Note that we only compare our method with totally unsupervised deep learning approaches to make a fair comparison. Table V shows the quantitative results in terms of the NMI score. In the table, the best result is shown in bold font, and the secondbest is underlined. As observed, our method obtains an NMI  Fig. 7. Land cover maps of (first row) Indian Pines ROI, (second row) Salinas ROI, and (last row) University of Pavia ROI. The proposed method is compared with the methods that perform best on these spectral images.

V. CONCLUSION
In this work, we presented a new subspace clustering algorithm for land cover segmentation which can handle largescale datasets and take advantage of spectral images' neighboring spatial information to boost the clustering accuracy. Our method considers the spatial similarity among spectral pixels to select the most representative ones, such that all other neighboring points can be well-represented by those representative pixels in terms of a sparse representation cost. Then, the obtained sparse coefficients matrix is enhanced by performing filtering on the coefficients, and a fast spectral clustering algorithm gives the segmentation. Through simulations using traditional test spectral images, we demonstrated the effectiveness of our method for fast land cover segmentation, obtaining remarkable high clustering performance when compared with state-of-the-art SSC algorithms and even novel unsupervised-deep-learning-based methods.