Locally Homogeneous Covariance Matrix Representation for Hyperspectral Image Classiﬁcation

—Combining spectral and spatial information has been proven to be an effective way for hyperspectral image (HSI) classi-ﬁcation. However, making full use of spectral–spatial information of HSI still remains an open problem, especially when only a small number of labeled samples are available. In this article, a new spectral–spatial feature extraction method called locally homogeneouscovariancematrixrepresentation(CMR)isproposedforthefusionofspectralandspatialinformation.Specially,tomake useofneighborhoodhomogeneityoflandcovers,originalHSIisﬁrstsegmentedintomanysuperpixelsusingmodiﬁedentropy ratesuperpixelsegmentation.Then,toacquirethemostsimilarpixels,weproposetoconstructneighborhoodsofeachpixelfrom theoverlappingareasbetweenthecorrespondingsuperpixelsandtheslidingwindowcenteredonit.Subsequently,CMRsofdiffer-entpixelscanbeobtained.Intheclassiﬁcationstage,wefedtheobtainedCMRsintoSVMwithLog-Euclidean-basedkernelfor classiﬁcation.Comparedtothetraditionalapproachthatutilizesneighboringinformationonlywithinaﬁxedwindow,thepro-posedlocalhomogeneitystrategycanabsorbmorediscriminativespectral–spatialfeatures.Experimentalresultsfromaseriesof availableHSIdatasetsshowthatourproposedmethodissuperiortoseveralstate-of-the-artmethods,especiallywhenthetrainingset isverylimited.


I. INTRODUCTION
D IFFERENT from ordinary RGB images, hyperspectral images (HSIs) usually contain hundreds of spectral channels from ultraviolet to infrared, which provides valuable information for detailed material analysis [1], [2]. Therefore, HSIs have been widely applied in many fields, such as environmental monitoring, agriculture [3], medical diagnosis, and target detection [1], [4], [5]. In the last few decades, HSI classification technology, which assigns a unique class label to each pixel, has attracted great attention in the field of remote sensing [6]. However, it is very challenging to obtain satisfactory results due to the limited labeled samples and existing noise [2], [7]- [9].
In the past few years, a large number of HSI classification methods have been designed [10], [11]. Due to the presence of a large number of bands in the HSI data, many spectral dimensionality reduction-based HSI classification methods have been proposed, such as the methods based on principal component analysis (PCA) [12] and maximum noise fraction (MNF) [13]. In [14], an unsupervised classification framework based on robust manifold matrix factorization, which can address the high dimensionality of HSI, has been proposed. However, since the high intraclass variability and interclass similarity in HSI data [7], resulting from the influence of variation of light, climate, and other uncontrolled factors, classification performances produced by these spectral-based methods are usually unsatisfactory [15].
To tackle this issue, a variety of spatial-spectral frameworks, which consider both spectral and spatial information, have been widely investigated [16]- [20]. For example, extended morphological profiles are put forward to adaptively extract spatial characteristics in HSI [17]. Kang et al. [21] proposed an edgepreserving filtering-based framework to improve the classification performance obtained by the pixelwise classifier. In [22], superpixel-based classification is adopted to utilize spatialspectral features via multiple kernels (SCMK). Meanwhile, Li et al. [23] proposed a method based on local binary patterns (LBPs) to exploit the contextual information of HSI. To overcome the oversmoothing phenomenon caused by the Gabor filtering, a method called spectral-spatial range Gabor filtering has also been developed in [24]. Huang et al. [25] proposed a method called local linear spatial-spectral probabilistic distribution by This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ constructing a multiclass probability map to make full use of spatial correlation in HSI.
In addition, owing to the successful application in computer vision over the past few years, deep-learning-based methods have also been employed in HSI classification [26]- [30]. For example, a regularized deep feature extraction and classification method is proposed in [29], which develops a 3-D convolutional neural network (CNN) to explore spatial-spectral features. However, the 3-D convolution model has a high time cost and computational complexity [31]. To address this problem, Roy et al. [32] have achieved competitive performance by the cooperation between 2-D CNN and 3-D CNN. Although these methods based on deep learning can achieve good classification performance, they usually require a large number of samples for training to avoid overfitting problem. But acquiring labeled samples is very time-consuming in the field of remote sensing [33], [34]. Although some improvement measures have been put forward, i.e., random patches networks [35] and deep metric learning based framework [36], it is still an open matter.
Recently, the covariance matrix representation (CMR) has been proven to be an effective feature representation method that can make use of the correlation between different features [37]- [39]. Each nondiagonal element in the covariance matrix (CM) represents the covariance between the two different features. Taking the advantages of CMR, Fang et al. [37] proposed a spatial-spectral feature extraction method called local CMR (LCMR). But there is much noise in the classification map. Although Zhang et al. [40] proposed a method that combines multiscale adaptive weighted filtering with LCMR to obtain more smooth classification maps, the local neighbor selection is still an important problem to be solved. The main idea of LCMR is to obtain the discriminative features of different pixels by computing CMR from the local neighborhood determined by a sliding window. However, the pixels in the same sliding window may not belong to the same class, which leads to noisy classification maps [41]. Therefore, this article proposes a new representation method called locally homogeneous CMR (LHCMR) method to make full use of local-homogeneity-based spatial-spectral information to enhance the classification performance. Specially, we select the most similar pixels in the overlapping area between the superpixel and the sliding window, where it is called locally homogeneous area. This article extends our previous conference paper [40], which just contains very preliminary results. Compared to it, this article first adopts our modified entropy rate superpixel (ERS) segmentation to segment HSIs into many homogeneous regions based on spatial-spectral information and then obtains CMRs based on the local homogeneity strategy, which provides a new solution for similar neighboring pixels selecting.
The remainder of this article is organized as follows. In Section II, some related works are briefly reviewed. Section III presents the proposed LHCMR-based method for HSI classification. Section IV presents experimental results using four wellknown HSI datasets and shows the results of the comparison with other state-of-the-art methods. Finally, Section V concludes this article.

A. ERS Segmentation
ERS segmentation is one of the commonly used preprocessing methods for HSIs [22], [42]. It is a graph-based segmentation algorithm with high efficiency [43]. Specifically, the ERS first constructs a graph G = (V, E) on the image to be segmented. V is the vertex set corresponding to pixels in the image and E is the edge set consists of the pairwise similarities between adjacent pixels. The weight W i,j between vertexes v i and v j is defined using Gaussian kernel as is the spatial distance. Then, the graph is segmented into some smaller connected subgraphs by selecting a subset of edges A ⊆ E. To balance compactness and similarities of cluster sizes, an entropy rate term H(A) and a balancing term B(A) are incorporated into the objective function where α is used to control the contributions of the entropy rate term H(A) and the balancing term B(A). Then, a 2-D superpixel map can be generated. The optimization problem in (2) can be effectively solved by a greedy algorithm [42].

B. CM Representation
Suppose an image is defined as I, and a given region R ⊆ I has K pixels. Let {x i } K i=1 denote the feature of the ith pixel within R and x denote the mean vector of them. The CMR of region R can be then obtained as follows: In the CM, each nondiagonal entry represents the covariance between two different features. Since the obtained CM is a symmetric positive definite matrix, which is located on a nonlinear Riemannian manifold space. So, a manifold-based distance metric, the Logarithmic Euclidean distance metric, can be used to measure the CM. Specifically, for two given CMs C 1 and C 2 , the Log-Euclidean distance between them is defined by the following formula: where · F is the Frobenius norm and logm(·) denotes the ordinary matrix logarithm operator [38]. For a given symmetric positive definite matrix C, let C = UΣU T represent its eigendecomposition, then its logm can be computed by

III. PROPOSED METHOD
To overcome the problem with LCMR based on a fixed sliding window, the LHCMR-based spectral-spatial feature extraction method is described in this section, where its framework is shown in Fig. 1. Specially, we first introduce the superpixel segmentation to reveal the homogeneity of land covers with different sizes. Then, the CM is computed based on the most similar pixels selected from the overlapping area of the superpixel and sliding window. After that, we feed the obtained CM sets into SVM with Log-Euclidean-based kernel for final classification. The proposed method takes the advantage of superpixel segmentation and obtains CM within the locally homogeneous area, which is conducive to enhance classification performance.

A. Superpixel Segmentation
Usually, in order to apply superpixel segmentation on HSI, a common way is to simply apply PCA to extract the first or three principle components of HSI. However, HSIs have the advantage of high spectral resolution, the PCA step before segmentation may make it lost. Thus, we carry out the ERS segmentation on full bands to more accurately capture the structure of HSI. In (1), both the vertexes v i and v j are n-dimensional spectral vectors in our modified version. Then, the weight W i,j is calculated as Then, we perform ERS on the original HSI I hsi to produce superpixel map. Let N denote the number of superpixels, the superpixel segmentation results can be defined as where η s is the sth superpixel in the superpixel map. Through the segmentation algorithm, HSI can be divided into several meaningful subregions. In this way, we can effectively extract spatial information in HSI, and then combine the spectral features for further representation.

B. Construction of LHCMR
After that, LHCMR will be derived based on the results of superpixel segmentation. We can calculate the similarity between each pixel and its neighbors, and then select the most similar K − 1 pixels to construct the CM.
However, constructing CMs directly from superpixels may mix pixels belonging to different classes. One of the reasons is that the pixels in superpixel may not be uniform as expected [44]. For example, the pixels in the same superpixel may be far away from each other in some cases, this is inconsistent with the local assumption. Therefore, we need to make a constraint to make full use of local spatial information and ensure local homogeneity.
To this end, we introduce a sliding window based on superpixel. Although directly selecting neighbors in a fixed window will cause some noise, constructing neighbors in the overlapping area between the superpixel and the sliding window can provide a solution to the aforementioned problem. Fig. 2 shows the overlapping area, where we will select the most similar pixels. Assume that there are m pixels in the superpixel η s . For each pixel p i ∈ η s , (i = 1, 2, . . . , m), the overlapping area of the T * T window W centered on the given pixel and the superpixel η s is determined by Subsequently, for each pixel, we employ the cosine-distancebased KNN to select the K − 1 most similar pixels. Suppose there are M pixels in the overlapping area D ol , the cosine distance between the pixel ρ 1 and its neighboring pixels can be calculated by where · and · represent the inner product and Frobenius norm, respectively. Based on the position index of these pixels, we can associate them with HSI.
To reduce noise and computational complexity, we first apply the MNF method for dimension reduction. After obtaining the neighbor pixels in the reduced-dimensional HSI, we can derive the LHCMR of K pixels according to (3). In this way, the local homogeneity-based spatial-spectral information of HSI will be naturally integrated into the LHCMR. Each off-diagonal element in the matrix represents the correlation between different spectral bands of this type of material, which helps improve the classification performance.

C. Classification
Finally, a series of CMRs obtained in the locally homogeneous area are fed into the SVM for training and classification. In this article, a commonly used kernel function called Logarithm-Euclidean kernel function [45] is adopted to map the CM from Riemannian manifold space to Euclidean space, it can be defined by where C 1 and C 2 are two given CMs.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
To demonstrate the effectiveness of the proposed LHCMRbased method, extensive experiments have been performed on four well-known HSI datasets. The classification performance is objectively evaluated by three widely used quality indicators: overall accuracy (OA), average accuracy (AA), and Kappa coefficient k.

A. Datasets
The first experimental dataset is Indian Pines. The number of spectral bands has been reduced to 200 after 24 water absorption bands are removed and each band has the size of 145 × 145 pixels. The ground truth available is designated into 16 classes. The second HSI data are the Pavia University image and were acquired by ROSIS sensor over the campus at the University of Pavia, Italy. This scene is of size 610 × 340 × 103 after the noise-corrupted bands were removed. The ground-truth is composed of 9 land cover classes. The third HSI used in the experiment is Salinas, which was also gathered by AVIRIS sensor over the Salinas Valley, California. This image has a size of 512 × 217 × 204 after 20 water absorption bands were discarded. And this dataset contains 16 classes. The fourth HSI data are the Pavia Center image, which with the size of 1096 × 492 and 102 spectral bands. The ground-truth map covers 9 classes. The details of all the datasets are tabulated in Table I.

B. Parameter Analysis
The effect of the number of superpixels N, the size of sliding window T, and the number of local neighboring pixels K has been seriously investigated in this section. First, we will discuss the effect of N (from 50 to 350 with step 50) and K (from 100 to 350) on the performance of the proposed method (T is fixed). Fig. 3 shows the OAs achieved by different N and  K on four datasets. For the number of superpixels N, we can observe that the classification performance is sensitive to the number of superpixels on Indian Pines image. It is because that a small number of superpixels is not sufficient to capture the image structure, whereas a large number of superpixels leads to oversegmentation for Indian Pines with small size. Furthermore, the number of superpixels partly reflects the details of the image. Thus, it can be seen from Fig. 3 that competitive results can be obtained when dividing a small number of superpixels on the Salinas dataset, whereas a large number of superpixels need to be divided for Pavia Center dataset. For the number of local neighboring pixels K, it is obvious that the OAs on all datasets are particularly low when the number of local neighboring pixels is small. The reason for this is that a small number of local neighboring pixels may not be enough to extract sufficient spatial information. It can be also observed that, with the increase of K, the classification accuracies of the proposed method have a trend of rising on Pavia University, Salinas, and Pavia Center image, whereas the OAs first improve and then decrease on the Indian Pines image. This is mainly because that the size of the Indian Pines image is smaller, a large number of local neighboring pixels may mix in more dissimilar pixels, which will hamper the classification performance. As a result, the parameters K and N on the four datasets are detailed in Table II, respectively. Then, parameter T, which is used to ensure local homogeneity, is also tested with parameters N and K fixed. In our experiment, T ranges from 19 to 39 with step 2. The effects of different T on accuracies obtained on four datasets are shown in Fig. 4. As can be seen, OAs, AAs, and Kappa show an upward trend when T grows from 19 to 35 on all the images. In addition, the best performance is achieved when T is 35 and the accuracies will decrease when T is larger than 35 on Indian Pines, Salinas, and Pavia Center images. On Pavia University image, the best results are obtained when T is 37. In the proposed method, the sliding window plays a role in ensuring spatial similarity of superpixels. This means that small size will lead to few neighboring pixels in the overlapping area, which causes the extracted features to be not representative. However, when the window is too large (larger than the size of the superpixel block), it will become meaningless. As a result, the value of T is set to 35 for all datasets.

C. Comparison With Different Methods
In this article, the performance of our proposed LHCMR algorithm is compared with seven state-of-the-art methods. Traditional spatial-spectral methods include the LBP-based method [23], superpixel-based classification method via multiple kernels (SCMK) [22], and LCMR [37]. In addition, some advanced deep-learning-based methods, random patches network (RPNet) [35], an improved 2-D CNN-based approach called deepNRD [30], deep metric learning based model (S-DMM) [36], and hybrid spectral CNN (HybridSN) [32] are also included. For the deepNRD, its parameters are set to the best, and the parameters of other methods are the default parameters.  The first experiment was conducted on the Indian Pines dataset. In this experiment, ten training samples per class are randomly selected, and the remaining are used as test samples. The average classification results for various methods over ten trials with different randomly selected training samples are reported in Table III. The best results are highlighted in bold font. As can be observed, deep-learning-based methods achieved lower accuracies, mainly because these methods suffer from overfitting problem when very few samples are chosen for training. Among them, RPNet achieves better results because its convolution kernels are randomly selected from feature images without any training. However, the proposed LHCMR performs the best among other comparison algorithms in terms of OA, AA, and Kappa coefficient. In addition, the proposed method significantly enhances the classification performance of LCMR. This demonstrates that the local homogeneity strategy can obtain more similar neighboring pixels and extract more effective spatial-spectral information than just using a fixed window. Fig. 5 illustrates visual classification results obtained by different methods on Indian Pines image. It can be observed that when using limited training samples, the classification maps generated by the SCMK, RPNet, LCMR, and S-DMM have much noise, whereas the LBP, deepNRD, and HybridSN lead to oversmoothed maps. However, the proposed LHCMR can not only preserve the structure of HSI, but also can classify pixels more accurately. The reason for this is that the superpixel segmentation strategy can naturally capture the structure of HSI. Furthermore, we select the representative neighboring pixels from locally homogeneous areas.
In addition, we have also investigated the effect of different numbers of training samples on the performance of the proposed LHCMR and other methods. Fig. 6 shows the OAs, AAs, and Kappa obtained by different methods. We randomly select the numbers of training samples from 5 to 15 in a step of 5 from each class (since there are only 20 samples of a class in Indian Pines image). As can be seen, the proposed  The second experiment was performed on the Pavia University image, and ten labeled samples per class are randomly chosen for training. Table IV tabulates the classification results (averaged over ten experiments) of diverse methods and Fig. 7 shows the classification maps. It can be seen that the proposed LHCMR achieves the best classification accuracy among other comparison methods. As shown in Fig. 7, the proposed LHCMR method yield a more smooth and accurate map. Furthermore, we also randomly choose 10,15,20,25, and 30 samples per class for training, and the average classification results are shown in Fig. 8. It can be seen that with the increase of training samples, the performance of all competitive methods is on the rise, but our method is still the best. These results objectively  Table V and the full classification maps are shown in Fig. 9. As can also be observed, the proposed LHCMR method delivers better performance than other compared methods, in terms of quantitative metrics and visual results.   The fourth experiment is performed on Pavia Center dataset. In this case, only five labeled samples per class are randomly chosen for training. Table VI reports the quantitative classification accuracies of different methods and Fig. 11 illustrates the full classification maps.     Fig. 12 also shows the effect of the number of training samples (ranging from 5 to 25, with a step of 5 per class) on the performance of different methods. The same conclusion can be drawn from this image. The proposed local-homogeneity-based approach can deliver competitive results with a few training samples.

V. CONCLUSION
In this article, LHCMR-based HSI classification method has been proposed. Not only the proposed method overcomes the drawbacks of nearest neighbors selection in a fixed window, but also achieves a competitive classification performance. The quantitative and visual results on homogeneous and nonhomogeneous HSI datasets illustrate that the proposed method is superior to several existing state-of-the-art classification methods. The reason for this may be twofold: first, by using a superpixel segmentation strategy on full bands, the HSI can be accurately segmented into many homogeneous regions based on spatialspectral information. Second, locally homogeneous CMRs calculated based on the locally homogeneous area (overlapping area of the superpixels and the sliding window) have fully utilized the correlation among different spectral bands. For future works, we will introduce various promising distance metrics to exploit spectral-spatial information residing in HSI.
Zhijing Ye received the B.Sc. and Ph.D. degrees in mathematics from the Huazhong University of Science and Technology, Wuhan, China, in 2011 and 2016, respectively.
He is currently with the School of Science, Wuhan University of Technology, Wuhan. His research interests include statistical learning, pattern recognition, and hyperspectral image processing. She is currently a Lecturer with the School of Computer Science, Hubei University, Wuhan, China. Her research interests include computer vision, pattern recognition, and machine learning.