Adaptive Hyperspectral Image Classification Based on the Fusion of Manifolds Filter and Spatial Correlation Features

,

Many scholars have applied a variety of machine learning methods to HSI and proposed various classifiers to improve classification accuracy, such as support vector machine (SVM) [9], [10]. Considering that SVM is a single interval and cannot represent the interval distribution of all training data, Zhang et al. propose a Large margin distribution machine (LDM) [11] that can maximize the marginal mean and minimize the margin variance to construct a new hyperplane for classification. In addition, many investigators attempt to progress the classification using LDM. For instance, Zhan et al. apply LDM to the hyperspectral image The associate editor coordinating the review of this manuscript and approving it for publication was Geng-Ming Jiang . application and give an LDM-FL method with the spatial information extracted by the recursive filter (RF) to enhance the classification [12]. Moreover, Liao et al. perfect HSI classification using LDM that combines spatial information obtained by the bilateral filter (BF) and linear spatial correlation information [13]. Similarly, this paper also adopts LDM to enhance HSI classification.
In recent years, many investigators have proposed a wide variety of classification methods based on spatial information extraction, such as image segmentation [14], morphological feature extraction [15], and Markov random fields [16]. For example, Mehta et al. propose an approach for assimilating segmentation, clustering, and local band selection within its framework [17]. Li et al. merge the regions into a single classification framework by combining semi-supervised learning and conditional random fields [18]. To extract spatial features more deeply, many investigators use the superpixelbased segmentation method to obtain spatial information for classification. Jiang et al. propose a SuperPCA method that incorporates the spatial context information into the dimensionality reduction and obtains the low-dimensional features by performing PCA on each homogeneous region partitioned by superpixel segmentation [19]. Chen et al. propose a SuperBF method that divides an HSI into some homogeneous regions obtained by superpixel segmentation and obtains spatial texture information on each homogenous region by BF [20]. Xu et al. present a hypergraph-based lowrank subspace clustering to obtain a more complex manifold structure that is meaningful both spatially and spectrally for HSI classification [21]. Besides, morphological feature extraction can be applied to HSI classification. For instance, Zhou et al. propose a method based on morphological component analysis for decomposing original hyperspectral data into its morphological components. In each feature domain, the combination of active learning and semi-supervised learning magnifies the superpixel-based training data set [22]. Liao et al. utilize morphological profile filter and domain transform normalized convolution filter (DTNCF) [23] to extract spatial information and inputs the extracted information into a SVM, and thus implements a two-step optimization in the classification process [24]. Imani et al. propose a morphology-based feature extraction and classification framework, which includes the local neighborhood information in a spatial window for extension of the training set and tries to preserve the data structure in spectralspatial feature space [25]. Besides, some researchers apply Markov random field (MRF) to HSI classification, e.g., Cao et al. present a supervised classification method that integrates spectral and spatial information into a unified Bayesian framework. Combined with MRF, a convolutional neural network is applied to obtain the spatial feature for enhancing the HSI classification [26]. Ghasrodashti et al. present a spectral-spatial classification method for the classification. The extended hidden MRF adopts Poisson distribution and weighted mean regularization to overcome Poisson noise, increasing the local neighborhood energy consistency of the classification with iterative maximum likelihood [27]. Although the extraction of spatial features for classification obtained good results in the past, most of them are a single spatial feature. Therefore, the spatial features of HSI have not been fully extracted and cannot be suitable for various HSI data sets.
To obtain better spatial information, a multitude of texture filters are employed to extract spatial information for enhancing HSI classification [28], such as Gabor filter (GF) [29]. Particularly, the two-dimensional GF is used to extract spatial features from HSI for the nearest regularized subspace classifier [30]. Moreover, Li et al. project Gabor feature of HSI onto the kernel induced space through the weighted summation composite kernel technique for SVM classifier [31]. Jia et al. provide a multi-feature learning algorithm that can learn proper dictionaries for Gabor feature and three other features until a stable code can be obtained in SVM [32]. Kang et al. use the first three components of principal component analysis (PCA) [33] to extract the spatial feature, forming a fusion feature for deep network on hyperspectral images by GF; and Ref. [34] presents a GFDN method for HSI classification. Jia et al. propose a cascade superpixel regularized Gabor feature fusion method, which obtains spatial features by Gabor filters from the original HSI and exploits the decision information with SVM, then applies the quadrant bit coding and Hamming distance metric to encode the Gabor phase features, and finally obtain classification with a series of superpixel graphs with over-segmentation to under-segmentation [35]. Besides, edge-preserving filters can be applied to extract spatial information. For example, Wang et al. apply joint BF to smooth images on a probability map obtained by the SVM classifier. The retaining object boundary can be used to smooth the salt-and-pepper classification noise in the homogeneous region, which can contribute to the classification result by combining the segmentation map obtained by the minimum cut algorithm [36]. Sahadevan et al. integrate spatial contextual information obtained by BF into the spectral domain to improve SVM performance [37]. Qiao et al. extract spatial features by joint bilateral filtering with the first principal component as the guidance image, and the filtered image is classified with a sparse representation method [38]. Based on the work of Guo et al., guided filter(GDF) is used to extracting spatial information for optimizing the classification [39]. Liao et al. attempt to improve HSI classification according to the combination of spatial information with GDF and domain transform interpolated convolution filter(DTICF) [40]. To achieve a better classification, Kang et al. propose a classification method using edgepreserving filtering (EPF) and SVM [41], which can enhance the method with the proposed PCA-based EPFs (PCA-EPFs) classification [42]. Besides, based on image fusion with multiple subsets of adjacent bands and RF [43], a spatial texture feature is obtained by Kang et al. to improve the classification of HSI [44]. Moreover, adaptive manifold filter (AMF) [45] is used to realize high-dimensional filtering in real-time, and good spatial features can be extracted for HSI images. For example, Xie et al. try to enhance the extreme learning machine (ELM) classification of HSI images based on spatial information obtained by AMF [46]. It turns out that spatial features extracted by texture filters can improve the performance of HSI classification. However, the filters extracting spatial texture features tend to fall into local feature extraction by window functions, and it is difficult to attain the better spatial texture features to assist the classifier.
To make full use of spatial features for improving classification capabilities, we proposed a new method. Considering that AMF can obtain rich texture features in local and global optimization, we adopt AMF for spatial texture feature extraction in this work. However, AMF can only extract effective spatial texture features but may ignore informative spatial correlation feature. Consequently, some investigators use DTNCF and DTICF to obtain the optimal spatial correlation feature to compensate for the deficiencies of spatial texture features for enhancing HSI classification [24], [40]. Nevertheless, DTNCF and DTICF have different effects on spatial texture features extraction for different HSI, resulting in different classification outcomes. To deal with these problems, this paper presents a new method based on the fusion of adaptive manifold filter and spatial correlation features (AMSCF) for adaptive hyperspectral image classification. Specifically, spatial texture features can be extracted by AMF, and spatial correlation features can be acquired by DTNCF and DTICF. The two kinds of spatial features, separately classified by LDM with the maximum marginal mean and the minimal margin variance, are fused, and output optimal classification after comparison. The contributions of this work are as follows: (1) AMF is used to extract the spatial texture feature of the hyperspectral image, and the obtained spatial correlation feature of DTNCF is used for classification.
(2) To achieve the optimal classification, the proposed method uses DTNCF and DTICF to extract the most suitable spatial correlation feature and adapt to types of HSI for improving classification performance.
(3) To enhance the performance of the LDM classification, the two kinds of spatial information are combined, and the experimental results indicate that the proposed AMSCF method is better than other classification methods.
The rest of this paper is organized as follows. Section II describes the proposed method. Section III shows the hyperspectral image dataset to verify the effectiveness of the method and analyses the experimental results with AMSCF. Section IV states the conclusions.

1) LDM-BASED CLASSIFICATION METHODS
The LDM proposed by Zhang et al. attempted to maximize the marginal means and minimize the marginal variances simultaneously for the construction of a new hyperplane to improve the SVM classification performance with maximizing the minimum margin. A training set can be defined as S = {(x 1 , y 1 ), . . . , (x n , y n )}, in which, x i is the training sample labeled by y i ∈ (+1, −1), i = 1, 2, · · · , n, where n is the number of the training data. The goal of SVM is to obtain a predictive function with strong generalization performance and predict unlabeled data using the obtained function. To show the margin of SVM, w can be represented as the linear decision function and f is a linear model as follows where ω(x) is a feature mapping of x by a kernel k, i.e., With the maximization of the minimum margin, the margin of instance (x i , y i ) for SVM in individual cases can be formulated as where ζ i is a slack variable that measures the degree of misclassification of x i . To improve the single interval of SVM, the marginal mean and the marginal variance were proposed to construct a new hyperplane for LDM and can be characterized as Eq. 3 and Eq. 4, respectively.
where X = [ω(x i ) · · · ω(x n )] and y = [y 1 · · · y n ]. For the inseparable conditions, the soft-margin LDM can be expressed as min w,ζ where α 1 and α 2 are the parameters corresponding to the trading-off the marginal variance and the marginal mean, respectively; E is a trading-off parameter. The terms of min w 1 2 w T w − α 2 − γ and min w 1 2 w T w + α 1 γ in Eq. 5 maximizes the mean margin and minimizes the margin variance, respectively. Meanwhile, due to the maximization of the mean margin and minimization of the margin variance, LDM can more effectively achieve generalization performance for the classification.
The classification hyper-planes of SVM and LDM are shown in Fig.1, and two categories of HSI grounds are plotted in '' '' and '' '' respectively. The SVM margin hyper-plane H SVM is drawn in a red line which is the vertical line of the dotted red line, and the LDM margin hyper-plane H LDM is drawn in the blue line. Fig.1 shows that the hyper-plane of SVM maximizes the minimum margin for all samples and the hyper-plane of LDM maximizes the mean margin and minimizes the margin variance simultaneously for all samples. We can conclude that the hyper-plane of LDM is more effective than that of SVM for the classification.
To improve the effectiveness and solve the nonlinear problem of LDM, substituting Eq. 3 and Eq. 4 into Eq. 5 can yield min w,ς 90392 VOLUME 8, 2020 According to the representer theorem [11], [47], the optimal solution of Eq. 6 has the following form: where β = [β 1 , · · · , β m ] is a set of the coefficients. It can be concluded that where K = X T X is the kernel matrix, and Eq. 6 can be rewritten as min w,ς where H = 4α 1 (nK T K + (Ky)(Ky) T )/n 2 + K, and b = −2α 2 Ky/n, K i denote the ith column of K.
Similar to SVM, LDM can also be improved by some kernel methods, and there are some types of kernel functions for LDM classifiers, such as linear kernel functions, polynomial kernel functions, and radial basis functions. We will adopt the radial basis functions for LDM in this paper.

2) ADAPTIVE MANIFOLD FILTER
The linear filtering [48] of the hyperspectral image can be expressed as where R i is the i th band of the hyperspectral image, E i is the filtering result of the image R i , and O i is the result of a linear combination filter of image. The AMF [45] proposed by Gastal et al. is a highdimensional filtering method that consists of three steps (i.e., splatting, blurring, and slicing), and the filtering for the hyperspectral image can be shown in the following parts.

a: SPLATTING
Pixels of the image can be projected onto the current manifold, and a Gaussian distance weighting can be performed on each manifold, such as where f i is the pixel of the hyperspectral image, β ki is the sampled pixel, and S / 2 (α ki −f i ) controlled by the covariance matrix S / 2 calculates the Gaussian distance weight of α ki for a position in the band, which can completely utilize the spatial correlation of the pixel and have the global diffusion and spatial correlation retention characteristics.

b: BLURRING
Smoothing the splatted values φ splat , RF is adopted for filtering in this paper.
The filtering response E i of each pixel is obtained from φ blur by interpolation weights, where K is the total number of adaptive manifolds, w ki is the weight of Gaussian distance, and φ 0 blur (α ki ) is the smoothing processing of splatting φ 0 splat (η ki ), which can be expressed as Eq. 14 is substituted into Eq. 11 to obtain a linear filtering result, which is manifold filtering of a hyperspectral image.
The adaptive manifold filtering algorithm for the hyperspectral image is the recursive process of splatting, blurring, and slicing; and the recursive process of manifold filtering is related to the height of the manifold. The height of the manifold tree determines the recursive hierarchy and the number of manifold tree nodes, and also greatly affects the filtering effect. The manifold tree height and the manifold node number are calculated as follows: 1) The tree's height H is written as where · is the round-up operator, n ∈ 1, 2, · · · , N , where · is the round-down operator where D s is a height computed from the spatial standard deviation σ s of the filter, and D r is a linear correlation computed from the range standard deviation σ r . Thus, the manifold number K can be obtained by 2) Generate the first manifold, the first manifold µ 1 (p i ) can be obtained by a low -pass filter, where p i S h S is a low-pass filter in S with covariance matrix, S which is a diagonal matrix that controls the decay of the Gaussian kernel.
3) Calculate eigenvector v 1 , Solve the eigenvector corresponding to the largest eigenvalue of ( 4) Segment the pixels into two clusters. The dot product d is first calculated by Eq. 22, Then divide the pixels of HSI into two clusters according to Eq. 23.
6) The next manifold and new manifold high-dimensional filtering can be recursively calculated and performed until it reaches the manifold height.
The manifolds generated by the adaptive manifold filter are always compatible with the input reference signal in the high-dimensional space, which makes the manifolds can well represent the input reference signal, with the adaptive manifold as the standard, so it can maintain clear edge features. In addition, the filtering process of the adaptive manifold filter will build a manifold tree while performing filtering. More detailedly, after calculating a manifold, then filter the input signal with the manifold, and then continue filtering recursively until all the manifolds are completed. It can be concluded that the result has more dimensional filtering directions [49].
Therefore, the manifold filtering has good global diffusivity and edge retention characteristics. The higher the height of the manifold tree, the more the manifold can adapt to the local population, and it has better local optimization. The manifold filtering can extract better spatial texture information for the local and global optimization, with obtaining excellent global spatial features. It can be concluded that the higher the manifold tree height, the more manifolds, and the local and global optimization capabilities will be stronger.
The filtering flow of the AMF for HSI is shown in Fig. 2.

3) DOMAIN TRANSFORM FILTER
In the past, the supervised HSI classification methods only obtained some useful features from spectral information, while the spatial-spectral classification methods focused on the extraction of spatial texture features. However, in the process of extracting texture features, spatial correlation features are often ignored. DTNCF and DTICF [23] proposed by Gastal can be used for image filtering, and the two-dimensional image filtering can be transformed into one-dimensional image filtering. For a uniform discretization S( ) of the original domain , the energy function of DTNCF for the hyperspectral image R at the i th band can be expressed as where Z a is the normalized factor of a as shown in Eq. 25, and G(·) denotes the kernel filter. Furthermore, the implication of Eq. 26 is that the neighborhood pixels are the same ground, and δ(·) is the Boolean function as shown in Eq. (27).
where ϕ(·) is given in Eq. 28, which cumulatively integrals the partial differential of the image and then transforms it to an increasing function. Therefore, the function ϕ(u) is used to reshape an image into a one-dimension vector. r is the radius of the filter as shown in Eq. 29, σ s and σ r are the space standard and range standard deviations, respectively. N is the total iteration number. σ J q is the standard deviation for the kernel used in the q th iteration expressed in Eq. 30.
At the i th band, the energy function of DTICF for hyperspectral image R can be represented as Equation (31) shows a reconstructed signal D w obtained by the linear interpolation (in w ) of the samples. Filtering D w is performed by the continuous convolution, where Q is a normalized box kernel represented by the bool function and r is the filter radius as shown in Eq. 33. The significance is that the pixel neighborhood can be considered to be the same ground, so the spatial correlation of the hyperspectral image is preserved during the filtering process.
Substituting Eq. 32 into Eq. 31, we can obtain where ϕ(u) and r are given in Eq. 28 and Eq. 29, respectively. There always exists a strong spatial correlation among pixels in HSI for the ground distribution with the appropriate uniformness. In addition, the spatially relevant feature is a related property of the reflection intensity between the adjacent pixels. However, spatial correlation feature is often ignored in the texture feature extraction with the edge-preserving filtering.
To analyze the spatial correlation characteristics of AMF, DTICF, and DTNCF, the spatial correlation of hyperspectral images can be measured by Moran's I [50], [51] before and after the filtering, and the formula can be expressed as where X i and X j are the reflection intensities of two hyperspectral pixels, X is the average of X , n is the pixel number of one band, and a ij is the spatial weight.
The larger the I , the stronger the spatial correlation, and vice versa. Subsection D of Section III provides an overview of the validation tests for DTNCF for space-related information extraction.
DTNCF and DTICF have excellent spatial correlation feature retention characteristics for HSI. However, these two filters have their advantages for different HSIs and can have different effects on the classification. Therefore, DTNCF and DTICF will be used in this work to adapt to different HSI datasets to obtain the best classification performance.

B. PROPOSED METHODS
To completely utilize two kinds of spatial features, a new classification approach (AMSCF) is proposed based on the combination of AMF, DTNCF, and DTICF. In which, spatial texture feature can be extracted with AMF, and spatial correlation features can be obtained with DTNCF and DTICF, separately. The fusion of two kinds of spatial features is classified by the LDM classifier, and the optimal classification can be exported.
The detailed flow chart of the AMSCF can be found in Fig. 3, which is roughly divided into 8 steps: 1) HSI is normalized; 2) The dimensionality of HSI is reduced by PCA; 3) Spatial texture feature can be extracted by AMF from the top 10% of principal components; 4) The first spatial correlation feature is obtained by DTNCF with all bands of HSI; 5) The second spatial correlation feature is obtained by DTNCF with all bands of HSI; 6)Two kinds of spatial features are fused respectively; 7) The fusion features are classified by LDM; 8) An optimal classification is an output by comparing the classification results.
The systematic implementation process is given as follows.

A. HYPERSPECTRAL DATA DESCRIPTION
The effectiveness of AMSCF has been tested on three different hyperspectral image datasets. The first dataset is Indian Pines [52] collected in 1992 from an Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines region in Northwestern Indiana. This dataset included 220 spectral bands with a spatial size of 145 × 145 pixels, with 200 bands remained and 20 spectral bands removed due to noise and water absorption. The image has 16 classes, each of which owns a specific type and number as shown in Table 2.
The second dataset is Salinas Valley [53] attained in 1998 by AVIRIS over Salinas Valley of Southern California and possesses a high spatial resolution of 3.7 m with an area of the spatial size of 512 × 217 pixels and 206 spectral bands. Similarly, because of noise and water absorption phenomena, 200 bands are retained. This image also contains 16 classes, whose specific types and numbers are shown in Table 3.
The third dataset is Kennedy Space Center [54] from NASA AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) at the Kennedy Space Center in Florida on March 23, 1996. 224 bands are gotten with 10 nm width. The Kennedy Space Center dataset can be available at an VOLUME 8, 2020  Table 4.

B. COMPARED METHOD AND PARAMETERS SETTING
To verify the superiority of the proposed method, the following procedures are conducted to compare with the method of AMSCF.
(1) SVM [9]: According to the raw features of hyperspectral images, SVM can be applied with the Gaussian radial basis function kernel, and the main parameters include gamma g = 0.18 and penalty factor C = 2500.
(2) IFRF [44]: This method attains the classified results with SVM based on image fusion and RF. The spatial standard deviation σ s = 200 and the range standard deviation σ r = 0.3 are set for RF.
(3) LDM: Based on the raw features of hyperspectral images, a Gaussian radial basis function kernel is applied here. The margin variance and the margin mean are separately set as α 1 = 125000 and α 1 = 50000, while the trading-off parameter is set as E = 1000000.
(4) LDM-FL [12]: This method obtains the classified results with LDM from RF. The parameter of LDM-FL is the same as the LDM method.
(5) PCA-EPFs [42]: The spatial information firstly constructed by applying edge-preserving filters is stacked to form the fused features, and the dimension is reduced by PCA for the classifier of SVM. The spatial standard deviation σ s = 3 and the range standard deviation σ r = 0.1 are set for PCA-EPFs.
(6) GFDN [34]: This method extracts the spatial features by GF on the first three principal components of the hyperspectral image to form the fused features, and then combines the original features used for deep network classification with the scale, orientation, row, and column number of to 5, 8, 55, and 55.
(7) SuperPCA [19]: In this method, the spatial features which acquired by performing PCA on some homogeneous region obtained with superpixel segmentation, were classified by SVM and got the final result by the majority voting. (8) SuperBF [20]: In this method, spatial texture information obtained by BF on some homogenous region divided via superpixel segmentation, were classified by SVM.
(9) AM-SVM: The hyperspectral dimensionality is reduced with PCA, and the first 10% principal components are selected for SVM based on AMF, and the spatial standard deviation σ s and range standard deviation σ r are set to 5 and 1.0, respectively.
(10) AM-LDM: PCA lowers the dimensionality of the hyperspectral, and the first 10% principal components are diminished for LDM based on AMF, and the spatial standard deviation of σ s and the range standard deviation σ r are set to be the same as AM-SVM.
(11) DTNC-SVM: The hyperspectral dimensionality has been lessened with PCA, and the first 10% principal components are picked for SVM based on DTNCF, and the spatial standard deviation σ s and range standard deviation σ r are correspondingly set to the values of 200 and 0.3.
(12) DTNC-LDM: The hyperspectral dimensionality is diminished with PCA, and the first 10% principal components are taken for LDM based on DTNCF, and the parameters of σ s and σ r follows the values of DTNC-SVM.
(13) DTIC-SVM: The hyperspectral dimensionality is lessened with PCA, and the first 10% principal components are picked for SVM based on DTICF, and the values of spatial standard deviation σ s and range standard deviation σ r are 200 and 0.3, respectively.
(14) DTIC-LDM: PCA is also used to shorten the hyperspectral dimensionality, and the first 10% principal components are taken for LDM based on DTICF, and the parameters of σ s and σ r are equal to the values of DTIC-SVM.
(15) AMDTIC-SVM: The spatial texture feature is extracted from the top 10% principal components by AMF, and the spatial correlation feature can be obtained by DTICF.
Last, the two kinds of spatial features are fused to be classified by SVM.
(16) AMDTNC-SVM: The classifying process is similar to the process of AMDTNC-SVM, other than that of the spatial correlation feature attained via DTICF.
(17) AMSCF: The spatial texture feature is extracted from the top 10% of principal components by AMF, and spatial correlation features can be obtained by DTNCF and DTICF. The two kinds of spatial features are fused and separately classified by LDM, and output optimal classification after comparison.
In this paper, Overall Accuracy (OA), Average Accuracy (AA), and Kappa statistic (Kappa) were adopted to measure classification accuracy. To avoid biased estimation, based on the configuration of i7-6700 CPU and 8GB RAM, 20 independent tests were carried out using a computer program of Matlab R2012b.

C. THE VALIDATION TEST OF AMF
To verify the effectiveness of AMF, the 15 th , 90 th , 125 th , and 190 th bands of Indian Pines were processed with AMF. As shown in Fig. 4, the four images of the first row are the original spectral image of Indian Pines, and another four images of the second row are the corresponding filtered images processed by AMF. It can be seen that good boundary features can be extracted by AMF from the hyperspectral images, with the great advantages of smoothing and removing hyperspectral image noise.

D. TEST OF SPATIAL CORRELATION INFORMATION
To compare the spatial correlation of AMF and DTICF, we calculated the mean of Moran's I for each band of Indian Pines and Salinas Valley datasets. The average Moran's I of the two filters is shown in Fig. 5. We can find  that the average of Moran's I obtained from DTICF is higher than the average of AMF and raw spectral features. Besides, the average of Moran's I acquired by AMF is lower than that of the spectrum images, suggesting the weak spatial correlation information. Therefore, it can be illustrated that DTICF can extract good spatial correlation information and effectively compensate for the deficiency of AMF.

E. INVESTIGATION OF THE PROPOSED METHOD 1) OPTIMIZATION OF AMF
The parameters of spatial standard deviation σ s and range standard deviation σ r of AMF can influence the spatial information extraction, so the classification test of Indian Pines dataset was conducted to verify the effectiveness of parameter optimization. For Indian Pines dataset, 6% of the training samples were randomly selected, and the three optimal parameters were found using the exhaustive method to obtain the best LDM classification results. We first set the parameters to σ s ∈ 1, 2, · · · , 20 and the standard deviation σ r ∈ 0.1, 0.2, · · · , 4.0. Then, the experiments were sequentially performed for the classification. As shown in Fig. 5, when σ r ∈ 0.1, 0.2, · · · , 4.0 for each σ r , the better classification can be attained when σ r ∈ 0.4, 0.5, · · · , 1.5 than other test values σ r . Fig.6 shows the classification test, and m is the number of iterations. Furthermore, the best classification can be achieved when σ s = 5 and σ r = 1.0. Therefore, to acquire a better classification, the parameters σ s = 5 and σ r = 1.0 were adopted in the following experiments.
To verify the importance of the manifold tree height for LDM classification, Table 1 shows the confirmation of the hyperspectral image classification, and n is the height of the manifold tree. From the table, the optimal classification result of OA is 97.66%, whereas the height of the manifold tree is 6 and the number of manifolds is 255, respectively. On the contrary, the worst classification result corresponds to a manifold height of 2, which fully indicates that the number of manifolds is larger. Also, the spatial feature of extracting hyperspectral image is better and can significantly improve classification performance. However, the complexity of the algorithm increases due to the massive nodes in the manifold tree, and the appropriate height of the manifold tree is the key to extract good spatial texture features. Therefore, height of the manifold tree was selected as 6 in this paper for the following classification test.

2) EXPERIMENT OF INDIAN PINES
To assess the performance of AMSCF, sixteen methods were employed to classify and verify the data of Indian Pines, as shown below: The distribution of Indian Pines datasets are shown in Fig.7 (a). All 16 categories were selected, with 4% samples of the thirteen types of grounds number of Indian Pines as the training set and the remaining samples were employed as the test set, while 16% of the three types of grounds were insufficient for training and were selected as the training set. Table 2 shows the classification accuracy resulting from these thirteen classification methods, as shown in Fig. 7. Table 2 shows the accuracy of OA, AA, and Kappa for each class using various methods, indicating that AMSCF achieves excellent accuracy, e.g., OA = 97.88%, AA = 96.28%, and Kappa = 97.58. In addition, the accuracy of AMSCF exceeds 99% in the 4 classes. This experiment shows that the classification performance of AMSCF is significantly enhanced compared with other approaches. VOLUME 8, 2020

3) EXPERIMENT OF SALINAS VALLEY
Likewise, the ground truth distribution of the Salinas Valley dataset is demonstrated in Fig. 8(a): all 16 classes were selected, with 0.7% of the samples as the training set, and the remaining 99.3% as the test set. Table 3 lists the classification accuracy of the Salinas Valley dataset under different methods, and the classification outputs are shown in Fig. 8.
In Fig. 8, the classification results of the Salinas Valley are shown, while Table 3 shows the OA, AA, Kappa, and accuracy of each method. The table also gives the best

4) EXPERIMENT OF KENNEDY SPACE CENTER
Similarly, the distribution based on Kennedy Space Center dataset is shown in Figure 10 (a), and 5%(about 261) samples of all 16 classes are selected as the training set, and the remaining 95% is used as the test set. Table 4 lists the classification results of the Kennedy Space Center dataset for 16 methods. The effect of classification is shown in Fig. 9.
The classification results for Kennedy Space Center are shown in Fig. 9, while Table 4 shows the accuracies of OA, AA and Kappa for all methods, with the best accuracy of AMSCF as OA = 99.05%, AA = 98.37%, and Kappa = 98.94%. Moreover, AMSCF owned 100% accuracy in the seven classes. This experiment indicates that compared with It can be shown that LDM has the ability to maximize the margin mean and minimize the margin variance, which is better than SVM.
Second, the AM-SVM and AM-LDM OA values of Indian Pines are 16.93% and 18.17 % higher than those of SVM and LDM, respectively. In addition, the OA values of AM-SVM and AM-LDM in Salinas Valley are 5.65% and 7.52% higher than those of SVM and LDM, separately. Moreover, the OA values of AM-SVM and AM-LDM for Kennedy Space Center are corresponding to 9.79% and 10.51% larger than those of SVM and LDM. This finding indicates that the spatial texture features extracted by AMF has good edge retention characteristics with the appropriate height of the manifold tree, and are effective for enhancing the classification performance of SVM and LDM.  for Indian Pines, Salinas Valley and Kennedy Space Center, respectively. Approximatively, the OA values of DTIC-LDM were 18.85%, 10.53% and 12.14% huger than that of LDM for the three datasets. Thus, the spatial correlation feature extracted by DTNCF or DTICF in this work is efficient for enhancing the hyperspectral classification.  more effective than single spatial feature to improve HSI classification.
Seventh, the OA values of AMDTNC-SVM and AMSCF-NL in Indian Pines are 0.49% and 0.54% larger than that of AMDTIC-SVM and AMDTIC-LDM. Similarly, the OA values of AMDTNC-SVM and AMSCF-NL in Salinas Valley are 0.66% and 0.15% greater than that of AMDTIC-SVM and AMDTIC-LDM. However, the OA values of AMDTNC-SVM and AMSCF-NL in Kennedy Space Center are 0.45% and 0.95% lower than that of AMDTIC-SVM and AMDTIC-LDM. It can be concluded that DTICF and DTNCF have different adaptability in HSI datasets, and the AMSCF algorithm can extract the optimal spatial correlation features and fuse them with AMF, demonstrating the adaptive classification performance with LDM in different HSI datasets.
Last but not least, the OA values of AMSCF in Indian Pines, Salinas Valley, and Kennedy Space Center are 5.51%, 0.84% and 3.83% larger than that of LDM-FL, respectively. Therefore, the spatial texture feature and spatial correlation feature extracted by AMF and DTNCF or DTICF in this paper are better for increasing the performance of LDM than that of RF. In addition, the OA values of AMSCF in Indian Pines, Salinas Valley, and Kennedy Space Center are 2.18%, 3.75% and 4.39% higher than that of GFDN, proving that AMSCF also has better performance than the classification methods with the certain deep learning.
To demonstrate the effect of training ratios on the classification, which had been used to test the different values in the three datasets, as shown in Fig.10. This figure indicates that the OA value of the proposed method can reach 96.38% if the training sample is 3% of the Indian Pines dataset. Also, the OA value can be enhanced to 98.31% when the ratio increases by 6%. In addition, if the training sample ratio of the Salinas Valley dataset varies from 0.1% to 0.2%, the OA value increases from 83.23% to 94.28%. It can achieve 99% when the ratio increases to 7%. Moreover, the OA value of the proposed method can be over 94% if the training sample is 2% of the Kennedy Space Center dataset. Also, the OA value can reach 99.05% when the ratio increases by 5%. Thus, the proposed method provided the best classification performance, with the stability of AMSCF of the different training ratios for HSI datasets.

IV. CONCLUSION
This paper combined two kinds of spatial features for LDM classification, and proposed a hyperspectral image classification method namely AMSCF. First, after splatting blurring and slicing, AMF with an appropriate height of manifold can extract better local and global spatial features and obtain better spatial texture features of HSI to improve the classification. Second, the spatial correlation features extracted by DTNCF and DTICF are effective to compensate for the deficiency of AMF in HSI classification. Last, the two spatial features were linearly fused for LDM classification, respectively. To verify the superior performance of AMSCF, three hyperspectral image datasets were tested. Although the training sample ratios are only 4%, 0.7%, and 5% of Indian Pines dataset, Salinas Valley dataset and Kennedy Space Center dataset, the OA values can reach 97.88%, 99.34% and 99.05%, respectively. It can be concluded that the proposed method can be greatly enhanced compared with other methods. The first advantage of this method is that the superior spatial texture feature obtained by AMF at the appropriate height of manifold, and local and global optimization capabilities can effectively assist the classification. The second benefit is that spatial correlation features extracted by DTICF and DTNCF can be adapted to different HSI datasets to attain VOLUME 8, 2020 the optimal spatial correlation feature and compensate the deficiency of AMF, thus effectively improving LDM classification for HSI. However, the disadvantage of the proposed method is that AMSCF needs too many parameters and cannot automatically adapt to all datasets. Therefore, our future work will focus on how to automatically determine the optimal parameters of AMSCF.