Invariant Attribute Profiles: A Spatial-Frequency Joint Feature Extractor for Hyperspectral Image Classification

So far, a large number of advanced techniques have been developed to enhance and extract the spatially semantic information in hyperspectral image processing and analysis. However, locally semantic change, such as scene composition, relative position between objects, spectral variability caused by illumination, atmospheric effects, and material mixture, has been less frequently investigated in modeling spatial information. Consequently, identifying the same materials from spatially different scenes or positions can be difficult. In this article, we propose a solution to address this issue by locally extracting invariant features from hyperspectral imagery (HSI) in both spatial and frequency domains, using a method called invariant attribute profiles (IAPs). IAPs extract the spatial invariant features by exploiting isotropic filter banks or convolutional kernels on HSI and spatial aggregation techniques (e.g., superpixel segmentation) in the Cartesian coordinate system. Furthermore, they model invariant behaviors (e.g., shift, rotation) by the means of a continuous histogram of oriented gradients constructed in a Fourier polar coordinate. This yields a combinatorial representation of spatial-frequency invariant features with application to HSI classification. Extensive experiments conducted on three promising hyperspectral data sets (Houston2013 and Houston2018) to demonstrate the superiority and effectiveness of the proposed IAP method in comparison with several state-of-the-art profile-related techniques. The codes will be available from the website: https://sites.google.com/view/danfeng-hong/data-code.


I. INTRODUCTION
L AND use and land cover (LULC) classification has been playing an increasingly vital role in the high-level image interpretation and analysis of remote sensing [1], [2].Owing to the rich spectral information, hyperspectral imagery (HSI) enables the identification and detection of the materials at a more accurate level, which has been proven to be effective for LULC-related tasks, such as HSI classification [3], [4], [5], [6], multi-modality data analysis [7], [8], [9], [10], and anomaly detection [11], [12], [13].In spite of the fine spectral discrepancies in HSI, the noisy pixels, manual labeling uncertainty, and the intrinsic or extrinsic spectral variability inevitably degrade the classification performance when only the spectral profile is considered as the feature input.Fortunately, except for the spectral dimension, the two-dimensional image space can provide the extra spatial information to correct the errors in a local region by linking with different objects and robustly eliminating the effects of spectral variability [14], [15], [16], [17] between the same materials.
In [19], mathematical morphology has shown its superiority in modeling and extracting the spatial information of an image related to the geometric shape and scale of different objects.Based on this concept, Pesaresi and Benediktsson [20] developed morphological profiles (MPs) to segment high resolution satellite imagery by applying a sequence of opening and closing operators to reconstruct or connect the targeted objects with a size-increasing structurized element (SE).The resulting morphological operator has been successfully extended and applied in the HSI classification [21], where the extended MPs (EMPs) are built on the first principal component of HSI obtained using principal component analysis (PCA) [22].The authors of [23] designed a novel strategy of jointly using SVMs and MPs for spatial-spectral hyperspectral classification.With the growing attention to MPs, a considerable volume of work related to HSI classification has frequently been reported in the literature [24], [25], [9].Nevertheless, the MPs' ability to extracting the diverse geometric features (e.g., textural, semantic) from hyperspectral images still remains  Due to the difference in composition between the Red and Bule patches -that is, the Blue patch holds more materials, there is a big difference in their APs, while the proposed IAPs are only slightly different for the two patches.Similarly, for the shift or rotation of the local scene, the IAPs are obviously more robust than the conventional APs.Given an HSI's patch, PCA is first performed on the patch to reduce its dimension to be 3. Then the APs can be extracted from the three PCs, respectively, and the final APs can be obtained by stacking all APs.The AF used in the APs is the region attribute and the code can be found in [18].Note that the APs or IAPs are extracted based on the whole image and the cropped patches are only visual examples and the corresponding histograms are only the profile representation of the centered pixels.
limited, since its concept has a few limitations: for example, the shape of SEs is fixed and SEs are not able to characterize information related to the gray-level or higher-level (e.g., HSI) characteristics of the regions.More specifically, rich spectral information in HSIs makes the geometric structure of hyperspectral regions or scenes more complex, leading to difficulties in representing and extracting MPs with the SEs in an appropriate way.
To reduce the limitations of MPs, morphological attribute profiles (APs) were developed in [26] by applying a set of attribute filters (AFs) [27], connected operators utilized to process an image by considering only its connected components, to integrate the attribute components of the given image at different levels.Moreover, APs are flexible tools since they can be of different types (e.g., they can be purely geometric, or related to the spectral values of the pixels or different characteristics such as spatial relations to other connected components).APs can be viewed as a generalized extension of MPs, yet APs are advantageous over MPs due to their flexibility in capturing a variety of region-based attributes (e.g., scale, geometry, size).For example, APs allow geometrical characterization to be extracted hierarchically [28], thereby yielding a more effective analysis in remote sensing images [29], [30], [31].
It is well known, however, that the connectivity between these defined AFs relies heavily on the geodesic reconstruction.This might lead to a problem of so-called leakage, also known as over-reconstruction [18], where multiple regions corresponding to different objects could be merged into a single region due to improper linking.To alleviate the problem, many advanced models were proposed by using tree-based image representations to construct the AFs or APs by the means of non-redundant representations, automatic strategies, and optimization techniques [32], [33], [34], [35], [36].Recently, a feasible solution to address the leakage problem was proposed in [37] in which the AFs were partially reconstructed in order to model and extract the spatially geometrical information to a particular specification.Another representative AFs-based method to automatically and precisely extract the spatial and contextual information is extinction profiles (EPs) [38] built based on extinction filters (EFs).EPs are less sensitive to changeable image resolution, as the used EFs are determined by extrema rather than a threshold given manually in the AP.Thus, EPs have shown their effectiveness in reducing the redundant details and preserving the discriminative geometrical features, making them more useful for the classification of remote sensing data [39], [40], [41].Similarly, the applications of APs and EPs in HSI classification are called as extended attribute profiles (EAPs) and extended extinction profiles (EEPs), respectively.Yet the APs and its variants hardly consider and investigate semantic variations.For example, two visually similar patches from a particular scene, e.g., Patch (Red) and Patch (Blue), Patch (Green) and Patch (Orange) in Fig. 1, should share basically identical feature representation.However, in reality, 1) on one hand, due to the shift and rotation of pixels (objects), the extracted features inevitably suffer from substantial differences between the two similar batches: Green and Orange (see the histograms of Fig. 1 in the lower left-hand corner).More specifically, the centered  pixels for the two patches stand for the same material: trees, but the locally semantic change (e.g., rotation) makes the extracted APs largely different between two patches.This might be explained by the fact that these previous APs-based methods fail to robustly merge the structure information of surrounding pixels, tending to absorb "negative" or "easy-changing" characteristics that are sensitive to a semantic change; 2) On the other hand, despite presenting similar semantic characteristics, the slightly different components (e.g., some objects added or missed), the changed arrangement in the location and order of the objects, or spectral variability caused by illumination, topology change, atmospheric effects, and intrinsic mixing of the materials) would lead to substantial differences in the APs of the two similar patches: Red and Blue (see the histograms of Fig. 1 in the upper left-hand corner).Moreover, for Patch (Red) and Patch (Blue) that are expected to be identified as the same material from the centered pixel perspective, there is a big difference between their APs, due to the moderately semantic change (other materials involved, e.g., trees) in Patch (Blue) compared to Patch (Red).This indicates that the APs are relatively sensitive to semantic change.Limited by the two factors described above, the resulting APs might be vulnerable and sensitive to semantic change in a local region, tending to further enlarge the negative effects on follow-up feature matching or classifier learning.Although some researchers from the remote sensing community have attempted to investigate the invariant feature representation [42] by integrating the AFs and geometric invariant moments (GIM) [43], the poor discriminative ability of GIM limits the classification performance of the HSI to a great extent.
To overcome the challenges and pitfalls of those previouslyproposed AP approaches, we propose to extract the invariant attributes (IAs) that can be robust against the semantic change in a hyperspectral scene by empowering the invariance to the AFs, yielding the proposed invariant attribute profiles (IAPs).Direct evidence is shown on the right hand side of Fig. 1, where there are only slight perturbations between the extracted features of the two-paired patches.The IAPs consist of two parts: spatial invariant features (SIFs) and frequency invariant features (FIFs).The former are constructed in the Cartesian coordinates, where the isotropic filter banks or convolutional kernels are first exploited for HSI and the over-segmentation techniques are further used to spatially aggregate the filtered features.The latter convert discrete APs to continuous profiles by modeling the variant behaviors of pixels or objects (e.g., shift, rotation) in a Fourier polar coordinate system.More specifically, our contributions can be highlighted as follows: • We propose a novel feature extractor for HSI, called invariant attribute profiles (IAPs).As the name suggests, these aim at extracting invariant feature representation by applying a sequence of well-designed AFs that are insensitive and even invariant to the change of pixels or materials in local regions.• The proposed approach is capable of modeling SIFs by isotropically filtering HSI in the first step.The filtered features can then be grouped into the semantically meaningful object-based representation.• To further improve the completeness and discrimination of the invariant features, the FIFs in this paper are also designed by extracting the continuous Fourier convolutional features in polar coordinates.• Two relatively new and challenging hyperspectral datasets are used to assess and compare the classification accuracies of state-of-the-art APs and our IAPs.Experimental results indicate that the classification performance of IAPs is superior to that of using the APs-based approaches, demonstrating the necessity and progressiveness of investigating and handling the issue of local semantic change in HSI classification.The rest of this paper is organized as follows.Section II presents the methodology for extracting IAPs and the workflow for HSI classification, focusing on the design of SIFs and FIFs.In Section III, we provide the experimental results and analysis as well as brief discussion of two hyperspectral datasets from both a qualitative and quantitative perspective.Finally, our main conclusions are summarized in Section IV.

II. METHODOLOGY
In this section, the proposed IAPs are first introduced from two different aspects: SIFs and FIFs.The holistic workflow with the designed IAPs as the input is then developed for the HSI classification.

A. Invariant Feature Extraction in the Spatial Domain
As shown in Fig. 1, the APs in spatial domain are sensitive to various factors that can give rise to semantic change, further leading to unexpected degradation in modeling spatial information.One feasible solution for this problem is to find and extract the invariant feature representation from the image space.It is well known that isotropic filtering (or convolution) [44] is a good tool that performs robustly against shift or rotation behavior of image patches and can simultaneously eliminate various other variabilities (e.g., salt-pepper noise, missing local information) effectively.Therefore, the robust convolutional features (RCF) can be extracted from the HSI in the form of feature set Ω(•) where the features of the k-th band (F k ) can be computed by I ∈ R W ×H×D is the HSI with D bands by W × H pixels, where I k denotes the k-th band of I; K conv is defined as the convolutional kernel for isotropically aggregating the local spatial information, and the operator ⊗ denotes the convolutional operation.Further, we combine the filtered or convolutional features (F RCF ) with spatial aggregation (SA) techniques, such as super-pixel segmentation, to enhance the object-based semantic features, i.e., edges, shape, and the invariance of these features.In our case, we select a popular super-pixel segmentation approach: simple linear iterative clustering (SLIC) [45].More specifically, the SA step is implemented on the basis of RCF; thus the pixel-wise (e.g., i-th pixel) SIFs can be represented as where N q is the number of pixels in the q-th super-pixel, while φ i,q is defined as a pixel set (q-th super-pixel) including the i-th targeted pixel.The final SIFs are simply stacked as where N = W × H denotes the number of pixels in a given hyperspectral scene.It should be noted that in the original SLIC algorithm, the superpixels are segmented on the CIELAB color space rather than RGB space, since the differences or changes between pixels (or materials) in the CIELAB space can be perceived or captured more easily.
From the perspective of feature level, CIELAB should hold a more discriminative feature representation compared to the RGB space.Naturally, the HSI is capable of better identifying the pixels due to the richer and more discriminative spectral information.However, considering the spectral redundancy in HSI, we first applied the PCA on the whole HSI to reduce the spectral dimension and simultaneously preserve the spectral information as much as possible, then performed the SLIC on the first three principal components (PCs).
The aforementioned two steps yield the SIFs of the IAPs, as illustrated in Fig. 2.

B. Invariant Feature Extraction in the Frequency Domain
Although SIFs are insensitive to the semantic changes in a local neighborhood, it still fails to accurately describe the shift or rotation behavior of an image (or an object) due to the quantization artifacts in a discrete coordinate system [46].Instead of locally estimating the discrete coordinates with the pose normalization, i.e., histogram of oriented gradients (HOG) [47], a Fourier-based analysis technique has guaranteed the invariance of feature extraction in a polar coordinate system [46], [48], [49].More significantly, the features only extracted from the spatial domain are relatively limited in representation ability and diversity, particularly for the complex hyperspectral scene, e.g., including various spectral variabilities.Therefore, the benefits of feature extraction in the frequency domain are, on the one hand, to enrich the diversity of the features, thereby further improving the performance of HSI classification; and, on the other hand, to be capable of effectively modeling the invariant behaviors by the the means of continuous signal representation, yielding robust feature extraction against a variety of semantic change in HSI.
For example, the rotation behavior of an object or local image patch can not be well modeled by a discrete histogram, but can be explained by circular shift by continuous and smooth signal representation.For this reason, the continuous Fourier transformation has been proven to be an effective tool to model the rotation behavior with any angles, e.g., either integer or non-integer, more accurately [46].Fig. 3 illustrates the two strategies for extracting feature descriptors for the traditional discrete HOG and continuous Fourier-based HOG.In detail, the differences from the pixel-wise feature design to the spatially regional feature aggregation are clarified by the two approaches, respectively.Another can be seen in Fig. 3, where there is a relatively obvious change between two image regions or patches in the form of a discrete histogram (e.g., HOG), due to the 10 • rotation behavior.Conversely, the Fourier-based HOG descriptor is robust to the big semantic gap when the image scene changes significantly, which only yields a slight shift along the horizontal coordinate and basically keeps the feature shape unchanged.In the following, we will detail the procedure for extracting the Fourier-based continuous HOG from the pixel-wise features to the regionbased representation.
1) Analysis of Rotation-invariance: As the name suggests, rotation-invariance refers to the extracted features f (x, y) for a given pixel located in the (x, y) of the image remaining the same or unchanged when the image rotates with a g o angle.
Similar to [49], the rotation behavior can be formulated as gf (x, y) = f (x, y), or h(I(x, y) • T g ) = h(I(x, y)), (5) where h(I) is abstracted as a feature extractor from the input image I, and T g represents an operation of coordinate transformation.For those pixels that do not change the locations, the invariant features can be simply deduced by Eq. ( 5) as gf (x, y) = f (x, y).For other pixels whose locations are changed, the coordinate transformation T g has to be considered to formulate the rotation-invariance [49] as We then have the equivalent condition gf = f • T g , which has been proven and widely used in many works [50], [51], [52].
2) Pixel-wise Fourier Features: Let a 2-D location of any one pixel in a given image I be (x, y) in Cartesian coordinates or ( D(x, y) , θ(D(x, y))) in polar coordinates, where D(x, y) and θ(D(x, y)) are defined as the magnitude and the phase information of a complex number: D(x, y) = dx + dyi, and dx and dy are denoted as the horizontal and vertical gradients, respectively, in the location of (x, y) of I in Cartesian coordinates.Thus, given an input pixel p, its mth order Fourier representation, denoted as F m (x, y), can be obtained by where h(ϕ) is an orientation distribution function with respect to the angle ϕ in the pixel p.This function can be given by an impulse response function δ with integral D(x, y) : Unlike the Fourier transformation in Cartesian coordinates, the polarized Fourier transformation has been proven to be effective for separating the angular information (e −imθ(D(x,y)) ) and radial basis from the Fourier representations [46].As shown in Eq. ( 7), the rotated angle in Cartesian coordinates can be equivalently explained by a shift behavior in the polar representation.
As shown in Eq. ( 7), one natural and intuitive way to eliminate the effects of the image rotation in feature extraction is to fit the angular information.It is, however, hardly possible to directly estimate the phase information (θ(D(x, y))), due to its complexity and uncertainty.Alternatively, we may enforce the Fourier order m to be zero, thereby achieving the same goal for removing the phase information.When a g o rotation is applied on F m (x, y), accordingly to the mathematical property of Fourier transformation in Polar coordinates [46], we then deduce the rotated Fourier representation gF m (x, y) as Owing to the self-steerability of the Fourier basis under polar coordinates [53] -that is, such a basis can be self-steered to any orientation -we can construct a sequence of Fourier base with self-adaptive rotation angles.According to the special property, the rotated Fourier representation in Eq. ( 9) can be multiplied or convoluted by another Fourier basis with the same rotation behavior (e.g., a g o angle): where * and • operators denote the convolution and multiplication behaviors, respectively.In order to make the extracted features rotation-invariant, we have to meet the condition of Eq. ( 6): that is, gf (x, y) = f (x, y)•T g .It is, therefore, natural to have a solution of rotation-invariance as long as m+m = 0 is satisfied.Note that the convolutional representation of Eq. ( 10) is regarded as the final rotation-invariant output.Supported by the above analysis and derivation of rotationinvariance in theory, we further detail the procedures to extract the rotation-invariant features from the images in practice.This process consists of three parts, as follows: • Part 1: magnitude.By applying the polarized Fourier transformation on each pixel of the input image, we can obtain the m magnitude features corresponding to m different Fourier orders, which can be formulated for each pixel as F 1 m (x, y) = D m (x, y) , m = 0, 1, . . ., m. • Part 2: absolute rotation-invariant features.The rotation behavior can be compensated by completely removing the phase information by using Eq. ( 10) and arranging m + m = 0.Then, the features of part 2 can be represented as , and m = −m .• Part 3: relative rotation-invariant features.To reduce the loss of rich phase information, the relative rotationinvariant features are developed by coupling the adjunct convolutional Fourier representations obtained with two neighbouring convolutional kernel-radii [54].This process can be performed by which is subject to m = m .The terms r 1 and r 2 are the radii of two different convolutional kernels, and the symbol F is defined as the complex conjugate.Combined with the three parts, the final pixel-wise Fourier features (PWFF) can be written as By collecting all PWFF, we then have 3) Regional Descriptors: Analogous to the HOG blocks, whcih aim to describe object-based contextual information, we aggregate the PWFF into region-based descriptors with the use of isotropically triangular convolutional kernels.In our case, multi-scaled convolutional kernels are used to capture the semantic information of different receptive fields.The scaling setting will be discussed in the experimental section.Finally, the resulting FIFs are where F Cj P W F F denotes the regional descriptors with the j-th convolutional kernel.

C. Invariant Attribute Profiles (IAPs)
For our proposed IAPs, the fusion strategy of SIFs and FIFs is nothing but a stacking operation.Despite its simplicity, this fusion strategy has been widely and successfully applied in various feature extraction tasks.Fig. 4 illustrates the step-wise workflow of extracting SIFs and FIFs, which can be delineated more specifically as follows.
• Step 1: Group the HSI and compute its gradients.Due to the redundancy of HSI, some adjunct bands might share similar invariance.To address that circumstance, we first group the HSI, e.g., using clustering techniques (k-means algorithm in our case), then compute the horizontal and vertical gradients for each group by means of the maximum magnitude response (see the first step in Fig. 4).Note that the final classification performance is sensitive to the number of the grouped HSIs, that is, excessively large or small ones would make the information redundant or coupled across bands.Therefore, a proper parameter setting is needed.In our case, the number of the grouped HSIs can be effectively determined by crossvalidation on the labeled training set.• Step 2: Extract polarized Fourier features by gradients.
The Fourier complex form can be generated by gradients (see the subsection entitled Pixel-wise Fourier Features).Thus, the polarized Fourier representation given in Eq. ( 7) can be obtained by applying Fourier transform to the complex number.• Step 3: Construct the regional descriptors on both spatial and frequency domains.We model the region-based representation by using isotropically spatial filters and Fourier convolutional kernels.• Step 4: Generate the proposed IAPs.The spatial aggregation technique, i.e., superpixel segmentation, is used to generate the SIFs.In the frequency domain, there are two parts in the FIFs: magnitude and order fitting.The latter consists of absolute rotation-invariant features and relative rotation-invariant features.In that way, the IAPs can be abstracted as

D. IAPs-based HSI Classification Framework
With the proposed IAPs, we intend to develop an automatic HSI classification system.For this purpose, an IAPs-based HSI classification framework is designed.As shown in Fig. 5, the framework is mainly composed of HSI grouping, IAPs extraction, feature stacking, and feature learning (FL) (or dimensionality reduction (DR)) [55].For more details, the HSI is grouped in a band-wise way, while the feature learning step is simply and effectively conducted by PCA, in our case.In the experiments below, we found that the FL or refinement step plays a significant role in improving the classification performance.This could be reasonably explained by the redundant and noisy concentrated features.

E. Feasibility and Effectiveness Analysis of the Proposed IAPs for HSI Classification
Different from pixel-wise semantic labeling of natural images, an effective and accurate HSI classification algorithm usually depends on jointly modeling spatial and spectral information.Owing to the effective spatial structure (or pattern) modeling, the MPs (or APs) and their variants have been widely and successfully applied for HSI classification.With this motivation, the proposed IAPs similarly consider the spatially structural information in the form of semantic patches or objects with spatial aggregation strategy (e.g., superpixel), yielding SIFs.In [56], the SIFs can be regarded as the lowlevel shift-invariant feature representation, which has been theoretically proven to be effective for robustly addressing variability within the same class.On the other hand, the IAPs model the irregular textural information from the frequency domain, yielding FIFs, in order to further improve the feature discrimination.Similarly, there is also a good theoretical support for the extracted FIFs in [46], [48], demonstrating its effectiveness in extracting the invariant features from optical remote sensing images for classification and detection tasks.The compact combination of the two features with their invariant attributes makes it possible for the proposed IAPs to classify the HSI more robustly and accurately.More notably, unlike those previously-proposed MPs and APs that are only performed on first few components obtained by PCA, our IAPs are not only capable of extracting spatial semantic features with spatial aggregation techniques and convolutional (or filtering) operators, but also fully considering the spectral information by the means of grouping strategy, maximum magnitude response, and FL or DR after feature extraction instead of DR before feature extraction (e.g., using PCA).

A. Data Description
We quantitatively and qualitatively evaluate the algorithm performance on three representative and promising HSI datasets: Pavia University, Houston2013, and Houston2018, in the form of image classification.These two datasets are briefly introduced as follows.1) Pavia University Dataset: The hyperspectral scene was acquired by the ROSIS sensor over the campus of Pavia University, Paiva, Italy.The image consists of 610 × 340 pixels at a ground sampling distance (GSD) of 1.3m with 103 spectral bands in the range of 430nm to 860nm.In this dataset, nine main categories are investigated for land cover classification task.The number of training and test samples are specifically listed in Table I, while the corresponding sample distribution is given in Fig. 6.
2) Houston2013 Dataset: The ITRES CASI-1500 sensor was used to acquire the data over the campus of University of Houston and its surrounding areas, in Houston, Texas, USA.This dataset was provided for the 2013 IEEE GRSS data fusion contest, and is composed of 349 × 1905 pixels with 144 spectral channels ranging from 364nm to 1046nm at a spectral sampling of 10nm.There are 15 challenging classes in the form of LULC in the scene.Table II lists the scene categories and the number of training and test samples used in the classification task, while Fig. 7 visualizes the false-color image of the hyperspectral scene and the sample distribution of the training and test set.
3) Houston2018 Dataset: The dataset is an airborne multimodal data product, where the hyperspectral data was acquired by the same IRTES CASI-1500 sensor at a GSD of 1m.This data has become available from the 2018 IEEE GRSS data In the second dataset, we artificially chose a number of challenging pixels to act as the training samples with multiple different experiments on the given ground truth (see [57] for more details).The benefits of the strategy to separate the training and test sets are two-fold.On one hand, unlike random selection for training and test set that usually yields a very high performance, such a challenging setting may help us assess the potential and performance of extracted features more effectively.On the other hand, using the fixed training and test samples also contributes to making a consistent fair comparison by reproducing the experimental results using different methods.

B. Experimental Setup 1) Evaluation Metrics:
Pixel-wise classification is explored as a potential application for quantitatively evaluating the performance of these feature extraction algorithms.Three commonly-used criteria, Overall Accuracy (OA), Average Accuracy (AA), and Kappa Coefficient (κ), are adopted to intuitively quantify the experimental results using two simple classifiers: nearest neighbor (NN) and random forest (RF).The two classifiers have been widely used in many works related to HSI classification [58], [59].Please note that the reason for selecting the two classifiers in our case is to emphasize the performance gain from the features rather than from the complex and advanced classifiers.
2   Therefore, the spatial-spectral features are extracted from the whole HSI.To demonstrate the effectiveness and superiority of IAPs, the baseline (original spectral features (OSF)) and other six advanced MPs or APs are compared with the proposed method.Specifically, they are EMPs, EAPs a , EAPs s , EAPs i , EAPs all , and EEPs, where a, s, i, and all denote the attributes for the area of the regions, the standard deviation in the regions, moment of inertia, and the stack of the previous three attributes, respectively.
3) Algorithm Configuration: As a matter of routine, PCA was performed on the original HSI to obtain several PCs (the first three PCs in our case) before applying opening and closing operations in MPs or thinning and thickening ones in APs.Moreover, the parameters for MPs and each attribute of APs are set by following the studies in [18], specifically: • MPs: the scales are 10, that is, there were ten openings and closings; • AP a : λ a = [100, 500, 1000, 2000, 3000, . . ., 7000, 8000]; • AP s : λ s = [0.1,0.5, 1, 2, 3, 4, 5, 6, 7, 8]; • AP i : λ i = [0.1,0.15, 0.2, 0.25, 0.3, . . ., 0.5, 0.55].For the EP, only one parameter, called desired levels (dl), needs to be given, due to its automatic process.To extract the EPs in our case, we fully follow the same setup steps suggested in [38]; the dl is assigned to be 7.More details can be found in [38] and [60].
Furthermore, the parameters in our IAPs consist of the number of scaled convolution kernels (n s ) and their radii of convolution kernels in spatially isotropic filtering (r), the number of Fourier orders (m), and the dimension in FL or DR (d) as well as the number of the grouped HSIs (n g ).In practical applications, these parameters can be determined by cross-validation on the available training set in order to achieve an automatic system for HSI feature extraction and classification.More specifically, the parameter combination is set to   for different compared methods using NN and RF classifiers are given in Fig. 6 and Table IV, respectively.
In detail, the classification accuracies using extracted features are much higher than that of using only OSF (at least 7% increase) for both classifiers.For example, the EMPs can effectively represent the structure information of objects in the HSI, yielding competitive classification performance even better than many single-attribute EAPs, such as EAPs s , EAPs i .By focusing on HSI's spatial information, the EAP a method is able to excavate the regional features, leading to a more smooth classification result (see Fig. 6).As expected, collecting multiple attributes into a stacked vector representation (EAPs all ) achieves a great improvement in classification performance (around 5% in terms of OAs) compared to the single-attribute one.However, the use of extinction filters makes another breakthrough on the basis of AF-based profiles, with an additional 2% increase in terms of OAs using the two classifiers (EEPs versus EAPs all ).
Although the above-mentioned methods have shown superiority in spatial information modeling and further obtained excellent performance on the pixel-wise classification task, yet their ability in addressing various spectral variabilities and semantic changes of local scene or objects is relatively weak.In contrast, the proposed IAP method is capable of extracting the intrinsic invariant feature representation from the HSI, achieving a more effective feature extraction.As shown in Fig. 6, there are less noisy points in the classification maps obtained by our method.In particular, the Meadows class located in the bottom of the studied scene shows a more reasonable result, which is approaching to the manually-labeled ground truth.Similarly, the IAP approach dramatically outperforms the others with a quantitative comparison in Table IV.
2) Houston2013 Dataset: Fig. 7 visualizes the classification maps of different feature extraction methods using two classifiers on the Houston2013 dataset, while Table V quantifies the corresponding experimental results in terms of OA, AA, and κ as well as the classification accuracy for each class.
Generally speaking, the classification performance with the feature extraction step is obviously superior to that with the OSF (Baseline), which is consistent for both NN and RF classifiers.Although the EMPs yield a similar performance compared to the EAPs with a single attribute (e.g., EAPs a , EAPs s , and EAPs i ) using the two classifiers, it is worth noting that those single-attribute EAPs usually hold a lower feature dimension.This indicates, to some extent, that EAPs would have the advantages over EMPs in extracting the spatial information of the image, since EAPs-based methods could make it possible to model more geometrical features (e.g., size, shape, texture).As expected, stacking all single EAPs is of great benefit in finding a more discriminative feature representation; thus the resulting EAPs all performs better at classifying each material than any single EAP or EMP.With the use of extinction filters, the developed EEPs outperform the aforementioned methods.Particularly when using the RF classifier, there is an obvious improvement in classification accuracy, improving the OAs by approximately 2% and 5%, over the EAPs all and EMPs, respectively.
Remarkably, our proposed IAP method not only outperforms other feature extraction operators overall in terms of OA, AA, and κ, but also it achieves the desirable results for each class, especially for challenging classes like Commercial, Highway, and Parking Lot1.For these classes, the IAPs make a significant performance improvement, at an increase of at least 10% accuracy.We can also observe from Fig. 7 that the classification maps obtained by the proposed feature extractor are more smooth in the regions sharing the same materials and sharper on the edges between different materials.Furthermore, there are some object deformations (e.g., shift, rotation) in the stadium and its surroundings located in the middle of the studied HSI.Owing to the robustness in local semantic changes of the scene, the IAPs obtain more accurate classification maps in the area in and around the stadium.
3) Houston2018 Dataset: For the Houston2018 dataset, we make a similar visual comparison for eight compared approaches, as given in Fig. 8; the specific numerical results are detailed in Table VI.
As observed in Table VI, the same basic trend appears when using NN and RF for all candidates: that is, the classification accuracies using RF are higher than those using NN.More specifically, with the original spectral features as the input, the baseline only holds 60.99% and 65.85% results using NN and RF classifiers, respectively, due to the lack of spatial information modeling.In MPs, the spatial information is considered in the form of openings and closings, yet the EMPs yield a relatively poor classification performance.This might result from some intrinsic limitations of MPs.For example, the MPs are not capable of adaptively making an effective connection between different scaled objects, further limiting their ability to characterize the semantic structural information.Despite the use of mathematical morphological attribute filers that are able to generate a richer geometrical description, EAPrelated approaches fail to improve classification performance dramatically, since there are more challenging categories and more complex semantic variations in the studied scene.Nevertheless, the ability of EAPs all to fuse the different attributes still contributes to its power in spatial information extraction, yielding an increase in accuracy of around 5% using the RF classifier, compared to the others previously mentioned.Unlike EMPs and EAPs, EEPs utilize a sequence of extrema-oriented connected filters to extract the extinction values as the features.Due to these constructed features, the classification accuracy of EEPs increases by approximately 5% using NN, in spite of a slight decrease in using RF (about 2 percentage points less than EAPs all ).
Beyond those algorithms developed based on MPs and APs, we propose to model the invariant features from both the spatial and frequency domains, which has been theoretically proven to be robust against the semantic change caused by various image deformations (e.g., shift, rotation, sensor noises, or distortions).Therefore, the resulting IAPs outperform the previously-proposed MPs or APs methods, showing the best overall performance and the highly competitive classification results for each class, as shown in Table VI.Fig. 8 also highlights the superiority of the proposed IAP method by means of classification maps.Generally speaking, our method tends to lead to more smooth classification maps: that is, the IAPs aggregate the same materials more easily while separating the different materials.Our IAPs also remove the effects of hot pixels like salt-and-pepper noise from classification maps effectively and simultaneously preserve the semantically meaningful structure or objects, leading to an increase of about 20% in classification accuracy.This is particularly the case for the class of Non-Residential Buildings that constitute large-scale coverage of the whole scene.
From the experimental results on the different datasets, we can observe an interesting and meaningful phenomenon that the proposed IAPs are more apt to recognize objects or materials with a regular structure (or shape), such as Road, Residential, and Commercial, while for those irregular classes (e.g., grass and tree) that seem to hold more disorderly textual features, the extracted IAPs might not be so discriminative to identify these materials, limiting the gain in the final classification performance to some extent.

D. Parameter Sensitivity Analysis
The effectiveness of a feature extractor largely depends on parameter selection.It is therefore indispensable to discuss the sensitivity of parameters involved in the proposed IAP.These parameters include the number of scaled convolution kernels (n s ) and the corresponding radii of these convolution kernels (r) in spatially isotropic filtering, the number of Fourier orders (m) in the process of extracting frequency features, and the reduced dimension (d) in FL or DR.Among them, • the search range of n s is constrained from 1 to 5; • the r can be investigated through the combination of [1,2,3], [2,4,6], [4,6,8], and [6, 8, 10]1 ; • the order m can be selected from the different set, i.e., [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], . . ., [0, 1, 2, . . ., 9, 10]; • the feature dimension after FL or DR is determined ranging from 10 to the dimension of the original IAPs (e.g., 300 for the Pavia University, 390 for the Houston2013 and 180 for the Houston2018) at a 10 interval.We experimentally analyze the effects of different parameters for the overall classification accuracy (OA in our case) with two classifiers.The resulting quantitative results are shown in Fig. 9 for the three datasets.
It is evident from Fig. 9 that the optimal parameter combinations are n s = 3, r = .Moreover, we also discovered an interesting phenomenon: classification performance is insensitive to the parameters of n s and r on the Houston2018 datasets, while for the Pavia University and Houston2013 they have a moderate effect on OAs.Notably, the Fourier orders (m) are associated with classification results, but a progressively increased m starts with a sharp performance decrease and then becomes stable with a relatively poor result.This reveals that m is a noteworthy and important parameter during the process of extracting IAPs.The graph showing results for the parameter d in Fig. 9 demonstrates that the extracted IAPs are, to some extent, redundant and can be further improved in feature discriminative ability by means of FL or DR, as the OAs start to quickly reach a certain optimal value (e.g., 30 for the Pavia University dataset, 30 for the Houston2013 dataset, and 40 for the Houston2018 dataset), then hold basically unchanged over a period of the time, and finally gradually diminish for the robust RF classifier.For the NN classifier, the OAs dramatically decrease when inputting the IAPs with a higher dimension.This could be result of the curse of dimensionality.Effects of DR or FL in Feature Extraction: For a fair comparison, we also investigate the effects of a DR technique on all studied methods, as listed in Table VII.
There is an interesting regular pattern in Table VII.Generally, the classification performance using the dimensionreduced features is obviously superior to that using original features (without DR), e.g., Baseline and our IAP method.It should be noted that, however, those EsP-based approaches with a DR technique usually perform worse than their previous versions without DR.A possible reason is the use of a DR technique (e.g., PCA) before feature extraction.If the DR technique is applied again after feature extraction, then this might suffer from the effects of reuse, leading to the limitation and even degradation in the final classification performance.

E. Ablation Studies
In addition, we investigate and analyze the performance improvement of our IAP method by step-wise adding the different components, since the IAPs are involved with multiple components, i.e., SIFs, FIFs, and FL or DR.Table VIII shows an increasing performance gain as the different techniques are gradually fused.We found that successively adding each component into the proposed IAPs led to a progressive enhancement in feature representation ability, yielding a higher classification result for the HSI.This also indicates the reasonableness and advancement of the proposed IAPs-based HSI classification framework we designed (see the Section II.D and Fig. 5).
The key points from the ablation analysis can be summarized as follows: 1) There is a great improvement in classification performance after applying the FL or DR on the original IAPs, from which we conclude that the FL or DR step plays a significant role in our proposed IAPs-based HSI classification framework.This might be explained by the fact that directly stacking the OSF, SIFs, and FIFs as a final input would hurt classification performance, since they come from different feature spaces.Moreover, this is also due to the reason that FL or DR can address the curse of dimensionality (the lack of balance between the number of features and training samples) as well as the existence of high redundancy among IAPs.2) Although the results with SIFs are superior to those with FIFs alone, performance is still limited without FIFs, showing the power of FIFs in enriching the diversity and robustness of the extracted features, particularly when they are jointly used, i.e., IAPs (OSF+SIFs+FIFs).3) Using FIFs individually would yield poor results, which indicates that frequency information alone is not sufficiently discriminative to identify a variety of materials.

F. Computational Cost in the Proposed IAP
All experiments in this paper were implemented with Matlab 2016a on a Windows 10 operation system and conducted on Intel Core i7-6700HQ 2.60GHZ laptop with 16GB memory.With the setting, the running time of the proposed IAP is given in Table VIII.Overall, the running time is acceptable in practice and it shows an approximately linear increase as the size of the image increases.

IV. CONCLUSION
To effectively improve the robustness of a feature extractor in modeling spatial information of HSI, we propose a novel AP-like feature descriptor called IAP that designs the invariant APs in both spatial and frequency domains.The proposed IAP method aims at overcoming the shortcomings of previous MPs or APs, in which extracted spatial features are apt to be degraded, leading to a large difference between the same materials, or confusion with other different materials, due to the local semantic change in a hyperspectral scene.Combined with the sound theory in modeling our proposed SIFs and FIFs, the resulting IAPs have demonstrated their potential and superiority in the HSI classification task.In the future, we will further extend our model to a supervised or semi-supervised version in an end-to-end fashion (e.g., deep learning) to conceive invariant feature extraction and classification as a whole.
Materials Added or Missed) Shift or Rotation of Objects, Relative Location Change False-color Image of Hyperspectral Scene

Fig. 1 .
Fig.1.Illustration clarifying the motivation of the proposed IAPs.Due to the difference in composition between the Red and Bule patches -that is, the Blue patch holds more materials, there is a big difference in their APs, while the proposed IAPs are only slightly different for the two patches.Similarly, for the shift or rotation of the local scene, the IAPs are obviously more robust than the conventional APs.Given an HSI's patch, PCA is first performed on the patch to reduce its dimension to be 3. Then the APs can be extracted from the three PCs, respectively, and the final APs can be obtained by stacking all APs.The AF used in the APs is the region attribute and the code can be found in[18].Note that the APs or IAPs are extracted based on the whole image and the cropped patches are only visual examples and the corresponding histograms are only the profile representation of the centered pixels.

Fig. 2 .
Fig. 2. Example illustrating the two-step extraction process of SIFs, where SLIC means simple linear iterative clustering.

Fig. 3 .
Fig.3.Illustration clarifying the advantage of continuous HOG, represented by the polarized Fourier analysis from pixel-wise features to regional descriptors over the discrete HOG encoded by gradient binning (cells) and regional aggregation (blocks) in the real image regions before and after 10 • rotation.

Fig. 4 .
Fig. 4. A step-wise workflow to extract the proposed IAPs that consist of spatial invariant features (SIFs) and frequency invariant features (FIFs) of order fitting and magnitude maps.

Fig. 6 .
Fig. 6.Scene categories and visualization of false-color image, the distribution of training and test samples, and classification maps of different compared algorithms using two classifiers (top: NN, bottom: RF) on the Pavia University dataset.

Fig. 7 .
Fig. 7. Scene categories and visualization of false-color image, the distribution of training and test samples, and classification maps of different compared algorithms using two classifiers (top: NN, bottom: RF) on the Houston2013 dataset.

Fig. 8 .
Fig. 8. Scene categories and visualization of false-color image, the distribution of training and test samples, and classification maps of different compared algorithms using two classifiers (NN and RF) on the Houston2018 dataset.

Fig. 9 .
Fig. 9. Parameter sensitivity analysis of the proposed IAPs with respect to the number of scaled convolution kernels (ns) and their radii of these convolution kernels (r) in spatially isotropic filtering, the number of Fourier orders (m), and feature dimension (d) after FL or DR, respectively.(a) Analysis results of two classifiers (NN and RF) in terms of OAs on the Pavia University dataset; (b) Analysis results of two classifiers (NN and RF) in terms of OAs on the Houston2013 dataset; (c) Analysis results of two classifiers in terms of OAs on the Houston2018 dataset.

TABLE I SCENE
CATEGORIES OF THE PAVIA UNIVERSITY DATASET WITH THE NUMBER OF TRAINING AND TEST SAMPLES SHOWN FOR EACH CLASS.

TABLE III SCENE
CATEGORIES OF THE HOUSTON2018 DATASET, WITH THE NUMBER OF TRAINING AND TEST SAMPLES SHOWN FOR EACH CLASS.
fusion contest and its size is 601×2384 with 50 spectral bands sampling the wavelength of between 380nm to 1050nm at intervals of 10nm.The specific training and test information for the data is detailed in TableIII, and a false-color image and the locations of labeled samples of training and test set are also given in Fig.8.

TABLE IV QUANTITATIVE
CLASSIFICATION PERFORMANCE FOR EIGHT DIFFERENT ALGORITHMS ON THE PAVIA UNIVERSITY DATASET.
*The best is shown in bold.The feature dimensions are given in the parentheses after the method's name.

TABLE V QUANTITATIVE
CLASSIFICATION PERFORMANCE FOR EIGHT DIFFERENT ALGORITHMS ON THE HOUSTON2013 DATASET *The best is shown in bold.The feature dimensions are given in the parentheses after the method's name.C. Results and Discussion1) Pavia University Dataset: The classification maps and corresponding quantitative results in terms of OA, AA, and κ

TABLE VI QUANTITATIVE
CLASSIFICATION PERFORMANCE FOR EIGHT DIFFERENT ALGORITHMS ON THE HOUSTON2018 DATASET.
*The best is shown in bold.The feature dimensions are given in the parentheses after the method's name.

TABLE VII CLASSIFICATION
PERFORMANCE IN TERMS OF OA FOR DIFFERENT STUDIED METHODS WITH DR TECHNIQUES ON THE THREE DATASETS.NOTE THAT THE OPTIMAL DIMENSION FOR EACH METHOD IS SELECTED BY THE MEANS OF A CROSS-VALIDATION STRATEGY.mean the current component with and without being involved or considered in the HSI classification framework, respectively.

TABLE IX PROCESSING
TIME FOR THE PROPOSED IAP ON THE THREE DATASETS