Mixed-Noise Band Selection for Hyperspectral Images

Hyperspectral images (HSIs) with abundant spectral information are generally susceptible to various types of noise, such as Gaussian noise and stripe noise. Recently, a few quality-based selection algorithms have been proposed to remove noise bands from HSIs. However, these methods suffer from an inability to discriminate the mixed-noise bands of HSIs and are sensitive to image content variation and luminance changes. Here, we develop a mixed-noise band selection framework that can separate the Gaussian and stripe noise bands from HSIs effectively. We first improve tensor decomposition to reconstruct the mixed-noise components and low-rank components, which reduces the influence of image content and luminance changes. Spectral smoothness constraints and unidirectional total variation are incorporated into the decomposition model to enhance the separation performance for Gaussian and stripe noise. Then, different statistical features, including Weibull and histogram of oriented gradient (HOG) features, are applied to extract the robust parameters from mixed-noise components. More importantly, an extreme learning machine (ELM) is trained to predict the noise bands. The ELM has an extremely fast learning speed and tends to achieve better performance than other networks. Finally, by aggregating all these strategies, our methods can select the mixed-noise bands efficiently. The experimental results on both synthetic and real noise HSIs indicate that the proposed method outperforms the state-of-the-art methods.


I. INTRODUCTION
Hyperspectral images (HSIs) are widely applied in different fields, including hyperspectral classification [1], [2], object recognition, detection [3], [4] and superresolution [5], [6]. Some bands of HSIs are inevitably corrupted by various types of noise, degrading the image quality greatly [7], [8], which directly deteriorates the related processing of the HSIs. Therefore, it is crucial and useful to exploit band selection algorithms for HSIs. In the last few years, a number of noise band selection methods [9], [10] have been developed, which can be classified into two types: subjective and objective band selection. Since subjective judgment is time consuming and expensive, especially for multiband images, it is important to develop objective selection, which is the focus of this article.
Objective noise band selection methods employ quality metrics to remove the noise bands. According to the The associate editor coordinating the review of this manuscript and approving it for publication was Qingli Li . availability of a reference image, quality metrics can be divided into full-reference (FR) methods and no-reference (NR) methods. The former class of approaches [11]- [15], which employ reference images to assess the distorted data, have a high correlation with the human vision system. The algorithm proposed in [11] used the most popular structural similarity (SSIM) index, which combines luminance, contrast, and structural information to estimate images. In [12], Sheikh applied visual information fidelity (VIF) to quantify the loss of image information. In addition to these metrics, feature similarity (FSIM) [13] exploits a combination of phase congruency and gradients to describe the quality of an image. Wang et al. [14] advocated the use of non-negative matrix factorization (NMF) to measure image degradation. In [15], Zhu designed a simple multivariate SSIM (MvSSIM) index for HSIs that can be applied to assess spectral structural similarity via covariance among different bands. However, the reference images are unavailable in actual situations, strictly limiting the application domain of FR models. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Later algorithms [16]- [22] predict the image quality by extracting only information from the distorted images whose quality is being evaluated. These strategies also suffer from various drawbacks. For example, Saad et al. [16] computed discrete cosine transform (DCT) coefficients and built a generalized Gaussian distribution to indicate perceptual quality. In [17], Mittal proposed the blind/referenceless image spatial quality evaluation (BRISQUE) and natural image quality evaluation separately by employing the natural scene statistics (NSS) of local luminance values to quantify the distorted image quality. These models are highly sensitive to the image content and illumination variation, and approximation issues arise in the method [18]. The model in [19] utilized entropy calculated from local image blocks in the spatial and DCT domains. Xue et al. [20] subsequently developed a blind quality metric based on the bandpass image gradient magnitude (GM) and Laplacian of Gaussian (LOG) maps. Gu et al. [21] proposed an NR method using free-energy-based brain theory integrated with some classical human visual system-inspired features and NSS features by fitting the mean subtracted contrast-normalized coefficients. Liu et al. [22] collected gradient statistics features and employed an AdaBoost neural network to represent the image quality. These models are sensitive to luminance variation and cannot deal with stripe noise or be applied to HSI quality assessment directly. In [23], Yang et al. designed a spectral-spatial quality metric that extracts the quality-sensitive features from the image structure, texture and spectra. However, this metric estimates HSIs as a whole and cannot be applied to remove mixed-noise bands.
To address the abovementioned drawbacks, we propose a mixed-noise band selection algorithm for HSIs. Three major steps are involved. First, tensor factorization [24] is improved to acquire the mixed-noise components of real HSIs. Then, multiple statistical models are calculated from the noise components to extract the robust features. Finally, these features are fed to an extreme learning machine (ELM), which learns to select the noise bands contaminated by Gaussian and stripe noise. Experiments demonstrate that our model correlates highly with the noise degree and is robust to the image background and illumination.
In summary, we make the following three contributions: (1) A novel noise band selection framework is proposed, where mixed-noise bands of HSIs can be evaluated exactly and selected from real HSIs.
(2) A spectral smoothness constraint is designed to enhance the reconstruction performance so that Gaussian noise can be effectively extracted from low-rank components.
(3) A new unidirectional total variation method is incorporated into our model to separate the stripe noise from the noise HSIs, thereby improving the stripe noise detection rate on complicated backgrounds.
The rest of this article is organized as follows: Section II presents the related tensor decomposition model. In Section III, we give a detailed introduction and discussion of our metric. The experimental results are reported in Section IV, and related analyses are given. Finally, we conclude in Section V.

II. RELATED TENSOR DECOMPOSITION MODEL
HSIs are regarded as a third-order tensor [24] with two spatial dimensions and one spectral dimension that comprise its three modes. Tensor decomposition is one of the processing methods for HSI tensors and has developed rapidly to recover low-rank tensors from noise HSIs. Karami et al. [25] adopted the kernel Tucker decomposition algorithm to reconstruct HSIs, which combines spatial and spectral information to separate noise. To address the Tucker model problems with the uniqueness of decomposition and in the prediction of multiple ranks, Liu et al. [26] proposed the parallel factor analysis (PARAFAC) method to improve performance. More recently, [27], [28] extended robust principal components analysis (RPCA) [29] to the tensor case. Better convexification was proposed compared with other tensor decompositions [30]- [32], and the method achieved excellent performance for sparse noise separation. Suppose we are given an HSI data tensor D ∈ R n 1 ×n 2 ×n 3 , which can be decomposed as D = L + E, where L ∈ R n 1 ×n 2 ×n 3 is the low-rank term, E ∈ R n 1 ×n 2 ×n 3 is the sparse noise term, and n 1 , n 2 and n 3 are the HSI height, width and spectral bands, respectively. In [27], it is shown that L components can be recovered by solving the following convex problem: where L * denotes the tensor nuclear norm [33], λ is the regularization parameter λ = 1/ √ max(n 1 , n 2 )n 3 and E 1 denotes the l1-norm. Meanwhile, Fan et al. [33] classified noise data as dense noise or sparse noise, where Gaussian noise is dense and stripe and impulse noise are sparse. Therefore, the separation model for mixed noise can be represented as follows: where G ∈ R n 1 ×n 2 ×n 3 is the Gaussian noise term, as shown in Fig. 1. A corrupted HSI tensor D consists of a low-rank component L, sparse noise tensor E and dense noise tensor G. Suppose the standard deviation of dense noise is σ ; then, Equ. (1) can be expressed as follows: where D − L − E F is the Frobenius tensor norm (see section 3.1). In [27], Lu employed a t-SVD algorithm to analyze the rank of tensors and used the alternating direction method of multipliers (ADMM) [34] to optimize the convex problem. However, the abovementioned model cannot fully utilize locally continuous spectra. More importantly, this model fails to separate stripe noise from the low-rank component, which degrades stripe noise detection. Therefore, we propose the strategy in the following sections to improve the separation performance.

III. METHOD
To overcome the aforementioned drawbacks, we improve the tensor robust principal component analysis (TRPCA) model to enhance the separation performance for the noise components of HSIs. The workflow of our selection algorithm is shown in Fig. 2. We employ tensor decomposition to separate the low-rank components from the original images and reduce the influence of the image content variations. The spectral smoothness constraint is incorporated into the low-rank component to enhance the separation performance for luminance changes between different bands of HSIs. Additionally, we design a unidirectional total variation method to separate the stripe noise from the low-rank components. After several iterations, we can reconstruct the mixed-noise components from the HSI cube. Furthermore, we extract multiple quality-sensitive parameters from the noise components and employ an ELM to predict the noise bands. In this way, the noise bands can be detected and selected accurately.

A. RELATED NOTIONS
We briefly review some tensor-related notions and preliminaries in this subsection. Throughout this article, we employ Euler script letters, e.g., A ∈ R n 1 ×n 2 ×n 3 , to denote tensors and boldface capital letters, e.g., A ∈ R n 1 ×n 2 , to denote matrixes. The (i, j, k)-th entry of tensor A is denoted as a ijk , and n 3 is the spectral band number of a HSI tensor. Here, we use MATLAB notation to denote slices of the tensor: the The norms of a tensor are denoted as A 1 = ijk |a ijk |, and the Frobenius norm is denoted as A F = ijk |a ijk | 2 . We apply the MATLAB command fft to define A as the result of the discrete Fourier transformation of A in the 3rd dimension, i.e., A = fft(A, [], 3). A denotes a block diagonal matrix with each block on a diagonal as the frontal slice A The multiplication of tensors, called the t-product [27], [28], is defined below.
Definition 1 (t-Product): Consider A ∈ R n 1 ×n 2 ×n 3 ; its block circulant matrix has the size n 1 n 3 × n 2 n 3 and is VOLUME 8, 2020 defined as: The unfolding and folding operators are separately defined as: (2) . . .
Given A ∈ R n 1 ×n 2 ×n 3 and B ∈ R n 2 ×l×n 3 , the t-product A * B is defined to be a tensor of size n 1 × l × n 3 :

Definition 2 (Conjugate Transpose):
The conjugate transpose of a tensor A of size n 1 × n 2 × n 3 is A * , with size n 2 × n 1 × n 3 , which is obtained by taking the conjugate transpose of each frontal slice and then reversing the order of the transposed frontal slices through n 3 .
where I ∈ R n 1 ×n 1 ×n 3 is the identity tensor, whose first frontal slice is the identity matrix and whose other frontal slices are all zeros. Definition 4 (Tensor Nuclear Norm): the tensor nuclear norm of A is defined as A * . The tensor nuclear norm is defined in the Fourier domain. It is closely related to the nuclear norm of the block circulant matrix in the original domain.
To complete the decomposition, the matrix singular value decomposition (SVD) needs to be calculated in the Fourier domain [27]. Note that the tensor nuclear norm above is defined in the Fourier domain and is akin to the nuclear norm of the block circular matrix in the original domain. For a detailed discussion, please refer to [27].

B. LOCALLY SMOOTH SPECTRUM CONSTRAINT
We can use Equ. (2) to recover the low-rank component and noise component of HSIs from sparse noise, but model (2) performs poorly in the separation of HSIs with mixed noise, such as Gaussian noise and stripe noise. Therefore, we modify model (2) and add different constraints to the objective function.
Since the spectra of HSIs are locally continuous [35], the two adjacent spectra are not greatly different from each other. Tikhonov regularization is employed as an extra term L e F proportional to the objective function, where L e ∈ R (n 1 n 2 )×n 3 is the matricization [30] result of tensor L ∈ R n 1 ×n 2 ×n 3 and is given as follows: Essentially, this extra term ''encourages'' the algorithm to minimize the difference between the coefficients of adjacent spectra. As shown in Fig. 3, spectrum data learned with the smoothness constraints (Fig. 3(c)) resemble the original ones ( Fig. 3(a)) much more closely than the result without smoothness constraints (Fig. 3(b)). Ultimately, this operation greatly improves the separation performance.
The optimization of our model can be implemented by a two-stage iterative algorithm: updating L with fixed E and updating E with fixed L. When we update the low-rank component L, the smoothness constraint is calculated in the spectral domain.

C. UNIDIRECTIONAL TOTAL VARIATION
Stripe noise, which has a linear structure, is a common degradation phenomenon in space-borne and airborne remote sensing imaging systems. It is necessary to remove the stripe noise bands for HSI application. Surface texture prevents the recognition of stripe noise, as shown in Fig. 4(a), where the HOG features are disordered. The orientation of the HOG features in Fig. 4(b) is consistent with stripe noise. Therefore, it is crucial to separate the stripe noise from the low-rank component to improve the performance of stripe noise detection.
Suppose that images L a ∈ R n 1 ×n 2 are contaminated by vertical stripe noise E a ∈ R n 1 ×n 2 . Unlike Gaussian noise, stripe noise is relatively sparse and has a certain direction. To better remove the stripe noise and preserve the texture information of HSIs, we incorporated a UTV constraint as follows: Then, the final objective function is:

Algorithm 1 Separation Algorithm With ADMM
Input: tensor data D, parameter λ Initialization: During processing, the low-rank components L in the y direction are reduced with a certain ratio to L k+1 = prox τ G L k − τ ∇ y L k , where prox τ G is the proximity operator described in [36], [37], to keep the main structure of the HSI. The termination condition is: where L k ∈ R n 1 ×n 2 and ε is the experiential threshold, which can be adjusted to obtain different separation effects. The algorithm proposed in this article is summarized as algorithm 1, which is described below.

D. MULTIPLE STATISTICAL FEATURES OF THE NOISE COMPONENT
Distortion changes the regular statistical features of the image, and measuring the deviations of the statistical features makes it possible to predict the image quality without a reference [23]. To describe the image quality, several models are built, and multiple statistical parameters are extracted from mixed-noise components. Fig. 5(a) shows an example noise map histogram that obviously follows a Weibull distribution, and the curves are distinct for images with various levels of Gaussian noise. Steven et al. [38] have also suggested that human vision systems seem to be tuned to the shape and scale parameters of images with a Weibull distribution. Therefore, we built a Weibull model to fit noise maps:

1) WEIBULL STATISTICS
where x is an element of the noise components and k and λ 1 are the shape and scale parameters, respectively. Fig. 5(b) shows the shape parameter k under Gaussian noise σ = 0.001, 0.005, 0.010, 0.050. Clearly, the acquired parameter k increases with increasing Gaussian noise. More importantly, each curve is relatively stable for different channel images, proving that our feature is illumination and content insensitive. VOLUME 8, 2020

2) HOG FEATURES
Since stripe noise has a linear structure, we adopt the HOG features [39] to detect it in the noise components. The structure of the HOG descriptor applied in our algorithm is shown in Fig. 6. In this article, the images have dimensions of 64 × 64, a cell is 8 × 8 pixels in size, and each cell consists of 9 orientation bins. Each block, which contains 2 × 2 cells, yields a 36-dimensional feature vector. Then, the whole pass, which includes 7 × 7 blocks, has a 1764-dimensional descriptor vector. These high-dimensional descriptor vectors are used to train an ELM model to determine whether the images are affected by stripe noise. The model can be further optimized in practical applications.

E. NOISE BAND SELECTION BASED ON AN EXTREME LEARNING MACHINE
In this subsection, we introduce the band selection process of this work. Different approaches for feature mapping are compared, such as support vector regression (SVR) [16], [17] and neural networks (NNs) [22]. Usually, the extracted features are input into these models to fit a complex relationship. However, learning-based tools are often time consuming and suffer from problems such as overfitting and local optimization. Therefore, we employ an emergent machine technique, i.e., an extreme learning machine, to predict the noise bands.
In [40]- [42], Huang et al. originally proposed an ELM for generalized single-hidden-layer feed-forward neural networks (SLFNs), which has been used in various applications [43]- [45]. The ELM aims to learn an approximation function based on the training data. Suppose that SLFNs with K hidden nodes can be represented by the following equation: where a ij is the input weight connecting the input x i to the j-th hidden node, b ij is the bias connecting the input x i with the j-th hidden node, g(·) is the activation function, and β j is the output weight of the j-th hidden node. The activation function g(·) can be any nonlinear piecewise continuous function, such as the sigmoid function (equation 12) or radial basis function (RBF, equation 13) where θ = (a, b) are the parameters of the mapping function and · 2 denotes the Euclidean norm.
In [41], Huang et al. proved that SLFNs can approximate any continuous target function over any compact subset X with the above sigmoid and RBF functions. Training ELMs is equivalent to solving a regularized least-squares problem, which is considerately more efficient than training an SVM or learning with back-propagation. Therefore, in our model, an ELM is employed for mapping the HSI features [k λ 1 ] into different noise degrees and classifying the HOG features of stripe noise bands in various scenes.

IV. EXPERIMENT
We first introduce the test data and evaluation criteria. Then, we compare the performance of our proposed method with other denoising methods, including TV [36], TGV [36], BM4D [46], NMoG [47], and TRPCA [28] for the low-rank component. We report the mean peak signalto-noise ratio (MPSNR) and mean structural similarity index (MSSIM) values of the reference images and reconstructed images. We also conduct an experiment on the data collected from the real sensor, and the corresponding results are presented. Furthermore, we compare the performance of our proposed method with other no-reference indices, including BLIINDS2 [16], BRISQUE [17], NIQE [18], SSEQ [19], GM-LOG [20] and OG [22]. We report the obtained Spearman rank-order correlation coefficient (SRCC) and Pearson linear correlation coefficient (PLCC) values between the image prediction result and the corresponding noise degree. The scatter plots of the predictions versus the image noise degree are provided. The detection results for the stripe noise are also presented.

A. EXPERIMENTAL IMAGE AND EVALUATION CRITERIA
Two datasets are used in the experiment. The first dataset is from Pavia University. The size of the images we tested is 300 × 300 × 103. Gaussian noise and mixed noise (Gaussian and stripe noise) are added to the test images. The second dataset was acquired by an airborne visible infrared imaging spectrometer (AVIRIS) sensor, and it contains 224 spectral bands in the range of 400 ∼ 2500nm. The size of each hyperspectral image is 200 × 200 × 224. We randomly selected 120 HSIs with diverse image contents, including rural, urban, port, mountain, agricultural, and water areas, as shown in Fig. 7. The pixel values of the HSIs are normalized to [0, 1].
The MPSNR [48] measures the mean of the peak signalto-noise ratio over all bands between the original HSI and reconstructed HSI.
The MSSIM [11] measures the mean of the structural similarity (SSIM) between a reference HSI and the recovered HSIs over all channels.
The PLCC [49] measures the prediction accuracy of different quality metrics.
The SRCC [49] measures the prediction monotonicity of different quality metrics. A better quality metric has both a higher PLCC and higher SRCC [50].

B. SEPARATION PERFORMANCE FOR GAUSSIAN NOISE DATA
To compare our model with other separation algorithms [28], [36], [46], [47], we first add Gaussian noise with a standard deviation of σ = 0.1 to the Pavia University dataset. Fig. 8 shows the separation performance of different algorithms,  and the MPSNR and MSSIM values of the compared methods are presented in Table 1. Fig. 8(a) is the Gaussian-noise image, and Fig. 8(b)-(e) are the corresponding separation results of TGV, BM4D, NMoG and TRPCA. They show that BM4D can filter Gaussian noise effectively but has defects related to the computational complexity of block-based processing. Furthermore, BM4D performs poorly on stripe noise. NMoG can preserve structure and texture perfectly, while the MPSNR is relatively low. Our method performs well on both the MPSNR and MSSIM. The denoised results prove that our method can separate the low-rank and noise components effectively.

C. SEPARATION RESULTS FOR MIXED-NOISE DATA
We also add mixed noise, including Gaussian noise and stripe noise, to the Pavia University dataset. The standard deviation of Gaussian noise is 0.1. The stripe noise is added as follows: Suppose the input image is I ∈ R n 1 ×n 2 and m r ∈ R 1×n2 is a matrix of N (0, 1). The stripe noise image I str is given as follows: where µ r (i) is the mean value of image I(i, :), i.e., µ r (i) = n 2 j=1 I(i, j)/n 2 , and ϕ = 0.3 is the percentage of stripe noise that we add. The mixed-noise image is shown in Fig. 9, and Fig. 9(b)-(e) are the corresponding separation results of TGV, BM4D, NMoG and TRPCA. The MPSNR and MSSIM of the compared methods are presented in Table 2. BM4D can remove Gaussian noise but performs poorly on stripe noise. NMoG achieves a similar MSSIM as our model, while the MPSNR remains relatively low. It is obvious that our approach is superior to the others, which demonstrates the efficiency of the multiple constraints adopted in our method.

D. SEPARATION RESULTS FOR ACTUAL NOISY DATA
We compare our model with BM4D, NMoD, and TRPCA on real HSI datasets acquired from the AVIRIS sensor. Some bands of AVIRIS hyperspectral images are seriously polluted by the atmosphere and water absorption. We use all of them without removing any band data to demonstrate the robustness of the proposed method for denoising. Fig. 10 shows bands 2, 107 and 108 of AVIRIS for all competing methods. The original images are polluted by complex noise, such as Gaussian noise, stripe noise, impulse noise and blur. Our model still obtains a better separation visualization and preserves the structure and texture well from the original VOLUME 8, 2020  images. The reconstructed results substantiate the robustness of the proposed method in noise component separation.

E. GAUSSIAN NOISE QUALITY PERFORMANCE COMPARISON
To compare the proposed algorithm with other quality metrics [16]- [20], [22] on high-frequency noise assessment, we add white Gaussian noise to the second database, which contains 120 HSIs, where each HSI consists of 189 spectral bands (without the water absorption bands and noisy bands). The variance σ 2 of Gaussian noise increases from 0 to 0.004 by 0.0002 in each band. We randomly select 5%, 10% and 15% of the images from the noisy data to train the ELM model, and the rest are used for testing. The PLCCs and SRCCs of  the objective quality scores and the corresponding Gaussian noise degrees are presented in Table 3. Clearly, our proposed model delivers a better PLCC and SRCC than the compared methods. Moreover, the PLCCs and SRCCs of the models grow with increasing training data, which accords with the actual conditions. Fig. 11 shows the scatter plots of the Gaussian noise degree versus the compared model in the second dataset. Note that the scatter plots of the proposed model exhibit good linearity, tight clustering, and a relatively uniform density. They are consistent with the high PLCC and SRCC shown in the bottom row of Table 3.

F. STRIPE NOISE BAND SELECTION USING HOG FEATURES
We apply our model to the HSIs and separate the noise components of the HSIs. We use 15% of the data to train the ELM model, and the confusion matrixes of our models are shown in Fig. 12, indicating that our algorithm effectively distinguishes stripe noise. The recall for stripe noise detection on the HSIs and for the corresponding reconstructed noise components are 0.924 and 0.994, respectively, satisfying the demands of practical application. More importantly, the recall of the experiment clearly increased by using a noise component, demonstrating that tensor factorization plays a significant role in stripe noise detection. VOLUME 8, 2020

V. CONCLUSION
In this work, we study mixed-noise band selection for HSIs. First, the tensor decomposition model is improved by adding different constraints for separating the mixed-noise components from the low-rank tensor. Then, multiple statistical features are calculated to extract the robust parameters. Finally, an ELM is trained to select the noise bands from real HSIs. The experimental results show that our model detects noise images more accurately and robustly than the compared methods.