A Denoising Network Based on Frequency-Spectral- Spatial-Feature for Hyperspectral Image

The quality of hyperspectral images seriously impedes subsequent high-level vision tasks such as image segmentation, image encoding, and target detection. However, the frequency, spectral, and spatial properties of the hyperspectral noise pictures are not utilized fully by existing image denoising algorithms. To address this issue, a novel convolutional network based on united Octave and attention mechanism (UOANet) is proposed to extract the frequency-spectral-spatial-feature for denoising the actual noise of HSIs. In particular, the negative residual mapping embedded in Unet is proposed for multiscale abstract representation and two modules are designed for modeling global noisy HSI features in the frequency-spectral-spatial domain. First, with the use of residual Octave convolution module, our model can focus on the intrinsic properties of HSI noise distribution for desirable noise removal. Next, a parallel spatial-spectral attention module is used to fully utilize the rich spectrum data and the various spatial data of each band in HSI, which improves the richness of HSI details after denoising. Experimental results on both synthetic and real HSIs demonstrate the validity and superiority of UOANet compared with the state-of-the-arts under various noise settings.

image can collect the signal of the whole electromagnetic spectrum, allowing researchers to obtain the spectral characteristics of various substances in the specific wave band and to analyze physical properties of the substance. Therefore, hyperspectral images are widely used in various fields, including ground object recognition [1], [2], [3], water retrieval [4], [5], and target tracking [6], [7]. However, hyperspectral images inevitably suffer from various corruptions and degenerations. Contaminated observations will seriously impede subsequent high-level vision tasks. As a result, it is of great importance to denoise HSIs before performing high-level tasks.

A. Related Work
This section briefly introduces the recent hyperspectral denoising works and the Octave-based approaches. Hyperspectral denoising has always been an ill-posed problem. Effective conditions that assist the denoising are required to address this issue, and various denoising approaches have been suggested to handle different types of noises. The existing methods can be coarsely divided into two categories: 1) Model-based approaches and 2) CNN-based approaches.
1) Model-Based Approaches: Most of the early hyperspectral remote sensing images are denoised by filtering techniques, which can be divided into spatial domain filtering and transform domain filtering. Spatial filtering is the most direct method for image denoising, and it works by combining the adjacent pixels in the window to achieve local smoothing. For example, Dabov et al. [8] proposed BM3D algorithm, which is used for three-dimensional (3-D) data denoising and can be directly used in hyperspectral image denoising. In order to enhance the denoising impact, Maggioni et al. [9] proposed BM4D, which extend BM3D to BM4D by employing 3-D cubes of voxels, and then stacks into a 4-D group and models bandwidth correlation by the joint processing of multidimensional image data. In addition, 1-D signal or 2-D image filtering method can be extended to denoise Hyperspectral Data Cube. Heo et al. [10] proposed a joint bilateral filter for hyperspectral image denoising. The bilateral filter and the fused image are applied to the hyperspectral image denoising after all bands of the picture have been weighted.
These filter-based approaches are simple and efficient. The key is the filter design and the selection of noise threshold. The periodic noise can be used to separate the signal components accurately. However, the mixed noise without obvious distribution features can easily lead to the spatial loss such as local This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ oversmoothing and detail blurring. Additionally, both the filter design and the selection of noise threshold are dependent on expert priors, which cannot satisfy the demands of intelligent data processing.
In general, since the structural characteristics of HSI are utilized, these regularization-based approaches are more flexible and can be used for different types of noises. The key is the design of prior term and the selection of optimal regularization parameters. However, due to the limitation of artificial noise model and prior information, the generalization performance of single model is limited, making it difficult to apply to different types of noise pollution in real scene. What is more, it is challenging to fulfill the practical processing requirements of the massive high-dimensional hyperspectral remote sensing images.
2) Convolutional Neural Network (CNN) Based Approaches: With the development of deep learning technology, the CNN has recently seen widespread use in several low-level vision tasks of HSIs due to its excellent nonlinear fitting capability and automatic feature selection. In the latest research progress, many scholars are attempting to build an appropriate denoising network by training the network to learn the relationship between the model parameters and the noisy image implicitly. By doing this, effective noise reduction can be achieved without relying on manual constraints. Zhang et al. [28] proposed the 2-D image denoising architecture DnCNN to remove various noises in HSIs. They argued the learned filters can well extract the spatial structural information. Furthermore, utilizing residual learning based on DnCNN, Chang et al. [29] proposed an HSI-DeNet, which can remove many types of noises. There are also methods that include adding image gradient information to a network, such as the networks proposed by Maffei et al. [30] and Yuan et al. [31], which takes the spectrum data and image direction gradient information as the network input to remove noise. Recently, the attention mechanism plays a critical role in computer vision tasks. Many works have applied it to explore the correlation between the spatial and spectral properties in the field of HSI denoising. For example, Zou et al. [48] proposed an enhanced channel attention to make the network focus on features that are more conducive to spectral reconstruction. Wang et al. [49] applied the attention mechanism is used to select distinctive pixels in the feature maps for HSI denoising. Li et al. [50], [51] applied the vision transformer to capturing the nonlocal self-similarity of HSIs. On the other hand, given that deep learning is lack of interpretability, some scholars combine the model based and learning based models [52], [53], [54]. However, currently, these studies mainly focus on simple Gaussian noise situations, which are difficult to handle complex real noisy HSIs acquired by different sensors with varying numbers of bands.
To sum it up, although many CNN-based methods have been developed for hyperspectral image denoising, most of these approaches rely heavily on a large amount of HIS data, resulting in low generalization and a significant amount of parameter redundancy. Therefore, fully mining the structural characteristics of real hyperspectral remote sensing image is an important task to improve the denoising effect of CNN-based approaches.
3) Octave-Based Approaches: Usually, a natural image can be defined as a discrete frequency signal, and the frequency distribution of noisy image y can be expressed as a combination of high-frequency information and low-frequency information, represented as F (Y) = {F H (Y), F L (Y)} . Based on the Octave convolution (Octave convolution) proposed by Chen et al. [37], we can separate the feature channels of an image by convolution, and get the high-and-low-frequency information of the image to obtain the frequency distribution of noise.
As shown in Fig. 1, in the Octave kernel, the ratio α represents the low-frequency proportion. The low-frequency features are represented by α × c channels, whose spatial resolutions are decreased to 0.5 H × 0.5 W. The high-frequency features are represented by the (1 − α) × c channels, whose spatial resolutions remain H × W.
In detail, because of the separation of the eigenvalues of the input and output, the Octave convolution weight W also needs to be separated, represented as{W H , W L } The different frequency feature vectors are fused. Specifically, F H→H (Y), F L→L (Y) means intra frequency forward propagation and F L→H (Y), F H→L (Y) means inter frequency forward propagation. The output characteristics of low and high frequencies are as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where upsample(·) represents the up-sampling operation using the nearest neighbor up-sampling method, pool(·) represents the down-sampling operation using the average pool method. Because of the ability to extract high-frequency features, Octave convolution has been applied in many HSI high-level tasks [55], [56], [57]. Following this line, we introduce this structure to our task for more effective HSIs denoising.

B. Motivations and Contributions
To overcome the shortcomings mentioned previously, this article proposes a denoising method based on the physical characteristics of hyperspectral noisy images to balance denoising performance and retain the noise-free component.
1) Spectral-Feature: HSI has a large number of spectral bands with strong correlation, which has an effect on denoising performance. However, noise randomly spread across different bands and lacks high spectral correlation. Therefore, how to capture the strong spectral correlation is a key issue for better denoising performance. 2) Spatial-Feature: In nature, most of the components in a picture are similar. Capturing the local and nonlocal spatial similarities benefit restoring the HSIs structural details. However, because of its random shape and distribution over different regions, noise lacks high spatial correlation. Therefore, it can be easily found that mining high correlation in the HSI spatial domain is of great significance for denoising. 3) Frequency-Feature: The high-frequency component is considered to contain more noise and texture details while the low-frequency component is considered to contain more content. This feature motivates us to focus the network on the high-frequency parts to better suppress noise and preserve the HSI content. From above, the high correlation in the spectral domain and spatial domain, and also high frequency distribution are significant factors to be considered. Based on these feature priors, we propose UOANet to extract the frequency-spectral-spatialfeature for denoising the real noise of HSIs. Specifically, first, we introduce the parallel spatial-spectral attention mechanism to extract the high correlation information in spectral-spatial domain. Second, inspired by Chen's work [37], we introduce Octave convolution to separate the high-frequency and lowfrequency information. This allows the network to focus on learning high-frequency noise information and minimizing the computation of solution space. Finally, we train the UOANet end-to-end to learn all of its parameters. The contributions of this article can generally be summed up as follows.
1) To better utilize the frequency-spectral-spatial-feature for HSIs denoising, we propose two noised image feature extraction modules, residual Octave convolution module (ResOct) and SSAT. A novel ResOct module is introduced in the encoding phase to extract high-frequency features, allowing the network to locate the noise information. Considering the spectral-spatial relationships between HSI pixels, in the decoding phase, an innovative spatial-spectral attention mechanism SSAT is proposed for noise feature learning, which fully captures the correlation information in feature maps. 2) An end-to-end denoising scheme is proposed based on the physical characteristics of hyperspectral noisy images, which considers the frequency of noise distribution, the spatial and spectral correlation of hyperspectral images.
What is more, we use negative residual mapping to significantly reduce the mapping range, ensuring the generalization of the model. 3) Experimental results on both synthetic and real HSI datasets confirmed that our proposed model can achieve comparable or better performance compared with other state-of-the-art methods in the richness of image highfrequency details and model convergence. The rest of this article is organized as follows. In Section II, we introduce the proposed network in detail. In Section III, we conduct a variety of experiments on synthetic and real HSI datasets. In Section IV, we prove the effectiveness of each module design. Finally, Section V concludes this article.

II. PROPOSED MODEL
An HSI is degraded by many factors during the imaging process. Therefore, it is necessary to improve the quality of hyperspectral imaging and increase the capacity for expression and information extraction. Image degradations caused by various mechanisms produce various types of noise. In this article, we will discuss additive and signal-independent noise (specifically, Gaussian noise, impulse noise, deadline noise, and stripe noise), which can be linearly modeled as where Y ∈ R H×W ×B is the observed noisy HSI, X ∈ R H×W ×B is the clean HSI, N ∈ R H×W ×B is the addictive random noise. e H, W, B indicated the spatial height, spatial width, and the number of spectral bands, respectively. Given a noisy HSI, our goal is to recover the clean HSI X from the observed noisy HSI Y. In this section, we introduce the overall network architecture of UOANet for HSI denoising, and then present the core building block in our network in detail.

A. Overall Network Architecture
The network takes Unet as the backbone and noisy HSIs as the input to predict clean HSIs. As shown in Fig. 2, UOANet uses Unet as the backbone, making better use of image context and location information than CNN, which is widely used in various low-level tasks [32], [33], [34], [35]. The network contains four encoding layers and three decoding layers. The left side of the network is the feature extraction network (encoder) to get the abstract semantic features. The right side of the network is the feature fusion network (decoder), which reconstructs the clean HSI image with the clean semantic features after denoising.  Specifically, we adopt the nearest neighbor interpolation for resize to help recover lost details during the encoder process. Symmetric skip connections are added in each layer to facilitate information transfer at various levels and promotes gradient back-propagation, which can help network training.
Compressing the mapping range is crucial for narrowing the solution space and enhancing network learning [36]. As shown in Fig. 3, taking Indian pines hyperspectral image as an example, noisy band as Y, clean band as X, we observe that when compared to the clean image Y, the residual of the rainy image Y-X has a significant range reduction in pixel values. This implies that the residual can be introduced into the network to aid with mapping learning. This skip connection can also directly propagate lossless information over the entire network, which is useful for estimating the final denoised image. In light of this idea, the mean squared error is employed as the loss function as follows: where a training group with N pairs {y i , s i , x i } N of image data, y i is the observed corrupted ith band data, and x i is the corresponding ith noise-free data, H(·) represents the UOANet.

B. Residual Octave Convolution Module
Inspired by Chen's work [37], we compare the noisy and clean HSI of the same scene to the spectrum in the frequency domain. As shown in Fig. 4, compared with the clean background spectrum, the spectrum of the noisy HSI diffuses energy from the high frequency part to the surroundings. Therefore, we can find that the noise frequency distribution F(N) mostly exist in high frequency information, and can be captured by the two-branch structure. Let F H→H (Y) represents the noise frequency present in high-frequency component, and F L→H (Y) represents the noise frequency present between high and low frequency component, the expression can be represented as Bring (5) into (2) As shown in Fig. 5(a), we introduce the ResOct in UOANet in encoding layers, enabling the network to locate the noise information. ResOct consists of residual blocks (RoctB) and long-short jumping connections. First, the high and low frequency information is separated by an Octave convolution, and then activated by BN and Relu, respectively. After three series RoctB, the high and low frequency information is recombined by an inverse Octave convolution. Finally, the edge information of the original graph is fused to output through a long jump connection.
The ResOct structure combines Octave convolution with the residuals structure to enable cross-layer feature interaction, and extract deeper high-frequency feature semantic information while reducing the impact of low-frequency features. This structure preserves raw information and avoids the gradient, the network training process has some advantages. The structure optimizes the network training and effectively improves the denoising effect of hyperspectral image. The preservation of the raw information and avoidance of the gradient optimizes the network training process, effectively improving the effect of hyperspectral image denoising.
The structure of RoctB is shown in Fig. 5(b), the design of the jump link refers to the structure of RESNET50, when identity is mapped, instead of simply adding it, it passes through an Octave convolution of a BN layer and a RELU layer, then, the high and low frequency features extracted by Octave convolution are fused along the feature channel, which makes the model more easily converge and the training of the network more simple and efficient.
The operations of the ResOct are represented as

C. Parallel Spatial-Spectral Attention Module
As 3-D data, the hyperspectral image has characteristics of the spectral-spatial structure, global spectral correlation, and local/nonlocal spatial interactions. To model spatial and coherence spectral of the HSIs, attention mechanism is introduced for more detailed clean HSI restoration. First, we resize the encoded feature map, then splice it with the same-sized shallow encoded feature map along the channel, using a 1 × 1 convolution model the global context. Then, we design SSAT to adaptively recalibrate spatial, spectral, and channel characteristics. The SSAT, which adopts the ideas of CBMA [38] can weight the feature map to better align the reduction result with the physical properties of the HSIs.
The structure of SSAT is shown in Fig. 6, which consists of two parts: 1) Spatial attention module and 2) spectral attention module. Specially, since what is learned by later modules is affected by what has been processed by previous modules, regardless of the sequential sequence of spatial and channel attention [39], the model effect becomes unstable and it is impossible to ensure the correctness of effective promotion. Therefore, this article integrates spatial and spectral attention information simultaneously to avoid the interference of different where λ 1 and λ 2 are the weighting parameters. 1) Spatial Attention Module: Spatial attention focuses on the interspatial domain. First, average pooling and max pooling operations along the channel axis are used to generate the descriptors: F S avg ∈ R M ×N ×1 and F S max ∈ R M ×N ×1 . Two descriptors are concatenated and fed into a vanilla convolution. The spatial attention process can be represented as where σ is the Sigmoid activation function and f is a 2-D convolution with a 7 × 7 kernel.

2) Spectral Attention Module:
This module combines feature maps along the spatial domain using average pooling and maximum pooling. They are forwarded by the channel attention module using a single hidden layer shared multilayer perceptron (MLP). The channel attention map M C is created by merging the results of two branches. The process can be represented as (10) where σ is the Sigmoid activation function and W1 and W0 are the shared MLP parameters. F C avg ∈ R 1×1×B and F C max ∈ R 1×1×B are the features generated by average and max pooling operations in spatial domain, respectively.

A. Training Experimental Settings 1) Training Data Set:
We conduct several experiments using data from ICVL hyperspectral dataset [40], which comprise 201 images at a resolution of 1392 × 1300 over 31 spectral bands. We use 100 images for training, 1 image for validation, while 40 images are for testing. The training set is expanded by cropping the photos into 64 × 64 × 31 patch pairs. Each image is standardized into [0, [1]  Then, ten bands in the ICVL data set are chosen randomly to add deadline noise. The number of deadlines in each band is 5% to 15% of columns. Case 4: Gaussian + Impulse noise. All bands are corrupted by Gaussian noise as mentioned in case 1. Then, ten bands in ICVL data set are randomly selected to add impulse noise with different intensities, and the percentage of impluse ranges from 10% to 70%. Case 5: Mixture noise. First, all bands are corrupted by Gaussian noise as previously mentioned. Then each band is randomly corrupted by at least one kind of the other four noise mentioned previously.

4) Evaluation Indexes:
In order to evaluate the denoising performance of simulated experiments in both the spatial domain and spectral domain, three quantitative criteria are introduced as follows. Smaller SAM and larger PSNR and SSIM imply better denoising.
Mean peak signal-to-noise ratio (MPSNR) [41] where M, N, and B represent the HSIs width, height, and number of bands, respectively. A is the maximum value of all the gray values. y(i, j, k) represents the original clean image, while x(i, j, k) represents the approximated image.

MSSIM = 1 B
where μ x i and μ y i , stand for the mean values of the ith estimated and original clean image, respectively, σ 2 x i and σ 2 y i are the variances, σ x i y i is the covariance, and C 1 and C 2 are constants that prevent the denominator from being 0. The mean value of SSIM of each band is adopted to assess the structural similarity of the whole.
Mean spectral angle mapper (MSAM) [43] where t i denotes the estimated spectrum and p i is the original spectrum. This metric is adopted to assess the spectral fidelity of denoising algorithms. 5) Implementation Detail: We adopted the incremental learning method to stabilize and accelerate the training, which also avoids the network converging to a poor local minimum.
Hyperparameter values were empirically set to make network learning fast yet stable. Small batch size (i.e., 16) is used to accelerate training at first stage, while large batch size (i.e., 64) is adopted to stabilize training when tackling harder cases (e.g., complex noise case). The overview of our training procedures is shown in Table I, with detailed hyperparameter setting. We used the Adam algorithm as the optimizer.
All experiments were performed on a PC with an Intel(R) Xeon(R) Gold 5218R CPU, and an NVIDIA 2080Ti GPU. A quantitative and qualitative analysis has been conducted for both simulated and real data.

B. Experiment on ICVL HSIs
1) Testing Data Set: We design two different scenarios to verify and evaluate the denoising performance of UOANet.
2) Results of ICVL HSIs: In Gaussian Noise case, Table II shows the index values of MPSNR, MSSIM, and MSAM after the proposed algorithm and seven other contrast algorithms are denoised. As can be seen from the table, our UOANet algorithm can achieve the best or the second best index in most bands, because our method fully considers the spatial-spectral correlation of noisy HSI. In addition, UOANet uses octave to preserve low-frequency information and denoise high-frequency information by convolution. It can be easily observed from Fig. 7, our method can better remove the noise and retain the details. At the same time, in addition, the PSNR values for each band in Fig. 7 are shown in Fig. 9(a) and (b), from which it can be observed that our method achieves a higher PSNR in almost all bands compared with other methods.
In complex noise case, denoising quantization results are shown In Table III. From Table III, we can see that our method achieves significantly better denoising results than some of the most advanced methods, such as LRMR, LRTV, because these methods are based on low-rank matrices, and some structural information is lost in the process of denoising. Compared with two methods based on depth learning (DNCNN and HSID-CNN), our method can explore spectral-spatial information and suppress noise thanks to the SSAT attention module. From Fig. 8, we can observe that although the competing methods LRTV, LRMR, and HSID-CNN and can obtain cleaner denoising results, the denoising image still contains some noise or structural information that is not well preserved. In contrast, our method can not only remove the complex noise well, but also preserve the structure and details better, so as to obtain better visual reconstruction results. Furthermore, we show a PSNR value for each band in Fig. 9(c) and (d), from which we can observe that our method can achieve a higher PSNR in almost all bands compared with other competing methods. And the spectral curves of pixels (130,74 s) in Case 5 are plotted in Fig. 10. It shows that compared to other methods, we are also closer to the ground truth value. UOANet can reconstruct HSIs with higher quality in both spatial and spectral domains.

C. Experiment on Remote Sensing Images With Simulated Noise 1) Testing Data Set:
The main motivation of proposing UOANet was to improve the generalization ability of the model, besides experiments with close-range images, such as ICVL HSIs. We also ran all the competing methods on remote sensing images, such as SDG images acquired by the SDGSAT-1 satellite. Compared with ICVL HSIs, SDG images has a higher spectral resolution 456 × 444, but much lower spatial resolution, which contains with a spatial resolution of 30 m per pixel. Therefore, experiments on real data of different satellites and loads were conducted to verify the generalization ability. We design three different scenarios to verify and evaluate the denoising performance of UOANet.
2) Results of SDG Images: In blind Gaussian noise case, from Fig. 11, we can observe that although BM4D, TDL, ITSReg, and LLRT can obtain cleaner denoising results, the structural information of the denoised image is not well preserved, resulting in oversmoothing. However, DNCNN and HSID-CNN still contain more noise, because they are changed to the scene, which is different from the training set, which shows that the generalization ability of the model is weak. Our method performs better in detail maintenance, noise removal, and model generalization.
In mixture complex noise case, from Fig. 12, we can observe that although LRMR and LRTV can obtain cleaner denoising results for dead-line noise, the LRMR denoising image still   retains some Gauss Speckle noise, the structure information of LRTV is not well preserved, so it is too smooth. Benefiting from the ability to integrate spatial context information and interchannel dependencies, our method can preserve the structure details better while finely remove the complex noise.
In particular, we used the average of all the pixels in each band to evaluate the effect of denoising. Figs. 13 and 14 show the longitudinal averages of the SDG images of Scene 1 and Scene2 before and after denoising, respectively. It can be seen from the graph that the curve of the original graph of noise has a sharp fluctuation, which indicates that the image contains banded noise. Compared with other contrast algorithms, the curve of our algorithm is smoother, which shows that our algorithm can better remove noise.

1) Testing Data Set:
In this article, we evaluate our model on remotely sensed hyperspectral datasets, including EO-01 data and Indian Pines data. All of them have been used for real HSI denoising experiments [44], [45], [46]. EO-01 data are captured through the Hyperion sensor with size 400 × 1000 × 242 and are mainly degraded by stripe, deadline, and Gaussian mixed noise. For simplicity, we select EO-01 sub image with a size of 240 × 240 × 31. The Indian Pines are captured through the AVIRIS with size contains 145 × 145 × 220 with a spatial resolution of 20 m per pixel, and some bands are seriously polluted by the atmosphere and water, as well as degraded by stripe, deadline, and Gaussian mixed noise, making it difficult to remove this noise.
2) Results of Real Noisy Images: For EO-01 data, it can be observed in Fig. 15 that scene was affected by striping noise and deadlines. It can be seen from the results that the visual effect of BM4D, BWBMD processing is not good: only a small amount of stripe noise is slightly suppressed and many obvious stripes remain. The LRMR, HSID-CNN, and DnCNN method generally remove the stripes, but a few stripes are not removed locally. After LRTV, 3DQRNN, MACNet, T3SC, SST, and SERT processing, some of the stripe interference can be removed, but the restored image is excessively smoothed due to the missing of texture information. Among these methods, our method produces the best denoising results for the restored   image retains the original structure features. Fig. 15 depicts the spectrum of denoising and noise HSI at position (152,82). As can be observed in Fig. 16, DNCNN, HSIDCNN, LRTV, 3DQRNN and UOANet provide optimal denoising, but UOANet has the best spectral fidelity.
For Indian Pines data, it can be observed in Fig. 17 that terrible atmosphere and water absorption obstruct the view to the real scenario, severely degrading the quality of images. The Gaussian denoising methods, such as BM4D, BWBM3D, and DnCNN cannot accurately estimate the underlying clean image  due to the non-Gaussian noise structure. The LRMR, LRTV, HSID-CNN, HSI-SDeCNN, MACNet, T3SC, SST, and SERT method generally removes the noises, but a few noises are not removed locally. Our method successfully tackles this unknown noise, and produces sharper and clearer result than others. Specifically, to comprehensively compare the denoising effect, we also show false color images of these constructed results of the Indian Pines (band 144, 154, and 164) in Fig. 18. It can be easily seen that other competing methods still exists much dense noise in the restored bands, while our proposed method can almost remove the most complex noise. The spectral reflectance of the pixels (103, 64) is plotted in Fig. 19 , and it can be seen that all methods provide very similar spectra in real visual perception. But our method is more complete in preserving the curve details of the spectrum and achieves the clear restored bands.

A. Effectiveness of the ResOct and SSAT Module
In this section, we examine the effectiveness of ResOct and SSAT on the denoising performance. Table V presents the denoising results of different module settings. Meanwhile, the sensitivity of attention module weighting parameters λ 1 and λ 2 are discussed. Fig. 20 presents the visualizations of the SSAT modules.    -II) to the model, we can obtain a certain promotion. However, it can be seen that UOANet ( λ 1 = 0.5, λ 2 = 0.5) achieves better index evaluation results, proving that parallel fusion of spatial and spectral attention information improving the denoising performance significantly.
To prove the effectiveness of SSAT module in exploring the spectral relationship and spatial relationship among feature, we   show feature maps learned by the SSAT module in Fig. 20. From Fig. 20(b), it can be easily seen that the features with a strong correlation to spectral information will have large response. For example, the whole sky appears red. It proves that the SSAT module is able to capture the spectral interrelationship along the channel dimension. From Fig. 20(c), it can easily observe that features with similar information will have a high reaction. For instance, the edges of the two cars appears red. It proves that the SSAT module is able to explore the spatial relationship among pixels.

B. Sensitivity Analysis of α
By adjusting the α value, the parameters of Octave convolution can be changed to save network parameters and computing resources. The larger α, the greater the proportion of low-frequency features selected. Although this can reduce the complexity of the network, but may lead to the loss of highfrequency features. Therefore, when using Octave convolution, we need to select the appropriate a value by experiment.
As shown in Table V, as α increases, the complexity of the model decreases. When α is too large, the effect of image denoising will be degraded by overcompression of spatial information. When α is too small, low-frequency redundancy makes it difficult for the network to pay attention to high-frequency features, and the network gets poor results. As can be observed, the performance of the proposed UOANet is best when α = 0.2.

C. Running Time Assessment
In this section, we compared training and testing time for different algorithm. For training time cost, we compared the parameters for different CNN -based methods. AS Table VI show, with the advantage of ResOct module, our method significantly reducing the number of parameters required. For testing time cost, we compared the average running time required for different methods of noise removal in blind Gaussian noise and mixture complex noise. AS Table VII shows, with the benefits of GPUs and end-to-end structures, deep-learning-based methods exhibit less runtime than the traditional methods. Our method performs best and requires the least amount of processing time.

V. CONCLUSION
Although many denoising methods have been suggested, most of them are unable to fully exploit the physical properties of hyperspectral noise images. In this article, we propose two key modules ResOct and SSAT in light of HSIs frequency distribution and spatial-spectral correlation. Based on these two modules, we improve the Unet network, and propose an HSI denoising network UOANet, which combines octave and attention mechanism. ResOct is embedded in the up-sampling process of UNet network, and uses the down-sampling low-frequency features to map the frequency features of noise, to remove the spatial redundancy, and to improve the network speed. SSAT is embedded in the down-sampling process, and the attention mechanism performs both global average and global maximum mixing pooling on the channel and spatial dimensions, which can provide more effective global and local details for the network during the decoding process. Simultaneously, the residual modules are fused in U-Net network to avoid the problems of gradient disappearance, and further enhance the ability of denoising.
Finally, we compare the denoising results, efficiency and visual effects of different methods on ICVL, SDG, EO, and Indian Pines, it has been demonstrated that the proposed method is superior to both model-based and depth-learning-based methods in subjective visual effects and objective quantitative measurements.