Tensor-Based Low-Rank and Sparse Prior Information Constraints for Hyperspectral Image Denoising

Hyperspectral data have been widely used in various fields due to its rich spectral and spatial information in recent years. Yet, hyperspectral images are always tainted by a variety of mixed noises. These noises seriously limit the accuracy of subsequent applications. To remove noise, this paper, based on low-rank tensor decomposition, combined with non-local self-similar prior information, proposes a tensor-based non-local low-rank denoising model, where non-local self-similarity uses mainly spatial correlation while low-rank tensor decomposition method uses mainly spectral correlation between bands. Traditional tensor-based methods are commonly NP-hard to compute and are sensitive to sparse noise. However, the method proposed in this paper can separate efficiently the low-rank clean image from Gaussian noise and sparse noise (pulses, deadlines, stripes, speckle, etc.) by using a new tensor singular value decomposition (T-SVD) and tensor nuclear norm (TNN). The NP-hard task was also achieved well by the alternating direction multiplier method. Due to the full use of spectral and spatial information of the data, Gaussian noise and sparse noise can be effectively removed. The effectiveness of our algorithm was verified through experiments using simulated and real data.


I. INTRODUCTION
The hyperspectral imaging spectrometer has important contributions to earth observation and remote sensing. Hyperspectral images obtained by such sensors have rich spectral and spatial information [1]. In recent years, hyperspectral data have been used widely in various fields, e.g., earth observation, environmental protection [2], target identification and tracking [3], food safety [4], precision farming, urban The associate editor coordinating the review of this manuscript and approving it for publication was Dongxiao Yu. hyperspectral denoising algorithms have been proposed, and these methods can be roughly divided into two categories: 2D extended method and tensor-based method.
The 2D extended methods use traditional 2D image denoising algorithms to process different bands of hyperspectral images [9]. Over the past 50 years, several 2D denoising methods have been proposed, including the classic K-SVD [10] and BM3D [11], which can be used directly for hyperspectral denoising. However, this extension ignores the inherent characteristics of hyperspectral images and usually fails to achieve good performance in practical applications. In recent years, many methods for hyperspectral noise reduction based on low-rank matrix decomposition [12]- [15] have been proposed. These methods consider clean hyperspectral data to be in a low-rank subspace, which can be modeled as a linear combination of a finite number of endmembers. Zhang et al. [7] proposed the method of low-rank matrix recovery (LRMR), which performed excellently for removal of mixed noise. However, only spectral correlation but not spatial correlation is taken into account by the LRMR method. To overcome this limitation, the non-local self-similarity across space (NSS) priori [16]- [21] and some spatial information methods are used for noise reduction. In [22], a new low-rank spectral non-local approach (LRSNL) was proposed. This method considers both spectral and spatial information, and can effectively remove mixed noise. Similarly, to improve the low-rank model, a total variation regularization term was considered, and a total variation (TV) regularized low-rank matrix factorization (LRTV) method [9] was proposed. These improved methods are effective in hyperspectral denoising. However, these methods transform hyperspectral images into 2D images by vectorizing the data of each band, resulting in the loss of some useful structural information. To overcome this limitation, many tensor-based methods [23], [24] have been proposed for hyperspectral image denoising.
A hyperspectral image is composed of a stack of two-dimensional images, which can be considered naturally as a 3-order tensor. Tensor-based methods mainly use tensor decomposition to hyperspectral image denoising [25]. There are usually two kinds of tensor decomposition used for denoising, namely CANDECOMP/PARAFAC (CP) decomposition [26] and Tucker decomposition [27]. By using Tucker decomposition, a low-rank tensor approximation (LRTA) [28] method was proposed. Liu et al. [24] designed the PARAFAC method by using parallel factor analysis. The two methods consider the correlation between spectral bands and ignore the spatial structure information. To improve these algorithms, a new method based on tensor dictionary learning was proposed in [29], which considered both global correlation among the bands and non-local self-similarity across space. In recent years, researchers have proposed many low-rank tensor denoising methods [30]- [32]. These new methods use tensor singular value decomposition (T-SVD) [33]. In other words, a tensor nuclear norm (TNN) [34] defined based on T-SVD is used for noise reduction. A tensor robust principal component analysis (TRPCA) [35] method was proposed by using T-SVD and tensor nuclear norm. This method takes advantage of the low rank and sparsity of the data, whereby it can remove effectively mixed noise in hyperspectral image data. Recently, Fan et al. [36] proposed a spatial-spectral total variation regularized low-rank tensor factorization (SSTV-LRTF) method. The total variation (TV) regularization [37] can preserve edge information effectively while obtaining low rank correlation among adjacent bands. Since the tensor-based denoising methods fully retain useful spatial structure information, they perform better than the 2D extended methods [38]- [40]. These existing algorithms have achieved good results in hyperspectral noise reduction. However, because noise is complex and diverse, no method exists that can deal with all kinds of noise simultaneously. Therefore, there is still a lot of room for improvement.
As mentioned above, non-local self-similarity (NSS) [41], [42] uses mainly spatial correlation while low-rank matrix decomposition method [41] uses mainly spectral correlation between bands. To utilize both spectral correlation and spatial information in hyperspectral data, a tensor-based non-local low-rank denoising model is proposed here. This method can eliminate both Gaussian noise and sparse noise (including impulse noise, deadlines, stripes, and speckle). The main ideas of this work can be summarized as follows: 1) Hyperspectral data are regarded as a third-order tensor. Based on the low-rank prior of clean hyperspectral data and the sparse nature of non-Gaussian noise, the low-rank model is applied to tensors composed of non-local similar full-band patch (FBP) [25] groups, and a tensor-based lowrank non-local denoising model is proposed.
2) In our model, low-rank tensors are constrained by tensor nuclear norm based on T-SVD, and non-convex term (logarithm sum) is used instead of tensor sparse measure 0 term. Finally, ADMM (the alternating direction multiplier method) [9], [32] is used to design an effective algorithm to solve this problem. ADMM can turn a complex problem into a series of simple problems, and has been successfully used in [35], [36], and [40] for global convergence guarantees.
3) The experiment verifies that the algorithm we proposed can not only effectively remove Gaussian noise, but also has a good effect on the mixture of several kinds of noise.
The organization of the remainder of this paper is as follows. The next section gives some notations and explanations for tensors. In the third section, the proposed tensor-based non-local low-rank denoising model is described. Experimental results and analysis are described in the fourth section. Conclusions are then given in the last section.

II. NOTATIONS AND PRELIMINARIES
Succinct definitions and notations used in this are given in this section. Here, scalar, vector, matrix and tensor are represented as lowercase non-bold letters, lowercase boldface letters, capitalized boldface letters and uppercase calligraphic letters, respectively. A N -tensor is expressed as A ∈ R I 1 ×I 2 ×···×I N . Elements of A are denoted as a i 1 ···i n ···i N where 1 ≤ i n ≤ I n .
There are two norms that are often used, the Frobenius norm of a tensor A is defined as A F = i 1 ,··· ,i n a i 1 ···i N 2 1 / 2 , and the other is the matrix nuclear norm A * = i σ i (A), where σ i (A) is ith singular value of the matrix A.
In the following chapters, we consider a third order tensor whose ith horizontal, lateral and frontal slices are expressed as A(i, :, :), A(:, i, :), and A(:, :, i), respectively. A frontal slice A(:, :, i) can also be expressed as A (i) .Â represents the fast Fourier transform (FFT) along the third dimension of tensor A, namelyÂ = fft(A, [], 3). Similarly, A can be obtained by ifft(Â, [], 3). For a 3-order tensor A ∈ R I 1 ×I 2 ×I 3 , letĀ be a block diagonal matrix defined as where each block on its diagonal is the frontal sliceÂ (i) ofÂ. In order to define the multiplication between tensors, called the t-product [33], [35], [40], we first define the block circular matrix. For a 3-order tensor A ∈ R I 1 ×I 2 ×I 3 , its block circular matrix has a size of I 1 I 3 × I 2 I 3 , and is defined as Likewise, the following two operators -unfold and fold -are defined as: (2) . . .
Below we give the product of two tensors, defined as follows: Definition 1 (T-Product) [33]: Let A ∈ R I 1 ×I 2 ×I 3 and B ∈ R I 2 × ×I 3 be two third-order tensors. Then, the t-product A * B is the I 1 × × I 3 tensor Definition 2 (Conjugate Transpose) [35]: The conjugate transpose of a tensor A of size I 1 × I 2 × I 3 is the I 2 × I 1 × I 3 tensor A * obtained by conjugate transpose of each frontal slice and then reversing the order of transposed frontal slices 2 through I 3 .
Definition 3 (Orthogonal Tensor) [35]: where I ∈ R I 1 ×I 1 ×I 3 is the identity tensor, which first frontal slice in the I 1 × I 1 identity matrix, and whose other frontal slices are all zeros. [33] : Let A ∈ R I 1 ×I 2 ×I 3 be a third-order tensor. Then, A can be factored as

Definition 4 (T-SVD)
where U ∈ R I 1 ×I 1 ×I 3 and V ∈ R I 2 ×I 2 ×I 3 are orthogonal, and S ∈ R I 1 ×I 2 ×I 3 is a f -diagonal tensor. Definition 5 (Tensor Multi-Rank and Tubal Rank) [40]: The tensor multi-rank of a tensor A ∈ R I 1 ×I 2 ×I 3 is a vector r ∈ R I 3 , in which the ith entry is the rank ofÂ (i) . The tensor tubal rank denoted as rank t (A) is defined as the number of non-zero singular tubes of S, where S is obtained from the T-SVD of A in definition 4. That is, Definition 6 (Tensor Nuclear Norm) [35], [40]: The tensor nuclear norm of a third-order tensor A ∈ R I 1 ×I 2 ×I 3 , denoted as A * , is defined as the average of the nuclear norm of all the frontal slices ofÂ, that is A * = (1 I 3 ) Notice that the tensor nuclear norm above is defined in the Fourier domain and is akin to the nuclear norm of the block circular matrix in the original domain. For detailed discussion, please refer to [35].

III. TENSOR-BASED NON-LOCAL LOW-RANK DENOISING MODEL
A. TENSOR-BASED LOW-RANK DENOISE MODEL As mentioned above, a third-order tensor can be used to express a hyperspectral image. Hyperspectral images are typically contaminated by a variety of mixed noise. This paper mainly studies Gaussian noise and sparse noise, such as pulse noise, dead pixels or lines and stripes [4]. Denote d H , d W , and d S as the spatial height, spatial width and number of spectral band of a hyperspectral image, the observed hyperspectral image Y ∈ R d H ×d W ×d S can be expressed as the sum of the following three parts: where Y is the noisy hyperspectral image, L is the clean hyperspectral data, S is sparse noise, and N is Gaussian noise, and L is a low tubal rank tensor [32]. In fact, if a tensor is a low-rank tensor, then it is a low tubal rank tensor as well. Thus, low-rank tensor L is likewise recoverable by low tubal rank tensor constraint [40].To restore the tensor L, the following optimization model is considered: The rank and 0 terms in (9) can only take discrete values, which make the combinatorial optimization problem difficult to solve in the application. For the ease of solving, the tensor tube rank is replaced by the tensor nuclear norm, which is a convex function. In addition, 0 is relaxed to log-sum form to simplify the calculation. This relaxation is proved an effective strategy for solving minimization problems [43], [44]. VOLUME 8, 2020 Then, the objective function is transformed into solving the following optimization problems: where P ls (A) = i 1 ,i 2 ,i 3 log a i 1 ,i 2 ,i 3 + ε , and ε is a small positive value. However, the low-rank recovery model only considers the global correlation along the spectral direction, but ignores the spatial structure in hyperspectral image. To make full use of spatial structure information, we establish a tensor-based non-local low-rank denoising model by introducing the prior information of NSS [45].

B. TENSOR-BASED NON-LOCAL LOW-RANK DENOISING MODEL
For a hyperspectral image, we define the full-band patch (FBP), which is stacked by patches at the same location of it over all bands, and the spatial size of each patch is 6 × 6 (d h = d w = 6) in this paper, where d h and d w represent the spatial height and spatial width of the hyperspectral image, respectively. Then, non-local self-similarity prior means that there are many FBPs (full-band patches), which are similar to a given local FBP [25]. In addition, non-local techniques mainly use spatial correlations between small patches of the hyperspectral image. To take advantage of the global spectral correlation and spatial structure priors of hyperspectral images, a non-local method is used to reconstruct a third-order tensor, and the newly constructed tensor retains perfectly the two prior information. A hyperspectral image can be expressed as a third-order tensor Y ∈ R d H ×d W ×d S , which is first partitioned into a group of overlapping full-band patches (FBPs) . Then, by converting each band of a FBP into a column vector, we can build a group of 2D FBPs P ij We can now represent all FBPs as a group of 2D patches is the number of patches in the whole hyperspectral image. According to the NSS of hyperspectral image, and by performing block matching [11], [25], for a given local FBP Y i , we can find a set of FBPs similar to it in its non-local neighborhood from y . Y i ∈ R d h d w ×d S ×d n (where d n is the number of non-local similar FBPs of Y i ) represents a third-order tensor, which is stacked by Y i and non-local FBPs similar to it in y . Thus, the global correlation along the spectral direction and the prior information of non-local selfsimilarity along the space are well preserved. By using the low-rank nature of hyperspectral data and solving the following optimization problem, we can estimate the corresponding true non-local similar FBPs L i from the noise image Y i : where L i represents clean non-local similar FBPs, S i is sparse noise, N i represents Gaussian noise. It is obvious that L i is a low-rank tensor. We can reconstruct the estimated hyperspectral image by aggregating all reconstructed L i s. The denoising process is portrayed in Figure 1.

C. A TENSOR-BASED NON-LOCAL LOW-RANK DENOISING ALGORITHM BY USING ADMM
To solve equation (11), we use the famous ADMM [9], [32], [35], [36], [46], [47]. This method can turn a complex problem into a series of simple problems. By introducing a Lagrangian multiplier D, the augmented Lagrangian function corresponding to the optimization problem in (11) is: where µ is the penalty parameter. For (12), the ADMM is used to update each variable and fix the remaining variables by minimizing the following equivalent function.
In the (k + 1)th iteration, L k+1 To simplify the calculation, let (14) can be rewritten as 102938 VOLUME 8, 2020 Solving (15) corresponds to solving, in the frequency domain, the following tensor restoration problem where,L i is the FFT along the third dimension of L i , and X k i is FFT along the third dimension X k i . Equation (16) has a closed-form solution and is solvable by the method of singular value threshold [32]. To solve the optimization problem in (16), we can decompose it into d n independent minimization problem. LetL Before solving this problem, let us give some definitions. If r = rank(X) and σ i (i = 1, 2, · · · r) denotes the ith singular value of X, then, the singular value decomposition (SVD) of X can be expressed as X = U diag {σ i } 1 i r V T . Thus, we can define the singular value threshold operator SVT τ (X) as: where (·) + = max(·, 0). We havê and updating of L k+1 i is achievable via inverse Fourier transform, namely To update S k+1 i , the following sub-problem is solved.
Equation (21) is convertible to the following optimization problem: It has been proved to have a closed solution [48], thus: where α = λ µ , Dα,ε(·) is the threshold operator, and it is defined as: where For N k+1 i , we have the following sub-problems.
Equation (25) is a least square problem, and we can obtain its closed solution as: The following formulation can be used to update D k+1 : The hyperspectral denoise algorithm proposed in this paper is summarized in algorithm 1, which is described as follows. It is worth noting that the penalty parameter µ adopts an adaptive updating strategy µ k+1 ← min ηµ k+1 , µ max in our algorithm, where µ max represents the upper bound of µ. This is because ADMM converges very slowly with a fixed µ [40].

IV. RESULTS OF EXPERIMENTS AND ANALYSIS
To verify the performance of the proposed hyperspectral image denoising algorithm, we performed experiments using simulated and real data. To analyze the performance of the algorithm quantitatively and qualitatively, seven typical denoising algorithms were selected for comparative experiments. These algorithms include band-wise K-SVD [10], band-wise BM3D [11], BM4D [21] and LRMR [7], which are two-dimensional extended to three-dimensional methods, VOLUME 8, 2020 and PARAFAC [24], LRTA [28] and tensor dictionary learning (TDL) [29], which are tensor-based methods. The parameters of all the seven algorithms were adjusted according to their descriptions in the corresponding reference papers to ensure their best performance. The algorithm parameters proposed in this paper are discussed in detail in section IV-C below. The spatial size of the patch was important. In fact, the larger the size, the better the accuracy and the longer the computational time will be. In principle, 3×3, 4×4, · · · ,8×8 are all fine but we finally chose 6 × 6 as the tradeoff between accuracy and computational time for all cases.
The methods were run on a PC with an Intel(R) Core (TM) i9-9900k Processor @3.60 GHz and 64.00 GB of RAM memory.To evaluate quantitatively the performance of the proposed algorithm, peak signal-to-noise ratio (PSNR), structural similarity index measurement (SSIM) [7] and feature similarity (FSIM) [49] were calculated. For hyperspectral data, we calculated the PSNR, SSIM and FSIM of each band and estimated the means, which were called mean PSNR (MPSNR), mean SSIM (MSSIM) and mean FSIM (MFSIM), respectively. The larger the mean of each of the three measurements, the better the noise reduction performance of the algorithm.

A. SIMULATED DATA EXPERIMENTS
Two public-domain datasets were used in these experiments to evaluate the performance of the proposed algorithm. These include the Washington DC Mall dataset and the Pavia City Center dataset [2]. The former contains 1208 × 307 pixels with 191 bands; a sub-image with size of 256 × 256 × 191 was used for our experiment (Figure 2(a)). The latter was acquired by the reflective optics system imaging spectrometer (ROSIS-03), and the first 22 bands (containing all the noisy bands) were removed to simulate a clean image; a sub-image of size 200 × 200 × 80 was extracted for our experiment (Figure 2(b)). In these experiments, Gaussian noise and sparse noise were added to the two reference hyperspectral datasets and the following two cases were considered: Case 1: We added Gaussian noise of independent and identical distribution to all bands of the two hyperspectral datasets. According to [2] and [4], considering the impact of strong noise on performance of the proposed algorithm, the mean value of Gaussian noise was set to zero and its variances were set to 0.04, 0.06, 0.08 and 0.1, respectively. In fact, different variances correspond to different noise levels. Each of these four variance values corresponds to the mean signal noise ratio (SNR) value of all bands of a hyperspectral image, i.e., 0.04, 0.06, 0.08, and 0.10 correspond to mean SNR values of 11.12, 7.99, 6.00, and 4.64 dB for the Washington DC Mall dataset, and to 13.68, 10.39, 8.19, and 6.61 dB for the Pavia City Center dataset.
Case 2: We added Gaussian noise and speckle noise to all bands of the two hyperspectral datasets. The mean value of Gaussian noise was set to zero and its variance was fixed at 0.02. According to [4] and [40], speckle noise was generated by adding salt-and-pepper noise to the hyperspectral image, and the percentage of sparse noise was set to 0.05, 0.1 and 0.15. Through comparative analysis, we believe that these three values can lead to effective verification of the performance of the proposed algorithm. Similarly, different percentages of sparse noise correspond to different noise levels. Each of these three values corresponds to the mean signal noise ratio (SNR) value of all bands of a hyperspectral image, i.e., 0.05, 0.10, and 0.15 correspond to mean SNR values of 3.07, 1.99, and 1.55 dB for the Washington DC Mall dataset, and 5.07, 3.49, and 2.82 dB for the Pavia City Center dataset.
For Case 1, the results of applying different de-noising algorithms to the Washington DC Mall and the Pavia City Center datasets (Table 1) show that the proposed algorithm was superior to the other seven algorithms in terms of each performance index. Therefore, our method performed best and that it can remove Gaussian noise effectively. Table 2 shows values of the three image quality indicators for the different methods in Case 2. It can be seen from this table that the tensor-based methods achieved better results than K-SVD, BM3D, and BM4D, and that LRMR provides the higher values than the first six algorithms for MPSNR, MSSIM, and MFSIM since it was a method based on low-rank matrix recovery. We can also see that the proposed algorithm was superior to the other algorithms. These results show that our proposed algorithm can remove mixture of Gaussian noise and sparse noise effectively. Figures 3 and 4 show the results of the different algorithms in denoising the Washington DC Mall dataset in the two simulated cases. Figures 5 and 6 show the results of the different algorithms in denoising the Pavia City Center dataset in the two simulated cases. The original band images in Figures 3 and 5 were contaminated with Gaussian noise (the mean value was set to zero and variance to 0.1). As can be seen from these two images, the K-SVD and BM3D methods lost some information because they did not take into account the differences between different pixels. The tensor-based LRTA and PARAFAC methods cannot remove Gaussian noise completely. Denoising was over-smoothed by the BM4D method. The image restored by TDL properly  removes Gaussian noise while finely preserves the structure underlying the image. The proposed algorithm and LRMR achieved better results than the other algorithms as images restored by the two methods preserve the edge and detailed information, proving that they can remove Gaussian noise effectively. The original band images in Figures 4 and 6 were simultaneously contaminated with Gaussian noise (with mean value of zero and variance of 0.02) and sparse noise (set to 0.15). From the two figures, it can be seen that the K-SVD, BM3D and BM4D methods almost failed to remove noise. Combined with the preceding content, it can be concluded that they have no ability to remove salt-and-pepper noise. The tensor-based LRTA and PARAFAC methods were capable of removing sparse noise, but some structural information was lost. TDL method performs comparatively better in structure preserving than LRTA and PARAFAC methods. Although LRMR obtains much better results than TDL, some impulse noise still exist in the image. The algorithm proposed in this paper had better performance, which shows that it can remove mixed noise effectively.

B. REAL DATA EXPERIMENTS
We used the Indian Pines dataset [40] for verification of the proposed algorithms performance. This dataset was acquired by the AVIRIS sensor in 1992. It consists of 145 × 145 pixels with 220 spectral bands. After removing water absorption    bands 104-108, 150-163 and 220, the final dataset we used contained 200 bands (Fig. 7). The image was corrupted mainly by Gaussian noise and impulse noise. To verify the performance of our proposed algorithm, we selected typical original bands 3, 112, and 219. Band 3 had both Gaussian noise and stripe noise. Band 112 had larger gray values than band 3, while band 219 was seriously polluted by noise, and there was almost no usable information before noise reduction. Figures 8 to 10 show the results of denoising the three bands, respectively, using eight different algorithms. As can be seen from Figures 8 and 9, the bandwise K-SVD and BM3D methods were completed poorly because the difference between different pixels was not considered, especially in the case of high noise. The tensor-based LRTA method can effectively remove Gaussian noise but it cannot remove stripes, while the PARAFAC method can   obtain better results than the first three algorithms. However, there were some over-smoothened areas. The BM4D method resulted in some loss of boundary information. The TDL and LRMR preserve the structure underlying the image. However, the images restored by TDL and LRMR have some stripes. The method proposed here obtained the best results; it retained the boundary and structure information of the image. This proves that our method can deal with both Gaussian noise and sparse noise. As can be seen from Figure 10, the K-SVD, BM3D, LRTA, and BM4D methods failed to remove noise, and after denoising the image has almost no available information. The PARAFAC method achieved better results than those four methods, but heavy noise can be also observed. The TDL preserves the structure of image. However, the images restored by TDL remains some unexpected sharp noises. LRMR method obtains good results except in the marginal area. This was because the patch-based method may result in the loss of some interdimensional information. The method proposed here has obtained the best denoising results whereby the boundary information was best preserved.

1) CHOICE OF PARAMETERS
We studied the influence of parameters in the proposed algorithm, which include two regularization parameters λ and τ , and one penalty parameter µ. The parameter λ was the one that constrains sparse noise and τ was used to limit Gaussian noise. It was clear that we should consider how to select these parameters properly. To evaluate and analyze these optimal parameters, we used the Washington DC Mall dataset and the Pavia City Center dataset as an example in Case 2 (the mean Gaussian noise was zero and its variance was 0.02, and sparse noise was set to 0.15), and used MPSNR as the evaluation measure of the analysis. To obtain L i accurately, the choices for parameter λ used in equation (11) were { 10 −4 , 5 × 10 −4 , 10 −3 , 5 × 10 −3 , 10 −2 , 5 × 10 −2 , 0.1, 0.5, 1 }, and the choices for τ were { 10 −4 , 5 × 10 −4 , 10 −3 , 5 × 10 −3 , 10 −2 , 5 × 10 −2 , 0.1, 0.5, 1, 1.5 }. Figures 11 and 12 show the MPSNR values of the proposed algorithm, as related to regularization parameters λ and τ on the Washington DC Mall dataset and the Pavia City Center dataset in Case 2. As can be seen from Figures 11 and 12, when the values of these two parameters were small, the value of MPSNR was almost constant. When τ changes from 0.005 to 1.5, the value of MPSNR first increases rapidly with the increase of λ, then decreases, and finally tends to be stable. In addition, the advantage of the proposed method becomes apparent when the noise level increases. As we can see, the best performance of the proposed algorithm was obtained when λ = 0.005 and τ = 0.05 for the Washington DC Mall dataset, and when λ = 0.01 and τ = 0.1 for the Pavia City Center dataset. At the same time, we set λ = 0.005 and τ = 0.05 for the Indian Pines dataset. The penalty parameter µ adopts an adaptive updating strategy µ k+1 ← min ηµ k+1 , µ max , where µ max represents the upper bound of µ. We empirically set η = 1.1, µ 0 = 1e−3, µ max = 1e10.

2) COMPUTATIONAL TIME COMPARISON
To evaluate further the proposed algorithm, we compared its running time with those of the other seven algorithms. Table 3 reports the running times of applying the different methods to the Indian Pines dataset, and it shows that the proposed method ran much faster than PARAFAC and BM4D, but slower than five of the other methods. The latter result   was mainly due to the use of non-local methods and SVD in solving low-rank problems, which led to the slowness of the methods. This is worth comparing with the optimal performance we achieved. In future work, we will optimize our algorithm to make it run faster.

V. CONCLUSIONS
In this paper, we proposed a tensor-based low-rank sparse prior denoising model. The main results of this study can be summarized as follows: (1) Hyperspectral data tensor was reconstructed by utilizing the non-local self-similarity property of hyperspectral data. The global correlation of the reconstructed data along the spectral direction and the prior information of non-local self-similarity along the space were well preserved.
(2) A non-local low-rank sparse model was constructed based on the low-rank and the prior information of non-local self-similarity of hyperspectral data. This low-rank model mainly used spectral information, while non-local selfsimilarity obtains spatial information of hyperspectral data. Therefore, our model fully considers both spectral and spatial information.
(3) The 0 term was replaced by a non-convex log-sum term, which is sparser than the regularization term 1 .
(4) A tensor-based non-local low-rank denoising algorithm was proposed by using the alternating direction multiplier method. Two experiments using simulated data and one experiment using real data showed that our proposed algorithm can remove Gaussian noise and sparse noise (which include streaks, dead lines, and pulses) effectively and better than other denoising methods.
There is no doubt that the proposed algorithm still has room for improvement. In the future, we can try to incorporate more prior information into our proposed model to improve further the performance of the denoising method.
Furthermore, further improving the running speed of the proposed algorithm is our future research direction.