Weighted t-Schatten-p Norm Minimization for Real Color Image Denoising

In this paper, to fully exploit the spatial and spectral correlation information, we present a new real color image denoising scheme using tensor Schatten-<inline-formula> <tex-math notation="LaTeX">${p}$ </tex-math></inline-formula> norm (t-Schatten-<inline-formula> <tex-math notation="LaTeX">${p}$ </tex-math></inline-formula> norm) minimization based on t-SVD to recover the underlying low-rank tensor. Similar to matrix Schatten-<inline-formula> <tex-math notation="LaTeX">${p}$ </tex-math></inline-formula> norm, using non-convex t-Schatten-<inline-formula> <tex-math notation="LaTeX">${p}~({0 < {p} < {1}})$ </tex-math></inline-formula> norm minimization could obtain better results than the tensor nuclear norm minimization which is a convex relaxation of the nonconvex tensor tubal rank. To avoid over-shrink the tensor tubal rank components, a flexible weighted t-Schatten-<inline-formula> <tex-math notation="LaTeX">${p}$ </tex-math></inline-formula> norm model is proposed with weights assigned to different elements of tensor singular tubes. We adopt the generalized iterated shrinkage algorithm to solve the minimization problem efficiently. Extensive experiments on one synthetic and two realistic datasets demonstrate the effectiveness of our proposed method to remove noise both quantitatively and qualitatively.


I. INTRODUCTION
As a low-level image processing technique, image denoising is crucial to high-level computer vision tasks, for example, segmentation [32], [45], feature extraction, classification and so on. Since noise is ubiquitous in real-world data, a number of image denoising methods have been proposed to date. Most of the existing methods can be roughly categorized into total variation (TV) regularized methods [1], [30], [34], sparse representation methods [7], [8], [10], [11], [36], [37], [39]- [41], nonlocal self-similarity prior based methods [2], [7]- [9], [12], [35], [38]- [41], low-rank regularized methods [9], [12], [21], [35], [38], generative learning methods [23], [24], [29], [47] and discriminative learning methods [3], [6], [13], [42]. Considering color image denoising, one of the representative works is the color BM3D (CBM3D) [7], which first transforms the color image into a less correlated luminance-chrominance space and then applies BM3D in each channel of the transformed space. Xu.et al. [36]- [38] perform denoising on the RGB channels simultaneously for better use of the spectral correlation. However, such matrixbased works break the intrinsic structure of the color image and inevitably suffer from loss of structural information and The associate editor coordinating the review of this manuscript and approving it for publication was Michele Nappi . global correlation among color channels. To provide a faithful representation of structural properties for real color images, tensors provide an intuitive way to represent multidimensional data. As a generalization of matrix, tensor can be viewed as a multidimensional array. Real-world color images can be naturally regarded as a third-order tensor with column, row and color modes. There is an increasingly growing interest in recovery of low-rank tensor from the degraded observation, since low-rank tensor can better characterize the local and nonlocal structures of the grouped multidimensional data [27], [33], [46]. Apart from the above model-based methods, discriminative learning based denoising methods are proposed based on the advent of deep learning technique. Although deep learning [13], [42], [43] could powerfully extract features, it is difficult to train an exact network with insufficient training sets.
The low-rank tensor recovery is an effective way to exploit low-dimensional structure in high-dimensional data. However, the tensor rank has several definitions without tight convex relaxations based on different tensor decomposition algebraic frameworks. There are basically three types of tensor factorizations: CANDECOMP/PARAFAC (CP) factorization [4], Tucker factorization [31] and tensor singular value decomposition (t-SVD) [16]. Specifically, given an Nth-order tensor A ∈ R I 1 ×I 2 ×···×I N , the CP decomposition model is the where • is the outer product, and u (n) r ∈ R I n is the rth moden factor. According to [14], to determine the CP rank is NPhard and its covex relaxation is ill-posed. The Tucker model decomposes Nth-order tensor A ∈ R I 1 ×I 2 ×···×I N as A tc = Z ×1 U (1) ×2 U (2) · · · ×N U (N ) , where ×n denotes the mode-n matrix product, Z is the core tensor, and U (n) ∈ R I n ×R n is the mode-n factor matrix. The tucker rank is defined as a vector, rank tc (A) := (rank(A (1) ), rank(A (2) ), · · · , rank(A (N ) )), where A (n) ∈ R I n ×(I 1 ···I n−1 I n+1 ···I N ) is the mode-n unfolding (matricization) of A. U (n) is composed by the orthogonal bases of A (n) . Motivated by that the nuclear norm is the convex relaxation of the matrix rank, some methods [19] adopt sum of matrix nuclear norm (SNN), i.e. i A (i) * as a convex surrogate of the Tucker rank. Albeit the tucker rank and its convex relaxation are tractable and widely used, its direct unfolding and folding operation tend to damage intrinsic structure information of the tensor and SNN is not the convex envelope of the tucker rank according to [28]. Based on the tensor-tensor product (t-product), the t-SVD model decomposes a third-order tensor Y ∈ R I 1 ×I 2 ×I 3 as Y = U * S * V * similar to the matrix SVD. The tubal rank is defined as the number of nonzero singular tubes of S [15], [44]. To recover low tubal rank tensor, the tensor nuclear norm (TNN) minimization has achieved competitive performance in [22], [44], since t-SVD based methods are better in exploiting the intrinsic structure and correlation information compared with the matricization operation of the Tucker model. The tensor nuclear norm is defined as the sum of the matrix nuclear norm of each tensor frontal slices in Fourier space. As matrix nuclear norm minimization tends to over-shrink the rank components and treats each rank component equally, the tensor nuclear norm minimization has similar restrictions and the weighted tensor nuclear norm minimization is introduced in [20].
Inspired by the fact that using non-convex l p -norm minimization (0 < p < 1) could get better results than the convex l 1 -norm minimization [35], we propose the weighted t-Schatten-p norm minimization model to denoise real color images, which is flexible to deal with different rank components. In [35], WSNM has demonstrated superior denoising performance on grayscale images. However, such matrixbased works have to restructure the three-dimensional data into a matrix, which will lead to an information loss for a real color image. In this paper, we treat a real color image as a third-order tensor with each color channel corresponding to a frontal slice, and introduce low tubal rank minimization to exploit the spatial and spectral information by utilizing the nonlocal self-similarity and DFT (Discrete Fourier Transformation) along the color channel of the image. we first search similar patches for each local patch to form the nonlocal patch group tensor (NPGT) Y. Then we adopt the weighted t-Schatten-p-norm as the low tubal rank tensor regularization and solve the minimization problem via the generalized iterated shrinkage algorithm. In summary, our contribution is three-fold. Firstly, we propose a weighted t-Schatten-p norm minimization for real color image denoising. Our method employs the low tubal rank tensor prior to model the spatial nonlocal self-similarity and spectral correlation information of real color images. The weighted t-Schattenp norm is introduced as the surrogate functional of low tubal rank constraint to avoid over-shrink the tensor tubal rank components. Secondly, we get the optimal closed-form solution via the generalized iterated shrinkage algorithm. Finally, extensive experiments on both synthetic and realistic noisy image datasets validate that the proposed method could produce competitive performance compared to the state-ofthe-art methods.
The rest of this paper is organized as follows. Section II briefly introduces some notations and preliminaries. Section III and IV gives the proposed weighted t-Schattenp norm minimization model and the implementation details of the weighted t-Schatten-p norm minimization for real color image denoisng. The experimental results, comparisons with the state-of-the-art algorithms and convergence analysis are reported in Section V. Section VI concludes this paper.

II. NOTATIONS AND PRELIMINARIES
Throughout the paper, we use lowercase letters, bold lowercase letters, bold uppercase letters and calligraphic script letters to denote scalars, vectors, matrices and tensors respectively. For a third-order tensor Y ∈ R I 1 ×I 2 ×I 3 , its i-th horizontal, lateral and frontal slice are denoted as Y(i, :, :), Y(: , i, :) and Y(:, :, i). Y(i, j, :) and Y ijk are the (i, j)-th tube and (i, j, k)-th entry of tensor Y respectively. The i-th I 1 × I 2 frontal slice of Y is denoted as Y (i) for brevity. The Frobenius norm of tensor Y is defined as Y F = ijk y ijk 2 . The identity tensor I ∈ R I ×I ×I 3 is the tensor whose first frontal slice is the I × I identity matrix and other slices are zero matrixes. We denote a as a nearest integer greater than or equal to a. We useȲ to denote the result of Discrete Fourier Definition 1 (t-Product [16]): Let A ∈ R I 1 ×r×I 3 and B ∈ R r×I 2 ×I 3 , the t-product is defined as (1) ; A (2) ; · · · ; A (I 3 ) ], and fold(unfold(A)) = A.
A is denoted as a block diagonal matrix, which is According to [22], t-product is equivalent to the familiar matrix multiplication in the Fourier domain. i.e., Y = A * B is equivalent toȲ =ĀB and the property Y 2 F = 1 Definition 2 (t-SVD [16], [22]): where U ∈ R I 1 ×r×I 3 , V ∈ R I 2 ×r×I 3 are othogonal tensors, V * is the conjugate transpose of V obtained by conjugate transposing each frontal slice of V and then reversing the order of transposed frontal slices 2 through I 3 as The t-SVD computation is detailed in Algorithm 1.

Definition 3 (Tensor Multi-Rank and Tubal Rank
Definition 4 (Tensor Schatten-p Norm [17]): The tensor Schatten-p norm (t-Schatten-p norm) denoted by Y S p is defined to be

IV. APPLYING T-Schatten-p NORM MINIMIZATION TO REAL COLOR IMAGE DENOISING
We treat a real color image as a third-order tensor with each color channel corresponding to a frontal slice, and introduce weighted t-Schatten-p norm minimization to real color image denoising exploiting the spatial and spectral information of the image. For an observed noisy color image Y ∈ R H ×W ×C , we separate it into a set of image patches where H , W are spatial height and width, C = 3 is color channel, p is the patch size and M = (H − p + 1) × (W − p + 1) is total number of patches over the whole image. For each local patch, we search its N similar patches over the whole image to form a 4D group of N patches G ∈ R p×p×C×N . Then we reshape G into a third-order tensor X l ∈ R p 2 ×N ×C by vectorising its spatial frontal slices, and both the nonlocal similarity and spectral correlation information are well preserved by such representation.
To estimate the underlying clean nonlocal similar patch group tensor X l from its noisy observation Y l , we solve the following optimization problem.
Following the analysis in Section III, each frontal slice of X l could be computed by arg min where σ 2 n is the noise variance, the first term in (12) represents the data conformity term, and the second term indicates the weighted t-Schatten-p norm of low rank regularization. Similar to [35], the diagonal elements of W (k) are empirically set as where c is a constant, N is the number of similar patches, δ i is the i-th singular value ofX (k) , is a small positive number to avoid dividing by zero. δ i can be initialized as l , and s i is a diagonal entry of S (k) . The optimization problem (11) could be solved via Algorithm 3. The underlying clean color imageX could be recovered by aggregating all the estimatedX l s. To restore clean image iteratively, we utilize the iterative regularization method in [9], [26] to enhance the denoising performance. The central idea of the iterative regularization is to add filtered residual back to the denoised image.
where t is the iteration number, α is a relaxation parameter. The proposed t-Schatten-p norm minimization for real color image denoising is detailed in Algorithm 4. The estimated standard deviation of noise is monotonically decrease and the underlying clean image is progressively recovered until convergence as the iterations progress.

Algorithm 4 The Proposed Algorithm for Real Color Image Denoising
Input: real color image Y Output: denoised imageX 1: Initialization:X 0 = Y 2: for t = 1 to T do 3: Extract local patches {Y t l } 1≤l≤L from Y t ; 5: for each patch Y t l do 6: Search nonlocal similar patches to form Y t l ; 7: Obtain the estimatedX t l via the weighted t-Schatten-p norm minimization. 8: end for 9: AggregateX t l to form the color imageX t . 10: end for

V. EXPERIMENTAL RESULTS
In this section, we first evaluate the denoising performance of the proposed method on one synthetic and two public realistic image datasets, and then we compare it with nine representative methods, including color blockmatching 3D filtering (CBM3D) [7], multi-channel weighted nuclear norm minimization (MCWNNM) [38], guided image denoisng (GID) [36], trilateral weighted sparse coding (TWSC) [37], multi-channel weighted Schatten-p norm minimization (MCWSNM) combining WSNM in [35] and MCWNNM in [38] for color images denoising, color multispectral t-SVD (CMSt-SVD) [18], denoising convolutional neural networks (DnCNN) [42], fast and flexible denoising network (FFDNet) [43] and weighted tensor nuclear norm minimization (WTNNM) [20]. All the denoising results of VOLUME 8, 2020 compared methods are obtained via the source codes released from the authors' website by fine-tuned parameters. Peak signal-to-noise ratio (PSNR) and structure similarity (SSIM) indices are adopted as objective evaluation criteria. All the experiments are implemented in Matlab 2017a on a PC with an Intel Xeon E5-2620 2.10GHz CPU, 256GB RAM and an Nvidia Titan X GPU.

A. EXPERIMENTAL RESULTS ON SYNTHETIC DATASET
The Kodak dataset (http://r0k.us/graphics/kodak/) which includes 24 lossless color images is utilized in our simulated experiments. The simulated noisy images are generated by adding AWGN with σ r = 40, σ g = 20, σ b = 20 to each color channel independently.
We empirically set patch size to 8 × 8 and the number of nonlocal similar patches as N = 90. The involved parameters α, c, p are set as 0.1, 2 √ 2, and 0.95 respectively. Table 1 lists the PSNR and SSIM results of all comparative denoising methods, and the best results are highlighted in bold. It can be seen that our method has higher average PSNR and SSIM than other methods. Specifically, our proposed method can bring an average gain of 0.56dB, 0.37dB, 0.18dB, 0.48dB, 0.27dB, 0.63dB, 0.44dB, 0.23dB and 0.09dB in terms of PSNR over CBM3D, MCWNNM, GID, MCWSNM, TWSC, CMSt-SVD, DnCNN, FFDNet, and WTNNM respectively. We show the denoised images in Figure 1 to further depict the denoising performance. It can be observed from the red box that TWSC tends to over-smooth the image details while CBM3D, MCWNNM, GID, MCWSNM, CMSt-SVD, DnCNN, FFDNet and WTNNM are likely to generate some undesirable color artifacts. The proposed t-Schatten-p norm minimization method could recover fine details and suppress undesirable color artifacts, providing competitive visually pleasant denoising images.
When the standard deviations of the AWGN are σ r = 75, σ g = 60, σ b = 60 for R, G, B color channels, the results of competing denoising methods for Kodak dataset are listed in Table 2 and our proposed method has higher average PSNR and SSIM than other denoising methods. It can be seen that our proposed method could produce competitive performance consistently on different noise levels compared with other competing methods.

B. EXPERIMENTAL RESULTS ON REAL-WORLD DATASETS
We adopt two publicly available realistic image datasets to evaluate our proposed method. The CC15 dataset includes 15 noisy and ground truth pair images (with size 512 × 512 × 3) of 11 static scenes, provided in [25] as shown in Figure 2. The CC60 dataset comprises 60 pairs of images of size 500 × 500 × 3, cropped in [36]. The images in CC15 dataset and CC60 dataset are from different shots. The standard deviation of noise has been given as a parameter for synthetic noisy images dataset. However, when it comes to real color image datasets, the noise level of each color channel is assumed to be Gaussian and could be estimated by the noise estimation method in [5].
The PSNR and SSIM results of CC15 and CC60 datasets are tabulated in Table 3 and 4, and the best results are highlighted in bold. In Table 3, the PSNR and SSIM results of CC15 are categorized into five groups based on different camera settings, and each group comprises three different scenes numbered as 1, 2, and 3. The average PSNR and SSIM results of competing methods on the holistic datasets (CC15 and CC60) are also listed in Table 3 and 4. It can be seen that the proposed method achieves better average PSNR and SSIM results of holistic datasets (CC15 and CC60) than other competing methods in Table 3 and Table 4.
The visual performance of the comparative denoising methods are shown in Figure 3 and Figure 4. Figure 3 shows the denoised images of the real noisy image with rich fur texture captured by Nikon D800 with ISO=6400 from CC15 dataset by different methods. It can be observed that   our proposed method is effective in removing noise as well as preserving the fur texture, while other competing methods remain some noise in their corresponding denoised images. In comparison, our proposed method could remove the noise without generating much noise. The denoised performance of the realistic image from CC60 dataset in Figure 4 demonstrates that CBM3D, MCWNNM, GID, TWSC, CMSt-SVD and FFDNet are likely to generate some color artifacts, while WTNNM and our proposed method are effective in removing noise. It is obvious that our proposed method could yield visually pleasant results.

C. CONVERGENCE ANALYSIS
Since the proposed t-Schatten-p norm minimization model is non-convex, it is difficult to prove the model's global convergence theoretically. Here we provide quantitative results to demonstrate the convergence of the proposed method. Figure 5 shows the curves of the PSNR indexes versus iteration numbers about two images from the Kodak Dataset. It can be observed that the PSNR values gradually increase and become flat afterwards as the iteration numbers goes on, illustrating the convergence of our proposed method empirically.

D. COMPARISON OF COMPLEXITY ANALYSIS
We assume that the size of the extracted patch is p × p, the number of the similar patches within a group is N , and the number of the patch group is M . The main computation cost of the proposed method concentrates on computing FFT,   [25], [36].    to [18], the time complexity of CBM3D and CMSt-SVD are O([p 2 log(p) + p 2 log(N )]NM ) and O([p 3 + p 4 ]NM ). The computation cost of MCWNNM in [38] and TWSC in [37] are O([p 4 N +N 3 ]MT ) and O([max(p 6 , p 2 N )]MT ). Thus, our proposed method is less efficient compared to CBM3D and CMSt-SVD. The running time results of different methods are listed in Table 4 and the fastest result is highlighted in bold. Although DnCNN and FFDNet are much faster than other methods benefitting from GPU computation, they have to take about 1 day to train specific models on GPU. It should be noted that CBM3D, CMSt-SVD are implemented with compiled C++ mex-function and parallel computation, therefore their computational time is much less than other competing methods like MCWNNM, TWSC, WTNNM and the proposed method which are implemented in Matlab. Although the proposed method is more computational complicated, it delivers better quantitative denoising results with finer details and less artifacts.

VI. CONCLUSION
In this paper, we treat a real color image as a third-order tensor with each color channel corresponding to a frontal slice, and introduce low tubal rank minimization to exploit the spatial and spectral information of the image. we propose a weighted t-Schatten-p norm minimization for real color image denoising. The proposed method is flexible to provide different treatment to different rank components, therefore the denoised image is closer to the latent low rank tensor. Extensive experiments on one synthetic and two realistic image datasets demonstrate that our proposed method could produce competitive performance compared to other state-ofthe-art methods for real color image denoising.