Local Image Denoising using RAISR

Digital images are frequently degraded by Gaussian noise while capturing photos. This paper proposes a rapid and high accurate Gaussian noise removal method by applying the learned linear filter used in RAISR for super-resolution. The denoising methods are classified into local, nonlocal methods and deep-learning-based methods. The conventional local processing has a problem that high-frequency components of the original image are lost while reducing the noise. The nonlocal and deep-learning-based methods achieve higher denoising performance but take a long time for training and implementation. To solve these problems,we apply a super-resolution method to the local denoising method as post-processing because it can efficiently recover the high-frequency components. The super-resolution method uses a learned linear filter according to the feature of patches. The novelty of this paper is that the same processing as super-resolution is incorporated into denoising. The proposed algorithm is a rapid local denoising method and can achieve comparable performance to the high-accurate nonlocal denoising methods. Experimental results show that our proposed method provides accurate denoising performance with a low computational cost compared to nonlocal processing like BM3D.

Image denoising is the reconstruction of an original image from noisy observations gathered by a digital camera sensor without affecting critical features such as edges, textures, and singularities in the image. During image acquisition and transmission, digital images are frequently contaminated by various types of noise. Some common noises encountered in digital images in the real world are Gaussian noise, Impulse noise, Poisson noise, Speckle noise, etc. Gaussian noise is a statistical noise having a probability density function equal to that of the normal distribution. It is usually caused by the thermal motion of electrons in the camera sensor while taking digital images. We focus on the removal of Gaussian noise, which has been widely researched. The goal of image denoising is to recover a clean image from a noisy measurement, z = x + n (1) where z is the observed signal, x is the original signal and n is Gaussian noise. One common assumption is that n is additive white Gaussian noise (AWGN) with zero means and variance σ 2 . Many denoising algorithms for the removal of Gaussian noise have been studied for few decades. Classically, the mean filter or averaging filter [1] oversmooths the images when the noise level is high. Besides linear filtering, non-linear filters such as median filtering [1], [2], weighted median filtering [3], wavelet transform [4] and edge-protective filtering, socalled bilateral filtering [5], are used to suppress Gaussian noise. Besides, there are too many noise removal methods to refer in this paper since denoising is the most classic and fundamental image processing. Even though these methods can reduce noise, the edge regions of the image deteriorate because they are performed locally.
In contrast, nonlocal methods [6]- [13] estimate the pixels by extracting the target patch and its similar patches within a search range and processing them. In NLM [6], a target pixel is obtained by a weighted average of the pixel centered at the reference patch and the pixels located at the center of similar patches found at different locations within the image. Block matching and 3D (BM3D) filtering [7] is wellknown as the state-of-the-art Gaussian noise removal method. The main idea of BM3D is to stack similar patches into 3-D fragments by block matching. Hard-Thresholding and Wiener-filtering are used to remove noise in the first and second steps. The denoised image is obtained by aggregating the inverse 3-D transformed patches. Learned simultaneous sparse coding (LSSC) [8], [9] is a nonlocal framework that combines nonlocal means and sparse coding with the help of learned dictionaries to restore the image.
Nonlocally centralized sparse representation (NCSR) [10] recovers the original image from the noisy measurements by suppressing the coding noise, which is the difference between the sparse codes of the observed image and the reference image. Weighted nuclear norm minimization (WNNM) [11] transforms the stacks of vectorized similar patches from the noisy image into matrices and removes the noise by singular value decomposition. Since the nonlocal methods gather similar patches in the whole image to denoise the image, the accuracy is significantly better than the local methods. However, they take too long to process because they have to search for similar patches within an image.
Recently, deep learning-based methods such as DnCNN [14]  To the best of our knowledge, most local methods including bilateral filtering and Wiener filter have low computational cost but have poor denoising performance. Unlike nonlocal methods such as NLM, BM3D, LSSC, NCSR, and WNNM, learning-based methods such as DnCNN and FFDNet achieve an impressive image denoising performance but with high execution speed. Therefore, there is a strong demand to obtain an excellent quantitative performance within a rapid processing time. To be reliable for real applications such as mobile devices, especially smartphones, we propose a rapid and accurate local-based denoising method for Gaussian noise in this paper.
The proposed method consists of two steps: noise reduction and reconstruction phases. The noise reduction is mainly adopted by joint bilateral filter with the clearer reference image, which is made by Hard-Thresholding and RAISR. Another RAISR improves the primary removal image in the reconstruction phase.
Four main contributions are included as follows: 1) Hard-Thresholding operation is adopted to remove Gaussian noise from the noisy input image at a low calculation cost. 2) RAISR is applied to the denoised image obtained by Hard-Thresholding to compensate for high frequency components of the original image. RAISR is a highly accurate and fast super-resolution method that uses a learnable linear filter based on Local Gradient Statistics. 3) Joint bilateral filter is exploited to obtain a high-quality denoised image by local processing. It calculates the weights using a synthesized reference image. Since the quality depends on the reference image, we must get a reliable pre-estimate image from the denoised image.
4) The final denoised images are obtained by another RAISR to improve the performance.
The proposed method has the same accuracy as the non-local methods like BM3D, but with a faster runtime. The rest of the paper is organized as follows. Section. II provides a brief review of RAISR and describes its mathematical approach. Section. III presents the proposed method as well as a comprehensive explanation of the approach. with effective explanation. In Section. IV, extensive experiments are conducted to evaluate the performance of our method both quantitatively and visually. Finally, we conclude the paper in Section. V with concise remarks.

RAISR [19]
is an efficient learning-based single image superresolution method. It is highly applicable in many superresolution fields because it can produce high-quality images with a fast execution time. The main idea of RAISR is to enhance the image quality by applying pre-learned filters on the image patches which are extracted from the initial upscaled low resolution (LR) image. The filters are learned based on the pairs of low resolution (LR) patches and high resolution (HR) pixels. The basic structure of RAISR for super-resolution is illustrated in Fig.1. The size of the patch extracted from the LR image is 11 × 11, and the hash is calculated on a patch of 9 × 9 at its center. The learned filter has the same patch size as the patches from the LR image. The output target HR pixel is estimated by convolving the patches from the LR image with the pre-learned filters.

A. HASH CALCULATION
While preserving the low complexity of linear filtering, the hashing approach is able to distinguish the image patches into clusters without using expensive clustering methods (e.g., Kmeans [20], Gaussian mixture model (GMM) [21], [22]). The hash-table keys of each patch are evaluated by eigenanalysis [23] . The local gradient for the k-th pixel located at k 1 , ..., k n is calculated in the √ n × √ n nearest neighborhood of each pixel. The gradient matrix G k with n × 2 is estimated by the composition of the horizontal gradient g x and the vertical gradient g y The matrix G T k W k G k is constructed by employing a diagonal weighting matrix W k , a separable normalized Gaussian kernel, to incorporate a small neighborhood of gradient samples per pixel.
From the eigen-decomposition of G T k W k G k , the three gradients' parameters are evaluated by utilizing the larger eigenvalue λ k 1 and the smaller eigenvalue λ k 2 , and their respective eigenvectors ϕ k 1 and ϕ k 2 as λ k θ k and µ k correspond to the strength, angle and coherence, respectively. The three hash indices (λ, θ and µ) are obtained by quantizing these hash parameters, denoted by where ⌈.⌉ is the ceiling function, and, Q s , Q θ and Q µ are the quantization factors for strength, angle and coherence, respectively. The strength λ is quantized into 3 classes, the angle θ is quantized into 24 classes and the coherence µ is quantized into 3 classes to learn the filters. Hence, the input patches are classified into 216 classes by the combination of these three quantized parameters.

B. FILTER LEARNING
The filter h are learned from a database of images which consist of the upscaled versions of LR images y i ∈ R M ×N and the HR images L is the number of images in the training set. The learned filter is learned for each hash class determined by the hash table and computed by solving a least-squares minimization problem where h is the filter in vector notation with size d 2 × 1. A i is the matrix composed of patches with size d×d extracted from the image y i . b i is the vector composed of the pixels extracted from the image x i , corresponding to the center coordinates of y i patches, as shown in Fig.1. Since this is the least-square problem, its solution is easily solved [19].

III. PROPOSED METHOD
The proposed method consists of two steps: noise reduction phase and reconstruction phase. The noise reduction is mainly adopted by joint bilateral filter with the clearer reference image, which is made by Hard-Thresholding and RAISR. Another RAISR improves the primary removal image in the reconstruction phase. In each phase, a different kind of linear filter is learned in RAISR.

A. NOISE REDUCTION PHASE
The first step of the proposed method mainly consists of three parts: Hard-Threshold processing, RAISR 1 and joint bilateral filtering [24] as illustrated in Fig.2. These parts will be explained in more detail in the following sections.

1) Hard-Threshold process
Hard-Thresholding aims to remove the Gaussian noise from the input noisy image Z. We conduct the Hard-Threshold operation in the frequency domain to preserve the image details. Although it can remove noise with a fast processing time, the high-frequency information is lost. The smoothed image is calculated as follows.
where T is M × M 1D-DCT transform matrix, Z is an input image, B is a Hard-Thresholded image, and λ thr is a threshold value.

2) RAISR
Since keeping the details in the image is preferable, the lost high frequency components are compensated by RAISR 1.
The RAISR processing is performed on the image after Hard-Threshold processing using 216 learned filters to obtain the image G 1 . In the training phase of RAISR, the input image is an image denoised by Hard-Threshold processing, and the objective image is the Ground Truth. This inference generates the pre-estimated image G 1 by recovering the missing highfrequency components of the image after Hard-Threshold processing.

3) Joint bilateral filter
The noise reduction is mainly adopted by joint bilateral filter. Joint bilateral filtering can produce high-quality denoised images if cleaner reference images are used as the guide [25]. Then the pre-denoised image G 1 produced by RAISR is further cleaned by reducing the low-frequency noise and used as the guide for joint bilateral filter. The target pixelĝ i VOLUME 3, 2020 /0122,03 +,*-). is estimated by a joint bilateral filter with the weights for both spatial difference G s and luminance difference G r as follows.
where u is the Euclidean distance, z is the input patch, q is the guide image, and the position i is the center in the target patch. The parameters G s and G r are Gaussian functions for image smoothing. In the proposed joint bilateral filter, a guide image is produced for the luminance difference term. The cleaner guide image is synthesized as where G 2 is obtained by applying a Gaussian filter to G 1 to further reduce the low frequency noise, and ⊙ denotes the element-wise product. M is a binary matrix whose components are m i = 1 in the edge regions and m i = 0 in the smooth regions, which is obtained by applying a Sobel filter to the image G 1 . Finally, the primary removal imagê G is obtained by joint bilateral filter in the first step of the proposed method.

B. RECONSTRUCTION PHASE
Hard-Threshold processing cannot suppress the noise components in the low frequency domain. Therefore, the preestimated image G 1 may still contain some low-frequency noise. In RAISR 2, we use not only the primary removal imageĜ, but also the pre-estimated image G 1 as the reference images to get the feature vector. Then, we apply another learned filter to the feature vector, which consists of the vectorized patches from image G 1 after RAISR 1 and the primary removal imageĜ. The details of the second step is shown in Fig. 3. In RAISR 2, the hash is calculated from only the primary removal imageĜ because more accurate image structure can be obtained. The number of classes in RAISR 2 is the same as RAISR 1. The difference with RAISR 1 is how to make the learnable filter. The filter for RAISR 2 is learned based on the concatenated patches extracted from the pre-estimated image G 1 and the primary removal imageĜ, and the pixels from the Ground Truth, to achieve higher denoising performance. Therefore, h in Eq. (7) is the filter in vector notation with size 2d 2 × 1 and A i is the matrix composed of two patches with size d × d extracted from G 1 andĜ. The learned filters can recover high frequency components which are removed in the first step. Moreover, the feature vector can preserve the lowfrequency noise in the imageĜ after the joint bilateral filter. Therefore, we can reconstruct the denoised image without degrading the edges of the primary removal image.

IV. EXPERIMENTAL RESULTS
We compare our proposed Gaussian noise removal method with several image denoising methods including bilateral filter [5], NLM [6], BM3D [7], WNNM [11], and DnCNN [14]. The peak signal-to-noise ratio (PSNR) and SSIM are used as a quantitative metric for performance evaluation. All the experiments conducted in this paper are run on a 2.2 GHz Intel Core i7 processor with 8GB 1600 MHz DDR3 memory using MATLAB (R2018b).

A. PARAMETER SETTINGS
The parameter settings utilized in the proposed method are as follows: the patch size of the RAISR filter is set to 11 × 11. The size of the hash table index is 9 × 9 at its center for the computation of hash key parameters. These parameters are the same as the original RAISR. The value of λ thr is 1.08 × σ × 10 −2 for both steps in the proposed method. We use 191 training images including Yang et al's Set91 [26] and General100 [27] to train the filters in RAISR 1 and 2. The implementation codes are downloaded from the corresponding author's websites, and we use the same default parameters in our experiments.  (d) WNNM [11] (e) DnCNN [14] (f) proposed (d) WNNM [11] (e) DnCNN [14] (f) proposed

B. QUANTITATIVE AND VISUAL EVALUATION
Tab.1 and 2 present the quantitative comparison of our proposed method with other noise reduction methods in PSNR and SSIM, respectively. Experiments are performed on 12 widely used testing images corrupted by different Gaussian noise levels σ = 10, 30, and 50. The best PSNR and SSIM values are marked in boldface. As can be observed, our proposed Gaussian noise removal method provides better PSNR and SSIM values than the local denoising method of bilateral filter [5] in all test images at all noise levels. Compared to NLM [6], our proposed method achieves higher average PSNR and SSIM values at all noise levels. However, our method underperforms in images such as Starfish, Parrot, and House at noise level σ = 30. The denoising performance of our method is comparable to the benchmark BM3D [7] in most images, especially Butterfly and Man images. However, it can be seen that the nonlocal method WNNM [11] outperforms our method by 0.52dB, 0.44dB and 0.4dB on average at noise levels σ = 10, 30, and 50, respectively. Analogously, our proposed noise reduction method for Gaussian is approximately 0.7dB less than the learning-based method DnCNN [14] at all noise levels. These results show that the proposed method can obtain comparable denoising performance as computationally expensive methods such as BM3D, WNNM, and DnCNN.
The processing time of our proposed method is compared with other denoising methods including bilateral filter [5], NL-means [6], BM3D [7], WNNM [11], and DnCNN [14] in Tab.3. We conduct the experiments on 256 × 256 sized images and 512 × 512 sized images at Gaussian noise level σ = 50 separately. The execution time of our method is much faster than BM3D and other learning methods in both types of images because the proposed method is local, i.e., it processes on only the target patch. On the other hand, the nonlocal methods like BM3D need huge computation costs since they have to search for similar patches as the target patch throughout the whole image. In addition, our method does not need self-exemplars as WNNM and a huge number of parameters as DnCNN.
Besides the quantitative measurements to evaluate the denoised image, the qualitative performance of Lena image corrupted by noise level σ = 30 is illustrated in Fig.4 to compare the proposed method with other noise removal methods. The cropped region of the image is highlighted in a black box to distinguish the image features clearly. Some fine details are well restored in the proposed method similar to BM3D and WNNM while producing sharper edges than these methods. Compared to DnCNN, some artifacts occur in the flat areas in our method, although the edges are significantly recovered. Fig.5 shows the visual observations on Butterfly image with Gaussian noise level 50. Our proposed method can reduce noise more efficiently than BM3D in smooth areas. In addition, the strong edges can be well preserved similar to WNNM and DnCNN due to the advantage of using the RAISR filter in our proposed Gaussian noise removal method. The visual evaluation of one image taken from Kodak dataset corrupted by Gaussian noise level σ = 30 and one image from BSD68 dataset with noise level 50 are illustrated in Fig.6 and Fig.7, respectively. The zoomed-in region of each image are shown in a rectangular box. In the Kodak image, our proposed method not only successfully removes the noise but also restores some image details (e.g., letters on the hat) except for over-smoothing in the flat regions. The cropped region of the image from the BSD68 dataset is well preserved by the proposed method because we use RAISR filters to compensate for the high-frequency components.

D. VALIDATION STUDY OF DENOISING RESULTS
The comparison of denoising performance on 12 extensively used test images with different noise levels σ = 10, 30, and 50 is presented in Tab.5 to analyze the denoising results individually. We compare the denoised image G 1 after Hard-Thresholding and RAISR processing, the primary removal image (Ĝ), and the final output. The best PSNR value of each image at each noise level is highlighted in bold. We can see that the final output has the highest PSNR values of almost all test images on all the three noise levels except for the Butterfly image with noise level σ = 10 and Montage image degraded by Gaussian noise σ = 30 and 50. This is because RAISR 2 filter in the second step cannot remove the remaining noise in the flat areas of Butterfly image and Montage image. Additionally, the average PSNR values of the final output on all testing images are the best at all noise levels. Therefore, we can achieve comparable denoising results to nonlocal denoising methods using our method by repeating some processes.

V. CONCLUSION
This paper proposes a Gaussian noise removal method with a rapid processing speed by applying a learned linear filter used in RAISR. There are two main steps in the proposed method. In each step, individual learned filters compensate for the high frequency components lost by the previous process. With these improvements, we can achieve quantitative performance and perceptual quality comparable to nonlocal methods like BM3D and the learning methods, with a faster processing speed. In the future, we will extend to RGB color image denoising and apply to images with unknown noise levels using noise level estimation methods. Furthermore it is important to extend to real-world image denoising. (c) BM3D [7] (d) WNNM [11] (e) Proposed (c) BM3D [7] (d) WNNM [11] (e) Proposed