• Abstract

SECTION I

## INTRODUCTION

During acquisition and transmission, images are inevitably contaminated by noise. As an essential and important step to improve the accuracy of the possible subsequent processing, image denoising is highly desirable for numerous applications, such as visual enhancement, feature extraction, and object recognition [1], [2].

The purpose of denoising is to reconstruct the original image from its noisy observation as accurately as possible, while preserving important detail features such as edges and textures in the denoised image. To achieve this goal, over the past several decades, image denoising has been extensively studied in the signal processing community, and numerous denoising techniques have been proposed in the literature. In general, denoising algorithms can be roughly classified into three categories: 1) spatial domain methods; 2) transform domain methods; and 3) hybrid methods [3], [4]. The first class utilizes the spatial correlation of pixels to smooth the noisy image, the second one exploits the sparsity of representation coefficients of the signal to distinguish the signal and noise, and the third one takes advantage of spatial correlation and sparse representation to suppress noise.

Spatial domain methods, also called spatial filters, estimate each pixel of the image by performing a weighted average of its local/nonlocal neighbors, in which the weights can be determined by their similarities and higher weights are given to similar pixels. Therefore, spatial filters can be further divided into local filters and nonlocal filters. Smith and Brady [5] proposed a structure preserving local filter called SUSAN, which uses the intensity distance as a quantitative measure of the similarity between pixels. Tomasi and Manduchi [6] proposed bilateral filtering by generalizing the SUSAN filter, in which both the intensity and spatial distances are used to measure the similarity between pixels. Although these local filters are effective for preserving edges, they cannot perform very well when the noise level is high. The reason is that the severe noise destroys the correlations of pixels within local regions [7]. To overcome this disadvantage of local filters, Buades et al. [8] proposed the nonlocal mean (NLM) filter, which estimates each pixel by a nonlocal averaging of all the pixels in the image. The amount of weighting for a pixel is based on the Euclidean distance between the patch centered around the pixel being denoised and the one centered around a given neighboring pixel. In essence, NLM uses the structural redundancy, namely, self-similarity that is inherent in natural images, to estimate each pixel. NLM can be considered as an extension of the bilateral filter by the means of replacing pointwise photometric distances with patch distances. Several variants of NLM have been proposed to improve the adaptivity of the nonlocal filter [9], [10]. Talebi et al. [3] proposed a spatially adaptive iterative filtering (SAIF) to improve the performance of NLM. Recently, there has been a growing interest in exploiting the self-similarity of images to suppress noise. Chatterjee and Milanfar [11], [12] proposed a patch-based locally optimal wiener (PLOW) filter, which also exploits the structural redundancy for image denoising and achieves the near optimal performance in the minimum mean-squared error (MMSE) sense. Zhang et al. [13] proposed a two-direction nonlocal (TDNL) variational model for image denoising using the horizontal and vertical similarities in the matrix formed by similar image patches. SAIF, PLOW, and TDNL are currently considered to be state of the art in spatial domain denoising methods.

Transform domain methods assume that the image can be sparsely represented by some representation basis, such as wavelet basis and its directional extensions. Due to the sparsity of representation coefficients, noise is uniformly spread throughout the coefficients in the transform domain, while most of image information is concentrated on the few largest ones. Therefore, noise can be effectively distinguished by different coefficient shrinkage strategies, including BayesShrink [14], ProbShrink [15], BiShrink [16], MultiShrink [17], and SUREShrink [18], [19]. Despite its remarkable success in dealing with point and line singularities, the fixed wavelet transform fails to provide an adaptive sparse representation for the image containing complex singularities. To overcome the problems caused using the fixed transforms, Aharon et al. [20] proposed an adaptive representation technique using K-means and singular value decomposition (called K-SVD), which uses a greedy algorithm to learn an overcomplete dictionary for image representation and denoising. Under the assumption that each image patch can be represented by the learned dictionary, Elad and Aharon [21] proposed a K-SVD based denoising algorithm, in which each image patch can be expressed as a linear combination of few atoms of the dictionary. Although the dictionary-based methods are more robust to noise, they are computationally expensive.

Spatial-based filters and transform-based filters have achieved great success in image denoising. Their overall performance, however, does not generally surpass the hybrid methods. Due to its impressive performance, the most well-known hybrid method for image denoising is the block-matching and 3-D (BM3D) filtering reported in [22], which groups similar patches into 3-D arrays and deals with these arrays by sparse collaborative filtering. To the best of the authors’ knowledge, it is the first one that utilizes both nonlocal self-similarity and sparsity for image denoising. However, the fixed 3-D transform is not able to deliver a sparse representation for image patches containing edges, singularities, or textures. Thus, BM3D may introduce visual artifacts. Dabov et al. [23] proposed an improved BM3D filter (called BM3D-SAPCA) that exploits adaptive-shape patches and principal component analysis (PCA). Although BM3D-SAPCA achieves state-of-the-art denoising results, its computational cost is very high (Table III). Zhang et al. [24] proposed an adaptive image denoising scheme using PCA with local pixel grouping (LPG-PCA). This method uses block matching to group the pixels with similar local structures, transforms each group of pixels using locally learned PCA basis, and shrinks PCA transformation coefficients using the linear MMSE estimation technique. Both LPG-PCA and BM3D-SAPCA use the PCA basis to represent image patches. A key difference between them is that LPG-PCA applies PCA on 2-D groups of fixed-size image patches, while BM3D-SAPCA applies PCA on 3-D groups of adaptive-shape image patches. He et al. [25] presented an adaptive hybrid method called ASVD, which uses SVD to learn the local basis for representing image patches. Another SVD-based denoising method is called spatially adaptive iterative singular-value thresholding (SAIST) [26]. This method uses SVD as a sparse representation of image patches and reduces noise in images by iteratively shrinking the singular values with BayesShrink. BM3D-SAPCA and SAIST are considered to be the current state of the art in image denoising.

Table III Comparison of the Computational Time and the Implementation Language of Different Denoising Methods

In this paper, we propose a simple and efficient denoising method by combining patch grouping with SVD. The proposed method first groups image patches by a classification algorithm to achieve many groups of similar patches. Then each group of similar patches is estimated by the low-rank approximation (LRA) in SVD domain. The denoised image is finally obtained by aggregating all processed patches. The SVD is a very suitable tool for estimating each group because it provides the optimal energy compaction in the least square sense [27]. This implies that we can achieve a good estimation of the group by taking only a few largest singular values and corresponding singular vectors. While ASVD uses SVD to learn a set of local basis for representing image patches and SAIST uses SVD as a sparse representation of image patches, the proposed method exploits the optimal energy compaction property of SVD to lead an LRA of image patches. Experiments indicate that the proposed method achieves highly competitive performance in visual quality, and it also has a lower computational cost than most of existing state-of-the-art denoising algorithms.

The rest of this paper is organized as follows. In Section II, we briefly review image representation tools for the sake of completeness. We present the proposed algorithm in detail in Section III, which fuses the nonlocal self-similarity and the LRA using patch clustering and SVD. In Section IV, we report the experimental results of our method to validate its efficacy and compare it with the state-of-the-art methods. In Section V, we discuss the differences between our method and other state-of-the-art methods. Finally, we conclude this paper with some possible future work in Section VI.

SECTION II

## LINEAR IMAGE REPRESENTATION

Let $\mathbf {X}$ be a grayscale image. The basic principle of linear image representation is that the signal of interest can be decomposed into a weighted sum of a given representation basis. Thus, $\mathbf {X}$ can be represented as TeX Source$$\mathbf {X}=\sum _{i=1}^{N}a_{i}\phi _{i}$$ where $a_{i} (i=1,\ldots ,N)$ are the representation coefficients of the image $\mathbf {X}$ in terms of the basis functions $\phi _{i} (i=1,\ldots ,N)$. $\phi _{i}$ can either be chosen as a prespecified basis, such as wavelet [28], curvelet [29], contourlet [30], shearlet [31], and other directional basis, or designed by adapting its content to fit a given set of images. In general, an adaptive basis has better performance than the prespecified one.

Aharon et al. [20] proposed a learning method to achieve a set of adaptive basis (also called dictionary). This method extracts all the $\sqrt {m}\times \sqrt {m}$ patches from the image $\mathbf {X}$ to form a data matrix $\mathbf {S}=(\mathbf {s}_{1},\mathbf {s}_{2},\ldots ,\mathbf {s}_{n})\in \mathcal {R}^{m\times n}$, where $m$ is the number of pixels in each patch, $\mathbf {s}_{i}(i=1,\ldots ,n)$ are the image patches ordered as columns of $\mathbf {S}$ and $n$ is the number of patches. Then the dictionary is learned by solving the following optimization problem:TeX Source$$\min _{\boldsymbol{\Phi },\mathbf {A}}\sum _{i=1}^{n}\|\mathbf {s}_{i}-\boldsymbol{\Phi }\mathbf {a}_{i}\|_{2}^{2} \quad {\rm s.t.} \quad \|\mathbf {a}_{i}\|_{0}\leq \beta$$ where $\boldsymbol{\Phi }\in R^{m\times p}$ is the dictionary of $p$ column atoms, $\mathbf {A}=(\mathbf {a}_{1},\mathbf {a}_{2},\ldots ,\mathbf {a}_{n})\in \mathbf {R}^{p\times n}$ is a matrix of coefficients, $\beta$ indicates the desired sparsity level of the solution, and the notation $\|\mathbf {a}_{i}\|_{0}$ stands for the count of the nonzero entries in $\mathbf {a}_{i}$. Based on the learned dictionary $\boldsymbol{\Phi }$, $\mathbf {S}$ can be represented as TeX Source$$\mathbf {S}=\mathbf {\Phi A}.$$

Another method for image representation with adaptive basis selection is PCA [32], which determines the basis from the covariance statistics of the data matrix $\mathbf {S}$. The principal components transform of $\mathbf {S}$ is calculated as [33]TeX Source$$\mathbf {A}=\boldsymbol{\Phi }^{T}(\mathbf {S}-E(\mathbf {S}))$$ with $\boldsymbol{\Phi }$ defined by TeX Source$$\mathbf {\Omega _{S}}=\boldsymbol{\Phi }\boldsymbol{\Lambda }\boldsymbol{\Phi }^{T}$$ where $E(\mathbf {S})$ is the matrix of mean vectors, $\mathbf {\Omega _{S}}$ is the covariance matrix of $\mathbf {S}$, $\boldsymbol{\Phi }$ is the eigenvector matrix, and $\boldsymbol{\Lambda }=\textrm {diag}(\lambda _{1},\ldots ,\lambda _{m})$ is the diagonal eigenvalue matrix with TeX Source$$\lambda _{1}\geq \lambda _{2} \geq ,\cdots ,\geq \lambda _{m}.$$ It can easily be derived that the covariance matrix $\mathbf {\Omega _{A}}$ of the matrix $\mathbf {A}$ equals TeX Source$$\mathbf {\Omega _{A}}=\boldsymbol{\Phi }^{T}\mathbf {\Omega _{S}}\boldsymbol{\Phi }=\boldsymbol{\Lambda }$$ which implies that the entries of $\mathbf {A}$ are uncorrelated. This property of PCA can be used to distinguish between the signal and noise. It is because the energy of noise is generally spread over the whole transform coefficients, while the energy of a signal is concentrated on a small amount of coefficients.

One major shortcoming of the adaptive dictionary and PCA is that they impose a very high computational burden. An alternative method for adaptive basis selection is SVD. The SVD of the data matrix $\mathbf {S}$ is a decomposition of the form [34]TeX Source$$\mathbf {S}=\mathbf {U}\boldsymbol{\Sigma }\mathbf {V}^{T}=\sum _{i=1}^{n}\sigma _{i}\mathbf {u}_{i}\mathbf {v}_{i}^{T}$$ where $\mathbf {U}=(\mathbf {u}_{1},\ldots ,\mathbf {u}_{n})\in \mathcal {R}^{m\times n}$ and $\mathbf {V}=(\mathbf {v}_{1},\ldots ,\mathbf {v}_{n})\in \mathcal {R}^{n\times n}$ are the matrices with orthonormal columns, $\mathbf {U}^{T}\mathbf {U}=\mathbf {V}^{T}\mathbf {V}=\mathbf {I}$, and where the diagonal matrix $\boldsymbol{\Sigma }= \textrm {diag}(\sigma _{1},\ldots ,\sigma _{n})$ has nonnegative diagonal elements appearing in nonincreasing order such thatTeX Source$$\sigma _{1}\geq \sigma _{2}\geq \cdots \geq \sigma _{n}\geq 0.$$ The diagonal entries $\sigma _{i}$ of $\boldsymbol{\Sigma }$ are called the singular values of $\mathbf {S}$, while the vectors $\mathbf {u}_{i}$ and $\mathbf {v}_{i}$ are the left and right singular vectors of $\mathbf {S}$, respectively. The product $\mathbf {u}_{i}\mathbf {v}_{i}^{T}$ in (8) can be considered as an adaptive basis, and $\sigma _{i}$ as the representation coefficient.

In fact, SVD and PCA are intimately related. PCA can be performed by calculating the SVD of the data matrix $({1}/{\sqrt {n}})\mathbf {S}^{T}$ (refer to [35] for more details). In addition, if a matrix is low rank, we can easily estimate it from its noisy version by the LRA in SVD domain. Thus, we propose a new denoising method using SVD instead of PCA in the following section, which has a low computational complexity.

SECTION III

## PROPOSED METHOD

Based on the analysis of SVD in Section II, we propose an efficient method to estimate the noise-free image by combining patch grouping with the LRA of SVD, which leads to an improvement of denoising performance. The main motivation to use SVD in our method is that it provides the optimal energy compaction in the least square sense, which implies that the signal and noise can be better distinguished in SVD domain. Fig. 1 shows a block diagram of the proposed approach. Concretely, the patch grouping step identifies similar image patches by the Euclidean-distance-based similarity metric. Once the similar patches are identified, they can be estimated by the LRA in the SVD-based denoising step. In the aggregation step, all processed patches are aggregated to form the denoised image. The back projection step uses the residual image to further improve the denoised result.

Fig. 1. Block diagram of the proposed denoising algorithm.

For ease of presentation, let $\mathbf {Y}$ denote a noisy image defined by TeX Source$$\mathbf {Y}=\mathbf {X}+\mathbf {E}$$ where $\mathbf {X}$ is the noise-free image, and $\mathbf {E}$ represents the additive white Gaussian noise (AWGN) with the standard deviation $\tau$ that, in practice, can usually be estimated by various methods such as median absolute deviation (MAD) [36], SVD-based estimation algorithm [37], and block-based ones [38], [39]. In this paper, we use a vectorized version of the model (10)TeX Source$$\mathbf {y}=\mathbf {x}+\mathbf {e}.$$ Given a noisy observation $\mathbf {y}$, our aim is to estimate $\mathbf {x}$ as accurately as possible.

As similarly done in BM3D and LPG-PCA, the proposed method also has two stages: 1) the first stage produces an initial estimation of the image $\mathbf {x}$ and 2) the second stage further improves the result of the first stage. Different from them, our method adopts the LRA to estimate image patches and uses the back projection to avoid loss of detailed information of the image. Each stage contains three steps: 1) patch grouping; 2) SVD-based denoising; and 3) aggregation. In the first stage, the noisy image $\mathbf {y}$ is first divided into $M$ overlapping patches denoted by $\{\mathbf {y}_{i}\}_{i=1}^{M}$, where $\mathbf {y}_{i}$ is a vectorized format of the $i$th image patch. For each patch $\mathbf {y}_{j}$, its similar patch group is formed by searching similar patches from $\{\mathbf {y}_{i}\}_{i=1}^{M}$. Next, each similar patch group is denoised by the low-rank approximation in SVD domain. Third, the denoised image $\mathbf {\widehat {x}}_{0}$ is achieved by aggregating all denoised patches. In the second stage, the final denoised image is obtained by applying the processing steps described above on the image $\widetilde {\mathbf {y}}$ produced by the back projection process. In the rest of this section, the procedures of our proposed method will be described in detail.

### A. Patch Grouping

Grouping similar patches, as a classification problem, is an important and fundamental issue in image and video processing with a wide range of applications. While there exist many classification algorithms available in [40], e.g., block matching, $K$-means clustering, nearest neighbor clustering, and others, we exploit the block-matching method for image patch grouping due to its simplicity.

For each given reference patch $\mathbf {y}_{j}$ with size $\sqrt {m}\times \sqrt {m}$, the block-matching method finds its similar patches from $\{\mathbf {y}_{i}\}_{i=1}^{M}$ by a similarity metric. In [22], the Euclidean distance from the transform coefficients is used to identify the similar square patches. A shape-adaptive version of this similarity metric is presented in [23], whereas it leads to a high computational cost. The simplest measure of similarity between two patches is the Euclidean distance directly in the spatial domain. Thus, we employ the spatial Euclidean distance as our similarity metric, which is defined by TeX Source$$S(\mathbf {y}_{j},\mathbf {y}_{c})=\|\mathbf {y}_{j}-\mathbf {y}_{c}\|^{2}_{2}$$ where $\|\cdot \|_{2}$ denotes the Euclidean distance and $\mathbf {y}_{c}$ is a candidate patch. The smaller $S(\mathbf {y}_{j},\mathbf {y}_{c})$ is, the more similar $\mathbf {y}_{j}$ and $\mathbf {y}_{c}$ are. The reference patch $\mathbf {y}_{j}$ and its $L$-most similar patches denoted by $\{\mathbf {y}_{c,i}\}_{i=1}^{L}$ are chosen to construct a group matrix using each similar patch as a column of the group matrix, and its corresponding group matrix $\mathbf {P}_{j}$ is formed by TeX Source$$\mathbf {P}_{j}=[\mathbf {y}_{j},\mathbf {y}_{c,1},\ldots ,\mathbf {y}_{c,L}].$$ Due to $\mathbf {P}_{j}$ being made up of the noisy patches, it can be represented as TeX Source$$\mathbf {P}_{j}=\mathbf {Q}_{j}+\mathbf {N}_{j}$$ where $\mathbf {Q}_{j}$ and $\mathbf {N}_{j}$ denote the noise-free group matrix and the noise matrix, respectively.

In general, the number $L$ of similar patches in the group matrix cannot be too small. Too small $L$ leads to too few patches within each group matrix, which makes the SVD-based denosing less robust. On the contrary, too large one leads to dissimilar patches being grouped together, which results in an incorrect estimation of $\mathbf {P}_{j}$. Similarly, the patch size $\sqrt {m}\times \sqrt {m}$ also has an impact on the performance of our method. We will discuss the influence of $L$ and the patch size in Section IV-C.

### B. SVD-Based Denoising

For simplicity of description, we will use $\mathbf {Q}$ and $\mathbf {P}$ instead of $\mathbf {Q}_{j}$ and $\mathbf {P}_{j}$ by a slight abuse of notation. Now our task is to estimate the noise-free group matrix $\mathbf {Q}$ from its noisy version $\mathbf {P}$ as accurately as possible. Ideally, the estimate $\widehat {\mathbf {Q}}$ should satisfy TeX Source$$\|\mathbf {P}-\widehat {\mathbf {Q}}\|_{F}^{2}=\tau ^{2}$$ where $\|\cdot \|_{F}$ is the Frobenius norm1 and $\tau$ is the standard deviation of noise.

The similarity between patches within the noise-free image $\mathbf {x}$ leads to a high correlation between them, which means that $\mathbf {Q}$ is a low-rank matrix. Fig. 2 shows the low-rank property of $\mathbf {Q}$ by displaying the singular values of group matrices of Lena image with different noise levels, where each point is the average $i$th singular value over all group matrices. The estimate of $\mathbf {Q}$ can be obtained by the LRA in the least square sense. Therefore, we can estimate $\mathbf {Q}$ from $\mathbf {P}$ by solving the following optimization problem:TeX Source$$\widehat {\mathbf {Q}}=\arg \min _{\mathbf {Z}}\|\mathbf {P}-\mathbf {Z}\|_{F}^{2} \quad {\rm s.t.} \quad \textrm {rank}(\mathbf {Z})=k$$ where $\textrm {rank}(\cdot )$ denotes the rank of matrix $\mathbf {Z}$.

Fig. 2. Singular values of group matrices of Lena image with different noise levels.

In SVD domain, $\mathbf {P}$ can be represented as TeX Source$$\mathbf {P}=\mathbf {U}\boldsymbol{\Sigma }\mathbf {V}^{T}\!.$$ Let TeX Source$$\mathbf {P}_{k}=\mathbf {U}\boldsymbol{\Sigma }_{k}\mathbf {V}^{T}$$ where $\boldsymbol{\Sigma }_{k}$ is obtained from the matrix $\boldsymbol{\Sigma }$ by setting the diagonal elements to zeros but the first $k$ singular values TeX Source$$\boldsymbol{\Sigma }_{k}=\textrm {diag}(\sigma _{1},\ldots ,\sigma _{k}, 0, \ldots , 0).$$ $\mathbf {P}_{k}$ is the solution of (16), which is a classical result given by the Eckart–Young–Mirsky theorem [41], [42].

#### Theorem 1 (Eckart–Young–Mirsky)

For any real matrix $\mathbf {P}$, if the matrix $\mathbf {Q}$ is of rank $k$, then TeX Source$$\|\mathbf {P}-\mathbf {Q}\|_{F}^{2}\geq \sum _{i=k+1}^{n}\sigma _{i}^{2}$$ where $\sigma _{i} (i=1, {\dots },n)$ are the singular values of $\mathbf {P}$, and equality is attained when $\mathbf {Q}=\mathbf {P}_{k}$ is defined by (18).

This theorem shows that $\mathbf {P}_{k}$ is the optimal solution for (16) in the Frobenius norms sense. Thus, we have TeX Source$$\widehat {\mathbf {Q}}=\mathbf {P}_{k}.$$ The key issue for this method is to determine the value of $k$. By comparing (15) with (20), we can find that $\mathbf {P}_{k}$ is the ideal estimate of $\mathbf {P}$ when $\sum _{i=k+1}^{n}\sigma _{i}^{2}$ is equal to $\tau ^{2}$. Therefore, $k$ can be determined by the following criterion:TeX Source$$\sum _{i=k}^{n}\sigma _{i}^{2}>\tau ^{2}\geq \sum _{i=k+1}^{n}\sigma _{i}^{2}.$$

### C. Aggregation

Till now, we have estimated each group matrix by applying the LRA defined by (21). Then the denoised patches can be obtained by rearranging column vectors of each denoised group matrix. As a result of taking the $L$ nearest neighbors of each patch to construct a group matrix, a single patch might belong to several groups, and multiple estimates of this patch can be obtained. Thus, we aggregate different estimates of this patch to obtain its denoised version by the following averaging process:TeX Source$$\widehat {\mathbf {x}}_{i}=\frac {1}{n}\sum _{j=1}^{n}\widehat {\mathbf {x}}_{i,j}$$ where $\widehat {\mathbf {x}}_{i}$ is the denoised version of a patch $\mathbf {y}_{i}$, and $\widehat {\mathbf {x}}_{i,j} (j=1,\ldots ,n)$ denote $n$ different estimates of $\mathbf {y}_{i}$.

The next step is to synthesize the denoised image from the denoised patches. Since the patches are sampled with overlapping regions for avoiding block artifacts at the boundaries of patches, multiple estimates are obtained for each pixel. Thus, these estimates of each pixel in the image need to be aggregated to reconstruct the final denoised image. The common method of combining such multiple estimates is to perform a weighted averaging of them. Meanwhile, the weighted averaging procedure can suppress noise further. The simplest form of aggregation is the uniformly weighted averaging that assigns the same weight to all estimates. However, the uniform weights will lead to an oversmoothened result. In general, the adaptive weights derived from various biased and unbiased estimators, such as variance-based weights, SURE-based weights, and exponential weights [10], can lead to better results. Different from these adaptive weights, in this paper, we exploit the weights depending on the rank $k$ of each group matrix due to its simplicity. For the $j$th group matrix $\widehat {\mathbf {Q}}_{j}$, our weight is defined by TeX Source$$w_{j}= \begin{cases} 1-\frac {k}{L+1}, & k<L+1 \\ \frac {1}{L+1}, & k=L+1. \end{cases}$$

If $k<L+1$, it means that patches in the group matrix are linearly correlated. The higher the degree of correlation of patches is, the smaller the rank $k$ of the group matrix is. The estimate of patches yielded from the LRA is better. Thus, this estimate needs to be assigned a high weight. If $k=L+1$, there exists no correlation among patches. The simplest uniform weight is used. Based on the weights defined in (24), the denoised estimate for the $i$th pixel of the image can be expressed as TeX Source$$\hat {x}_{i}=\frac {1}{W}\sum _{j\in \Gamma (x_{i})}w_{j}\hat {x}_{i,j}$$ where $W$ is a normalizing factor defined by TeX Source$$W=\sum _{j}w_{j}.$$ $\Gamma (x_{i})$ denotes the index set of all similar group matrices containing the pixel $x_{i}$, which is described as TeX Source$$\Gamma (x_{i})=\{j|x_{i}\in \mathbf {Q}_{j}, j=1, {\dots },C\}$$ and $\hat {x}_{i,j}$ denotes the denoised estimate of the $i$th pixel in the $j$th similar group matrix $\widehat {\mathbf {Q}}_{j}$. Once all pixels are estimated by (25), the final denoised image can be obtained by reshaping the estimates of all pixels.

### D. Back Projection

Although most of noise can be removed using the denoising procedures described before, there still exists a small amount of noise residual in the denoised image. The noise residual stems from the fact that noise in the original noisy image affects the accuracy of the patch grouping, which leads to an inaccurate group. The grouping errors in turn affect the SVD-based denoising. In addition, there exists another reason for noise residual. Ideally, based on the discussion in Section III-B, the optimal estimate $\widehat {\mathbf {Q}}$ satisfies TeX Source\begin{align} \|\mathbf {P}-\widehat {\mathbf {Q}}\|_{F}^{2} = \|\mathbf {P}-\mathbf {Q}\|_{F}^{2}\Longrightarrow&\|\mathbf {P}-\widehat {\mathbf {Q}}\|_{F}^{2} = \|\mathbf {N}\|_{F}^{2}\notag \\[-3pt]\Longrightarrow&\sum _{i=k+1}^{n}\sigma _{i}^{2}=\tau ^{2}. \end{align} Unfortunately, the left side of (28) is not usually equal to the right side. In most cases, it is that $\tau ^{2}> \sum _{i=k+1}^{n}\sigma _{i}^{2}$. Therefore, we need to further improve the denoising performance of our method.

The commonly used way to further improve the performance of a denoising method, as used by the clustering-based denoising method using locally learned dictionaries (named K-LLD) [43] and SAIST, is to develop an iterative version for the basic denoising method. While the iterative strategy for image denoising has been widely used in the literature, it has a very high computational cost, which limits the scope of applications. An alternative way exploited by BM3D and LPG-PCA is the two-stage approach, in which the basic estimate of the noisy image yielded by the denoising method is used as a reference image to perform improved grouping and parameter estimation.

In this paper, unlike the iteration-based or the reference-based strategies, we make use of the two-stage strategy with a back projection step to further suppress the noise residual. Back projection is an efficient method that uses the residual image to improve the denoised result [44], [45]. In fact, the use of the residuals in improving estimates can date at least back to [46], in which this idea is termed twicing. This concept is also known by several names, such as Bregman iterations, $l_{2}$-boosting, and biased diffusion. Milanfar [47] provides a good overview of these methods. The basic idea of back projection is to generate a new noisy image by adding filtered noise back to the denoised image TeX Source$$\widetilde {\mathbf {y}}=\widehat {\mathbf {x}}_{0}+\delta (\mathbf {y}-\widehat {\mathbf {x}}_{0})$$ where $\delta \in (0,1)$ is a constant projection parameter and $\widehat {\mathbf {x}}_{0}$ is the denoised result produced by the first stage. Note that when $\delta \rightarrow 0$, $\widetilde {\mathbf {y}}\rightarrow \widehat {\mathbf {x}}_{0}$. On the contrary, if $\delta \rightarrow 1$, $\widetilde {\mathbf {y}}\rightarrow \mathbf {y}$. For simplicity, in our experiments, we set $\delta =0.5$, which is a tradeoff between 1 and 0.

Now we can achieve an improved result of $\widehat {\mathbf {x}}_{0}$ by denoising $\widetilde {\mathbf {y}}$ with the proposed three processing steps in Sections III-AIII-C, i.e., patch grouping, SVD-based denoising, and aggregation, respectively. It is necessary to point out that the noise variance of $\widetilde {\mathbf {y}}$, denoted by $\widetilde {\tau }^{2}$, needs to be updated in the SVD-based denoising step. We employ the estimator presented in [26] to determine $\widetilde {\tau }^{2}$, which is written as TeX Source$$\widetilde {\tau }=\gamma \sqrt {\tau ^{2}-\|\mathbf {y}-\widehat {\mathbf {x}}_{0}\|_{F}^{2}}$$ where $\gamma$ is a scaling factor.

To summarize, the complete procedure of our proposed method is algorithmically described in Algorithm 1.

SECTION IV

## EXPERIMENTAL RESULTS

To demonstrate the efficacy of the proposed denoising algorithm, in this section, we give our experimental results concerning simulations that have been conducted on ten natural grayscale images with size $512\times 512$. These images have been commonly used to validate many state-of-the-art denoising methods. The noisy images are generated by adding zero mean white Gaussian noise with different levels to the test images. The noise level $\tau$ is from 10 to 50, and the intensity value for each pixel of the images ranges from 0 to 255.

### A. Evaluation Criteria

Two objective criteria, namely, peak signal-to-noise ratio (PSNR) and feature-similarity (FSIM) index [48], are adopted to provide quantitative quality evaluations of the denoising results. PSNR is the mostly widely used quality measure in the literature, even though it is often inconsistent with human eye perception. FSIM measures the similarity between two images by combining the phase congruency feature and the gradient magnitude feature, which is based on the fact that human visual system understands an image mainly according to its low-level features. The aforementioned criteria can comprehensively reflect the performance of the denoising methods.

### B. Denoising Performance

To quantitatively evaluate the denoising performance of our method, we compare it with five state-of-the-art image denoising methods: 1) PLOW [11]; 2) K-SVD [21]; 3) LPG-PCA [24]; 4) SAIST [26]; and 5) BM3D-SAPCA [23]. All of these denoising methods utilize the self-similarity of natural images to suppress noise. These denoising algorithms contain some control parameters, which should be tuned according to the noise level of the image. In our experiments, we use the default parameters settings suggested by the respective authors. The source codes of these denoising methods can be downloaded from the respective authors’ websites. In addition, the proposed method was implemented in MATLAB programming language due to its simplicity.2 In our experiments, we empirically set $L=85$, $\delta =0.5$, and $\gamma =0.65$ for all noise levels. Depending on the amount of noise present in the image, we set the patch size $9\times 9$ if $\tau <20$, $10\times 10$ if $20\leq \tau <40$, and $11\times 11$ if $\tau \,\geq \,40$. The influence of different parameters will be evaluated in Section IV-C.

In Table I, we quantify the performances of six competing algorithms for the test images with different noise levels in terms of PSNR and FSIM. From Table I we can observe that BM3D-SAPCA, which is considered to be the state of the art in image denoising, achieves the highest PSNR values on average, and slightly outperforms our method. However, our method performs better on images with high repeating patterns such as Elaine, Zelda, and Barbara. It is because our method sufficiently exploits the nonlocal redundancies in these images by the patch grouping procedure described in Section III-A. For example, for Elaine image, on average, our method is superior to BM3D-SAPCA by 0.14 dB, to SAIST by 0.28 dB, to LPG-PCA by 0.58 dB, to PLOW by 0.47 dB, and to K-SVD by 0.43 dB, respectively. The FSIM results of different algorithms are also tabulated in Table I. It can be observed that our method has higher FSIM measures than other methods except BM3D-SAPCA. In a word, the quantitative results by our method are competitive with BM3D-SAPCA and SAIST, and clearly superior to LPG-PCA, PLOW, and K-SVD.

Table I Comparison of the PSNR (dB) and FSIM Results of Different Denoising Methods on Test Images With Different Noise Levels.The Best Results are Highlighted in Bold

In terms of visual quality, our method also is comparable and even superior to the state-of-the-art denoising methods. Fig. 3 shows the denoising results of Lena image with a noise level of $\tau =30$. As can be seen from it, the results by the proposed method are visually close to SAIST and LPG-PCA, and better than BM3D-SAPCA, PLOW, and K-SVD, especially in the edge and texture regions. The visual comparisons are further illustrated in Fig. 4, which shows the zoomed-in denoising results of the noisy image Barbara by different methods. It can be observed that the denoised images by SAIST, LPG-PCA, and the proposed method are very similar in real visual perception, in which some edges and textures are better preserved, and fewer artifacts are introduced. We note that although BM3D-SAPCA has higher PSNR and FSIM measures than our method, denoised results by BM3D-SAPCA contain more noticeable artifacts around edges and in smooth regions than our results. The main reason is that BM3D-SAPCA exploits the orthogonal transform to represent similar image patches and reduces noise by thresholding representation coefficients, in which significantly visible artifacts are produced. (This phenomenon also was discussed in [49].)

Fig. 3. Visual comparisons of denoising results on Lena image corrupted by AWGN with standard deviation 30. (a) Original image. (b) Noisy image. (c) PLOW [11]. (d) K-SVD [21]. (e) LPG-PCA [24]. (f) SAIST [26]. (g) BM3D-SAPCA [23]. (h) Proposed method.
Fig. 4. Visual comparisons of denoising results on Barbara image corrupted by AWGN with standard deviation 30. (a) Original image. (b) Noisy image. (c) PLOW [11]. (d) K-SVD [21]. (e) LPG-PCA [24]. (f) SAIST [26]. (g) BM3D-SAPCA [23]. (h) Proposed method.

As shown in Fig. 5, we also calculated the absolute difference images between the original Barbara image and the denoised versions of six denoising algorithms. The mean absolute error (MAE) values of PLOW, K-SVD, LPG-PCA, SAIST, BM3D-SAPCA, and our method are 4.73, 5.31, 5.23, 4.75, 4.44, and 4.35, respectively. The MAE value produced by our method is lower than those by other denosing algorithms. To further demonstrate our performance, we apply the proposed method to some real noisy images.3 Fig. 6 displays the denoised images yielded by PLOW and our method. Our method can reduce the noise effectively, while preserving the finer features. In short, our denoising results are both quantitatively and visually comparable with the state of the art.

Fig. 5. Zoomed absolute difference between the original Barbara image and the denoised version. (a) PLOW [11] (${\rm MAE}=4.73$). (b) K-SVD [21] (${\rm MAE}=5.31$). (c) LPG-PCA [24] (${\rm MAE}=5.23$). (d) SAIST [26] (${\rm MAE}=4.75$). (e) BM3D-SAPCA [23] (${\rm MAE}=4.44$). (f) Proposed method (${\rm MAE}={4.35})$.
Fig. 6. Visual comparisons of denoising results on real noisy images with unknown noise characteristics.

### C. Influence of Parameters

In our method, there are four tuning parameters: 1) patch size; 2) number of similar patches in each group matrix $L$; 3) projection factor $\delta$; and 4) scaling factor $\gamma$. The patch size plays an important role in the proposed denoising algorithm. On the one hand, a too large patch size can capture the varying local geometry and also lead to a high computational cost. On the other hand, a too small patch size can reduce the denoising performance of the proposed method. To study the influence of the patch size, we set $L=85$, $\delta =0.5$, $\gamma =0.65$, and perform our method on Barbara image with different patch sizes and noise levels. The PSNR results are listed in Table II. It can be observed that choosing the patch size from $9\times 9$ to $11\times 11$ in our experiments is a reasonable tradeoff between accuracy and speed.

Table II PSNR (dB) Results of the Proposed Denoising Method on $Barbara$ Image With Different Patch Sizes and Noise Levels

The projection parameter $\delta$ controls the amount of residual image added to the output of the first stage. To analyze the effect of this parameter, we run our algorithm with different values, $\delta =0.1, 0.2, \ldots , 0.9$. Fig. 7(a) shows the denoising performance of the proposed algorithm applied to the noisy image Barbara ($\tau =30$) as a function of the parameter $\delta$. As can be seen, the highest PSNR is reached when $\delta =0.5$. Similar curves for $\gamma$ and $L$ are shown in Fig. 7(b) and (c), respectively. The best denoising result is obtained with $\gamma =0.65$. In addition, we find that our algorithm is insensitive to $L$ in the range [70, 100]. Thus, we choose $L=85$ as a tradeoff between accuracy and speed.

Fig. 7. PSNR results on Barbara image ($\tau =30$) as a function of (a) varying $\delta$ with a patch size of $10\times 10$, $L=85$, and $\gamma =0.65$, (b) varying $\gamma$ with a patch size of $10\times 10$, $\delta =0.5$, and $L=85$, and (c) varying $L$ with a patch size of $10\times 10$, $\delta =0.5$, and $\gamma =0.65$.

### D. Analysis of Iterative Denoising

Since the noise reduces the accuracy of the patch grouping, the grouping errors make the residual image of the first stage contain a lot of visual edge or texture details. In the second denoising stage, the new noisy image produced by back projection contains more structural details than the denoised image of the first stage and has a lower noise level than the original noisy image, which improves the accuracy of the patch grouping and reduces the error of the LRA in SVD domain. Therefore, a better result can be reached at the end of the second denoising round. The previously mentioned experimental results demonstrate that the two-stage denoising scheme based on back projection is very effective in suppressing noise.

Intuitively, the denoised results might be further improved if the proposed denoising scheme is iterated more than twice. Fig. 8 shows the evolution of the PSNR during the iterations. Here, we note that further iterations cannot significantly improve the denoising performance. In theory, the truncated SVD shrinkage used in our method guarantees an optimal approximation of the noisy group matrix in the least square sense, which makes most of the noise be suppressed in the first two iterations.

Fig. 8. Evolution of the PSNR with iterations for different images ($\tau =30$).

### E. Computational Cost

To evaluate the computational cost of six denoising methods, we compare the running time on the ten test images with different noise levels. We have run all the source codes by default throughout all experiments performed on a platform of an Intel Core i7-870 CPU 2.93 GHz with 4-GB memory. Denoising a $512\times 512$ grayscale image, SAIST, K-SVD, BM3D-SAPCA, LPG-PCA, and PLOW take, on average, roughly 197, 393, 925, 997, and 1875 s, respectively. The computational cost of the proposed method is quite low in comparison with the denoising algorithms above. For the test images, our MATLAB implementation requires only 77 s on average. There are two main computational components of our algorithm, one is grouping similar patches to form group matrices and the other is the calculation of SVD for each group matrix. In general, the patch grouping step takes approximately 23% of the execution time, whereas 75% of the time is spent in estimating group matrices by the LRA in SVD domain. The execution time of the various algorithms are presented in Table III. It can be seen that the proposed method provides the fastest running speed among the six denoising algorithms.

SECTION V

## DISCUSSION

The nonlocal self-similarity of natural images plays an important role in image denoising. The most well-known denoising method based on the nonlocal self-similarity is BM3D, which often produces state-of-the-art denoising results. BM3D-SAPCA is an improved version of BM3D by exploiting PCA and shape-adaptive image patches, which achieves remarkable performance. Our method utilizes the nonlocal self-similarity to construct low-rank group matrices that can be easily estimated by the LRA.

The main differences between these methods are threefold.

1. The basis functions of image representations are different. BM3D uses the fixed 3-D basis functions (joint wavelets and cosine bases) that are less adapted to the edges and textures. To improve the denoising performance, BM3D-SAPCA applies PCA to the shape-adaptive patch group, whereas it leads to a high computational cost. Our method uses an adaptive basis derived by SVD, which outperforms BM3D and BM3D-SAPCA by better preserving the local geometric structure.
2. BM3D uses the Euclidean distance from the transform coefficients to identify the similar square patches, which can improve the robustness of the block matching. In BM3D-SAPCA, the usual square patches are replaced by the shape-adaptive patches for block matching, whereas it leads to a complex aggregation process with a high computational cost. Different from BM3D and BM3D-SAPCA, the proposed method calculates the similarity metric based on the Euclidean distance directly in the spatial domain due to its simplicity.
3. In BM3D and BM3D-SAPCA, the second denoising stage is directly applied on the original noisy image that is grouped into 3-D data arrays based on the patch similarities from the denoised image in the first stage. However, the denoising for the first stage might contain grouping errors due to the effect of noise, which yields an incorrect basic estimate of the noisy image. Therefore, using this basic estimate as the pilot signal is not very ideal because it would decay the accuracy of the second denoising stage. Unlike BM3D and BM3D-SAPCA, the second denoising stage in our method is applied on a new noisy image obtained by adding a part of residual image to the basic estimate, i.e., back projection. The new noisy image contains more structural details than the output of the first stage, which improves the accuracy of the patch grouping and the LRA in SVD domain.

Besides, the existing methods based on the adaptive representation, such as K-SVD and ASVD, often need to learn a set of adaptive basis using the given training images. Unfortunately, the learning process is computationally expensive. The proposed method is intrinsically simpler than K-SVD and ASVD. And it does not need to be trained for each image separately or for a given training data set, which avoids a high computational cost for the training process.

SECTION VI

## CONCLUSION

In this paper, we have presented a simple and efficient method for image denoising, which takes advantage of the nonlocal redundancy and the LRA to attenuate noise. The nonlocal redundancy is implicitly used by the block-matching technique to construct low-rank group matrices. After factorizing by SVD, each group matrix is efficiently approximated by preserving only a few largest singular values and corresponding singular vectors. This is due to the optimal energy compaction property of SVD. In fact, the small singular values have little effect on the approximation of the group matrix when it has a low-rank structure. The experimental results demonstrate the advantages of the proposed method in comparison with current state-of-the-art denoising methods.

The computational complexity of the proposed algorithm is lower than that of most of the existing state-of-the-art denoising algorithms, but higher than BM3D. The fixed transform used by BM3D is less complex than SVD, whereas it is less adapted to edges and textures. The main computational cost of our algorithm is the calculation of SVD for each patch group matrix. As each group matrix could potentially be processed independently in parallel, our method is suitable for parallel processing. Therefore, in practice, we can use a parallel implementation to speed it up, which will make it feasible for real-time or near real-time image denoising. In addition, while developed for grayscale images, our method can be extended to shape-adaptive color image and video denoising by taking into account the shape-adaptive patches and the temporal redundancy across color components and frames. This further work will be studied in the future.

### Acknowledgment

The authors would like to thank Chatterjee and Milanfar [11], Elad and Aharon [21], Dabov et al. [23], Zhang et al. [24], and Dong et al. [26] for sharing their source codes and Prof. J. Wang with Shandong University and the anonymous reviewers for their helpful comments and suggestions to improve the quality of this paper.

## Footnotes

This work was supported in part by the National Natural Science Foundation of China under Grant 61202150, Grant 61272245, Grant 61373078, Grant 61332015, and Grant 61472220; the National Natural Science Foundation of China Joint Fund with Guangdong under Grant U1201258; the China Post-Doctoral Science Foundation under Grant 2013M531600; and the Program for Scientific Research Innovation Team in Colleges and Universities of Shandong Province.

This paper was recommended by Associate Editor A. Kaup.

1For an $m\times n$ matrix ${A }$ with elements $a_{ij} (i=1, {\dots },m,j=1, {\dots },n)$, the Frobenius norm is defined as the square root of the sum of the absolute squares of its elements, i.e., $\|A\|_{F} = \sqrt {\sum _{i=1}^{m}\sum _{j=1}^{n}a_{ij}^{2}}$.

2A demo will be available at http://qguo.weebly.com when this paper is published.

3Available at http://users.soe.ucsc.edu/%7epriyam/PLOW/. In our experiments, we use their grayscale versions.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available