IEEE Quick Preview
  • Abstract

SECTION I

INTRODUCTION

During acquisition and transmission, images are inevitably contaminated by noise. As an essential and important step to improve the accuracy of the possible subsequent processing, image denoising is highly desirable for numerous applications, such as visual enhancement, feature extraction, and object recognition [1], [2].

The purpose of denoising is to reconstruct the original image from its noisy observation as accurately as possible, while preserving important detail features such as edges and textures in the denoised image. To achieve this goal, over the past several decades, image denoising has been extensively studied in the signal processing community, and numerous denoising techniques have been proposed in the literature. In general, denoising algorithms can be roughly classified into three categories: 1) spatial domain methods; 2) transform domain methods; and 3) hybrid methods [3], [4]. The first class utilizes the spatial correlation of pixels to smooth the noisy image, the second one exploits the sparsity of representation coefficients of the signal to distinguish the signal and noise, and the third one takes advantage of spatial correlation and sparse representation to suppress noise.

Spatial domain methods, also called spatial filters, estimate each pixel of the image by performing a weighted average of its local/nonlocal neighbors, in which the weights can be determined by their similarities and higher weights are given to similar pixels. Therefore, spatial filters can be further divided into local filters and nonlocal filters. Smith and Brady [5] proposed a structure preserving local filter called SUSAN, which uses the intensity distance as a quantitative measure of the similarity between pixels. Tomasi and Manduchi [6] proposed bilateral filtering by generalizing the SUSAN filter, in which both the intensity and spatial distances are used to measure the similarity between pixels. Although these local filters are effective for preserving edges, they cannot perform very well when the noise level is high. The reason is that the severe noise destroys the correlations of pixels within local regions [7]. To overcome this disadvantage of local filters, Buades et al. [8] proposed the nonlocal mean (NLM) filter, which estimates each pixel by a nonlocal averaging of all the pixels in the image. The amount of weighting for a pixel is based on the Euclidean distance between the patch centered around the pixel being denoised and the one centered around a given neighboring pixel. In essence, NLM uses the structural redundancy, namely, self-similarity that is inherent in natural images, to estimate each pixel. NLM can be considered as an extension of the bilateral filter by the means of replacing pointwise photometric distances with patch distances. Several variants of NLM have been proposed to improve the adaptivity of the nonlocal filter [9], [10]. Talebi et al. [3] proposed a spatially adaptive iterative filtering (SAIF) to improve the performance of NLM. Recently, there has been a growing interest in exploiting the self-similarity of images to suppress noise. Chatterjee and Milanfar [11], [12] proposed a patch-based locally optimal wiener (PLOW) filter, which also exploits the structural redundancy for image denoising and achieves the near optimal performance in the minimum mean-squared error (MMSE) sense. Zhang et al. [13] proposed a two-direction nonlocal (TDNL) variational model for image denoising using the horizontal and vertical similarities in the matrix formed by similar image patches. SAIF, PLOW, and TDNL are currently considered to be state of the art in spatial domain denoising methods.

Transform domain methods assume that the image can be sparsely represented by some representation basis, such as wavelet basis and its directional extensions. Due to the sparsity of representation coefficients, noise is uniformly spread throughout the coefficients in the transform domain, while most of image information is concentrated on the few largest ones. Therefore, noise can be effectively distinguished by different coefficient shrinkage strategies, including BayesShrink [14], ProbShrink [15], BiShrink [16], MultiShrink [17], and SUREShrink [18], [19]. Despite its remarkable success in dealing with point and line singularities, the fixed wavelet transform fails to provide an adaptive sparse representation for the image containing complex singularities. To overcome the problems caused using the fixed transforms, Aharon et al. [20] proposed an adaptive representation technique using K-means and singular value decomposition (called K-SVD), which uses a greedy algorithm to learn an overcomplete dictionary for image representation and denoising. Under the assumption that each image patch can be represented by the learned dictionary, Elad and Aharon [21] proposed a K-SVD based denoising algorithm, in which each image patch can be expressed as a linear combination of few atoms of the dictionary. Although the dictionary-based methods are more robust to noise, they are computationally expensive.

Spatial-based filters and transform-based filters have achieved great success in image denoising. Their overall performance, however, does not generally surpass the hybrid methods. Due to its impressive performance, the most well-known hybrid method for image denoising is the block-matching and 3-D (BM3D) filtering reported in [22], which groups similar patches into 3-D arrays and deals with these arrays by sparse collaborative filtering. To the best of the authors’ knowledge, it is the first one that utilizes both nonlocal self-similarity and sparsity for image denoising. However, the fixed 3-D transform is not able to deliver a sparse representation for image patches containing edges, singularities, or textures. Thus, BM3D may introduce visual artifacts. Dabov et al. [23] proposed an improved BM3D filter (called BM3D-SAPCA) that exploits adaptive-shape patches and principal component analysis (PCA). Although BM3D-SAPCA achieves state-of-the-art denoising results, its computational cost is very high (Table III). Zhang et al. [24] proposed an adaptive image denoising scheme using PCA with local pixel grouping (LPG-PCA). This method uses block matching to group the pixels with similar local structures, transforms each group of pixels using locally learned PCA basis, and shrinks PCA transformation coefficients using the linear MMSE estimation technique. Both LPG-PCA and BM3D-SAPCA use the PCA basis to represent image patches. A key difference between them is that LPG-PCA applies PCA on 2-D groups of fixed-size image patches, while BM3D-SAPCA applies PCA on 3-D groups of adaptive-shape image patches. He et al. [25] presented an adaptive hybrid method called ASVD, which uses SVD to learn the local basis for representing image patches. Another SVD-based denoising method is called spatially adaptive iterative singular-value thresholding (SAIST) [26]. This method uses SVD as a sparse representation of image patches and reduces noise in images by iteratively shrinking the singular values with BayesShrink. BM3D-SAPCA and SAIST are considered to be the current state of the art in image denoising.

Table 3
Table III Comparison of the Computational Time and the Implementation Language of Different Denoising Methods

In this paper, we propose a simple and efficient denoising method by combining patch grouping with SVD. The proposed method first groups image patches by a classification algorithm to achieve many groups of similar patches. Then each group of similar patches is estimated by the low-rank approximation (LRA) in SVD domain. The denoised image is finally obtained by aggregating all processed patches. The SVD is a very suitable tool for estimating each group because it provides the optimal energy compaction in the least square sense [27]. This implies that we can achieve a good estimation of the group by taking only a few largest singular values and corresponding singular vectors. While ASVD uses SVD to learn a set of local basis for representing image patches and SAIST uses SVD as a sparse representation of image patches, the proposed method exploits the optimal energy compaction property of SVD to lead an LRA of image patches. Experiments indicate that the proposed method achieves highly competitive performance in visual quality, and it also has a lower computational cost than most of existing state-of-the-art denoising algorithms.

The rest of this paper is organized as follows. In Section II, we briefly review image representation tools for the sake of completeness. We present the proposed algorithm in detail in Section III, which fuses the nonlocal self-similarity and the LRA using patch clustering and SVD. In Section IV, we report the experimental results of our method to validate its efficacy and compare it with the state-of-the-art methods. In Section V, we discuss the differences between our method and other state-of-the-art methods. Finally, we conclude this paper with some possible future work in Section VI.

SECTION II

LINEAR IMAGE REPRESENTATION

Let Formula$\mathbf {X}$ be a grayscale image. The basic principle of linear image representation is that the signal of interest can be decomposed into a weighted sum of a given representation basis. Thus, Formula$\mathbf {X}$ can be represented as FormulaTeX Source\begin{equation} \mathbf {X}=\sum _{i=1}^{N}a_{i}\phi _{i} \end{equation} where Formula$a_{i} (i=1,\ldots ,N)$ are the representation coefficients of the image Formula$\mathbf {X}$ in terms of the basis functions Formula$\phi _{i} (i=1,\ldots ,N)$. Formula$\phi _{i}$ can either be chosen as a prespecified basis, such as wavelet [28], curvelet [29], contourlet [30], shearlet [31], and other directional basis, or designed by adapting its content to fit a given set of images. In general, an adaptive basis has better performance than the prespecified one.

Aharon et al. [20] proposed a learning method to achieve a set of adaptive basis (also called dictionary). This method extracts all the Formula$\sqrt {m}\times \sqrt {m}$ patches from the image Formula$\mathbf {X}$ to form a data matrix Formula$\mathbf {S}=(\mathbf {s}_{1},\mathbf {s}_{2},\ldots ,\mathbf {s}_{n})\in \mathcal {R}^{m\times n}$, where Formula$m$ is the number of pixels in each patch, Formula$\mathbf {s}_{i}(i=1,\ldots ,n)$ are the image patches ordered as columns of Formula$\mathbf {S}$ and Formula$n$ is the number of patches. Then the dictionary is learned by solving the following optimization problem:FormulaTeX Source\begin{equation} \min _{\boldsymbol{\Phi },\mathbf {A}}\sum _{i=1}^{n}\|\mathbf {s}_{i}-\boldsymbol{\Phi }\mathbf {a}_{i}\|_{2}^{2} \quad {\rm s.t.} \quad \|\mathbf {a}_{i}\|_{0}\leq \beta \end{equation} where Formula$\boldsymbol{\Phi }\in R^{m\times p}$ is the dictionary of Formula$p$ column atoms, Formula$\mathbf {A}=(\mathbf {a}_{1},\mathbf {a}_{2},\ldots ,\mathbf {a}_{n})\in \mathbf {R}^{p\times n}$ is a matrix of coefficients, Formula$\beta $ indicates the desired sparsity level of the solution, and the notation Formula$\|\mathbf {a}_{i}\|_{0}$ stands for the count of the nonzero entries in Formula$\mathbf {a}_{i}$. Based on the learned dictionary Formula$\boldsymbol{\Phi }$, Formula$\mathbf {S}$ can be represented as FormulaTeX Source\begin{equation} \mathbf {S}=\mathbf {\Phi A}. \end{equation}

Another method for image representation with adaptive basis selection is PCA [32], which determines the basis from the covariance statistics of the data matrix Formula$\mathbf {S}$. The principal components transform of Formula$\mathbf {S}$ is calculated as [33]FormulaTeX Source\begin{equation} \mathbf {A}=\boldsymbol{\Phi }^{T}(\mathbf {S}-E(\mathbf {S})) \end{equation} with Formula$\boldsymbol{\Phi }$ defined by FormulaTeX Source\begin{equation} \mathbf {\Omega _{S}}=\boldsymbol{\Phi }\boldsymbol{\Lambda }\boldsymbol{\Phi }^{T} \end{equation} where Formula$E(\mathbf {S})$ is the matrix of mean vectors, Formula$\mathbf {\Omega _{S}}$ is the covariance matrix of Formula$\mathbf {S}$, Formula$\boldsymbol{\Phi }$ is the eigenvector matrix, and Formula$\boldsymbol{\Lambda }=\textrm {diag}(\lambda _{1},\ldots ,\lambda _{m})$ is the diagonal eigenvalue matrix with FormulaTeX Source\begin{equation} \lambda _{1}\geq \lambda _{2} \geq ,\cdots ,\geq \lambda _{m}. \end{equation} It can easily be derived that the covariance matrix Formula$\mathbf {\Omega _{A}}$ of the matrix Formula$\mathbf {A}$ equals FormulaTeX Source\begin{equation} \mathbf {\Omega _{A}}=\boldsymbol{\Phi }^{T}\mathbf {\Omega _{S}}\boldsymbol{\Phi }=\boldsymbol{\Lambda } \end{equation} which implies that the entries of Formula$\mathbf {A}$ are uncorrelated. This property of PCA can be used to distinguish between the signal and noise. It is because the energy of noise is generally spread over the whole transform coefficients, while the energy of a signal is concentrated on a small amount of coefficients.

One major shortcoming of the adaptive dictionary and PCA is that they impose a very high computational burden. An alternative method for adaptive basis selection is SVD. The SVD of the data matrix Formula$\mathbf {S}$ is a decomposition of the form [34]FormulaTeX Source\begin{equation} \mathbf {S}=\mathbf {U}\boldsymbol{\Sigma }\mathbf {V}^{T}=\sum _{i=1}^{n}\sigma _{i}\mathbf {u}_{i}\mathbf {v}_{i}^{T} \end{equation} where Formula$\mathbf {U}=(\mathbf {u}_{1},\ldots ,\mathbf {u}_{n})\in \mathcal {R}^{m\times n}$ and Formula$\mathbf {V}=(\mathbf {v}_{1},\ldots ,\mathbf {v}_{n})\in \mathcal {R}^{n\times n}$ are the matrices with orthonormal columns, Formula$\mathbf {U}^{T}\mathbf {U}=\mathbf {V}^{T}\mathbf {V}=\mathbf {I}$, and where the diagonal matrix Formula$\boldsymbol{\Sigma }= \textrm {diag}(\sigma _{1},\ldots ,\sigma _{n})$ has nonnegative diagonal elements appearing in nonincreasing order such thatFormulaTeX Source\begin{equation} \sigma _{1}\geq \sigma _{2}\geq \cdots \geq \sigma _{n}\geq 0. \end{equation} The diagonal entries Formula$\sigma _{i}$ of Formula$\boldsymbol{\Sigma }$ are called the singular values of Formula$\mathbf {S}$, while the vectors Formula$\mathbf {u}_{i}$ and Formula$\mathbf {v}_{i}$ are the left and right singular vectors of Formula$\mathbf {S}$, respectively. The product Formula$\mathbf {u}_{i}\mathbf {v}_{i}^{T}$ in (8) can be considered as an adaptive basis, and Formula$\sigma _{i}$ as the representation coefficient.

In fact, SVD and PCA are intimately related. PCA can be performed by calculating the SVD of the data matrix Formula$({1}/{\sqrt {n}})\mathbf {S}^{T}$ (refer to [35] for more details). In addition, if a matrix is low rank, we can easily estimate it from its noisy version by the LRA in SVD domain. Thus, we propose a new denoising method using SVD instead of PCA in the following section, which has a low computational complexity.

SECTION III

PROPOSED METHOD

Based on the analysis of SVD in Section II, we propose an efficient method to estimate the noise-free image by combining patch grouping with the LRA of SVD, which leads to an improvement of denoising performance. The main motivation to use SVD in our method is that it provides the optimal energy compaction in the least square sense, which implies that the signal and noise can be better distinguished in SVD domain. Fig. 1 shows a block diagram of the proposed approach. Concretely, the patch grouping step identifies similar image patches by the Euclidean-distance-based similarity metric. Once the similar patches are identified, they can be estimated by the LRA in the SVD-based denoising step. In the aggregation step, all processed patches are aggregated to form the denoised image. The back projection step uses the residual image to further improve the denoised result.

Figure 1
Fig. 1. Block diagram of the proposed denoising algorithm.

For ease of presentation, let Formula$\mathbf {Y}$ denote a noisy image defined by FormulaTeX Source\begin{equation} \mathbf {Y}=\mathbf {X}+\mathbf {E} \end{equation} where Formula$\mathbf {X}$ is the noise-free image, and Formula$\mathbf {E}$ represents the additive white Gaussian noise (AWGN) with the standard deviation Formula$\tau $ that, in practice, can usually be estimated by various methods such as median absolute deviation (MAD) [36], SVD-based estimation algorithm [37], and block-based ones [38], [39]. In this paper, we use a vectorized version of the model (10)FormulaTeX Source\begin{equation} \mathbf {y}=\mathbf {x}+\mathbf {e}. \end{equation} Given a noisy observation Formula$\mathbf {y}$, our aim is to estimate Formula$\mathbf {x}$ as accurately as possible.

As similarly done in BM3D and LPG-PCA, the proposed method also has two stages: 1) the first stage produces an initial estimation of the image Formula$\mathbf {x}$ and 2) the second stage further improves the result of the first stage. Different from them, our method adopts the LRA to estimate image patches and uses the back projection to avoid loss of detailed information of the image. Each stage contains three steps: 1) patch grouping; 2) SVD-based denoising; and 3) aggregation. In the first stage, the noisy image Formula$\mathbf {y}$ is first divided into Formula$M$ overlapping patches denoted by Formula$\{\mathbf {y}_{i}\}_{i=1}^{M}$, where Formula$\mathbf {y}_{i}$ is a vectorized format of the Formula$i$th image patch. For each patch Formula$\mathbf {y}_{j}$, its similar patch group is formed by searching similar patches from Formula$\{\mathbf {y}_{i}\}_{i=1}^{M}$. Next, each similar patch group is denoised by the low-rank approximation in SVD domain. Third, the denoised image Formula$\mathbf {\widehat {x}}_{0}$ is achieved by aggregating all denoised patches. In the second stage, the final denoised image is obtained by applying the processing steps described above on the image Formula$\widetilde {\mathbf {y}}$ produced by the back projection process. In the rest of this section, the procedures of our proposed method will be described in detail.

A. Patch Grouping

Grouping similar patches, as a classification problem, is an important and fundamental issue in image and video processing with a wide range of applications. While there exist many classification algorithms available in [40], e.g., block matching, Formula$K$-means clustering, nearest neighbor clustering, and others, we exploit the block-matching method for image patch grouping due to its simplicity.

For each given reference patch Formula$\mathbf {y}_{j}$ with size Formula$\sqrt {m}\times \sqrt {m}$, the block-matching method finds its similar patches from Formula$\{\mathbf {y}_{i}\}_{i=1}^{M}$ by a similarity metric. In [22], the Euclidean distance from the transform coefficients is used to identify the similar square patches. A shape-adaptive version of this similarity metric is presented in [23], whereas it leads to a high computational cost. The simplest measure of similarity between two patches is the Euclidean distance directly in the spatial domain. Thus, we employ the spatial Euclidean distance as our similarity metric, which is defined by FormulaTeX Source\begin{equation} S(\mathbf {y}_{j},\mathbf {y}_{c})=\|\mathbf {y}_{j}-\mathbf {y}_{c}\|^{2}_{2} \end{equation} where Formula$\|\cdot \|_{2}$ denotes the Euclidean distance and Formula$\mathbf {y}_{c}$ is a candidate patch. The smaller Formula$S(\mathbf {y}_{j},\mathbf {y}_{c})$ is, the more similar Formula$\mathbf {y}_{j}$ and Formula$\mathbf {y}_{c}$ are. The reference patch Formula$\mathbf {y}_{j}$ and its Formula$L$-most similar patches denoted by Formula$\{\mathbf {y}_{c,i}\}_{i=1}^{L}$ are chosen to construct a group matrix using each similar patch as a column of the group matrix, and its corresponding group matrix Formula$\mathbf {P}_{j}$ is formed by FormulaTeX Source\begin{equation} \mathbf {P}_{j}=[\mathbf {y}_{j},\mathbf {y}_{c,1},\ldots ,\mathbf {y}_{c,L}]. \end{equation} Due to Formula$\mathbf {P}_{j}$ being made up of the noisy patches, it can be represented as FormulaTeX Source\begin{equation} \mathbf {P}_{j}=\mathbf {Q}_{j}+\mathbf {N}_{j} \end{equation} where Formula$\mathbf {Q}_{j}$ and Formula$\mathbf {N}_{j}$ denote the noise-free group matrix and the noise matrix, respectively.

In general, the number Formula$L$ of similar patches in the group matrix cannot be too small. Too small Formula$L$ leads to too few patches within each group matrix, which makes the SVD-based denosing less robust. On the contrary, too large one leads to dissimilar patches being grouped together, which results in an incorrect estimation of Formula$\mathbf {P}_{j}$. Similarly, the patch size Formula$\sqrt {m}\times \sqrt {m}$ also has an impact on the performance of our method. We will discuss the influence of Formula$L$ and the patch size in Section IV-C.

B. SVD-Based Denoising

For simplicity of description, we will use Formula$\mathbf {Q}$ and Formula$\mathbf {P}$ instead of Formula$\mathbf {Q}_{j}$ and Formula$\mathbf {P}_{j}$ by a slight abuse of notation. Now our task is to estimate the noise-free group matrix Formula$\mathbf {Q}$ from its noisy version Formula$\mathbf {P}$ as accurately as possible. Ideally, the estimate Formula$\widehat {\mathbf {Q}}$ should satisfy FormulaTeX Source\begin{equation} \|\mathbf {P}-\widehat {\mathbf {Q}}\|_{F}^{2}=\tau ^{2} \end{equation} where Formula$\|\cdot \|_{F}$ is the Frobenius norm1 and Formula$\tau $ is the standard deviation of noise.

The similarity between patches within the noise-free image Formula$\mathbf {x}$ leads to a high correlation between them, which means that Formula$\mathbf {Q}$ is a low-rank matrix. Fig. 2 shows the low-rank property of Formula$\mathbf {Q}$ by displaying the singular values of group matrices of Lena image with different noise levels, where each point is the average Formula$i$th singular value over all group matrices. The estimate of Formula$\mathbf {Q}$ can be obtained by the LRA in the least square sense. Therefore, we can estimate Formula$\mathbf {Q}$ from Formula$\mathbf {P}$ by solving the following optimization problem:FormulaTeX Source\begin{equation} \widehat {\mathbf {Q}}=\arg \min _{\mathbf {Z}}\|\mathbf {P}-\mathbf {Z}\|_{F}^{2} \quad {\rm s.t.} \quad \textrm {rank}(\mathbf {Z})=k \end{equation} where Formula$\textrm {rank}(\cdot )$ denotes the rank of matrix Formula$\mathbf {Z}$.

Figure 2
Fig. 2. Singular values of group matrices of Lena image with different noise levels.

In SVD domain, Formula$\mathbf {P}$ can be represented as FormulaTeX Source\begin{equation} \mathbf {P}=\mathbf {U}\boldsymbol{\Sigma }\mathbf {V}^{T}\!. \end{equation} Let FormulaTeX Source\begin{equation} \mathbf {P}_{k}=\mathbf {U}\boldsymbol{\Sigma }_{k}\mathbf {V}^{T} \end{equation} where Formula$\boldsymbol{\Sigma }_{k}$ is obtained from the matrix Formula$\boldsymbol{\Sigma }$ by setting the diagonal elements to zeros but the first Formula$k$ singular values FormulaTeX Source\begin{equation} \boldsymbol{\Sigma }_{k}=\textrm {diag}(\sigma _{1},\ldots ,\sigma _{k}, 0, \ldots , 0). \end{equation} Formula$\mathbf {P}_{k}$ is the solution of (16), which is a classical result given by the Eckart–Young–Mirsky theorem [41], [42].

Theorem 1 (Eckart–Young–Mirsky)

For any real matrix Formula$\mathbf {P}$, if the matrix Formula$\mathbf {Q}$ is of rank Formula$k$, then FormulaTeX Source\begin{equation} \|\mathbf {P}-\mathbf {Q}\|_{F}^{2}\geq \sum _{i=k+1}^{n}\sigma _{i}^{2} \end{equation} where Formula$\sigma _{i} (i=1, {\dots },n)$ are the singular values of Formula$\mathbf {P}$, and equality is attained when Formula$\mathbf {Q}=\mathbf {P}_{k}$ is defined by (18).

This theorem shows that Formula$\mathbf {P}_{k}$ is the optimal solution for (16) in the Frobenius norms sense. Thus, we have FormulaTeX Source\begin{equation} \widehat {\mathbf {Q}}=\mathbf {P}_{k}. \end{equation} The key issue for this method is to determine the value of Formula$k$. By comparing (15) with (20), we can find that Formula$\mathbf {P}_{k}$ is the ideal estimate of Formula$\mathbf {P}$ when Formula$\sum _{i=k+1}^{n}\sigma _{i}^{2}$ is equal to Formula$\tau ^{2}$. Therefore, Formula$k$ can be determined by the following criterion:FormulaTeX Source\begin{equation} \sum _{i=k}^{n}\sigma _{i}^{2}>\tau ^{2}\geq \sum _{i=k+1}^{n}\sigma _{i}^{2}. \end{equation}

C. Aggregation

Till now, we have estimated each group matrix by applying the LRA defined by (21). Then the denoised patches can be obtained by rearranging column vectors of each denoised group matrix. As a result of taking the Formula$L$ nearest neighbors of each patch to construct a group matrix, a single patch might belong to several groups, and multiple estimates of this patch can be obtained. Thus, we aggregate different estimates of this patch to obtain its denoised version by the following averaging process:FormulaTeX Source\begin{equation} \widehat {\mathbf {x}}_{i}=\frac {1}{n}\sum _{j=1}^{n}\widehat {\mathbf {x}}_{i,j} \end{equation} where Formula$\widehat {\mathbf {x}}_{i}$ is the denoised version of a patch Formula$\mathbf {y}_{i}$, and Formula$\widehat {\mathbf {x}}_{i,j} (j=1,\ldots ,n)$ denote Formula$n$ different estimates of Formula$\mathbf {y}_{i}$.

The next step is to synthesize the denoised image from the denoised patches. Since the patches are sampled with overlapping regions for avoiding block artifacts at the boundaries of patches, multiple estimates are obtained for each pixel. Thus, these estimates of each pixel in the image need to be aggregated to reconstruct the final denoised image. The common method of combining such multiple estimates is to perform a weighted averaging of them. Meanwhile, the weighted averaging procedure can suppress noise further. The simplest form of aggregation is the uniformly weighted averaging that assigns the same weight to all estimates. However, the uniform weights will lead to an oversmoothened result. In general, the adaptive weights derived from various biased and unbiased estimators, such as variance-based weights, SURE-based weights, and exponential weights [10], can lead to better results. Different from these adaptive weights, in this paper, we exploit the weights depending on the rank Formula$k$ of each group matrix due to its simplicity. For the Formula$j$th group matrix Formula$\widehat {\mathbf {Q}}_{j}$, our weight is defined by FormulaTeX Source\begin{equation} w_{j}= \begin{cases} 1-\frac {k}{L+1}, & k<L+1 \\ \frac {1}{L+1}, & k=L+1. \end{cases} \end{equation}

If Formula$k<L+1$, it means that patches in the group matrix are linearly correlated. The higher the degree of correlation of patches is, the smaller the rank Formula$k$ of the group matrix is. The estimate of patches yielded from the LRA is better. Thus, this estimate needs to be assigned a high weight. If Formula$k=L+1$, there exists no correlation among patches. The simplest uniform weight is used. Based on the weights defined in (24), the denoised estimate for the Formula$i$th pixel of the image can be expressed as FormulaTeX Source\begin{equation} \hat {x}_{i}=\frac {1}{W}\sum _{j\in \Gamma (x_{i})}w_{j}\hat {x}_{i,j} \end{equation} where Formula$W$ is a normalizing factor defined by FormulaTeX Source\begin{equation} W=\sum _{j}w_{j}. \end{equation} Formula$\Gamma (x_{i})$ denotes the index set of all similar group matrices containing the pixel Formula$x_{i}$, which is described as FormulaTeX Source\begin{equation} \Gamma (x_{i})=\{j|x_{i}\in \mathbf {Q}_{j}, j=1, {\dots },C\} \end{equation} and Formula$\hat {x}_{i,j}$ denotes the denoised estimate of the Formula$i$th pixel in the Formula$j$th similar group matrix Formula$\widehat {\mathbf {Q}}_{j}$. Once all pixels are estimated by (25), the final denoised image can be obtained by reshaping the estimates of all pixels.

D. Back Projection

Although most of noise can be removed using the denoising procedures described before, there still exists a small amount of noise residual in the denoised image. The noise residual stems from the fact that noise in the original noisy image affects the accuracy of the patch grouping, which leads to an inaccurate group. The grouping errors in turn affect the SVD-based denoising. In addition, there exists another reason for noise residual. Ideally, based on the discussion in Section III-B, the optimal estimate Formula$\widehat {\mathbf {Q}}$ satisfies FormulaTeX Source\begin{align} \|\mathbf {P}-\widehat {\mathbf {Q}}\|_{F}^{2} = \|\mathbf {P}-\mathbf {Q}\|_{F}^{2}\Longrightarrow&\|\mathbf {P}-\widehat {\mathbf {Q}}\|_{F}^{2} = \|\mathbf {N}\|_{F}^{2}\notag \\[-3pt]\Longrightarrow&\sum _{i=k+1}^{n}\sigma _{i}^{2}=\tau ^{2}. \end{align} Unfortunately, the left side of (28) is not usually equal to the right side. In most cases, it is that Formula$\tau ^{2}> \sum _{i=k+1}^{n}\sigma _{i}^{2}$. Therefore, we need to further improve the denoising performance of our method.

The commonly used way to further improve the performance of a denoising method, as used by the clustering-based denoising method using locally learned dictionaries (named K-LLD) [43] and SAIST, is to develop an iterative version for the basic denoising method. While the iterative strategy for image denoising has been widely used in the literature, it has a very high computational cost, which limits the scope of applications. An alternative way exploited by BM3D and LPG-PCA is the two-stage approach, in which the basic estimate of the noisy image yielded by the denoising method is used as a reference image to perform improved grouping and parameter estimation.

In this paper, unlike the iteration-based or the reference-based strategies, we make use of the two-stage strategy with a back projection step to further suppress the noise residual. Back projection is an efficient method that uses the residual image to improve the denoised result [44], [45]. In fact, the use of the residuals in improving estimates can date at least back to [46], in which this idea is termed twicing. This concept is also known by several names, such as Bregman iterations, Formula$l_{2}$-boosting, and biased diffusion. Milanfar [47] provides a good overview of these methods. The basic idea of back projection is to generate a new noisy image by adding filtered noise back to the denoised image FormulaTeX Source\begin{equation} \widetilde {\mathbf {y}}=\widehat {\mathbf {x}}_{0}+\delta (\mathbf {y}-\widehat {\mathbf {x}}_{0}) \end{equation} where Formula$\delta \in (0,1)$ is a constant projection parameter and Formula$\widehat {\mathbf {x}}_{0}$ is the denoised result produced by the first stage. Note that when Formula$\delta \rightarrow 0$, Formula$\widetilde {\mathbf {y}}\rightarrow \widehat {\mathbf {x}}_{0}$. On the contrary, if Formula$\delta \rightarrow 1$, Formula$\widetilde {\mathbf {y}}\rightarrow \mathbf {y}$. For simplicity, in our experiments, we set Formula$\delta =0.5$, which is a tradeoff between 1 and 0.

Now we can achieve an improved result of Formula$\widehat {\mathbf {x}}_{0}$ by denoising Formula$\widetilde {\mathbf {y}}$ with the proposed three processing steps in Sections III-AIII-C, i.e., patch grouping, SVD-based denoising, and aggregation, respectively. It is necessary to point out that the noise variance of Formula$\widetilde {\mathbf {y}}$, denoted by Formula$\widetilde {\tau }^{2}$, needs to be updated in the SVD-based denoising step. We employ the estimator presented in [26] to determine Formula$\widetilde {\tau }^{2}$, which is written as FormulaTeX Source\begin{equation} \widetilde {\tau }=\gamma \sqrt {\tau ^{2}-\|\mathbf {y}-\widehat {\mathbf {x}}_{0}\|_{F}^{2}} \end{equation} where Formula$\gamma $ is a scaling factor.

To summarize, the complete procedure of our proposed method is algorithmically described in Algorithm 1.

Algorithm 1 Proposed Denoising Algorithm

Algorithm 1
SECTION IV

EXPERIMENTAL RESULTS

To demonstrate the efficacy of the proposed denoising algorithm, in this section, we give our experimental results concerning simulations that have been conducted on ten natural grayscale images with size Formula$512\times 512$. These images have been commonly used to validate many state-of-the-art denoising methods. The noisy images are generated by adding zero mean white Gaussian noise with different levels to the test images. The noise level Formula$\tau $ is from 10 to 50, and the intensity value for each pixel of the images ranges from 0 to 255.

A. Evaluation Criteria

Two objective criteria, namely, peak signal-to-noise ratio (PSNR) and feature-similarity (FSIM) index [48], are adopted to provide quantitative quality evaluations of the denoising results. PSNR is the mostly widely used quality measure in the literature, even though it is often inconsistent with human eye perception. FSIM measures the similarity between two images by combining the phase congruency feature and the gradient magnitude feature, which is based on the fact that human visual system understands an image mainly according to its low-level features. The aforementioned criteria can comprehensively reflect the performance of the denoising methods.

B. Denoising Performance

To quantitatively evaluate the denoising performance of our method, we compare it with five state-of-the-art image denoising methods: 1) PLOW [11]; 2) K-SVD [21]; 3) LPG-PCA [24]; 4) SAIST [26]; and 5) BM3D-SAPCA [23]. All of these denoising methods utilize the self-similarity of natural images to suppress noise. These denoising algorithms contain some control parameters, which should be tuned according to the noise level of the image. In our experiments, we use the default parameters settings suggested by the respective authors. The source codes of these denoising methods can be downloaded from the respective authors’ websites. In addition, the proposed method was implemented in MATLAB programming language due to its simplicity.2 In our experiments, we empirically set Formula$L=85$, Formula$\delta =0.5$, and Formula$\gamma =0.65$ for all noise levels. Depending on the amount of noise present in the image, we set the patch size Formula$9\times 9$ if Formula$\tau <20$, Formula$10\times 10$ if Formula$20\leq \tau <40$, and Formula$11\times 11$ if Formula$\tau \,\geq \,40$. The influence of different parameters will be evaluated in Section IV-C.

In Table I, we quantify the performances of six competing algorithms for the test images with different noise levels in terms of PSNR and FSIM. From Table I we can observe that BM3D-SAPCA, which is considered to be the state of the art in image denoising, achieves the highest PSNR values on average, and slightly outperforms our method. However, our method performs better on images with high repeating patterns such as Elaine, Zelda, and Barbara. It is because our method sufficiently exploits the nonlocal redundancies in these images by the patch grouping procedure described in Section III-A. For example, for Elaine image, on average, our method is superior to BM3D-SAPCA by 0.14 dB, to SAIST by 0.28 dB, to LPG-PCA by 0.58 dB, to PLOW by 0.47 dB, and to K-SVD by 0.43 dB, respectively. The FSIM results of different algorithms are also tabulated in Table I. It can be observed that our method has higher FSIM measures than other methods except BM3D-SAPCA. In a word, the quantitative results by our method are competitive with BM3D-SAPCA and SAIST, and clearly superior to LPG-PCA, PLOW, and K-SVD.

Table 1
Table I Comparison of the PSNR (dB) and FSIM Results of Different Denoising Methods on Test Images With Different Noise Levels.The Best Results are Highlighted in Bold

In terms of visual quality, our method also is comparable and even superior to the state-of-the-art denoising methods. Fig. 3 shows the denoising results of Lena image with a noise level of Formula$\tau =30$. As can be seen from it, the results by the proposed method are visually close to SAIST and LPG-PCA, and better than BM3D-SAPCA, PLOW, and K-SVD, especially in the edge and texture regions. The visual comparisons are further illustrated in Fig. 4, which shows the zoomed-in denoising results of the noisy image Barbara by different methods. It can be observed that the denoised images by SAIST, LPG-PCA, and the proposed method are very similar in real visual perception, in which some edges and textures are better preserved, and fewer artifacts are introduced. We note that although BM3D-SAPCA has higher PSNR and FSIM measures than our method, denoised results by BM3D-SAPCA contain more noticeable artifacts around edges and in smooth regions than our results. The main reason is that BM3D-SAPCA exploits the orthogonal transform to represent similar image patches and reduces noise by thresholding representation coefficients, in which significantly visible artifacts are produced. (This phenomenon also was discussed in [49].)

Figure 3
Fig. 3. Visual comparisons of denoising results on Lena image corrupted by AWGN with standard deviation 30. (a) Original image. (b) Noisy image. (c) PLOW [11]. (d) K-SVD [21]. (e) LPG-PCA [24]. (f) SAIST [26]. (g) BM3D-SAPCA [23]. (h) Proposed method.
Figure 4
Fig. 4. Visual comparisons of denoising results on Barbara image corrupted by AWGN with standard deviation 30. (a) Original image. (b) Noisy image. (c) PLOW [11]. (d) K-SVD [21]. (e) LPG-PCA [24]. (f) SAIST [26]. (g) BM3D-SAPCA [23]. (h) Proposed method.

As shown in Fig. 5, we also calculated the absolute difference images between the original Barbara image and the denoised versions of six denoising algorithms. The mean absolute error (MAE) values of PLOW, K-SVD, LPG-PCA, SAIST, BM3D-SAPCA, and our method are 4.73, 5.31, 5.23, 4.75, 4.44, and 4.35, respectively. The MAE value produced by our method is lower than those by other denosing algorithms. To further demonstrate our performance, we apply the proposed method to some real noisy images.3 Fig. 6 displays the denoised images yielded by PLOW and our method. Our method can reduce the noise effectively, while preserving the finer features. In short, our denoising results are both quantitatively and visually comparable with the state of the art.

Figure 5
Fig. 5. Zoomed absolute difference between the original Barbara image and the denoised version. (a) PLOW [11] (Formula${\rm MAE}=4.73$). (b) K-SVD [21] (Formula${\rm MAE}=5.31$). (c) LPG-PCA [24] (Formula${\rm MAE}=5.23$). (d) SAIST [26] (Formula${\rm MAE}=4.75$). (e) BM3D-SAPCA [23] (Formula${\rm MAE}=4.44$). (f) Proposed method (Formula${\rm MAE}={4.35})$.
Figure 6
Fig. 6. Visual comparisons of denoising results on real noisy images with unknown noise characteristics.

C. Influence of Parameters

In our method, there are four tuning parameters: 1) patch size; 2) number of similar patches in each group matrix Formula$L$; 3) projection factor Formula$\delta $; and 4) scaling factor Formula$\gamma $. The patch size plays an important role in the proposed denoising algorithm. On the one hand, a too large patch size can capture the varying local geometry and also lead to a high computational cost. On the other hand, a too small patch size can reduce the denoising performance of the proposed method. To study the influence of the patch size, we set Formula$L=85$, Formula$\delta =0.5$, Formula$\gamma =0.65$, and perform our method on Barbara image with different patch sizes and noise levels. The PSNR results are listed in Table II. It can be observed that choosing the patch size from Formula$9\times 9$ to Formula$11\times 11$ in our experiments is a reasonable tradeoff between accuracy and speed.

Table 2
Table II PSNR (dB) Results of the Proposed Denoising Method on Formula$Barbara$ Image With Different Patch Sizes and Noise Levels

The projection parameter Formula$\delta $ controls the amount of residual image added to the output of the first stage. To analyze the effect of this parameter, we run our algorithm with different values, Formula$\delta =0.1, 0.2, \ldots , 0.9$. Fig. 7(a) shows the denoising performance of the proposed algorithm applied to the noisy image Barbara (Formula$\tau =30$) as a function of the parameter Formula$\delta $. As can be seen, the highest PSNR is reached when Formula$\delta =0.5$. Similar curves for Formula$\gamma $ and Formula$L$ are shown in Fig. 7(b) and (c), respectively. The best denoising result is obtained with Formula$\gamma =0.65$. In addition, we find that our algorithm is insensitive to Formula$L$ in the range [70, 100]. Thus, we choose Formula$L=85$ as a tradeoff between accuracy and speed.

Figure 7
Fig. 7. PSNR results on Barbara image (Formula$\tau =30$) as a function of (a) varying Formula$\delta $ with a patch size of Formula$10\times 10$, Formula$L=85$, and Formula$\gamma =0.65$, (b) varying Formula$\gamma $ with a patch size of Formula$10\times 10$, Formula$\delta =0.5$, and Formula$L=85$, and (c) varying Formula$L$ with a patch size of Formula$10\times 10$, Formula$\delta =0.5$, and Formula$\gamma =0.65$.

D. Analysis of Iterative Denoising

Since the noise reduces the accuracy of the patch grouping, the grouping errors make the residual image of the first stage contain a lot of visual edge or texture details. In the second denoising stage, the new noisy image produced by back projection contains more structural details than the denoised image of the first stage and has a lower noise level than the original noisy image, which improves the accuracy of the patch grouping and reduces the error of the LRA in SVD domain. Therefore, a better result can be reached at the end of the second denoising round. The previously mentioned experimental results demonstrate that the two-stage denoising scheme based on back projection is very effective in suppressing noise.

Intuitively, the denoised results might be further improved if the proposed denoising scheme is iterated more than twice. Fig. 8 shows the evolution of the PSNR during the iterations. Here, we note that further iterations cannot significantly improve the denoising performance. In theory, the truncated SVD shrinkage used in our method guarantees an optimal approximation of the noisy group matrix in the least square sense, which makes most of the noise be suppressed in the first two iterations.

Figure 8
Fig. 8. Evolution of the PSNR with iterations for different images (Formula$\tau =30$).

E. Computational Cost

To evaluate the computational cost of six denoising methods, we compare the running time on the ten test images with different noise levels. We have run all the source codes by default throughout all experiments performed on a platform of an Intel Core i7-870 CPU 2.93 GHz with 4-GB memory. Denoising a Formula$512\times 512$ grayscale image, SAIST, K-SVD, BM3D-SAPCA, LPG-PCA, and PLOW take, on average, roughly 197, 393, 925, 997, and 1875 s, respectively. The computational cost of the proposed method is quite low in comparison with the denoising algorithms above. For the test images, our MATLAB implementation requires only 77 s on average. There are two main computational components of our algorithm, one is grouping similar patches to form group matrices and the other is the calculation of SVD for each group matrix. In general, the patch grouping step takes approximately 23% of the execution time, whereas 75% of the time is spent in estimating group matrices by the LRA in SVD domain. The execution time of the various algorithms are presented in Table III. It can be seen that the proposed method provides the fastest running speed among the six denoising algorithms.

SECTION V

DISCUSSION

The nonlocal self-similarity of natural images plays an important role in image denoising. The most well-known denoising method based on the nonlocal self-similarity is BM3D, which often produces state-of-the-art denoising results. BM3D-SAPCA is an improved version of BM3D by exploiting PCA and shape-adaptive image patches, which achieves remarkable performance. Our method utilizes the nonlocal self-similarity to construct low-rank group matrices that can be easily estimated by the LRA.

The main differences between these methods are threefold.

  1. The basis functions of image representations are different. BM3D uses the fixed 3-D basis functions (joint wavelets and cosine bases) that are less adapted to the edges and textures. To improve the denoising performance, BM3D-SAPCA applies PCA to the shape-adaptive patch group, whereas it leads to a high computational cost. Our method uses an adaptive basis derived by SVD, which outperforms BM3D and BM3D-SAPCA by better preserving the local geometric structure.
  2. BM3D uses the Euclidean distance from the transform coefficients to identify the similar square patches, which can improve the robustness of the block matching. In BM3D-SAPCA, the usual square patches are replaced by the shape-adaptive patches for block matching, whereas it leads to a complex aggregation process with a high computational cost. Different from BM3D and BM3D-SAPCA, the proposed method calculates the similarity metric based on the Euclidean distance directly in the spatial domain due to its simplicity.
  3. In BM3D and BM3D-SAPCA, the second denoising stage is directly applied on the original noisy image that is grouped into 3-D data arrays based on the patch similarities from the denoised image in the first stage. However, the denoising for the first stage might contain grouping errors due to the effect of noise, which yields an incorrect basic estimate of the noisy image. Therefore, using this basic estimate as the pilot signal is not very ideal because it would decay the accuracy of the second denoising stage. Unlike BM3D and BM3D-SAPCA, the second denoising stage in our method is applied on a new noisy image obtained by adding a part of residual image to the basic estimate, i.e., back projection. The new noisy image contains more structural details than the output of the first stage, which improves the accuracy of the patch grouping and the LRA in SVD domain.

Besides, the existing methods based on the adaptive representation, such as K-SVD and ASVD, often need to learn a set of adaptive basis using the given training images. Unfortunately, the learning process is computationally expensive. The proposed method is intrinsically simpler than K-SVD and ASVD. And it does not need to be trained for each image separately or for a given training data set, which avoids a high computational cost for the training process.

SECTION VI

CONCLUSION

In this paper, we have presented a simple and efficient method for image denoising, which takes advantage of the nonlocal redundancy and the LRA to attenuate noise. The nonlocal redundancy is implicitly used by the block-matching technique to construct low-rank group matrices. After factorizing by SVD, each group matrix is efficiently approximated by preserving only a few largest singular values and corresponding singular vectors. This is due to the optimal energy compaction property of SVD. In fact, the small singular values have little effect on the approximation of the group matrix when it has a low-rank structure. The experimental results demonstrate the advantages of the proposed method in comparison with current state-of-the-art denoising methods.

The computational complexity of the proposed algorithm is lower than that of most of the existing state-of-the-art denoising algorithms, but higher than BM3D. The fixed transform used by BM3D is less complex than SVD, whereas it is less adapted to edges and textures. The main computational cost of our algorithm is the calculation of SVD for each patch group matrix. As each group matrix could potentially be processed independently in parallel, our method is suitable for parallel processing. Therefore, in practice, we can use a parallel implementation to speed it up, which will make it feasible for real-time or near real-time image denoising. In addition, while developed for grayscale images, our method can be extended to shape-adaptive color image and video denoising by taking into account the shape-adaptive patches and the temporal redundancy across color components and frames. This further work will be studied in the future.

Acknowledgment

The authors would like to thank Chatterjee and Milanfar [11], Elad and Aharon [21], Dabov et al. [23], Zhang et al. [24], and Dong et al. [26] for sharing their source codes and Prof. J. Wang with Shandong University and the anonymous reviewers for their helpful comments and suggestions to improve the quality of this paper.

Footnotes

This work was supported in part by the National Natural Science Foundation of China under Grant 61202150, Grant 61272245, Grant 61373078, Grant 61332015, and Grant 61472220; the National Natural Science Foundation of China Joint Fund with Guangdong under Grant U1201258; the China Post-Doctoral Science Foundation under Grant 2013M531600; and the Program for Scientific Research Innovation Team in Colleges and Universities of Shandong Province.

This paper was recommended by Associate Editor A. Kaup.

1For an Formula$m\times n$ matrix Formula${A }$ with elements Formula$a_{ij} (i=1, {\dots },m,j=1, {\dots },n)$, the Frobenius norm is defined as the square root of the sum of the absolute squares of its elements, i.e., Formula$\|A\|_{F} = \sqrt {\sum _{i=1}^{m}\sum _{j=1}^{n}a_{ij}^{2}}$.

2A demo will be available at http://qguo.weebly.com when this paper is published.

3Available at http://users.soe.ucsc.edu/%7epriyam/PLOW/. In our experiments, we use their grayscale versions.

References

No Data Available

Authors

Qiang Guo

Qiang Guo

Qiang Guo received the B.S. degree in information and computing science from Shandong University of Technology, Zibo, China, in 2002, and the M.S. and Ph.D. degrees in computer science from Shanghai University, Shanghai, China, in 2005 and 2010, respectively.

He is an Associate Professor with Shandong Provincial Key Laboratory of Digital Media Technology, School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China. His research interests include image restoration, sparse representation, and object detection.

Caiming Zhang

Caiming Zhang

Caiming Zhang received the B.S. and M.S. degrees in computer science from Shandong University, Jinan, China, in 1982 and 1984, respectively, and the Ph.D. degree in computer science from Tokyo Institute of Technology, Tokyo, Japan, in 1994.

He was a Post-Doctoral Fellow with University of Kentucky, Lexington, KY, USA, from 1998 to 1999. He is currently a Professor with the School of Computer Science and Technology, Shandong University, and a Distinguished Professor with the School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan. His research interests include computer aided geometric design, computer graphics, information visualization, and medical image processing.

Yunfeng Zhang

Yunfeng Zhang

Yunfeng Zhang received the B.S. degree in computational mathematics and application software from Shandong University of Technology, Jinan, China, in 2000, and the M.S. degree in applied mathematics and the Ph.D. degree in computational geometry from Shandong University, Jinan, in 2003 and 2007, respectively.

He is a Professor with Shandong Provincial Key Laboratory of Digital Media Technology, School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan. His research interests include computer aided geometric design, image processing, computational geometry, and function approximation.

Hui Liu

Hui Liu

Hui Liu received the B.S., M.S., and Ph.D. degrees in computer science from Shandong University, Jinan, China, in 2001, 2004, and 2008, respectively.

She is a Professor with the Shandong Provincial Key Laboratory of Digital Media Technology, School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan. Her research interests include computer aided geometric design, medical image processing, and shape modeling.

Cited By

No Data Available

Keywords

Corrections

None

Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size