MRWM: A Multiple Residual Wasserstein Driven Model for Image Denoising

Residual histograms can provide valuable information for vision research. However, current image restoration methods have not fully exploited the potential of multiple residual histograms, especially their role as overall regularization constraints. In this paper, we propose a novel framework of multiple residual Wasserstein driven model (MRWM) that can organically combine multiple residual Wasserstein constraints and various natural image priors for image denoising. Specifically, by utilizing the Wasserstein distance derived from the optimal transmission theory, the multiple residual histograms of the observed images are forced to be as close as possible to the reference residual histogram, thereby improving the accuracy of residual estimation. Furthermore, the proposed concrete MRWM unifies the multiple residual Wasserstein distribution approximation and the image total variation prior knowledge to carry out image denoising. Alternating iterative algorithm of histogram matching and Chambolle dual projection has the characteristics of less parameters and easy implementation. Finally, our experiments confirm that compared with some representative image denoising algorithms, the MRWM can obtain better performance in objective evaluation, and can better preserve the details such as the image edges, making the image look more natural.


I. INTRODUCTION
Image denoising, which aims to reconstruct a potential clean image x from a noisy degraded image y, is one of the classic but still active low-level vision research tasks [1], [2], [3], [4], [5], [6], [7], [8], [9]. A widely used data model is y = x + n, where n is additive white Gaussian noise. At present, methods on image denoising can be divided into two categories: model-based methods and deep learning-based methods. The first class of methods often estimates clean image by mining various image priors, while the second one mainly employs neural networks to learn a mapping from noisy image to clean image [5], [6], [10], [11]. Here we briefly describe these two classes of image denoising methods respectively.

A. MODEL-BASED METHODS
Due to the fact that the gradients of natural images have heavy-tailed distributions, sparse priors are widely applied to The associate editor coordinating the review of this manuscript and approving it for publication was Huaqing Li .
image denoising [12]. Sparse coding has been shown to be effective in image denoising by representing image patches as sparse linear combinations of atoms in an over-complete dictionary [13]. Later, by using superpixel segmentation, SSLRR [14] presents a low-rank approximation image denoising method. A local low-rank image denoising method via tensor decomposition [15] is introduced. By utilizing intra and inter patch correlation, LIIC [16] shows an image denoising method with low-rank regularization. SLG [17] proposes a structure-based low-rank model for noise removal. SLG exploits manifold structure information to embed graph kernel norm regularization into a low-rank approximation model. The weighted singular-value thresholding algorithm solves the model effectively and achieves excellent denoising performance.
As one of the milestones in the history of image denoising, BM3D [18] fully explores the nonlocal self-similarity (NSS) of natural images. The combined use of NSS and sparse priors has resulted in some outstanding work [12], [19], [20]. By forcing the gradient histogram of the denoised image to VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ be as close as possible to the reference gradient histogram of the clean image, GHP [21] presents a texture-enhanced image denoising algorithm. Based on the information that in some practical problems singular values have clear physical meaning and should be treated differently, WNNM [22] applied to image restoration studies the weighted kernel norm minimization problem, in which different singular values can be weights adaptively.
Recently, ATVBH [23] presents a higher-order TV image restoration method by combining first-order and secondorder total variations with adaptive parameter estimation. CAS [24] fully utilizes local and non-local correlation of natural image contents, respectively, to achieve nearoptimal sparse representations that minimize signal uncertainty. For highly correlated image data groups, CAS employs a content-adaptive transform to obtain the sparsest representation, thereby resulting in outstanding denoising performance. By combining a thresholding function and image TV regularization, a modified TV regularization method M-TVRM [25] is proposed for salt and pepper noise (SPN) removal. The thresholding function based on SPN characteristics is utilized for noise pixel detection, and the image TV regularization is employed to restore the noisy pixels.
By mining the fact that many natural images have low-dimensional block manifold structure [26], [27], [28], [29], LDMM [1] and G-LDMM [30] are applied to image denoising and inpainting by point integral method and weighted non-local Laplacian(WNLL), respectively. The combination of low-dimensional manifold and residual distribution approximation produces W-LDMM [31], which is well applied to natural image restoration. It can be seen that various image prior information plays an important role in these methods.

B. DEEP LEARNING-BASED METHODS
In recent years, image denoising methods based on deep learning have been widely developed. Deep convolutional neural networks have achieved great success in low-level visual tasks including image denoising [32], [33], [34]. An improved encoder-decoder network [35] is constructed for image denoising by utilizing symmetric skip connections. By adding batch normalization to the residual learning framework, DnCNN [36] builds image denoising networks that perform significantly better than traditional denoising algorithms. To enable memory of the network, a densely connected denoising neural network [37] is constructed.
After these, a convolutional neural network based on multilevel wavelet denoising [38] is proposed. A fast and flexible network FFDNet [39] is put forward to process images with nonuniform noise. By solving the fractional optimal control problem, FOCNet [6] develops an advanced Gaussian denoising neural network. Employing graph convolution operation, a neural network architecture is proposed to create neurons with non-local receptive fields for image denoising [40].
To deal with the problem that feature scaling in image denoising loses some visual informative details, AGSN [41] presents a fast and accurate image denoising method by attention guided scaling network. By utilizing the attention guided adversarial training, AGSN enhance the reconstruction quality of the images with challenging noisy texture. In DER-Net [42], a multilevel network that efficiently utilizes GPU memory is proposed for biological-image denoising. DER-Net, which is composed of U-Net encoder-decoder structures, has a strong ability to recover the details of biological images. It can be found that these deep learning methods obtain excellent image denoising performance, but also need to pay a high computational cost under the powerful hardware platform.

C. MOTIVATION
With the continuous improvement of modern imaging technology, multiple images with the same content can be obtained almost instantaneously on one imaging device. For example, a mobile phone can take dozens of images in a second, while a high-speed camera can take a larger number of images in an instant. But these images in the acquisition, transmission, storage, processing links sometimes inevitably encounter noise pollution. Therefore, denoising multiple noisy images is a realistic and meaningful research topic. One can consider estimating a latent clean image x from multiple noisy image data {y µ , µ = 1, 2, · · · , M } acquired in an instant from an imaging facility. Here M is the number of contaminated images. In other words, the observed versions of the multiple noisy images can be defined as the following formula: where n µ is additive white Gaussian noise with mean zero and standard deviation σ . As in the literatures [43], [44], the noise standard deviation σ can be estimated from multiple noisy images through the finest scale wavelet coefficients and median filtering. Different from the most image restoration methods that focus on mining the prior information of images, the multiple residual Wasserstein driver model (MRWM) proposed in this paper is designed primarily to regularize the multiple residual images n µ = y µ −x, (µ = 1, 2, · · · , M ), i.e., the differences between the degraded images y µ and the potential image x.
Although the Gaussian noises contained in the images are random, their distribution is clear. Therefore, we can make full use of the Wasserstein distance to make the estimated residual histograms as close as possible to the reference Gaussian noise histogram, so as to improve the residual estimation accuracy. As we know, the training of the GAN based denoising network [45], [46], [47] is to minimize the weighted sum of the content loss and adversarial loss. However, the proposed MRWM is a model based image denoising method which ably combines the multiple residual Wasserstein constraints and image prior regularization. The key idea of MRWM is to reduce the difference between the estimated residual histograms and the reference Gaussian noise histogram through the multiple residual Wasserstein distance.

D. CONTRIBUTION
The contributions of this paper can be summarized as follows: (1) A novel unified framework of multiple residual Wasserstein driven model (MRWM) is proposed. The framework can effectively combine the multiple residual Wasserstein distribution approximation with the existing image priors to perform image denoising. (2) A specific MRWM is constructed to combine multiple residual Wasserstein distance and classic image total variation prior to improve the quality of denoised images. (3) The proposed alternate iterative algorithm of histogram matching and Chambolle dual projection has the characteristics of fewer parameters and easy implementation, and achieves excellent image denoising performance.
The rest of this paper is structured as follows. Section II introduces the basics of Wasserstein distance. Section III presents the unified framework of multiple residual Wasserstein driven model. Section IV proposes a specific multiple residual Wasserstein driven model. In Section V, we show the model solution process and algorithm design. Numerical experimental results are given in Section VI. Finally, Section VII concludes this paper.

II. PRELIMINARIES ON WASSERSTEIN DISTANCE
The Wasserstein distance originated from the optimal transport theory is an effective tool to measure the difference between two histograms. The Wasserstein distance can be defined as the least cost which must be paid to convert one histogram to another. For the two known probability distributions p and q which are defined on the real number field R, the Wasserstein distance of p and q is expressed as the solution to the Monge problem [48]: where the random variable x follows the distribution p, while another variable φ(x) obeys the distribution q. The infimum in above Eq. (2) is for all the determined functions φ : R → R which map any random variable x to the variable φ(x). It can be seen that the Wasserstein distance is a statistical metric between two known probability distributions, and will go to zero if and only if p and q are the same distribution. For a probability distribution p on R, its cumulative distribution function can be defined as where F is called the histogram equalization transform [49]. Moreover, the percentile function of this probability distribution p is expressed as F −1 In the comprehensive introduction part of the optimal transport theory [48], a fundamental conclusion is that the optimal φ in the problem (2) has the explicit closed-form solution: Since Monge problem (2) is actually an expected value, the Wasserstein distance can be further expressed as By treating x = (x 1 , x 2 , · · · , x n ) T as n independent samples stemed from the distribution p, and representing histogram h q as the discrete approximation of the distribution q, the equivalent discrete definition of the Wasserstein distance can be introduced as follows: where the functionφ converts x i to ξ i =φ(x i ), such that the transformed samples ξ = (ξ 1 , ξ 2 , · · · , ξ n ) T can satisfy the histogram h q . Similar to the continuous case above, the optimalφ for Eq. (5) also acquires a closed-form solution [48]: where the cumulative distribution function F h x and the percentile function F −1 h q are derived from the histograms h x and h q respectively.
If x is an image, thenφ is a histogram matching operator which can be applied to moving object detection and image contrast enhancement. For a known image x = (x 1 , x 2 , · · · , x n ) T ,φ ensures that the output sample ξ = (ξ 1 , ξ 2 , · · · , ξ n ) T can match the given histogram h q . Eqs. (5) and (6) constitute the theoretical basis of the proposed image denoising model in this paper.

III. THE UNIFIED FRAMEWORK OF MULTIPLE RESIDUAL WASSERSTEIN DRIVEN MODEL
Instead of regularizing only image prior information in most image restoration models, the proposed model is designed to mainly regularize residual images n µ = y µ − x, (µ = 1, 2, · · · , M ), i. e., the differences between the noisy observations and the potential clean image.
We know that for each noisy sample image, its residual y µ − x should theoretically obey the Gaussian distribution. We make full use of the Wasserstein distance distribution approximation effect to make each residual histogram h y µ −x , (µ = 1, 2, · · · , M ) as close as possible to the reference Gaussian noise histogram (ground truth) h g . Based on these, we propose the following unified framework of multiple residual Wasserstein driven model (MRWM) for image denoising: arg min where the first term R(x) represents prior knowledge about the original clean image x.Ŵ (h y µ −x , h g ) is the Wasserstein distance between h y µ −x and h g . Here β µ is the Wasserstein scale parameter. VOLUME 10, 2022 FIGURE 1. The stability analysis of the reference Gaussian noise histogram is carried out in terms of pixel size and noise level. From the first to third rows, they are randomly generated histograms of Gaussian noise with pixel sizes of 512 × 512, 256 × 256, 128 × 128, respectively. In the histograms of the first two columns, the middle two ones, and the last two ones, the corresponding Gaussian noise standard deviations are 25, 50, and 100, respectively. It can be found that the stability of the noise histogram is mainly affected by the pixel size rather than the noise level.
The Eq. (7) above is an organic combination of image prior regularization and multiple residual Wasserstein distance constraints. Popular R(x) includes the probability models of image patches [13], [50], the heavy-tail distribution models in the gradient domain [51], [52], and the low-dimensional manifold prior models in most of the natural images [1], [31]. One can see that the existing image priors can be easily embedded into the framework (7) for image denoising.

IV. A SPECIFIC MULTIPLE RESIDUAL WASSERSTEIN DRIVEN MODEL A. THE PROPOSED MRWM
Without loss of generality, in this paper, we choose the classic image total variation (TV) prior regularization and triple residual Wasserstein distance constrains (M = 3) to expound the effectiveness of the proposed MRWM framework. Therefore we mainly focus on the following specific multiple residual Wasserstein driven model (MRWM) for image denoising: arg min where ∇x denotes the discrete gradient of the clean image, and ∇x 1 represents the sum of the length of the gradient vector at each point.
In model (8), Wasserstein distance drives the estimated residual histogram to approximate the reference residual histogram, thus improving the residual estimation performance. The dual combination of Wasserstein multiple residual distribution approximation and image TV prior can improve the performance of image restoration. Subsequently, the stability of the reference Gaussian noise histogram h g is analyzed.
After that, we carry out the solution of the MRWM and the corresponding numerical experiments.

B. THE STABILITY ANALYSIS OF THE REFERENCE GAUSSIAN NOISE HISTOGRAM
Here, we analyze the stability of the reference Gaussian noise histogram h g from the visual and numerical perspectives, respectively. Firstly, due to space factor, we present only 18 randomly generated Gaussian noise histograms in Figure 1. Concretely, the histograms shown in the three rows are derived from Gaussian noises which pixel sizes are 512 × 512, 256 × 256 and 128 × 128,respectively. The histograms in the first two columns, the middle two ones, and the last two ones, correspond to Gaussian noises with σ = 25, σ = 50, σ = 100, respectively. It can be found from Figure 1 that the first two rows of histograms have better stability, while the third row histograms are less stable. This implies that the stability of the reference Gaussian noise histogram is mainly affected by the size of the noise (image) pixels, rather than the noise level. In fact, the larger histogram contains more random sampling points, which helps to improve the stability of the histogram.
Then, the stability of reference noise histogram is analyzed numerically. Figure 2 presents the histograms of ten randomly generated Gaussian noises (512 × 512), which are all zero mean and σ = 25. The experimental results show that the average mean and standard deviation in the 10 histograms are -0.0035 and 25.0020, respectively. Meanwhile, the variance of this ten standard deviations is 2.6966 × 10 −4 . Based on the above discussion, it can be concluded that the reference Gaussian noise histogram h g in the model (8) is stable. First, three auxiliary variables ξ 1 , ξ 2 , ξ 3 are introduced, and the problem (8) is split into arg min Next, By fixing x and ξ 1 , ξ 2 , ξ 3 (all three are fixed at the same time) respectively, the minimization problem (9) is solved. Specifically, when x in (9) is fixed, the following optimization problem need to be solved arg min The minimum point ξ 1 , ξ 2 , ξ 3 in (10) can be obtained by the histogram matching operation below, where the subscript i is the i − th component. Here the histogram matching operatorφ can make the histograms of the outputs ξ 1 , ξ 2 , ξ 3 conform to the reference Gaussian noise histogram h g . On the other hand, when the variables ξ 1 , ξ 2 , ξ 3 are fixed, Eq. (9) changes to arg min One can see that the last three quadratic terms in Eq. (14) can be combined into one term. So Eq. (14) reduces to arg min For simplicity of notation, let θ = β 1 + β 2 + β 3 , and Thus, Eq. (15) is simplified tô It can be found that the above Eq. (16) is obviously a TV-L2 problem, which can be solved by the noted Chambolle duality projection algorithm [53]. In consequence, the solution of Eq. (16) is stated as followŝ where the vector p can be obtained by the following fixed point iteration method: initializing p = 0 and iterating with τ ≤ 1 / 8 to ensure convergence, and readers can refer to [53] for more details here.

B. ALGORITHM DESIGN
To solve the proposed multiple residual Wasserstein-driven model, we design the above-mentioned optimization solution method, which can be summarized as Algorithm 1.
In essence, the algorithm is an alternate iterative implementation of histogram matching operation and Chambolle duality projection algorithm. Histogram matching makes the output residual more consistent with Gaussian distribution from the perspective of probability distribution, while Chambolle duality projection makes the output image more close to the original image. It can be seen that the algorithm is a process of implementing dual effects on residual distribution and output image.

VI. NUMERICAL EXPERIMENTS
In this section, the experimental parameters are first described. Then the experimental results on 12 test images and dataset BSD68 [54] are presented successively. Finally, the implementation efficiency of the experiment is analyzed.

A. PARAMETRIC DESCRIPTION
In the proposed MRWM (8), there are three model parameters β 1 , β 2 and β 3 . Considering that the status of the three noisy images is equal in the restoration process, we take three of VOLUME 10, 2022 Algorithm 1 Multiple Residual Wasserstein Driven Model (MRWM) for Image Denoising 1. Initialize: k = 1, x = eye(size(y 1 )). 2. Iterate on k = 1, 2, · · · , J . 3. Update ξ 1 , ξ 2 , ξ 3 by the histogram matching operations: 6. Fixed-point iteration: initializing p = 0 and iterating 7. Implement Chambolle's duality projection: the same parameters, namely β 1 = β 2 = β 3 . It is known that most model-based image restoration methods contain several model parameters. Finding optimal parameters by manual tuning is a daunting task. In fact, the MRWM has only one parameter, which makes the experiment easy to implement. The general theory of setting regularization parameters in image restoration is to ensure that the solution obtained by the model satisfies the discrepancy principle. Specifically, adjusting the optimal parameter pursues to make the variances of the output residuals y µ − x, (µ = 1, 2, 3) equal to the noise variance σ 2 as much as possible. In the specific experiment process, we take the test image ''plane'' as an example to illustrate the change curve of PSNR with β i (i = 1, 2, 3). The curve are shown in the Figure 3. Combined with some experimental experience, we finally determine the optimal parameters β i = 0.82, (i = 1, 2, 3). The initial iteration input x in Algorithm 1 is selected as the identity matrix of the same size as the noisy image. In experiments, we set the iteration number J=300. In fixed point iteration, the parameter τ = 1/8, which can achieve convergence.
We employ the 12 test images shown in Figure 4 and the commonly used test dataset BSD68 [54] to illustrate the effectiveness of the MRWM. In the experiments, the images to be processed contain Gaussian noise with σ = 25. The experimental hardware platform is dual-core CPU (1.80 GHz and 2.30 GHz), 64.0 GB RAM. The operating system is Windows 10.0, and the software environment is Matlab R2017b.

B. EXPERIMENTAL RESULTS ON 12 TEST IMAGES
The experimental results on the 12 test images shown in Figure 4 consist of two parts: the comparison of multi-sample and single-sample experiments; the comparison of MRWM with several representative denoising methods.
Firstly, Table 1 displays the denoising result comparison of MRWM-1 (i.e. β 1 = 0.82, β 2 = β 3 = 0 in (8)), MRWM-2 (i.e. β 1 = β 2 = 0.82, β 3 = 0 in (8)), and MRWM-3 (i.e. β 1 = β 2 = β 3 = 0.82 in (8)). In order to investigate the case of denoising four images (namely µ = 4 in Eq. (1)), we also add the corresponding experimental result MRWM-4 in Table 1. Figure 5 shows the PSNR and SSIM statistical figures of the three methods MRWM-1, MRWM-2 and MRWM-3. It can be seen that as the image sample number increases, PSNR and SSIM of the recovered images are significantly increased. In addition, Figure 6 illustrates the visual comparison of local magnification. It can be seen that increasing the number of image samples can significantly improve the visual performance of the restored image. These show that the proposed multiple residual histogram approximation by the Wasserstein distance is effective in image denoising.
Then, Table 2 shows the experimental results of MRWM and five representative Gaussian denoising methods BM3D [18], NCSR [20], GHP [21], WNNM [22], W-LDMM [31]. Fiugre 7 gives the average PSNR and SSIM curves of these six methods acting on 12 test images. One can see that MRWM is superior to the competing methods in terms of average PSNR and SSIM.
We know that the denoising effect of the classic TV-L2 is not as good as that of BM3D. However, from the experimental results in this paper, it can be found that the denoising effect of MRWM is better than that of the above representative denoising methods. In addition, in the specific experimental process, we also found that the effect of total variation is limited. This is because total variation is only a regularization of the image itself. For the problem of estimating the original image from multiple noisy images, multiple residual Wasserstein distance constraints can play a more important role.
In addition, the enlarged partial details of some images are shown in Figure 8 and Figure 9. As can be seen in Figure 8, compared to the original image, the fine particles around the starfish obtained by BM3D, NCSR, GHP, WNNM  and W-LDMM are polished. However, the proposed MRWM can better preserve these details, which make the denoised images look more natural and visually pleasing. We can also find from Figure 9 that the competitive methods tend to over-smooth image edges. In contrast, MRWM retains sharper edge information, which is an important image detail feature.
The proposed MRWM is a denoising method, which mainly deals with gray images. However, with the help of YUV color space [55], [56], MRWM for color image denoising is also feasible. The basic idea is to first construct YUV color space by utilizing RGB three channels of noisy color image, and then employ MRWM in this paper to denoise Y component to obtain Y component. Further, Y component is merged with the original UV components to obtain Y UV space. Finally, the Y UV space is transferred back to RGB channels to get the restored image. Figure 10 shows two examples of color image denoising using this idea.
We utilize the two metrics EP and TP in [57] and [58] to evaluate the edge preservation and texture preservation of the denoised images respectively. Specifically, the edge preservation metric of the m − th region of interest (ROI) in an image is defined as (19), shown at the bottom of the next page, where I m and I m are the corresponding matrices VOLUME 10, 2022  of the m − th ROI in the original image and the denoised image respectively. denotes a Laplacian operator. In fact, I is a highpass filtered form of I . I is acquired by a 3 × 3 pixel approximation of the Laplacian operator. I is the empirical mean of I . represents the correlation within ROI: Thus, the edge preservation on the M selected ROIs is expressed as One can find that EP has a larger value when the edge in ROI is sharp. On the other hand, texture preservation (TP) in a ROI is defined as where σ 2 m and (σ m ) 2 are the variance of the m − th ROI in the original image and the denoised image respectively. µ ori and µ den are the corresponding mean values. Then the texture preservation over the M selected ROIs can be obtained It can be found that the larger the TP value, the better the texture preservation of the restored image. Figure 11 shows regions of interest (ROI) of three images used for the evaluation of our EP and TP. Table 3 presents EP and TP evaluation results of three methods MRWM-1, MRWM-2 and MRWM-3 on three images Monarch, Cameraman and Starfish. It can be shown that in the proposed MRWM, the increase of the residual Wasserstein regularization term can effectively improve the performance of the restored images in edge and texture preservation.
Finally, we compare the proposed MRWM and multiple image denoising method t-SVD [59]. The 12 test images ,    shown in Figure 4 are still employed for denoising comparison. Table 4 shows the results of corresponding Gaussian denoising experiments, including PSNR and SSIM index values. It can be found from Table 4 that in terms of PSNR, MRWM is superior to t-SVD on 12 images, and both methods have their own advantages in SSIM. But for the average values of the two metrics, MRWM outperforms t-SVD. Meanwhile, Figure 12 shows the enlarged local results of the two methods on test image 3. It can be found from Figure 12 that t-SVD tends to blur the edge details of the image, while the proposed MRWM can better recover the texture information of the image. Therefore, MRWM based on image prior regularization and multiple residual distribution constraints is effective in image denoising.

C. EXPERIMENTAL RESULTS ON THE DATASET BSD68
To show the denoising capacity of the proposed MRWM, on the basis of the original BM3D [18] and WNNM [22], we further employ three state-of-the-art methods, including the recent ATVBH [23], deep learning based methods DnCNN [36] and FFDNet [39]. Moreover, we use Berkeley  segmentation dataset (BSD68) [54] to test the Gaussian denoising performance of MRWM. The dataset BSD68 containing 68 natural images is widely utilized for the evaluation of Gaussian denoising methods. Table 5 shows the average PSNR and SSIM results of the six denoising methods on the dataset BSD68. From Table 5, one can see that regarding the average PSNR and SSIM on BSD68, MRWM surpasses ATVBH by a large margin, and significantly outperforms BM3D and WNNM. According to the literature [60], [61], few denoising methods can exceed BM3D by more than 0.3dB on average. In contrast, the proposed MRWM outperforms BM3D by about 0.6dB for the average PSNR acting on the 68 images in BSD68. It should be mentioned that this gap (0.6dB) is very close to the PSNR bound (0.7dB) estimated over BM3D in [61].
It can also be found from Table 5 that the denoising effect of MRWM is slightly inferior to the deep learning based methods DnCNN and FFDNet. In fact, this is a common shortcoming of current model based denoising methods.    [59]. One can see that t-SVD tends to blur the edge details of the image, while MRWM can restore the richer texture information. However, our experimental results show that MRWM outperforms DnCNN and FFDNet numerically and visually on some images of BSD68. Here, three examples of specific images are listed for discussion. Figure 13 illustrates the visual results of the six methods on the image ''test001'' in BSD68. It can be seen that ATVBH, BM3D, WNNM, DnCNN and FFDNet tend to polish the detailed texture on the original image. However, the proposed MRWM can recover more realistic texture information on the stone sculptures. Figures 14-15 show the visual comparison results of the six methods on images ''test008'' and ''test021'' in BSD68 respectively. As can be seen from Figure 14, the tiger tail recovered by ATVBH, BM3D, WNNM, DnCNN and FFD-Net is partially lost, while MRWM can better present the complete tiger tail. It can be found from Figure 15 that there are some white spots in the local part of the original image. ATVBH, BM3D, WNNM, DnCNN and FFDNet can only recover very few white point information, while MRWM can recover sufficient white point details, which is more consistent with the real characteristics of the image.
The above quantitative and qualitative evaluations on the dataset BSD68 confirm that the proposed MRWM can not only effectively remove noise, but also restore the detailed texture features of images. The Gaussian denoising VOLUME 10, 2022  performance obtained by MRWM is attributed to the combined effect of multiple residual Wasserstein distribution approximation and image prior regularization.

D. IMPLEMENTATION EFFICIENCY
In image denoising, in addition to the quality of image restoration, the implementation efficiency of denoising method also needs to be considered. Although deep learning based methods such as DnCNN and FFDNet require a short time in the testing phase, they often take a lot of time to train the required neural network. Moreover, such training generally has to rely on the GPU to complete. Here, an image of size 481 × 321 in BSD68 is taken as an example to illustrate the operating efficiency of the model based denoising methods. The running time of our MRWM (16s) is longer than that of BM3D (3s), but less than that of ATVBH (26s) and WNNM (601s). It should be noted that BM3D mainly implements image restoration through C language. Overall, MRWM is still competitive in the implementation efficiency of image denoising.

VII. CONCLUSION
Different from most model-based image denoising methods, which focus primarily on exploiting the physical prior information existing in the image itself, the core of the MRWM proposed in this paper is to mine the efficacy of the multiple residual distribution approximation of degraded images. The effective combination of the Wasserstein distance driven residual estimation and the image total variation prior contributes to the improvement of the image denoising effect. For this realistic and meaningful problem, the proposed MRWM algorithm is simple, but its image denoising performance is better. The multiple residual distribution approximation provides new ideas for the development of other image restoration tasks such as deblurring and inpainting.
A limitation of MRWM is that it cannot be directly applied to the removal of non-additive noise, such as multiplicative Poisson noise. The removal of non-additive noise has always been of interest and value. One strategy is to transform the degraded image into one with additive white Gaussian noise, and then make full use of MRWM for image restoration.
RUI-QIANG HE received the Ph.D. degree in applied mathematics from the School of Mathematics and Statistics, Xidian University, Xi'an, China, in 2020. He is currently working with the Mathematics Department, Xinzhou Teachers University, China. His research interests include inverse problems in image processing and mathematical models and algorithms for image processing.
WANG-SEN LAN received the M.S. degree from the Shanxi University of Finance and Economics, Taiyuan, China, in 2004. He is currently a Professor with the Mathematics Department, Xinzhou Teachers University, China. His research interests include data analysis and its application in image processing.
FANG LIU is currently pursuing the Ph.D. degree with the School of Science, North University of China. She is also an Associate Professor with the Mathematics Department, Xinzhou Teachers University, China. Her research interests include image processing, modeling, and simulation of complex systems. VOLUME 10, 2022