Blind Universal Bayesian Image Denoising With Gaussian Noise Level Learning

Blind and universal image denoising consists of using a unique model that denoises images with any level of noise. It is especially practical as noise levels do not need to be known when the model is developed or at test time. We propose a theoretically-grounded blind and universal deep learning image denoiser for additive Gaussian noise removal. Our network is based on an optimal denoising solution, which we call fusion denoising. It is derived theoretically with a Gaussian image prior assumption. Synthetic experiments show our network’s generalization strength to unseen additive noise levels. We also adapt the fusion denoising network architecture for image denoising on real images. Our approach improves real-world grayscale additive image denoising PSNR results for training noise levels and further on noise levels not seen during training. It also improves state-of-the-art color image denoising performance on every single noise level, by an average of $0.1dB$ , whether trained on or not.


Blind Universal Bayesian Image Denoising With
Gaussian Noise Level Learning Majed El Helou , Student Member, IEEE, and Sabine Süsstrunk , Fellow, IEEE Abstract-Blind and universal image denoising consists of using a unique model that denoises images with any level of noise.It is especially practical as noise levels do not need to be known when the model is developed or at test time.We propose a theoretically-grounded blind and universal deep learning image denoiser for additive Gaussian noise removal.Our network is based on an optimal denoising solution, which we call fusion denoising.It is derived theoretically with a Gaussian image prior assumption.Synthetic experiments show our network's generalization strength to unseen additive noise levels.We also adapt the fusion denoising network architecture for image denoising on real images.Our approach improves real-world grayscale additive image denoising PSNR results for training noise levels and further on noise levels not seen during training.It also improves state-of-the-art color image denoising performance on every single noise level, by an average of 0.1d B, whether trained on or not.Index Terms-Additive Gaussian noise removal, Bayesian estimation theory, deep learning, CNN image denoiser optimality.

I. INTRODUCTION
I MAGE denoising is a fundamental image restoration task applied in all image processing pipelines.An image denoiser can also be part of deep network models to improve the training of high-level vision tasks [27].However, being an ill-posed inverse problem, denoising is challenging [14].
After the development of the best analytical solution, BM3D [8], [18], little improvement in denoising performance was achieved until the advent of deep learning denoisers [59].Recent Convolutional Neural Network (CNN) based methods achieve state-of-the-art image denoising performance and are even faster than traditional optimizationbased approaches [55].The increased capacity of deep CNN models also addresses the limitation of previous multi-layer perceptron methods when it comes to denoising different levels of noise [5].Well-designed CNN architectures can also outperform adversarial training methods in image restoration tasks [45].
Neural networks can be deep and wide and thus have large capacity to model complex functions [56], [61], by leveraging network regularization or normalization [21] and residual learning [19].However, the complex functions modeled by Manuscript received August 5, 2019; revised January 27, 2020; accepted February 20, 2020.Date of current version March 9, 2020.The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Lisimachos P. Kondi.(Corresponding author: Majed El Helou.) The authors are with the School of Computer and Communication Sciences, EPFL, 1015 Lausanne, Switzerland (e-mail: majed.elhelou@epfl.ch).
Digital Object Identifier 10.1109/TIP.2020.2976814 the networks are not interpretable and have little connection to stochastic denoising.This is a limitation for training general models for denoising different noise levels.Denoisers are blind when they require no information about the noise level at test time, and universal when a single model can handle all noise levels.Blind universal models are important since knowing the noise level, at test time or ahead of training, is not a practical scenario for most applications.
We first mathematically derive a blind and universal denoising function under the theoretical assumption that the image prior is Gaussian.Our denoising function, which is optimal in stochastic expectation, is referred to as fusion denoising because it fuses the input with a prior weighted using the signal-to-noise ratio.It is optimized for additive Gaussian noise removal.Our experimental results show that the stateof-the-art denoiser DnCNN [59] can model an optimal fusion denoising function.However, it only models it for noise levels that are seen by the network during training.For unseen levels, our synthetic experiment's fusion network, called Fusion Net, far outperforms DnCNN.We show on synthetic data our improved generalization results.
The assumption that the image prior is Gaussian does not necessarily apply to real-world images.Building on the foundations of our theoretical solution, we adapt our Fusion Net by designing a second network that learns a fusion function for additive Gaussian noise removal.We call this new network Blind Universal Image Fusion Denoiser (BUIFD).BUIFD improves state-of-the-art denoising performance on noise levels seen in training for grayscale and color images on the standard Berkeley test sets (BSD68 and CBSD68) [41].Furthermore, we show that our generalization results to unseen noise levels obtained in our synthetic experiment extend to the denoising of the grayscale BSD68 test set.Indeed, the denoising performance on noise levels not trained on improves by multiple PSNR points.We present an extended denoising evaluation that covers other test datasets and other traditional and learning-based denoising methods.
Our main contributions are: (1) we theoretically derive an optimal fusion denoising function and integrate it into a deep learning architecture (Fusion Net) to evaluate the optimality of deep networks on a theoretical additive Gaussian noise removal task with known prior, (2) we show on synthetic data that the integration of the auxiliary fusion loss into our Fusion Net improves the network's generalization power bringing closer to the optimal solution, and (3) we develop a blind universal image fusion denoiser (BUIFD) network adapted to real images, and show that it outperforms the state of the art for This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see http://creativecommons.org/licenses/by/4.0/.
Gaussian noise removal on multiple standard image processing test sets.
The paper is organized as follows.After a review of related work, we first lay the ground for our theoretical experiments.Our experiment allows us to assess the optimality of the networks on training noise levels and the generalization of trained networks to unseen Gaussian noise levels, in comparison to the optimal Bayesian solution.We then extend the Bayesian framework solution into our network designed for real images (BUIFD) whose exact prior is unknown to improve generalization.Experimental results on standard denoising benchmarks show that our denoising network outperforms the state of the art, especially on unseen noise levels.

II. RELATED WORK
Image denoising approaches in the literature can be divided into classical methods and the more recent deep-learningbased methods.One common aspect is, however, the leveraging of image priors to improve denoising results.For practical reasons, it is important for a denoiser to be blind and universal since the noise levels in noisy images might not be constant or known.
Noise Modeling: Additive white Gaussian noise is not necessarily the best model in practical scenarios such as denoising raw images [3].Nevertheless, a large part of the image denoising literature focuses on Gaussian denoising since it remains a fundamental problem.Images with noise following different, potentially data-dependent, distributions can be transformed into images with Gaussian noise, and transformed back [31], [38].In addition, a Gaussian denoising solution can serve as a proximal [26], [36] for image regularizers.It can be a substitute for the costly step in halfquadratic splitting (HQS) optimization, typically responsible for non-differentiable regularization in image processing.This approach is taken in the recent HQS method that leverages the denoiser for image restoration [60].We thus work with the assumption of an additive white Gaussian noise model.
Image Denoisers: Having to know the exact noise level is a serious limitation in practice for denoisers, and to know it ahead of time, before training, is even more limiting.A fixed and known noise level is also a limitation when denoising images with spatially-varying noise level [61].Not having a universal denoising model means that multiple models need to be trained and stored for different noise levels, and that noise level knowledge is required at test time.The recent method [60] that generalizes to image restoration tasks is a non-universal non-blind denoiser, where 25 denoising networks are used for noise levels below 50, and even training parameters are chosen based on the noise level.Similarly, Remez et al. [39], who reach PSNR results on par with the state of the art, is another non-universal non-blind example.To leverage better priors, images are first classified into a set of classes and every single class has its specific deep network.The method is also not blind and is trained per noise level.Zhang et al. [62] present a universal non-blind network for multiple super-resolution degradations by denoising, deblurring, and super-resolving images.They report that although a blind version is more practical, their blind approach fails to perform consistently well since it cannot generalize.
Blind Universal Denoisers: The state-of-the-art Gaussian denoiser DnCNN is both universal and blind [59].It is a deep network that is jointly trained on randomly-sampled noise level patches to generalize denoising to a range of noise levels.It has not been outperformed yet by other methods, whether blind or not [16], [48].Only the recent FFDNet [61] by the same authors of DnCNN [59] improves on DnCNN for noise levels 50 and 75 by 0.06 and 0.15d B respectively, on the Berkeley BSD68 set, while performing similarly or worse for other levels.It is, however, not a blind network as it requires a noise level map as input.Lefkimmiatis [26] recently studied universal denoising, building on prior work for modeling patch similarity in CNNs [25].His methods are, strictly speaking, not universal as two networks are trained separately, one for low (≤ 30) and one for high noise levels (∈ [30,55]).They are thus non-blind since a noise-levelbased choice must be made at inference time.Furthermore, the published results do not outperform the blind DnCNN denoising results.We thus conduct evaluation comparisons of our BUIFD method with the state-of-the-art DnCNN and the classic BM3D approach [8], [9], which is the best nonlearning-based denoiser.It leverages image self-similarities by jointly filtering similar image patches.The authors also present a blind version of the BM3D algorithm, and we compare to both blind and non-blind versions.
Our proposed image denoiser BUIFD learns to disentangle its features to predict a prior and a noise level intermediate results.They serve as inputs to the fusion part of the network, responsible for the final denoising.Disentangling the feature space is fundamental for interpretability [6], partial transfer learning [57], domain translation [54], domain adaptation [58], specific attribute manipulation [12], [28], [63] and multi-task networks [2].In our case, it is fundamental for our theoretical denoising function since the different representations serve as its inputs.

A. Theoretical Framework
Although some specific applications can have a more accurate modeling [24], [49], an additive white Gaussian noise model is often assumed in denoising tasks, as it models common acquisition channels [52].We thus assume that the additive independent and identically distributed noise n follows a Gaussian distribution N (0, σ 2 n ), and is uncorrelated with the data x.The noise standard deviation σ n is called noise level.In a Bayesian framework, the conditional probability distribution of the noiseless data x given a noisy observation y (where y = x + n) is given by the relation where X and Y are the random variables corresponding respectively to x and y.We are interested in the conditional distribution as we search for the Maximum Aposteriori Probability (MAP) estimate x of x.The former is We also model the data prior on x as a Gaussian distribution N ( x, σ 2 x ) centered at x [40].We later modify this assumption in Sec.III-D to the practical case of real-world images.The conditional probability of y given a noiseless x value is and the probability distribution of y is the convolution of those of x and n, given in the Gaussian case by where is the convolution operator.With these probability distribution functions, we can obtain an expression for the conditional distribution of x given its noisy observation y by substituting Eq. (3) and Eq. ( 4) into Eq.( 1).p X |Y (x|y) can also be written in the following form of a Gaussian in x, given an observation y By matching the expanded expression of p X |Y (x|y) with Eq. ( 5) for all possible x values, we obtain the expressions for μ and σ For the Gaussian shown in Eq. ( 5), the MAP estimator is also the conditional expected value (mode and mean being equal) and it is hence given by which, using Eq. ( 5), can be directly derived to be where S σ 2 x /σ 2 n and stands for Signal-to-Noise Ratio (SNR).We call this operation fusion denoising as it fuses the prior and the noisy image, based on the SNR.
Image denoising models are typically trained to maximize PSNR or equivalently minimize Mean Squared Error (MSE) loss.This means that with close-to-optimal convergence of a neural network model (MSE loss → 0 + ), its output tends towards the minimum MSE estimator (MMSE).With our Gaussian modeling, this leads to the MAP estimator x of Eq. ( 8).Thus, an MSE reconstruction loss in a neural network leads it to the estimator x, iff S and x are correctly predicted and correctly used in the fusion with the noisy input y, as in Eq. ( 8).The optimal fusion, used as reference in our experimental evaluation in Sec.IV-B, is given the exact S and x values for Eq. ( 8).

B. Fusion Net Architecture
We incorporate the basic structure of the optimal fusion solution into the architecture of a neural network, which we call Fusion Net.We build the main blocks of our Fusion Net based on the blind DnCNN introduced in [59] and illustrated in Fig. 1(a).In Fig. 1, the noise-predicting CNN of DnCNN (Fig. 1(a)), the prior-predicting CNN, and the one predicting f (S) (where f (S) 1 1+S ) in our Fusion Net (Fig. 1(b)), all leverage the same DnCNN architecture design.The CNNs are all constituted of a sequence of convolution layers, rectified linear units (ReLU) [34] and batch normalization blocks [21].Note that f (S) is inversely-proportional to the SNR and proportional to the noise level.It is the factor multiplying the prior in Eq. (8).To summarize, the f (S) CNN predicts 1 1+S where S is the SNR of the input image (determined by the noise level and the image model used in our theoretical settings), and the prior CNN predicts x defined in Eq. (7).
Unlike the DnCNN that predicts the noise values in the input noisy image, then subtracts them from the noisy input to yield the final denoised output, our network learns optimal fusion denoising given by the function in Eq. ( 8), as illustrated in Fig. 1(b).The same depth and capacity of the DnCNN are retained to learn separately the image prior and the SNR function, f (S), that is required for the weighted fusion of the prior and the noisy input image.Note that SNR learning also contains a form of prior knowledge, but of variance rather than of expectation.We subtract from the prior our noisy input image and multiply the result, pixel-wise, with the SNR function.This yields the noise prediction given a noisy input, which we subtract from the latter to obtain the denoised output.This architecture is mathematically equivalent to Eq. ( 8).However, the wiring of Fig. 1(b) allows us to clearly have a residual learning connection and to keep the parallelism between the two aforementioned networks.

C. Fusion Net Feature Disentangling
To mimic the optimal fusion between image prior and noisy image based on the SNR, as in Eq. ( 8), both the architecture and loss function are adapted.For the fusion, the network needs to predict the image prior x and f (S) per pixel (Fig. 1(b)).We obtain, with close-to-zero MSE reconstruction loss of our Fusion Net, that the ground-truth target and the network output are approximately equal where a and b are the outputs of intermediate layers in the Fusion Net, and y is the noisy input.Specifically, a is the output of the final layer of the prior CNN in Fig. 1(b), and b the output of the last layer of f (S) in the same figure.After gradient descent convergence, when the MSE reconstruction loss is close to zero, we get the approximate equality of the left and right terms in Eq. (9).We can view this equation as a first-degree polynomial in the variable y.As Eq. ( 9) holds for all y in the training dataset D T , we can apply coefficient equating, where the coefficients are {a •b, 1−b} and { x • f (S), (1− f (S))}.We thus obtain the approximate equality between a and x and between b and f (S).The network intermediate outputs {a, b} are therefore, respectively, equal to the prior and the SNR function { x, f (S)}, with close-to-zero MSE reconstruction loss ∀y ∈ D T .This extends to other y outside the dataset assuming that the latter is general enough.We can further incorporate optimal denoising information in the Fusion Net, under the theoretical settings described in Sec.III-A, through explicit SNR learning with a dedicated loss term.The fusion representations, i.e. the prior x and f (S), are thus further enforced through a penalty term for predicting f (S) in the loss function.The full loss function L f of the Fusion Net is given by where α is a weight parameter, the first term is the MSE reconstruction loss similar to that of the DnCNN, and the second term is a reconstruction loss for f (S).Following Eq.( 9), is the denoised output of the Fusion Net.The Fusion Net therefore minimizes the reconstruction loss over the denoised image by learning to predict the image prior and the SNR function values separately.Unlike the DnCNN residual learning network, which only leverages ground-truth noise-free images during training, the Fusion Net also leverages explicit SNR information.

D. Denoising Non-Gaussian Images
Here, our main objectives are to (1) design a Blind Universal Image Fusion Denoiser (BUIFD) for real images, by adapting the theoretical fusion strategy integrated in our Fusion Net, (2) evaluate the denoising performance of BUIFD on training noise levels, and (3) assess the generalization to unseen noise levels with real images.
Since a real image cannot be modeled with a simple Gaussian prior, our image fusion denoising network used for real images (BUIFD), shown in Fig. 1(c), is adapted from the theoretical Fusion Net, shown in Fig. 1(b), by modifying the fusion part.We replace the optimal mathematical fusion by a product fusion step followed by trainable convolution layers.We use three convolution layers to learn the data-dependent fusion function.The optimal fusion function F is to be applied on the noisy input image y, the prior prediction, and the noise level prediction where the prior-predicting and noise-level-predicting network functions are respectively f P and f N , with their corresponding learned parameters θ P and θ N , and the denoised estimate is x.
Intuitively, the prior-predicting network ( f P ) is used to predict the expected value of the unknown real-word distribution out of which the intensity of a given pixel is sampled, and that for each pixel.The noise-level-predicting network ( f N ) predicts the noise level, which is used to control the weighted average between prior and observation.When the noise level is low, the actual observation can be given more weight, and when the noise level is high, the current observation is less reliable and the fusion resorts more to the use of the prior estimation.
The optimal fusion F can be approximated by F modeled with three convolution layers.However, we expect F to contain pixel-wise inter-input multiplications similar to the ones of Eq. ( 8).Since such pixel-wise multiplications cannot be replicated with convolutions, we pass two additional inputs into the convolution layers that model F. These two additional inputs are given by (12) where is pixel-wise multiplication.They are concatenated with the inputs of F given in Eq. ( 11), yielding five different inputs that are sent to F. The two additional inputs reduce the learning burden of the convolution layers and improve the denoising performance.Note that we normalize f N (•, •) ∈ [0, 1].We call this pixel-wise multiplication step and the concatenation of the additional inputs the product fusion (shown in the pipeline of Fig. 1(c)).These two fusion steps, namely the product fusion and the three convolution layers, form F and realize point (1) above.The BUIFD's optimization loss is given by where C is the concatenation of the inputs listed in Eq. ( 11) and Eq. ( 12), namely, {y, f P (y, θ P ), f N (y, θ N ), f P (y, θ P ) x is the ground-truth original image, and f N (y, θ N ) and N are respectively the predicted and ground-truth noise level values, normalized to [0, 1].We discuss the relation between BUIFD (Fig. 1(c)) and our theoretical Bayesian network Fusion Net (Fig. 1(b)) in detail in the following section.

E. Relation With the Bayesian Framework
The Fusion Net in Fig. 1(b) explicitly models the relation with the Bayesian solution in the theoretical experiments.We discuss in what follows the relation between BUIFD (Fig. 1(c)) and the Bayesian solution Eq. (8).We first note that a Gaussian prior does not perfectly model real images, and thus, we expect that the real-image BUIFD network (Fig. 1(c)) deviates from the Fusion Net (Fig. 1(b)), from which it is inspired, to adapt to real images.However, as addressed in Sec.III-D, the relation between BUIFD and the Bayesian framework is strongly pertinent.
First, the product fusion Eq. 12 explicitly creates the same components as in the Bayesian equation Eq. ( 8).This product fusion weighs noisy input and learned prior based on SNR, as in the Bayesian fusion.The fusion layers are only 3 convolutional layers with no non-linearities, to ensure that mostly an additive fusion of our Bayesian terms takes place, with local smoothing, and the relation with the Bayesian solution is preserved as much as possible.
Second, we do not predict an image prior in the sense of a pixel intensity probability distribution, but only the expected mean of that unknown distribution.In the literature, priors are often probability distributions of image gradients, but our definition is quite distinctive.Our prior is, per pixel, the expected value of the distribution out of which the pixel's intensity was sampled.Even with noise-free images, one cannot exactly know that distribution (nor its mean), per pixel, to assess how much this definition is still respected in the BUIFD network with real images.However, all other Bayesian components are consistent, and the empirical results as well.Our improvement of 3.30d B at unseen noise level 70 in the theoretical experiment is paralleled by an improvement of about 3d B at noise level 75 in the real image BSD68 experiment.
We hope our methodology motivates future work to analyze deep network optimality on theoretical experiments that are designed such that an optimal solution is known, and that it motivates deep network design inspired from Bayesian solutions.

A. Fusion Net Experimental Setup
The networks are trained (and tested) with data generated synthetically according to the theoretical assumption of a Gaussian image prior as defined in Sec.III We train the networks for 50 epochs with mini-batches of size 128.We use the Adam optimizer [22] with an initial learning rate of 0.001 that is decayed by a factor of 10 every 30 epochs, the remaining parameters being set to the default values.The weight α in Eq. ( 10) is set to 0.1.We train the networks with multiple levels of noise.The standard deviation of the additive Gaussian noise is chosen uniformly at random within the interval [5,25] during the training.At the end of every epoch, the noise components are re-sampled, following the same procedure, but not the ground-truth images.For the testing phase, the networks are evaluated on test images where the added noise is also Gaussian, with a given standard deviation.

B. Fusion Net Evaluation
PSNR results of DnCNN, our Fusion Net, as well as the optimal upper bound are presented in Table I.The optimal upper bound denoising performance is that of the optimal mathematical solution in Eq. ( 8).We can see that both the DnCNN and the Fusion Net perform similarly on the training noise levels (left half of the table), and very close to optimal.To validate that the results are indeed statistically similar, we analyze the distribution of PSNR values across the test This shows that the Fusion Net, despite the modeling that mimics optimal denoising fusion and the additional training information to learn SNR values, performs similarly to the DnCNN.The latter has therefore enough capacity and learns an optimal denoising.This, however, only holds for the noise levels seen during training by the networks, shown in the left half of Table I.The confidence in the null hypothesis decreases with increasing test noise levels.With a significance level above 0.053, the null hypothesis would even be rejected for noise level 25.
The evaluation results on noise levels larger than 25, which are not trained on by any of the networks, are reported in the right half of Table I.For these larger noise levels, the null hypothesis is very clearly rejected as there is a growing performance gap between DnCNN and our Fusion Net.The p-value quickly drops to zero when there is a PSNR gap, since variances are very small in our results.The Fusion Net generalizes better to unseen noise levels, even performing close to optimal up to noise level 60.The further we increase the noise level, the larger is the performance gap between the Fusion Net and the DnCNN.Although both networks perform well for the training noise levels, the Fusion Net learns a more general model and clearly outperforms on unseen noise levels.

C. Real-Image Experimental Setup
We use the referenced implementation by the authors of DnCNN and the same datasets. 1As mentioned in Sec.III-D, the architecture of our prior-predicting network is identical to 1 https://github.com/SaoYan/DnCNN-PyTorchthat of DnCNN. 2 All the network details are available in [59] and we omit the repetition.The same network depth and feature layers are thus used in the prior-predicting network (18 main blocks) in Fig. 1(c).The noise level network is a shallower one consisting of 5 blocks similar to the ones    [7], [43] for grayscale training and the 432 color Berkeley images for color training, as in [59].The same architectures are retained for grayscale and color networks.

D. Real-Image Evaluation
Grayscale denoising evaluation is carried out over the standard Berkeley 68 image test set (BSD68) [41] taken from [32].Table II reports the results of our fusion approach and of the state-of-the-art blind DnCNN, when they are both trained with noise levels up to 55 or up to 75.Note that for our fusion approach that is trained up to noise level 55, we map the maximum network prediction of 1, during training, to 55 and not to the maximum test noise level, for a more fair comparison.The results of the blind version of BM3D as well as those of the non-blind BM3D, which is given the correct test noise level at inference time, are also reported for reference.We restrict all noisy test images to the range [0, 255], as having negative intensities, or values exceeding 255, is not a configuration encountered in practice.
Fig. 3 shows our intermediate feature results, the prior and the noise level values, along with denoising results.The denoised image is created by fusing the noisy input image with the network-derived prior and the noise level values.The fusion is carried out by the product fusion step and the three convolution layers.As in practical scenarios, the denoised outputs are clipped to [0, 255], as are the noisy input images.Our results better remove the noise compared with those of DnCNN over low frequency regions, and details are better reconstructed over the high-frequency content.We note that, at high noise levels, there is a smudging effect most visible around low-frequency regions (Fig. 3 (k) and (l)), which creates blurry and noisy edges.These are created by both networks, but are more salient in our result (k) as it is less noisy than (l).The higher the noise level and standard deviation of the Gaussian noise, the larger the number of averaged samples needs to be such that the statistical mean converges to zero.This makes the local mean of the noise across small patches vary around zero from region to region, randomly, and causes the smudging-like or wave-like effect (notice over low-frequency regions how almost all these artifacts have a curve shape, rather than a linear one, which is modeled by the various different mean values around them).
As seen in Table II, our fusion approach improves the PSNR at every single noise level starting from 15−20, which includes seen levels for both training ranges.Comparing DnCNN 75 and BUIFD 75 , which are trained on all noise levels, we also note with our approach an improvement of up to 0.7d B and an average improvement of 0.36d B. We outperform even the non-blind version of BM3D by an average of 0.25d B with our version trained on all noise levels and we perform just as well as the non-blind BM3D when training only up to level 55.Comparing the results of DnCNN 55 and of BUIFD 55 in Table II, for unseen noise levels in the range (55, 75], we see that the generalization of the fusion approach to unseen noise levels indeed applies to real images.The improvement of multiple PSNR points for level 75 is consistent with that obtained in our synthetic experiment in Table I.
The results in Table III illustrate denoising images with spatially-varying noise level, without re-training the networks.Noise is added across an image with a level that increases linearly with rows.For the non-blind BM3D, we input the average noise level as a guide.The BUIFD network can handle spatially-varying noise, which neither the prior nor the noise level predicting network branches are trained on.It outperforms DnCNN on all noise setups, whether the networks are trained on the full range or only up to level 55.
For color image denoising, we use the standard color version of BSD68 (CBSD68) for testing.Noise is simulated and added to each test image before running it through a denoising method.PSNR results are reported in Table IV.The high inter-channel correlation between the RGB color channels [13] allows all methods to perform significantly better in terms of denoising PSNR on color images compared with grayscale images.We note that this advantage of having multiple correlated channels as in color imaging is not always available, for instance with single-wavelength imaging [29].We hypothesize that this correlation also enables the networks to implicitly learn the noise level prediction.High correlation implies that the network sees multiple approximately equal data samples with different noise instances drawn from the same distribution.Thus, it more easily learns an estimate of the noise variance.Each of the two networks therefore performs more or less the same when trained up to noise level 55 and when trained up to noise levels 75.Our fusion approach, however, consistently outperforms CDnCNN on every single noise level for both training noise ranges.Our average improvement over CDnCNN is about 0.1d B. We also note that the networks outperform, on average, even the nonblind CBM3D by about 0.5d B for CDnCNN and 0.6d B for our CBUIFD.Sample image denoising results for grayscale and color images are illustrated in Fig. 4, 5 and Fig. 6, 7 respectively, for the non-blind BM3D and the blind networks DnCNN and BUIFD trained on the full range of noise levels.The main trade-off seen between the results of BM3D and those of DnCNN is on detail reconstruction.The non-blind BM3D achieves good PSNR reconstruction but at the expense of blurring the results.This causes a loss of details (visible on the large rock in Fig. 4, and the zoom-in insert in Fig. 5) and of edge sharpness (visible on the borders of the lake in the zoom-in insert in Fig. 4).The DnCNN results suffer less of a blurring problem, but the noise-removal is not optimal in certain areas such as smooth surfaces (visible on the inner area of the lake in the zoom-in insert in Fig. 4).Our approach achieves a good performance in terms of this trade-off.BUIFD achieves good PSNR results, with significantly less blurring than the non-blind BM3D (see Fig. 5 for example).

E. Extended Benchmark Comparisons
We present more denoising experimental tests on different benchmark datasets, and compare the results of different denoising approaches on these datasets.We report blind denoising results for noise levels 10 to 80 (with a step size of 10) on the BSD68 dataset, Set5, Set14, Sun_Hays80, Urban100, and Manga109 datasets.Set5 and Set14 are made up of, respectively, 5 and 14 traditionally-used images for testing image processing algorithms.Most of their images are smaller than 512×512.The Sun_Hays80 dataset is made up of the high-resolution version of the 80 images presented in [46], with sizes smaller than 1024 × 1024.The Urban100 dataset is a collection of 100 high resolution images taken from Flickr using urban keywords [20].The Manga109 dataset is constituted of 109 professional artist drawings [33], of size 827 × 1170.We present in Table V the denoising results of the blind non-learning methods BM3D, EPLL [64], KSVD [1], and WNNM [17] that were developed for Gaussian denoising and are given, to enforce the blind setting, the default noise level set by the non-blind BM3D (set to 25), and the learningbased methods DnCNN [59] and BUIFD, on denoising the luminance of the images with added Gaussian noise levels ranging from 10 to 80.We also evaluate another learning-based method with the same training hyper-parameters as those of DnCNN, namely, the MemNet architecture [47], and extend our fusion technique to that architecture and call it BUIFD(M).It is constructed following Fig. 1(c), with the exception that the MemNet architecture replaces that of DnCNN for the prior-predicting CNN.All the learning-based methods in this section are trained up to noise level 55.Table V shows the PSNR and SSIM metrics for each and we highlight in bold the best-PSNR and best-SSIM method between DnCNN and BUIFD, and between MemNet and BUIFD(M).A sample visual result is shown in Fig. 8, taken from Set14.

V. CONCLUSION
We define a theoretical framework under which we derive an optimal denoising solution that we call fusion We integrate it into a deep learning architecture compare with the optimal mathematical solution and with the stateof-the-art blind universal denoiser DnCNN.Our synthetic experimental results show that our Fusion Net generalizes far better to higher unseen noise levels.
We learn a data-dependent fusion function to adapt our fusion denoising network to real images.Our universal image fusion denoising network BUIFD improves the stateof-the-art real image denoising performance both on training noise levels and on unseen noise levels.

Fig. 1 .
Fig. 1.(a) Schematic of the DnCNN residual learning approach for denoising.The network predicts the noise in an image.(b) Our Fusion Net that explicitly learns the SNR function for optimal fusion of the noisy image with the learned prior, following Eq.(8).(c) Our real-image fusion denoiser, BUIFD, where fusion is carried out with a pixel-wise product stage followed by three convolution layers for learning a general fusion function (Sec.III-D).
-A.The training data is composed of over 200k patches of size 40 × 40 pixels.Image pixel intensities for the training data are drawn at random from N (127, 25 2 ), following the Gaussian image prior assumption, and all values are normalized to [0, 1] before the training through division by 255 and clipping of all values outside the interval to the interval's closer bound when noise is added.For the testing data, 256 images of size 256 × 256 pixels are used, and they are created with the same procedure as that of the training data.

Fig. 2 .
Fig. 2. Training losses of the different learning-based methods.Per epoch, we plot with a full black curve the overall loss (i.e.reconstruction loss) of the base methods DnCNN and MemNet, in (a) and (b) respectively.The same reconstruction loss with our fusion method is plotted with a dotted red curve, the noise-level loss computed on the corresponding intermediate output (i.e. the output of the noise level CNN) is plotted with a dotted blue curve, and the overall loss for the fusion methods (the sum of the former two losses) is plotted with a dotted green curve.Note the abrupt small improvement in loss reduction at epoch 30, which is when the learning rate is exponentially decayed.We can see that the different learned function converge by the end of training (logs shown for the methods with upper training noise level 55).

Fig. 3 .
Fig. 3. Left to right: original and noisy images, prior and noise level predictions of BUIFD, our fused denoising result and the DnCNN denoised image.Our denoising result is created by fusing the noisy image, the prior and the noise level values, for instance (e) is F((b), (c), (d)).All the networks are trained on noise levels in [0, 55].Whether the noise level is seen (25), or not seen (75), during training, our denoised results show better noise removal (sky in (e-f), window, wall and arms in (k-l)).We show the PSNR in d B and the SSIM [50] between parentheses for the different results.Best viewed on screen.

Fig. 4 .
Fig. 4. Grayscale image denoising example from BSD68.All networks are trained on all noise levels [0, 75] and we test on noise level 25.Nonblind BM3D loses edge details due to blur smoothing.The network results are sharper, with the better PSNR being that of BUIFD 75 .Best viewed on screen.

Fig. 5 .
Fig. 5. Grayscale image denoising example from BSD68.All networks are trained on all noise levels [0, 75] and we test on noise level 45.Non-blind BM3D results are very smoothed, and details are lost.DnCNN preserves more details, but at the expense of PSNR.Our blind approach preserves details and outperforms the non-blind BM3D in terms of PSNR.Best viewed on screen.

Fig. 6 .
Fig. 6.Color image denoising example from CBSD68.All networks are trained on the full range of noise levels [0, 75] and we test on noise level 25.Best viewed on screen.

Fig. 7 .
Fig. 7. Color image denoising example from CBSD68.All networks are trained on the full range of noise levels [0, 75] and we test on noise level 45.Best viewed on screen.

Fig. 8 .
Fig. 8. Sample visual result from Set14, with PSNR(d B)/SSIM values.The top row shows non-blind results with the traditional methods KSVD, BM3D, EPLL and WNNM, as the noise level is 25, which the default set when the noise level is unknown.And the bottom row shows the results with the different learning methods.

TABLE I TEST
SET PSNR (d B) RESULTS FOR THE NOISE STANDARD DEVIATIONS GIVEN IN THE TOP ROW.THE NETWORKS ARE TRAINED ON NOISE LEVELS RANDOMLY CHOSEN IN [5, 25].NOISE LEVELS IN THE RIGHT HALF OF THETABLE ARE NOT SEEN DURING TRAINING.WE ALSO REPORT THE OPTIMAL BAYESIAN DENOISING (OPTIMAL FUSION).THE BOTTOM ROW SHOWS THE INDEPENDENT TWO-SAMPLE T-TEST RESULTS BETWEEN DNCNN AND OUR FUSION NET.THE TWO-TAILED p-VALUES VALIDATE THE NULL HYPOTHESIS OF EQUAL AVERAGE PSNR BETWEEN DNCNN AND THE FUSION NET ON TRAINING NOISE LEVELS, WITH SIGNIFICANCE LEVEL 0.05

TABLE II PSNR
(d B)/SSIM COMPARISONS OF grayscale IMAGE DENOISING ON THE BSD68 STANDARD TEST SET.WE COMPARE THE NON-BLIND BM3D, THE BLIND BM3D, DNCNN, AND OUR BUIFD.DNCNN σ OR BUIFD σ INDICATES THAT THE NETWORK SEES NOISE LEVELS only UP TO σ DURING THE TRAINING.BOLD INDICATES THE BEST BLIND RESULT, FOR EACH RANGE OF TRAINING NOISE LEVELS, AND THAT BEST RESULT IS SELECTED BEFORE ROUNDING.NOTE: SMALL DEVIATIONS IN REPORTED PSNR VALUES COMPARED WITH THE LITERATURE, NOTABLY ON HIGHER NOISE LEVELS, ARE DUE TO CLIPPING NOISY INPUTS (AND OUTPUTS) TO [0, 255], AS A PRACTICAL CONSIDERATION set.A two-sided T-test (independent two-sample T-test) is used to evaluate the null hypothesis that the PSNR results of both networks have similar expected values.This test is chosen as we have the exact same sample sizes defined by the test dataset, and the variances of PSNR results are very similar.The T-test results are given in the bottom row of TableI, and the null hypothesis holds for all configurations in the left half of the table (for a 0.05 significance level, i.e., a p-value ≥ 0.05).

TABLE III WE
EVALUATE PSNR VALUES, WITH SPATIALLY-VARYING NOISE LEVEL, ON THE BSD68 TEST SET.THE NOISE LEVEL INCREASES LINEARLY WITHIN THE IMAGE OVER THE RANGE [σ c − 10, σ c + 10].THE NON-BLIND BM3D IS GIVEN THE CENTRAL NOISE LEVEL σ c

TABLE IV PSNR
(d B)/SSIM COMPARISONS OF Color IMAGE DENOISING, SIMILAR TO TABLE II, ON THE CBSD68 STANDARD TEST SET.BOLD INDICATES THE BEST BLIND RESULT, FOR EACH RANGE OF TRAINING NOISE LEVELS, AND THAT BEST RESULT IS SELECTED BEFORE ROUNDING from a specified range (details in Sec.IV-D), and is the same for all pixels in a given training patch.We use the training hyper-parameters of DnCNN, for training it and for training BUIFD, the hyper-parameters are not tweaked for BUIFD.The noise level predictor is jointly trained within BUIFD, so both network branches always see the same training data (with the same simulated noise distributions) as each other in the experiments of Sec IV-D.We use the 400 Berkeley images

TABLE V PSNR
/SSIM EVALUATION OF THE Blind BM3D, EPLL, KSVD, WNNM, DNCNN, BUIFD, MEMNET, AND BUIFD(M).BOLD INDICATES THE BEST BLIND DENOISING RESULT IN TERMS OF PSNR OR SSIM BETWEEN EACH PAIR OF LEARNING METHODS, FOR EACH GAUSSIAN NOISE LEVEL.WE CLIP NOISY IMAGES TO [0, 255], AS A PRACTICAL CONSIDERATION