BoostNet: A Boosted Convolutional Neural Network for Image Blind Denoising

Deep convolutional neural networks and generative adversarial networks currently attracted the attention of researchers because it is more effective than conventional representation-based methods. However, they have been facing two serious problems in the trade-off between noise removal, artifacts, and preserving low-contrast features and high-frequency details. In particular, deep convolutional neural networks might fail to remove strong noise in regions with higher noise levels while completely erasing low-contrast features and high-frequency details. By contrast, compared with conventional deep convolutional neural networks, generative adversarial networks might be better in balancing between erasing different types of noise and recovering texture details. However, they often generate fake details and unexpected artifacts in the image owing to the instability of their discriminator during training. In this study, we explored an innovative strategy for handling the serious problems of image denoising. With this strategy, we propose a novel boosting generative adversarial network (BoostNet) that not only combines all advantages of a generative adversarial sub-network and a deep convolutional neural network, it also successfully avoids the serious problems caused by the corruption and instability of training. BoostNet is developed by integrating a stand-alone deep convolutional neural network and a robust generative adversarial network into an ensemble network, which can effectively boost the denoising performance. We conducted several experiments using challenging datasets of additive white Gaussian noise and real-world noisy images. The experimental results show that our proposed method is superior to other state-of-the-art denoisers in terms of quantitative metrics and visual quality. Our source codes and datasets for BoostNet are available at https://github.com/ZeroZero19/BoostNet.git.


I. INTRODUCTION
Image denoising plays a vital role in the image processing field because it is a crucial step in numerous practical applications [9], [13], [24], [40], [42], [43]. The development of deep convolutional neural networks (DCNNs) [1], [4], [16], [26] has led to a notable change in image denoising. However, despite the dramatic achievement of DCNN, these The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan . methods cannot recover important details in the case of an overestimated noise level. DCNNs have mainly focused on a minimization of the mean squared reconstruction error, which results in high peak signal-to-noise ratios. Nevertheless, DCNN-based methods are unable to tackle the problem of losing details. High-frequency details can be completely corrupted by seriously random noise at high levels. For this reason, many researchers have recently been focusing on algorithms based on generative adversarial networks (GANs), which can effectively recover the high-frequency details of VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the original image. These methods can generate images that look more natural than those obtained by any deep convolutional neural network. These GAN-based methods were mainly based on the extremely common win-lose strategy of the game theory. A GAN comprises two networks, a generator and a discriminator, which compete with each other in a recurrent zero-sum game setting. The generative model is trained to fool the discriminator, whereas the discriminator avoids being fooled by distinguishing between a fake image created by the generator and a real image. Although a GAN can generate acceptable natural images, the main challenge in the training of a GAN is stabilizing the performance of the discriminator, which is highly unstable and frequently results in mode collapse. GAN is based on a compelling assumption that the discriminator is optimal at each step. Arjovsky et al. [3] showed that the loss function of the discriminator frequently approaches either zero or volatile gradients. Consequently, traditional GAN-based methods are unable to find the optimal solution, whereas the training of the generator quickly stops because the loss of the discriminator dramatically decreases to zero.
To improve the performance of GANs, researchers have paid attention to applying new loss functions and advanced discriminators such as a LSGAN [29] and a WGAN [2]. These loss functions have shown promising approaches leading to an optimal solution for the GAN. Other researchers have focused on employing integral probability metrics (IPMs) to prevent the discriminator from converging too quickly [34] such as a WGAN [2], a WGAN-GP [12], a Sobolev GAN [32], or a Fisher GAN [33]. In fact, the IPM constraints can relatively improve the denoising performance of a GAN. However, the problem of the instabilities of a GAN has not been completely solved. This is because the loss function of the discriminator cannot accurately measure the distance between the images generated by the generator and its corresponding ground-truth image in the training space of the discriminator. Recent discriminators used popular functions of the probability distance that allows a sequence of distributions to converge more easily. Nevertheless, these distance functions also make the discriminator highly unstable. This problem can be seen more obviously when the discriminator is trained using high-dimensional training samples. In a high-dimensional space, the distance function does not precisely reflect the similarity between the fake image and its ground-truth image. This is illustrated in Figure 1, in which the output feature vectors generated from the generator and the clean images are perfectly mapped into two classes of real and fake samples in the training space of the discriminator. Nevertheless, the distance between a pair of real and fake images is not proportional to the similarity between them, as described in detail in [29]. Thus, in highdimensional spaces, the discriminator frequently generates inaccurate and unstable estimations of the density ratio during training, whereas the generators are unable to learn the multimodal structure of the target distribution. Compared to traditional DCNNs, GAN-based methods can capture better texture details; however, they also create more edges and texture artifacts in some local areas of the image and have relatively lower peak signal-to-noise ratios. Figure 1 shows the feature vector generated by DCNN denoted by G S and the feature vector generated by the generator denoted by G F . Here, although G F is classified into the class of real images, it is not close to the coordinates of the clean image owing to the instability of the discriminator. By contrast, because DCNNs often use the average of the squared error (MSE) loss function to precisely measure the similarity between the denoised and clean images, G S , which is mapped within the training space of the discriminator, shows a closer position to the clean image than G F . In addition, G S can be classified into the class of face images owing to the over-smoothing problem. Overcoming the drawbacks of GAN-based methods is extremely important in practical applications, particularly in medical imaging, where the peak signal-to-noise ratios are the most significant criteria for evaluating the reconstruction performance.
In general, DCNN-based and GAN-based methods have both advantages and disadvantages. In this paper, we address the problems of GANs and DCNNs through the development of a novel method, namely, BoostNet, which combines all the advantages of DCNN-based and GAN-based methods while avoiding their disadvantages. BoostNet consists of three hierarchical generators: G S , G F , and G T . Here, G F is used to tackle the problem of losing edges and texture, whereas G S focuses on increasing the high peak signal-tonoise ratio (PSNR). In addition, G T utilizes the denoised images from G S and G F as inputs and generates a synthesized image which is close to the ground-truth clean image.
The BoostNet plays an important role in developing mobile robots because mobile robots frequently operate in the airports, supermarkets, or other public sectors. In such environments, images are degenerated due to unknown noise, which is the most challenging problem in face recognition and detection systems on mobile robots. Moreover, deep learning-based methods used on mobile robots drastically decrease when images include realistic random noise. Thus, the BoostNet is highly applicable for face recognition [40] and other related tasks of mobile robots. The BoostNet is also suitable for medical image reconstruction and segmentation. Interestingly, fluorescence microscopy plays a vital role in the improvement of recent biological research [48]. However, fluorescence microscopy images are deteriorated by the Poisson-Gaussian noise. Thus, the BoostNet is also helpful to restore fluorescence microscopy images damaged by Poisson-Gaussian noise. BoostNet possesses potential benefits for denoising magnetic resonance images. This is because BoostNet can outperform other state-of-the-art methods in terms of noise suppression and detail preservation, including the margins of the tissues, low-contrast lesions, and other important edges. Our contributions are summarized as: • BoostNet was proved to be effective in eliminating unpredictable artifacts caused by the instabilities of the discriminator in a GAN-based method. BoostNet can retain the most important details while removing the sophisticated noise.
• BoostNet also demonstrated that in comparison to the state-of-the-art DCNNs, it achieved not only a similar high peak signal-to-noise ratio but also preserved the high-frequency details more effectively.
• For further improving the reconstruction performance, we developed a new perceptual loss based on feature maps of the densely connected convolutional network (DenseNet) [17].
• BoostNet exhibits high reconstruction performance on both noisy synthetic images degraded by additive white Gaussian noise (AWGN) and real-world noisy images.
It generates more realistic images than regular DCNNs and GANs.  [45] proposed the DnCNN method to tackle the problem of blind Gaussian denoising with a wide variety of noise levels. DnCNN can produce visually good results within a smooth region, but it fails to restore high-frequency details. Other DCNN-based methods, such as TNRD [7], Noise2noise [24], MWCNN [27], BM3D-Net [39], MemNet [44], and FFDNet [46] have also been proposed, which utilize the effects of the convolutional network depth to increase the denoising accuracy. Noise2noise can restore noisy images by only training corrupted examples without using clean data, specific image priors, or likelihood models of the corruption. FFDNet can outperform other DCNN-based methods, such as DnCNN and MWCNN, if an adequately high level of noise is manually set. FFD-Net can deal with different types of noise, such as JPEG lossy compression noise and structured noise. Nonetheless, FFDNet performs poorly in removing real non-AWGN noise and other types of unknown noise. Inspired by FFDNet, CBDNet [14] employed a noise estimation sub-network to predict a real-world noise map. Notably, benefitting the noise estimation sub-network, CBDNet consistently outperformed FFDNet in eliminating real-world noise. Nevertheless, CBD-Net tends to over-smooth the noise and remove details in complex local areas of an image because its realistic noise model does not completely match the real-world noise model. Stateof-the-art denoising methods, such as FFDNet and CBDNet, cannot continually modulate restoration levels and restore VOLUME 9, 2021 noisy images at unseen levels. Thus, He et al. [16] proposed a new DCNN-based method using adaptive feature modification layers (AdaFMlayers), which can generate denoised images at continuous and unseen levels. This method can handle an arbitrary restoration level without training a new learning model.

B. GENERATIVE ADVERSARIAL NETWORK-BASED DENOISERS
To deal with the problem of DCNN-based methods, Lin et al. [25] proposed an advanced GAN-based method to solve two principal tasks: removing real-world noise and preserving details. This method adopted a visual attention map to assist the generator and discriminator networks in paying attention to local noise areas. Similarly, Chen et al. [6] also used a GAN to generate paired training images by learning the noise distributions and estimating a similar type of noise. Despite the lack of ground-truth data, this method can improve the denoising performance of real-world applications. Although both GAN-based denoisers can learn complex distributions, they are unable to solve the challenging problem of the instabilities of discriminator as mentioned in [31]. The discriminator occasionally generates sharp gradients around the real details of the denoised image. Our proposed BoostNet was developed to solve this vital problem, and we will present this method in the next section.

III. PROPOSED METHOD
In this section, the proposed BoostNet is presented in detail. Our BoostNet is comprised of five sub-networks: two generators, G S , G F ; a discriminator, D F ; a boosting network, G T ; and a noise estimation sub-network denoted by EstNet. Figure 2 shows the roles of each subnetwork and how they can work together. Both generators, G S and G F , are DCNNs for denoising images. In addition, G F is trained together with D F to improve its ability to denoise the corrupted image and preserve the details. The advantage of G F is its ability to tackle the problem of losing high-frequency features, such as edges and texture, by constantly competing against D F in a repeated zero-sum game setting. By contrast, G S is trained independently and does not need to compete with any network. The strategy of G S is to avoid the effect of instabilities by the discriminator and focus only on increasing the peak signal-to-noise ratios. In general, G S and G F have different strategies and use different criteria to evaluate the reconstruction performance and none of them is totally better than the other. For this reason, G T is employed to help G S and G F cooperate more effectively and boost the reconstruction performance. In addition, G T uses the denoised images from G S and G F as inputs and generates a synthesized image in which its PSNR is close to that of G S and its details are recovered more accurately than those of G F . Figure 1 shows the position of G T , which is closest to the position of the clean image.
EstNet is used to suppress the problems of an overestimation and an underestimation of the noise level. This is because DCNN-based denoisers only work well with a specific known noise level. These methods are over-fitted to such noise levels based on their memorizing ability. However, they may not work well with unknown noise in natural images, which is significantly more challenging. When the noise is unknown, the DCNN-based denoiser faces difficulties in balancing the trade-off between removing noise and preserving high-frequency features. For this reason, EstNet plays a key role in improving the accuracy of G S , G F , and G T , which are DCNN-based denoisers. To eliminate unknown realistic noise, EstNet provides an effective noise estimate map for the generators, G S and G F . In addition, EstNet also allows the user to manually adjust noise levels to improve the reconstruction performance.
Most of the existing denoisers only focus on increasing the PSNR, which is a common measure used to evaluate and compare denoising accuracy rates. However, this type of estimate is found to be extremely limited to evaluating the ability to preserve high-frequency details and low-contrast features, which are also important information in realistic applications such as medical imaging or object segmentation. For this reason, the use of the sole PSNR does not sufficiently evaluate the denoising performance. As a result, in this study, we used two generators, G S and G F , which use different types of optimization strategies. Similar to the state-of-theart DCNN-based denoisers, such as a DnCNN [45], and an FFDNet [46], G S has an optimization strategy for increasing the PSNR value as much as possible, which is one of the most important criteria for selecting a good denoising model. By contrast, G F is used to balance the trade-off between the alleviation of noise and preservation of details. For this reason, G F is trained to compete constantly against D F in a repeated zero-sum game setting. In addition, D F plays a key role in driving the denoised image, generated by G F , toward the natural images.
Because G S and G F employ different minimization strategies for denoising images, each generator has its own advantages and disadvantages. Although G S can achieve high PSNR values, it also generates an over-smoothing effect that leads to the loss of high-frequency details and low-contrast features. By contrast, compared to G S , G F can preserve more textures and edges but has significantly lower PSNR values owing to the instability of the D F which potentially generates unreal details and features. Therefore, both G S and G F tend to proceed towards various local minimum costs instead of converging toward the global minimum cost. For this reason, our proposal aims to continually develop G T which utilizes the denoised images from G S and G F as inputs and generates a synthesized image. In addition, G T is trained to generate the synthesized image that is close to the best-denoised image, which corresponds to the global minimum cost. Each sub-network is explained in detail in the following sections.

A. REALISTIC NOISE MODEL
To tackle the problem of real-world image denoising, a noise estimation sub-network has been employed to model unknown noise, including Poisson noise associated with photon sensing and Gaussian noise generated in the sensor. Our model of generic signal-dependent noise observation can be expressed as follows: where u ∈ U is the pixel position in the domain U, z : U → R is the observed image, y : U → R is the original image that is unknown, υ : Z 3 → R is the zero-mean independent random noise, and σ : R → R + is the standard deviation of the overall noise component. Because the current challenging datasets of real-world images are insufficient for training a realistic noise model, represented by Eq. (1), the model of generic signal-dependent noise observation can be simplified using a noise level map M(σ ), where σ is the noise level.
In general, EstNet can estimate a denoised imageŷ from a noisy image z. After separating the noise from z, we can estimate the noise levelσ (z) based on the amount of separated noise. Here z is generated by adding noise into its ground-truth counterpart y. Ground-truth images were only available during training. During training, σ (z) is acquired by applying a Gaussian filter to y. To train EstNet for blind denoising, a range of noise levels is set as σ ∈ [0, 75], and all training images were divided into training patches with a size of 50 × 50. EstNet was trained as a modified deep residual network [15]. For training N image pairs y i and z i with i = 1, .., N and building a Gaussian noise estimation model, EstNet is optimized by minimizing the following loss function: whereŷ i denotes the i th denoised image generated by EstNet, and y i is the corresponding ground-truth image. Because BoostNet was also developed to eliminate real-world noise that is more complex than artificial Gaussian noise, we adopted the method of computing the asymmetric loss and the total variation loss for the real-world noise estimation task, which was described in detail in [14].

B. GENERATORS FOR DENOISING IMAGES
Deeper networks perform well, but they frequently face the gradient vanishing problem. Therefore, our consecutive generators for denoising images are built with the noise estimation network in the first stage, whereas G F and G S are trained in the second stage. In practice, the noise estimation sub-network lacks flexibility and is trained on a wide range of noise levels. A deep learning network can attain a competitive denoising performance only when the noise level is specifically estimated. By contrast, G F and G S regard the noise level map that is robustly estimated by EstNet as an input. Thus the training parameters of G F and G S are insensitive to the noise level. As a result, these sub-networks can invariably outperform the noise estimation network in the reconstruction and refinement of noisy images. The introduction of a noise level map as an input naturally leads to the expectation that the performance of the model will be high when the noise level map matches the ground-truth noise level of the noisy input. In addition, the noise level map should also play a role in regulating the trade-off between noise reduction and the preservation of details. Because the noise level is estimated from EstNet, G F and G S are non-blind denoising networks that are utilized to produce denoised images from a noisy image z. The training of these models with numerous data units, based on Eq. (1), leads to the expectation that the model will perform well when the noise level and ground-truth level match each other. Specifically, both G F and G S are developed from a modified deep residual network [15], respectively, which are similar to those of EstNet. However, there is an additional input in G F and G S , that is the noise level map M. Similar to the training process of EstNet, N image pairs z i and y i with i = 1, .., N are trained through our process.

1) TRAINING OF G F
The training of G F is complicated because it is designed not only to preserve more textures and edges but also to have a high PSNR. The optimal solution to the following minimization problem must be determined: whereˆ G F is the optimized training parameter of G F . In this study, a perceptual loss L total G F is specifically designed to model the distinct and desirable characteristics of denoised images, and is a weighted integration of three types of loss components, namely, the MSE loss function from G F , the perceptual loss function from DenseNet [17], and the adversarial loss function from D F . More details of the individual loss functions are provided in the following.
To improve the denoising performance of G F , D F is employed to differentiate between the denoised image produced by G F , which is considered as a fake image, and the ground-truth image, considered as a real image. In other words, in each span, G F attempts to produce a reconstruction to fool D F whereas D F is circumvented from being fooled. By training D F to maximize the probability of correctly assigning true or false labels to images, we aim to address the following adversarial min-max problem: where θ D F is the training parameter of D F . In addition to the delineated content losses, the generative component of our GAN was also added to the perceptual loss. It stimulates our network to focus on the solutions that lie on the manifold of natural images, by attempting to fool the discriminator network. The generative loss L Ad is determined based on the probabilities of the discriminator D F (G F (z)) over all training samples as follows: where D F (G F (z) is the probability that the denoised image G F (z) is a clean ground-truth image. Because the adversarial loss on its own is insufficient for a correct reconstruction, we utilized two kinds of content loss to attain better reconstruction from the generator G F . The content losses are the MSE loss function and the DenseNet loss function, which is computed based on the features extracted from the dense convolutional network [17]. We use the MSE loss function, which is the most utilized function, for reconstructing a noisy image. The calculation of the pixel-wise MSE loss applied to G F is presented as follows: (6) where W and H are the size of z, and C is the number of patches taken out from z. The minimization of the MSE loss L MSE (G F ) allows G F to maximize the PSNR value. However, optimization of the MSE leads to a high frequency content, for example, the texture and details of the image. To circumvent the loss of the image texture and details, the DenseNet loss function, which is closer to perceptual similarity, is used to optimize the performance of G F . In fact, our DenseNet loss function is better than the VGG loss function presented in [38] owing to the fact that the feature map extracted from the DenseNet network contains much more diversified features and has richer patterns than that extracted from the VGG network. We built the DenseNet network with four dense blocks with 162 convolution layers as described in [17]. The features are extracted only from the feature map of the third dense block, in which low-and high-level features are properly accumulated. This is because each layer of the DenseNet network is able to store all collective information from all preceding layers. The computation of the DenseNet loss function applied to G F is: (7) where K = W u H u C u . Additionally, W u , H u and C u denote the dimensions of the feature map extracted from the DenseNet network. In total, we formulate the perceptual loss computed for G F as the weighted sum of the DenseNet loss L Q (G F ), the MSE loss L MSE (G F ) and the adversarial loss L Ad (G F ) as follows: where τ Q and τ Ad are regularization parameters. In this study, τ Q = 1 and τ Ad = 0.001.

2) TRAINING OF G S
The training of G S is similar to that of G F except for the absence of D F . The training of G S is based on the minimization problem below: where L total G S is the weighted sum of the DenseNet loss L Q (G S ) and the MSE loss L MSE (G S ). L total G S can be formulated as follows: where τ Q = 1. In addition, L MSE (G S ) and L Q (G S ) can be computed using Eq. (6) and Eq. (7), respectively.

C. BOOSTING NETWORK
Here, G T is developed to produce a reconstruction model in the form of an ensemble of two weak reconstruction models, G S and G F . In addition, G T is able to generate a synthesized image that achieves a high PSNR, preserves the true details, and removes the fake details and textures caused by the instability of the discriminator D F . To utilize the results from G S and G F to boost the reconstruction performance, the denoised images, G F (z i ) and G S (z i ) are concatenated and fed as an input into G T (z i ) to generate a better denoised image. In particular, G T is trained by minimizing the following cost function where L total G T is the weighted sum of L Q (G T ) and L MSE (G T ), respectively. In addition, L total G T can be formulated as follows: where τ Q = 1. Moreover, L MSE (G T ) and L Q (G T ) are also presented in Eq. (6) and Eq. (7), respectively.

IV. MODEL ARCHITECTURE A. GENERATORS
In this section, our neural network architecture, including four underlying sub-networks: EstNet, G F , G S , and G T , is presented in detail. The architectures of these sub-networks are quite similar, and are modifications of deep residual network [15], including a set of residual learning blocks (ResBlocks). Each ResBlock is composed of convolution layers (Conv) and batch normalization layers (BN) to address random corruptions because it allows the effective building of a deeper generator without losing information. Figure 3 shows the architecture of each ResBlock. In particular, a 2D convolution with a filter size of 3 × 3 and a stride of 2 are applied in each ResBlock.

VOLUME 9, 2021
For EstNet, G F and G S include a down-sampling layer in which each input image is first reshaped into four down-sampled sub-images. This layer is useful for enlarging the receptive field and accelerating the training process. In addition, an up-scaling layer is placed after the last convolution layer to rescale the feature map back into the original scale and generate the denoised image. Moreover, G F and G S also include a noise estimation map that accepts the estimated noise obtained from EstNet as an input.
The network architecture of G T is slightly different from that of the other generators, and does not comprise a down-sampling layer or an up-scaling layer. Instead, G T takes the denoised images from G F and G S as inputs. These input images are concatenated into the input layer of G T before passing through a large number of convolution layers. After the last convolution layer, a cross-channel pooling layer including 1 × 1 convolution kernels is applied to reduce the number of channels in the feature map and generate the final denoised image.

B. DISCRIMINATOR
The discriminator, D F , is designed to distinguish the ground-truth images from the denoised images that are created by G F . The instability of the discriminator included in recent training generative adversarial networks poses a considerable challenge. In the case of high-dimensional spaces, the discriminator often generates inaccurate and unstable estimations of the density ratio during training. In this case, the generator networks are also incapable of learning the multimodal structure of the target distribution. A clear description of the instability of the discriminator is presented in [31]. Thus, to stabilize the training of discriminator networks, we utilized a novel weight normalization method which is termed a spectral normalization [31]. Here, D F is constructed based on the six ResBlocks. To stabilize its training, D F can be regulated by precisely constraining the spectral norm of each convolution layer.

V. DATASETS A. TRAINING DATASETS
In this section, we use the challenging Waterloo Exploration database [28] to train noisy images. The dataset included 4,744 natural images, from which training patches with randomly cropped sizes of 50 × 50 were from randomly sampled images. We also augmented these patches by rescaling with various scale factors and random flipping. The pairs of noisy and ground-truth patches were generated by adding an AWGN of noise level σ ∈ [0, 75] to the ground-truth patches.
To prove the application of BoostNet in enhancing the performance of biological research [20], [21], we also adopted the fluorescence microscopy denoising (FMD) dataset [48] and an extended fluorescence microscopy denoising (EFMD) dataset to evaluate denoising methods. The EFMD dataset includes 60,000 real noisy microscopy images from the FMD dataset and 15,000 real noisy images of Drosophila from our laboratory. These images have a wide variety of imaging modalities, including confocal, two-photon, and wide-field imaging. Furthermore, the images show different biological samples comprising cells, zebrafish, Drosophila, and mouse brain tissues. Noisy images had five different noise levels, and the corresponding ground-truth images were collected using high-quality commercial microscopes. In these datasets, each noisy image at the l i noise level, with i = 1, 2, 4, 8, 16, was created by averaging i noisy raw images.
Based on the ADAM algorithm [18], which was used to minimize different cost functions, all of our networks were effectively trained. The learning rate of the trained networks was set originally to 1e-3 and progressively reduced to 1e-5. We set the weight decay of each convolution layer to zero. Finally, the number of epochs was set to 100 for each network. Our models were trained using the Pytorch open-source machine-learning library [35]. The evaluation was performed on a computer with a 3.3 GHz Intel(R) Core(TM) i7 CPU, 32 GB of RAM, and an Nvidia GTX 1080 Ti graphics card.
The BSD68 and Set12 databases are mainly used to evaluate a PSNR on natural gray images, and were used to compare the performance of BoostNet and its related competitors. The BSD68 dataset consists of 68 images of size 481 × 324, whereas the Set12 dataset includes only 12 images with a size of 512 × 512.
To evaluate the color images, the CBSD68, Kodak24 and McMaster datasets were suitable. Here, the CBSD68 is the color version of the BSD68 dataset, whereas the Kodak24 dataset includes 24 color images with a size of 768 × 512. In addition, the McMaster dataset was used for testing noisy color images, and comprises 18 images with a size of 512 × 512.
We used RNI15 [22] to evaluate our method and its competitors for the denoising of real-world noisy images with more sophisticated noise. Its competitors consist of FFD-Net and a state-of-the-art method for the blind denoising of real photographs, namely, CBDNet [14]. This dataset comprises 12 noisy images and does not include any ground-truth images. Therefore, we only compared the competing methods based on qualitative evaluation.
Finally, the FMD and EFMD datasets were randomly split into training and test images. The test set of the FMD dataset consisted of 240 images with five specific noise levels, as described in [48]. Similarly, the test set of the EFMD dataset includes 1000 images with five noise levels.

VI. EXPERIMENTAL RESULTS AND ANALYSIS
To demonstrate the effectiveness of BoostNet, we conducted two main experiments with seven challenging databases to analyze the advantages and disadvantages of BoostNet. The first experiment tested these sub-networks on natural images under the corruption of blind Gaussian noise at a wide variety of noise levels. This experiment is necessary because we can utilize sufficient and reliable datasets in which clean ground-truth images are available for measuring the quality and accuracy of each algorithm. In this experiment, the sub-networks and their roles were assessed and analyzed separately. Finally, in the last experiment, the proposed method was tested on a well-known dataset used for denoising natural images with unknown noise.

A. EXPERIMENTS ON AWGN REMOVAL
To demonstrate the effectiveness of BoostNet, in this section, we compare the results of state-of-the-art denoisers on noisy images corrupted using an AWGN. We conducted the first experiment on the BSD68 dataset to compare the PSNR of BoostNet with that of the competing algorithms including MLP [5], TNRD [7], BM3D [8], WNNM [11], MWCNN [27], DnCNN [45], and FFDNet [46]. In addition, BM3D, WNNM, and MLP were implemented with a single-threaded CPU computation, whereas TNRD, DnCNN, MWCNN, FFDNet, and BoostNet were applied on a GPU. We presented the quantitative evaluations of the algorithms in Table 2. Table 2 shows that BoostNet considerably outperforms the competing denoisers, including BM3D, WNNM, MLP, TNRD, DnCNN, FFDNet, and MWCNN, for a wide variety of noise levels on the BSD68 dataset. These results prove that BoostNet removes unknown noise more effectively than the competing methods and better preserves the details and texture in an image. Moreover, BoostNet can extract more discriminated features than DnCNN, FFDNet, and other denoising methods. By contrast, its competing methods tend to smooth out the details and textures in the image, as shown in Figure 4.
We demonstrate the effectiveness of BoostNet for denoising images on the Set12 dataset. We consider seven denoising methods, including MLP [5], TNRD [7], BM3D [8], WNNM [11], MWCNN [27], DnCNN [45], and FFDNet [46], in our comparison. We show a comparison of the algorithms in Table 1, the best results of which are marked in red. Table 1 shows that BoostNet is superior to other state-ofthe-art denoisers, including BM3D, WNNM, MLP, TNRD, DnCNN, and FFDNet for a wide variety of noise levels. Both DnCNN and FFDNet not only fail to eliminate intense noises in areas with a high noise level but also smooth out the details of regions with a low noise level. BoostNet has slightly lower PSNR values at some noise levels than an MWCNN because it is able to extract a large number of repetitive structures using a wavelet multi-scale convolutional neural network. However, an MWCNN is significantly slower in testing because its wavelet multi-scale convolutional neural VOLUME 9, 2021  network has many more training network parameters. In fact, increasing the number of ResBlocks is also able to improve the capacity in denoising images and achieves similar results to an MWCNN. Nevertheless, our GPU graphics card might run out of memory if the number of ResBlocks is further increased. Overall, the quality of the denoised images generated by BoostNet is similar to that of an MWCNN.
To evaluate BoostNet in color image denoising, we used the CBSD68, Kodak24, and McMaster datasets. Using these datasets, BoostNet is compared with state-of-the-art denoisers that are composed of CBM3D [8], DnCNN, and FFD-Net, as shown in Table 3. CBM3D was implemented with a single-threaded CPU computation, whereas the other methods were conducted on a GPU. Evidently, BoostNet consistently outperforms the other competing methods. The average PSNR value by BoostNet can be 0.27 dB and 0.23 dB higher than those by FFDNet and DnCNN on the CBSD68 dataset, respectively. On the Kodak24 dataset, the average PSNR value by BoostNet is 0.4 dB and 0.94 dB higher than those by FFDNet and DnCNN, respectively. Similar results can also be seen on the McMaster dataset, in which BoostNet is better than CBM3D, FFDNet, and DnCNN at different noise levels. Moreover, Figures 5, 6, 7, 8, and 9, show the qualitative evaluation of our sub-networks and FFDNet which is the state-ofthe-art denoiser for removing additive Gaussian noise. These visual comparisons indicate that BoostNet not only possesses the highest PSNRs but also preserves high-frequency details more effectively than FFDNet. By contrast, FFDNet smooths out the image details and textures and generates local blurry regions. Moreover, Figure 9 also shows the qualitative evaluation of BoostNet and FFDNet on the JPEG compressed images from the LIVE database [37]. In these images, Boost-Net also achieves the highest PSNRs and remarkably outperforms FFDNet. These visual comparisons indicate that BoostNet not only possesses the highest PSNRs but also preserves more effectively high-frequency details than FFD-Net. By contrast, FFDNet smooths out the image details and textures and generates local blurry regions. These results are extremely important in high-level computer vision tasks, such as object detection or object segmentation. In particular, applications in medical imaging, such as magnetic resonance imaging or X-ray imaging, strictly require denoising methods to eliminate image noise and preserve essential information. For example, a computer-aided diagnosis system cannot precisely detect potential cancer tumors if the margins of different tissues are smoothed out, or low-contrast lesions are eliminated. In these cases, BoostNet provides the best performance in terms of noise suppression and detail preservation.

B. EXPERIMENTS ON REAL-WORLD NOISY IMAGES
To further demonstrate the effectiveness of BoostNet, compared to its competing methods, we used one of the most challenging datasets of real noisy images for training and evaluating our method, namely, the RNI15 dataset [22].
First, we aimed at training a realistic noise model, EstNet, to provide the real noise estimation map for G S and G F . Because the accuracy of noise estimation can strongly affect the performance of the denoising algorithms, our EstNet was trained with a new dataset of real-world noise to precisely estimate the noise map for G S and G F . However, training a realistic noise model using a deep learning network is a challenging task because the deep learning network requires a large number of training pairs of a noisy image and a clean image, whereas most real-world noise datasets lack clean ground-truth images. Moreover, the denoising algorithms that are trained using only the dataset of training images corrupted by Gaussian noise might not perform well when they are tested on real digital camera images. For these reasons, collecting a large amount of noisy synthetic images and real noisy photographs is the most difficult task for building a realistic noise model. Inspired by CBDNet [14], we utilized a dataset of noisy synthetic images used for training CBDNet to build our own realistic noise model, EstNet. An explanation of how to build this dataset is presented in detail in [14]. The training process of EstNet is also similar to that of the noise estimation sub-network presented in [14]. After successfully training EstNet, G S and G F can use its noise estimation map as the input. Second, to evaluate BoostNet and its competitors, we used the RNI15 dataset for testing, which is composed of 15 real noisy images and did not include any clean ground-truth images. Thus, this dataset is highly challenging for recent state-of-the-art denoising algorithms. The results are shown in Figures 10 and 11.
In the RNI15 dataset, the images ''Frog'', ''Windows'', ''Pattern3'', and ''Boy'' show clear evidence that Boost-Net consistently outperforms the other denoisers in handling low-contrast structured noise. It is obvious from these images that BoostNet is not only successful in removing unknown noise but also precisely preserves low-contrast edges, lines, and patterns. In addition, BoostNet generates VOLUME 9, 2021 more acceptable natural images than the other denoisers. By contrast, FFDNet and CBDNet tend to over-smooth the noise and erase details in various local areas of the image. This is because these methods are built on the basis of a conventional deep convolutional neural network, which often leads to an overfitting problem when training is on a large number of images with various noise levels. Hence, they may fail to remove the unseen random noise from the natural image. Because EstNet is not able to precisely estimate the distribution of real-world noise, these networks tend to over-smooth real noise and at the same time wipe out local low-contrast details.
The ability of BoostNet in balancing natural noise suppression and detail preservation can be seen more clearly on the ''Audrey Hepburn'' and ''Movie'' images, which are corrupted through JPEG lossy compression and video noises, respectively. BoostNet performs much more effectively than FFDNet and CBDNet in terms of removing various types of noise and preserving high-frequency edges.
The ''Postcards'' and ''Flowers'' images are used for testing a genuine strong noise. For these images, BoostNet is the best denoiser because it can successfully control the trade-off between noise removal and texture preservation, whereas, to a great extent, FFDNet smooths out a portion of the lines and edges and makes them blurry.
Notably, the results shown in the ''Bears'', ''Pattern1'' and ''Dog'' images demonstrate that BoostNet is the most effective method in recovering high-frequency details and textures while noise is accurately eliminated. By contrast, FFDNet and CBDNet erase these high-frequency details and textures to a great significant extent.
Finally, the ''Glass'' image shows the most difficult noise to be handled for state-of-the-art denoising methods. The image includes the coffee and milk-foam regions, both of which are corrupted by spatially variant noise. Both regions show extremely low-contrast edges under a strong unknown noise. Once again, CBDNet and FFDNet fail to recover these edges, whereas BoostNet preserves them extremely well. Table 4 demonstrates that BoostNet is the most reliable approach in the field of denoising fluorescence microscopy images on the FMD dataset. Using this dataset, Boost-Net is compared with WNNM [11], DnCNN, FFDNet [41], and Noise2Noise [23]. The WNNM was applied using a single-threaded CPU computation, whereas the other methods were implemented on a GPU. BoostNet obtains a higher PSNR and structural similarity index measure (SSIM) than those obtained by G F and conserves the essential details more efficiently than G S . Figure 12 demonstrates the qualitative evaluations of the competing algorithms. BoostNet can capture perceptually important textures and essential details, whereas the other methods produce misleading artifacts and noise. Textures and details play a vital role in analyzing the biological effects and the results.

C. EXPERIMENTS ON FLUORESCENCE MICROSCOPY IMAGES
Poisson-Gaussian noise causes severe deterioration in fluorescence microscopy images of Drosophila cells. Thus,   we conducted experiments on the EFMD dataset to compare the state-of-the-art methods, including FFDNet [41], DnCNN [45], CBDNet, and Noise2Noise, which were implemented on a GPU. Table 5 shows that BoostNet outperforms its competing methods at all noise levels. Figure 13 shows that BoostNet is more reliable than G S and G F in balancing the VOLUME 9, 2021  elimination of a wide variety of noise and reconstructing features. Table 6 demonstrates the running time results of the stateof-the-art methods, including CDnCNN-B [45], TNRD [7], FFDNet [46], CBDNet [14], and BoostNet for denoising color images with a size of 512 × 512. Although BoostNet significantly outperforms FFDNet and CBDNet, its network includes more training network parameters. As a consequence, BoostNet is relatively slower than the state-of-the-art denoisers, including CBDNet and FFDNet. BoostNet has a limit in terms of running time, which is important in real-time computer vision or image processing applications.

VII. CONCLUSION
Inspired by state-of-the-art deep-learning-based denoisers, we developed an effective and accurate method, called BoostNet, which can boost the denoising performance of traditional DCNN-based methods and GAN-based methods while simultaneously avoiding their drawbacks. Extensive experimental results demonstrated that BoostNet is better at handling both unknown Gaussian noise and real-world noise compared to other state-of-the-art methods. In this paper, we demonstrated several key contributions to solving the recent problems of image denoising. First, our proposed method can handle the common problems of state-of-theart DCNN-based denoisers. Although these denoisers can achieve high PSNR scores, they often eliminate important details of the image. Compared to these methods, BoostNet is better because it can only obtain high PSNR scores. It also effectively preserves high-frequency details and low-contrast features. Second, we proved that BoostNet could address the problem of a GAN, which is the instability of the discriminator. The experimental results showed that BoostNet could remove noise, retain good features, and eliminate the unreal features generated by a GAN. In general, BoostNet takes all of the advantages of DCNN-based and GAN-based methods and perfectly avoids the disadvantages caused by the instability of the training.
BoostNet possesses potential benefits for improving the performance of image denoising in specific research areas, such as low-dose computed tomography imaging, magnetic resonance imaging, fluorescence microscopy imaging, and natural imaging. In such studies, a reliable denoiser must be assessed in terms of noise suppression and detail preservation. Finally, our method can be used to enhance the performance of image super-resolution and image deblurring. For the future development of our method, we aim to improve the reconstruction performance of BoostNet by using a new GAN-based method in which the instability of its discriminator can be solved more effectively than GAN-based methods using spectral normalization. Moreover, BoostNet has a limit in terms of computation costs, which is extremely essential in real-time image processing applications. Thus, we intend to reduce the computational cost of BoostNet in a future study. She is interested in developmental biology and focuses on studying how tissues achieve their final shape during development. She is currently a Postdoctoral Researcher in genetics and bioinformatics with Chung laboratory, Louisiana State University, where she works on salivary gland tubulogenesis in Drosophila embryos.