Image Denoising With Generative Adversarial Networks and its Application to Cell Image Enhancement

This paper proposes an image denoising training framework based on Wasserstein Generative Adversarial Networks (WGAN) and applies it to cell image denoising. Cell image denoising is a challenging task which has high requirement on the recovery of feature details. Current popular convolutional neural network (CNN) based denoising methods encounter a blurriness issue that denoised images are blurry on texture details, which is fatal for the cell image denoising. In this paper, to solve the blurriness issue, we first theoretically analyze the cause of the blurriness issue. Subsequently, an image denoising training framework with WGAN based adversarial learning is proposed. This training framework solves the blurriness issue by guiding the denoising network to find the distribution space of real clean images rather than the distribution space of blurry images and introducing feature information. Experimental results show that this training framework can effectively solve the blurriness issue and achieve better denoising performance than the state-of-the-art denoising methods. Meanwhile, the application of this training framework on cell image denoising also achieves satisfactory performance. Recovered cell images of this training framework are clear on feature details.


I. INTRODUCTION
Image denoising is a classic topic in low-level vision as well as an essential preprocessing step in many high-level vision tasks. In general, a given noisy image can be modeled as y = x + v, where x is the noise-free image, and v represents the noise. The task of image denoising is to recover the noisefree image x by removing the noise v from the given noisy image y. The existing denoising methods can be divided into two categories: image prior based methods [1]- [12], which obtain denoised image by processing the noisy image according to some prior knowledge about image; discriminative learning methods [13]- [24], which train a model to learn the mapping relationship from given noisy image to the denoised The associate editor coordinating the review of this manuscript and approving it for publication was Min Xia . image using a large number of pairs of images (noisy and clean). Nowadays, the discriminative learning methods are the most popular one, because they can automatically exploit and utilize more statistical characteristic of image through training and thus can achieve better denoising performance. Especially, in the discriminative learning methods, the CNN based methods are the most popular one now, because the CNN has features such as sparse connection and weight sharing, these features make the CNN based models easy to train and able to avoid the overfitting problem.
In literature, CNN based methods can also be divided into four categories. The first one is to model the noise v as one model and use the trained model to process the given noisy images y. Usually, the noise is modeled as an additive white Gaussian noise. For example, DnCNN [13] and FFDNet [25] model the noise as additive white Gaussian noise and directly VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ denoise the noisy images using their models trained for removing Gaussian noise. CBDNet [26] models the noise on the raw data of imaging sensors in-camera as heteroscedastic Gaussian noise, and trains their denoising network according to this explicitly defined model. Zhou et al. [27] tries to find the relationship between Gaussian noise and real-world noise, and to find a way to apply the Gaussian-noise-trained network to remove the real-world noise. This kind of method can quickly generate many pairs of images for training, but the trained model can not well remove the sophisticated realworld noise. The second one is the prior knowledge-based methods. Such methods train a denoising network according to some statistical laws. For instance, NOISE2NOISE [28] trains its denoising network using many different pairs of independently degraded images of the same scenes. This method is based on the statistical law, that L2 loss will guide a network to find the mean solution of all potential solutions (the mean solution will have weaker noise intensity). Furthermore, NOISE2VOID [29] offers a more simple solution in which a denoising network is trained only using many single noisy images of different scenes. According to the local similarity of the image, it takes the mean of the surrounding pixels of the objective noisy pixel as the corresponding clean pixel. This kind of method can overcome the need that many pairs of images are needed to train a denoising network. Still, their denoising performance is limited by the used prior knowledge.
The third category is the generative methods. This kind of method removes noise through two phases: noise modeling and supervised denoising. In the noise modeling phase, they are to model the real-world noise using the real-world noisy images and then synthesize many pairs of clean-noisy image pairs for supervised denoising. Like the above methods, the supervised denoising is to use the generated image pairs to train a denoising network to learn the mapping relationship. For example, GCBD [30] models the real-world noise using Generative Adversarial Networks [31], and synthesizes many pairs of clean-noisy images pairs by adding their generated noises to one clean image dataset. Comparing with the first kind of method, this kind of method can more accurately model the real-world noise, but the training process of this kind of method is also more complicated and so that it is not widely used.
The last category is to make a training dataset containing many pairs of clean-noisy images representative of real-world noise. This kind of method also can be divided into two subcategories according to the way to get the clean images. The first one gets the clean images by taking pictures on low ISO value and taking careful post-processing on the images, while on the contrary, gets the noisy images by taking pictures on high ISO value. The representative dataset is RENOIR [32]. The other one gets the clean images differently. They firstly take multiple photos of a static scene and then get the clean image by computing the mean of the multi images and taking post-processing on the obtained mean image.
Meanwhile, they take one of the multi images as the noisy image. The representative datasets are Nam [33], PolyU [34] and SIDD [35]. These made training datasets can be easily used to train a denoising network to remove the sophisticated real-world noise.
To make the denoising network to quickly and efficiently learn the mapping relationship from noisy image to the denoised image, the above four kinds of methods almost use the pixel-wise Mean Squared Error (MSE) as their loss function. However, the denoised images of these MSE based networks always are blurry on texture details, which is fatal for some noisy images with important feature details. In this paper, we will analyze the cause of this blurriness problem and propose a solution to solve this problem.
The major contributions of this paper can be summarized as follows: • We theoretically analyze the cause of the blurriness problem; • We propose an image denoising training framework with WGAN based adversarial learning; • Experimental results prove that the framework can effectively solve the blurriness problem and achieve better denoising performance than the state-of-the-art denoising methods; • The framework is successfully applied to the cell image denoising. The recovered cell images of this framework are clear on feature details.
The remainder of the paper is organized as follows. Section II and Section III introduce the problem this paper plan to solve and the relative works, respectively. Section IV presents our proposed method. In Section V, extensive conducted experiments are reported to validate the effectiveness of our method. In Section VI, our method is applied to cell image denoising. Section VII gives several concluding remarks.

II. PROBLEM STATEMENT
Taking the MSE between the output image and the corresponding clean image as the loss function of training, can quickly and efficiently guide a denoising network to find the mapping relationship from noisy image to the denoised image. But, because image denoising is a task with multiple solutions and the MSE will guide the denoising network to find the mean solution of the potential multiple solutions, the denoised images of the MSE-based denoising network always are blurry on texture details. Next, making a pixel as an example, more details about this problem will be introduced.
Because of the randomness of noise intensity, high possibility is that after being populated by noise, multiple different clean pixel intensities correspond to the same noisy pixel intensity. This means that during training, a noisy input pixel may correspond to many potential clean pixels. Here, we represent the noisy input pixel as y 0 and represent the corresponding clean pixels as x 1 0 , x 2 0 , x 3 0 , . . . , x n 0 . Besides, the 82820 VOLUME 8, 2020 S. Chen et al.: Image Denoising With Generative Adversarial Networks and Its Application to Cell Image Enhancement denoising network with trainable parameters is represented as F, and the output denoised pixel is indicated as F(y 0 ; ). The MSE between the denoised pixel and the corresponding multiple clean pixels and the expectation of the MSE are as follows Obviously, minimizing the E(MSE) by upgrading the parameters , the optimal F(y 0 ; ) will be obtained at which means, that the MSE will statistically drive the denoising network to search for the mean solution of the all corresponding potential solutions. As a result, the pixel intensities in the final solution space tend to change slowly, which leads to the recovered images in this solution space are showed blurry on texture details. So, this is the reason why the denoised images of the MSE-based methods always are blurry on texture details.

III. RELATIVE WORK A. SOLVING BLURRINESS PROBLEM
In the field of image denoising, there are little works to solve the blurriness problem. Most of the existed methods [15], [36], [37] solve the blurriness problem by introducing feature information from another trained highlevel network, to guide the training of the denoising network. For example, [15] concatenates a trained network for image classification behind its denoising network during training. It feeds the denoised image and the corresponding real clean image to the trained network, and use the back propagated loss information to train its denoising network. Because the trained network has high capacity of extracting features, the loss information carries the feature information that indicates the distance between the input denoised image and real clean image on features. By making use of the feature information, the denoised network can perform better on recovering texture details. However, the performance of these methods is still unsatisfactory since the established feature space (i.e., the trained network) can not provide sufficient feature information.
Because the high-level network is trained for a particular task, the trained network is only sensitive to some features, relative to its solved task. For the features not relative to its solved task, the trained network can not extract the features from the input images, and so that it can not measure the distance between the denoised image and the real clean image on the features. As a result, the trained network can not provide useful feature information to help the denoising network to recover the features better.
In this paper, after analyzing the cause of the blurriness problem, we propose a method to radically solve the blurriness problem by guiding denoising network to find the distribution space of real clean images rather than the distribution space of blurry images. The proposed method is an image denoising training framework with adversarial learning based on WGAN [38]. The WGAN has been proved that it has high capacity to fit the distribution of generated data to the distribution of real clean data. The WGAN is a variant of Generative Adversarial Nets (GAN) [31], and it needs to be explained that the reason why we employ WGAN instead of GAN in this paper is that it is difficult to train GAN (more details will be introduced in IV-A) and the WGAN can well solve this training difficulty problem. Next, before introducing the WGAN, the GAN will be briefly introduced firstly in next subsection.

B. GENERATIVE ADVERSARIAL NETS
GAN [31] is a training framework that generally consists of a generative subnetwork and a discriminative subnetwork. The former takes random low-dimension noisy data as its input, and targets at generating high-dimension images which should follow the distribution of given real images so that they can cheat the discriminative subnetwork. Whereas, the latter targets at accurately distinguishing the generated image from the real image. Their role in training can be looked as a minimax two-player game.
where p r and p n represent the distribution of real images and the distribution of input noisy data, respectively. For discriminative subnetwork, it is to maximize the above loss function and in contrary the generative subnetwork is to minimize the loss function. The two subnetworks will be alternately trained. After finishing training, the generative subnetwork can successfully learn the distribution p r and generate ideal outputs following this distribution. This is the reason why we introduce adversarial learning to guide the training of our denoising network. The adversarial learning can help our denoising network to find the distribution space of real clean images rather than the distribution space of blurry images. However, training a GAN is tricky and unstable [39]. Fortunately, the WGAN can well solve this training problem, and thus we apply it to our image denoising work. In the next section, we will briefly introduce the WGAN.

IV. PROPOSED IMAGE DENOISING TRAINING FRAMEWORK
To solve the blurriness problem with image denoising, we propose an image denoising training framework with adversarial learning based on WGAN. Also, we fine-tune the training details of WGAN to make it suitable for our denoising task. The overall structure of our framework is shown in Fig. 1. The framework can be divided into three parts: generative subnetwork, MSE-based learning, and adversarial learning. The generative subnetwork is to learn the mapping relationship from a noisy image to the denoised image. MSE-based learning is to guide the generative subnetwork to quickly and efficiently learn the mapping relationship. The adversarial learning is to solve the blurriness problem existed in MSE-based learning by guide the generative subnetwork to find the distribution space of real clean images, not the distribution space of blurry images. Next, we will firstly introduce the WGAN and the relative adversarial learning. Subsequently, the details about the generative subnetwork will be described. Finally, we will enumerate the training details of this image denoising training framework.

A. WASSERSTEIN GENERATIVE ADVERSARIAL NETWORKS
In GAN, the discriminative subnetwork and the generative subnetwork will be alternatively trained, and the task to train the generative subnetwork is equivalent to minimizing the Jensen-Shannon (JS) divergence between the distribution p r of real data and the distribution p g of generated data (from the generative subnetwork) [31]. Theoretically, by minimizing the JS divergence, the distribution p g can be transformed to the distribution p r . Actually, the training easily falls into gradient vanishing trap and thus the generative subnetwork can not be continuously updated. The reason is that there is no nonnegligible intersection between the two distributions at the beginning of the training, and so that the JS divergence is a constant, as a result, there is no gradient information can be backpropagated to upgrade the generative subnetwork [39]. Therefore, it is necessary but difficult to carefully balance the trainings of the two subnetworks to avoid the gradient vanish.
WGAN [38] uses Wasserstein distance rather than JS divergence to measure the distance between the two distributions. The Wasserstein distance has better property than the JS divergence. Even if there is no nonnegligible intersection between the two distributions, Wasserstein distance can still well measure the distance between the two distributions. Therefore, using Wasserstein distance as metric can well avoid the gradient vanishing problem. In WGAN, it is unnecessary to carefully balance the trainings of the two subnetworks, and the training becomes easy.
In WGAN [38], the Wasserstein distance is approximated as To satisfy the regularization, the weight of discriminative subnetwork D is clip to a certain range. Subsequently, in [40], a gradient penalty term is proposed to replace the weight clip, because the weight clip would lead to gradient vanishing or exploding. The final loss function can be formulated as: where the former two terms indicate the Wasserstein distance estimation and the third term is a gradient penalty term to regularize the discriminative subnetwork D. The z in the gradient penalty term is sampled uniformly along straight lines between the input real data x and generated data G(y). For the discriminative subnetwork, it is to maximize the above loss function for accurately estimating the Wasserstein distance between the distribution p g and the distribution p r . In contrary, the generative subnetwork is to minimize the above loss function for minimizing the Wasserstein distance so as to make the distribution p g fit the distribution p r . Through training, the p g will eventually fit the distribution p r .

B. ADVERSARIAL LEARNING
To solve the blurriness problem, we introduce adversarial learning based on WGAN. In this adversarial learning, the discriminative subnetwork is to estimate the Wasserstein distance between the distribution of denoised images and the distribution of real clean images. And for the generative subnetwork, by upgrading itself to minimize the Wasserstein distance, it can make the distribution of its output denoised images to fit the distribution of real clean images.

1) FREEZING INPUT
In the original WGAN, in each alternative adversarial training iteration, the inputs to the generative subnetwork are different when respectively training the generative subnetwork and the discriminative subnetwork. Differently, in our framework, the inputs will remain same, which can bring feature information when training the generative subnetwork.
In each alternative adversarial training iteration, the discriminative subnetwork will be optimized firstly. As we all known, CNN has strong ability on extracting features. After being optimized, the discriminative subnetwork can well discriminate the input denoised image from the real clean image on features. When in turn optimizing the generative subnetwork, if we still take the same noisy image as the input of generative subnetwork, the out denoised image will also be keep same. Because the discriminative subnetwork has been trained to well discriminate the denoised image from the real clean image on features, the loss information from the discriminative subnetwork can guide the generative subnetwork to perform better on recovering the features. In contrary, if taking a new input to the generative subnetwork, the output denoised image is also a new input for the discriminative subnetwork. Because the discriminative subnetwork has not been trained to distinguish the new denoised image from the real clean image, it actually can not discriminate them on features. As a result, the loss function from the discriminative subnetwork would not be able to perform as it does in the above case.

2) ARCHITECTURE OF DISCRIMINATIVE SUBNETWORK
Our discriminative subnetwork is designed based on the discriminator in [37]. We make some changes to the original one.
To reduce the complexity of the network, we remove some convolution layers. Besides, we use a max-pooling layer to replace the original two fully connected layers to reduce the dimension of input feature maps, which can significantly reduce the parameters of the network. It is worth noting that our experimental results show that the performance of using the max-pooling layer is better than the performance of using a average-pooling layer in this subnetwork. This is mainly due to the max-pooling layer that can efficiently extract the most important features to distinguish the denoised image from the real clean image correctly. According to GP-WGAN [40], the batch normalization layer would make the gradient penalty loss term invalid, so we remove all of the batch normalization layers. Besides, as the request of WGAN [38], we remove the sigmoid layer. The final architecture of our discriminative subnetwork is shown in Fig. 2. The functionality of this subnetwork can be divided into three parts. Firstly, the subnetwork extracts and learns the features of the input image through seven convolution layers with seven LeakyReLU activation functions. Then, one convolution layer with one 1 × 1 kernel is to fuse the learned features. Finally, one max-pooling layer selects the most important feature and output one value. The adversarial loss function to optimize this discriminative subnetwork can be formulated as where F(y i ; ) represents the denoised image from generative subnetwork; the function D(·) represents the discriminative subnetwork with trainable parameters ; the first two terms indicate the Wasserstein distance estimation between the distribution of denoised images F(y i ; ) and the distribution of real clean images x i ; the last term is a gradient penalty term for the regularization of discriminative subnetwork; the z in the gradient penalty term is sampled uniformly along straight lines between the input real clean image x i and denoised image F(y i ; ); the λ is the weight value of the gradient penalty term.

3) ADVERSARIAL LEARNING TO SOLVE BLURRINESS
The reason why our adversarial learning can solve the blurriness problem is in two-folds. Firstly, by making use of WGAN's great capacity on distribution fitting, this adversarial learning can help denoising network (i.e. the generative subnetwork) to find the distribution space of real clean images rather than the distribution space of blurry images. As we discussed in II, the MSE loss function will guide denoising network to find a solution space in which the pixel intensities change slowly. And this is the root reason why the denoised images of MSE-based denoising network are blurry on texture details. In this adversarial learning, the generative subnetwork will be driven to learn the distribution of real clean images so that the distribution of its output denoised images fit the distribution of the real clean images.
In other words, this adversarial learning can correct the oversmoothness solution space to the real clean images solution space. Secondly, as the previous works [15], [36], [37] that are introduced in III, this adversarial learning also can provide feature information for the training of generative subnetwork. The methods [15], [36], [37] connect a trained network behind the denoising network during training, using the feature information from the trained network to guide their denoising network to better recovere the features. In this adversarial learning, it also connects a network (i.e. the discriminative subnetwork) behind the generative subnetwork. During this adversarial training, the discriminative subnetwork will be trained firstly, which means that for the generative subnetwork, the discriminative subnetwork is also a trained network. Therefore, like the previous works, the discriminative subnetwork can also offer useful feature information to guide the training of the generative subnetwork. Besides, the discriminative subnetwork is not trained for one specific task, so it can be sensitive to more features than the previous works.

C. GENERATIVE SUBNETWORK
We design our generative subnetwork as an end-to-end network, which takes a noisy image as an input and produces the corresponding denoised image as output. Basically, it is inspired by super-resolution deep learning [37], [41] and residual learning [13]. Our generative subnetwork firstly learns the residual image and then obtains the desired denoised image by removing the residual information from the noisy input image. This residual learning can avoid the optimization difficulty to learn identity mapping [42] because the denoising process is approximately close to an identity mapping, especially when the noise level is low. The architecture of our generative subnetwork is shown in Fig. 3.
This architecture can be divided into two parts: residual learning and reconstruction. In the residual learning phase, one convolution layer with a PReLU function extracts the initial features of noisy input image. Then, a big residual block containing 16 small residual blocks (shown in green on Fig. 3) and a convolution layer removes the latent clean features. Later, three convolution layers with two activation functions reconstruct the remained residual features to get the final residual image. In the reconstruction phase, the denoised image is obtained by removing the residual image from the noisy input. The loss function to optimize this generative subnetwork can be formulated as follow: where l MSE represents the MSE loss from the MSE-based learning and l Adv represents the adversarial loss from the adversarial learning. α is the weight value to control the tradeoff between the two loss terms. The MSE loss l MSE can be formulated as: represents N noisy-clean training patch pairs and F(y i ; ) represents the output denoised image from the generative subnetwork that takes the noisy image y i as input.
The adversarial loss l Adv can be formulated as

D. TRAINING PROCEDURE
In this image denoising training framework, the discriminative subnetwork and the generative subnetwork will be alternately trained. We explain the training procedure in ALGORITHM 1. In the algorithm, the (X , Y ) is the training dataset with many couples of clean images X and noisy images Y . t is the iteration number and T is the total number of iterations. Function F(·) is the generative subnetwork and function D(·) is the discriminative subnetwork.

2) TRAINING DETAILS
For the model trained for Gaussian denoising, the learning rate of the first 30 epoches are set as 1 × 10 −4 and the later epoches are set as 1 × 10 −5 . For the model trained for realworld noise removing, the learning rates of the first 4 × 10 5 training times are set as 1 × 10 −4 and the learning rates of the later training times are set as 1 × 10 −5 . All models are trained using Adam optimizer in which the hyper-parameters β 1 and β 2 are set as 0 and 0.9 respectively. Specially, the weight value λ in Eqn. 6 is set as 10 and the weight value α in Eqn. 7 is set as 1.01.

B. PERFORMANCE OF PROPOSED IMAGE DENOISING TRAINING FRAMEWORK
In this part, we are to validate the effectiveness of our image denoising training framework. For convenience, we represent the denoising network trained by only using MSE-based learning as ID-MSE, and represent the denoising network trained by our training framework (i.e. the denoising network is trained by using MSE-based learning and adversarial learning (based on WGAN) simultaneously) as ID-MSE-WGAN. Specially, different from the ID-MSE-WGAN, the denoising network trained by using MSE-based learning and adversarial learning without freezing input is represented as ID-MSE-WGAN(WF).
The average PSNR of the ID-MSE, ID-MSE-WGAN(WF) and ID-MSE-WGAN on test dataset Set12 at noise levels σ = 25, 35, 50 are respectively shown in Fig. 4, Fig. 5 and Fig. 6. It can be seen that at each noise level,    Table 1, and the best are shown in bold. The relationship among the ID-MSE, ID-MSE-WGAN(WF) and ID-MSE-WGAN on dataset BSD68 is same as the relationship on the dataset Set12. The above results prove that our proposed image denoising training framework can help a denoising network to improve its denoising performance. Furthermore, we have additionally trained two denoising networks (ID-MSE-GAN and ID-MSE-P). The ID-MSE-GAN is trained by using MSE-based learning and adversarial learning based on GAN. The ID-MSE-P is trained by using MSE-based learning and perceptual loss (i.e. the loss function from a trained image classification network VGG19) as the previous works [15], [36], [37]. The average PSNR of this two networks on the test dataset Set12 at noise level σ = 25 are 30.44db and 29.89db, respectively. They are both less than our ID-MSE-WGAN (30.52db). The reason why our ID-MSE-WGAN can perform better than the ID-MSE-GAN is that the GAN based method is difficult to be trained and it easily falls into gradient vanishing trap. And the result that our ID-MSE-WGAN outperforms the ID-MSE-P proves that our solution can perform better on removing noise and recovering features than the previous solutions.
We also show two examples, respectively, from our ID-MSE-WGAN and the ID-MSE, to visualize the effect of our method on recovering texture details. The two examples are shown in Fig. 7. In the first example, it can be easily found that our ID-MSE-WGAN can well recover the regular texture details of starfish from the severely noisy starfish image. And by contrast, the denoised image from the ID-MSE is blurry on these texture details and even loses them. In the second example, faced with the irregular texture details on the butterfly wing, our ID-MSE-WGAN can still recover them better than the ID-MSE. These two examples prove that our image denoising training framework can effectively help denoising network to improve its performance on recovering texture details.
The results of all methods are shown in Table 2 and the best results are shown in bold. Specially, the CBDNet conducts experiments only on real-world noise removal not on Gaussian denoising. In the table, it can be easily observed that our proposed ID-MSE-WGAN significantly surpass the compared methods at each noise level. On Gaussian denoising, compared with the popular state-of-the-art denoising method DnCNN, our ID-MSE-WGAN has 0.08db, 0.17db, 0.18db gains on Set12 dataset at noise level σ = 25, 35, 50, respectively. On real world denoising, our ID-MSE-WGAN surpasses the DnCNN 13.48db. Compared with the second best one CBDNet, our ID-MSE-WGAN still exceeds it 3.86db. The above results show that our proposed image denoising method can achieve better performance than the state-of-theart denoising methods.

VI. APPLICATION TO CELL IMAGE DENOISING
At present, identifying and classifying cell images remains a challenging task. Because the features of cells are very subtle, it is difficult to recognize and extract the features. Worse, when the cell images are noisy, recognizing and extracting the features will be more difficult, because the features are impaired or even obscured by noise. And in fact, the cell images are effortless to be polluted by noise (the noise may come from the imaging process, image transmission process, etc.). Therefore, it is necessary to remove the noise from the noisy cell images before identifying and classifying the cell images.
Because of the importance of features on noisy cell images, it is a requirement for the selected denoising method that recovering well the features when removing the noise. However, the most current denoising methods can not well meet this requirement as we discussed in II. Fortunately, our proposed ID-MSE-WGAN can well meet this requirement, as we discussed in IV and validated in V. And now, we are S. Chen et al.: Image Denoising With Generative Adversarial Networks and Its Application to Cell Image Enhancement   to apply our proposed ID-MSE-WGAN to the cell image denoising.
We apply our method to cell dataset collected by [47] from the CellaVision blog. 2 This dataset contains 100 cell images with size 300 × 300. We randomly select 88 images out of 100 and randomly clip 128 × 1303 patches as our training dataset. And the remaining 12 images are used as our test dataset. Therefore, the test dataset is absolutely different from the training dataset. We train three models to remove the light, medium and high level of noise (σ = 25, 35, 50) respectively.    Five examples on light level of noise (σ = 25) are illustrated to show the effectiveness of our method on recovering features. In each example, we zoom in two feature details in particular. The first example is shown in Fig. 8. In the first place, there is a halo, which can observed in the ground truth one and the denoised image from our ID-MSE-WGAN. However, in the denoised image from ID-MSE, the halo is barely visible. In the second place, there is a fault structure. In the denoised image from our ID-MSE-WGAN, the fault structure can still be seen. But in the denoised image from ID-MSE, the fault structure has been lost.
The second example is shown in Fig. 9. In the first place, there is a longitudinally curved crack feature. In the denoised image from our ID-MSE-WGAN, the curved crack feature S. Chen et al.: Image Denoising With Generative Adversarial Networks and Its Application to Cell Image Enhancement still can be observed. But, in the denoised image from ID-MSE, the feature is lost. In the second place, there is a vesicle, which still can be clearly seen in the denoised image from our ID-MSE-WGAN. However, in the denoised image from ID-MSE, some features of the vesicle are blurry so that they can not be clearly seen.
The third example is shown in Fig. 10. In the first place, there is a transverse curved crack feature. In the denoised image from our ID-MSE-WGAN, the feature still can be clearly observed. But in the denoised image from ID-MSE, the feature is so blurry that it can not be clearly observed. In the second place, there is a moon-shaped feature, and the feature still can be extracted in the denoised image from our ID-MSE-WGAN. In the denoised image from ID-MSE, unfortunately, the feature can not be seen.
The fourth example is shown in Fig. 11. In the first place, there is a lateral fault structure. In the denoised image from our ID-MSE-WGAN, the feature is still clear. But in the denoised image from ID-MSE, the edge of the lateral fault is blurry and the feature becomes unclear. In the second place, there is a block. The shape of the block can be clearly and easily seen in the denoised image from our ID-MSE-WGAN. However, in the denoised image from ID-MSE, the edge of the block is blurry and thus its shape can not be easily distinguished.
The fifth example is shown in Fig. 12. In the first place, there is a small red block between the cracks. In the denoised image from our ID-MSE-WGAN, the small red block still can be clearly distinguished. But in the denoised image from ID-MSE, the edge of the block is so blurry that the block becomes a piece of the surrounding area, and it can not be distinguished. In the second place, there is also a longitudinally curved crack, and our ID-MSE-WGAN can well recover it but ID-MSE can not. From the above examples, it can be easily concluded that our ID-MSE-WGAN can well satisfy the requirement that removes the noise while recovering well the features in the cell images.

VII. CONCLUSIONS
In this paper, an image denoising training framework with WGAN-based adversarial learning is proposed. The main goal of WGAN-based adversarial learning is to address the blurriness problem caused by the pixel-wise loss function. We employed WGAN instead of primary GAN because it solves the training difficulty problem of GAN and improves GAN's performance. Moreover, to customize the WGAN to the image denoising application, the input to the generative subnetwork in each alternative adversarial training iteration is frozen. We validated our method based on three datasets, including Set12, BSD68, and SIDD, with Gaussian denoising and real-world denoising. The experimental results demonstrated that our proposed framework outperforms the state-of-the-art denoising methods and generates clearer and more realistic denoised images in terms of texture details. Keeping in view the importance of feature details restoration in cell image denoising, we applied our framework to cell image denoising. And results have proved that our method can remove noise and recover feature details very well simultaneously.