Image De-Noising With Machine Learning: A Review

Images are susceptible to various kinds of noises, which corrupt the pictorial information stored in the images. Image de-noising has become an integral part of the image processing workflow. It is used to attenuate the noises and accentuate the specific image information stored within. Machine learning is an important tool in the image-de-noising workflow in terms of its robustness, accuracy, and time requirement. This paper explores the numerous state-of-the-art machine-learning-based image de-noisers like dictionary learning models, convolutional neural networks and generative adversarial networks for a range of noises like Gaussian, Impulse, Poisson, Mixed and Real-World noises. The motivation, algorithm and framework of different machine learning de-noisers are analyzed. These de-noisers are compared using PSNR as quality assessment metric on some benchmark datasets. The best de-noising results for different noise types are discussed along with future prospects. Among various Gaussian noise de-noisers, GCBD, BRDNet and PReLU network prove to be promising. CNN+LSTM, and MC2RNet are most suitable CNN-based Poisson de-noisers. For impulse noise removal, Blind CNN, and CNN+PSO perform well. For mixed noise removal, WDL, EM-CNN, CNN, SDL, and Mixed CNN are prominent. De-noisers like GRDN and DDFN show accurate results in the domain of real-world de-noising.


I. INTRODUCTION
Image de-noising has played a pivotal role in recent years with the advent of many latest computer vision applications. The digital image is prone to noise corruption due to camera sensors, illumination level, transmission error, timing error of A/D converters, storage sensor faulty memory location, capturing medium, transmission channel interference, and compression artifacts. In biological imaging, low-light conditions and shorter exposure time degrade the image quality [1]. Image restoration is required in various fields such as medical imaging, remote sensing, underwater de-noising, dehazing applications [2]. The different medical imaging modalities like computed tomography (CT), magnetic resonance imaging (MRI), X-ray, PET, etc. use appropriate de-noising methods for proper diagnosis. Moreover, image pre-processing also includes de-noising procedure prior to The associate editor coordinating the review of this manuscript and approving it for publication was Mingbo Zhao . the medical image classification or segmentation problem to attain higher accuracy. Remote sensing de-noising restores relevant data from synthetic aperture radar images, satellite images, hyperspectral images, and underwater images.

A. TYPES OF NOISES
The noise classification is done based on its probability distribution function, correlation, nature, and its source. The different types of noise based on pdf are Gaussian, Rayleigh, Uniform, Impulse, Poisson, etc. According to the correlation, noise is classified into white and color noise. The white noise has uniform power spectral density and zero autocorrelation, unlike color noise. If an image is corrupted with white noise, it implies that all the pixels are uncorrelated with each other. It is additive or multiplicative (speckle) according to nature, i.e., noisy pixels are added or multiplied with the reference image. It is termed as quantization noise or photon noise as per source classification. The description of commonly used noise types is given as follows:

1) GAUSSIAN NOISE
It is statistical and additive in nature which follows normal distribution with zero mean and σ standard deviation and affects all the pixels in the image. The cause of its occurrence is sensor temperature fluctuation and environmental illumination variations. It is commonly found in magnetic resonance imaging, and confocal laser scanning microscopy imaging [3]. The probability distribution function of Gaussian noise is given by the following equation.
where x is image pixel value, µ is mean and σ is the standard deviation.

2) IMPULSE NOISE
It is an additive noise that occurs due to faulty sensors and transmission error. It affects only certain pixels in the entire image, unlike Gaussian noise. It is divided into two parts, i.e., salt and pepper impulse noise (SPIN) and random valued impulse noise (RVIN). In salt and pepper noise corruption, some image pixels take either maximum or minimum value of image dynamic range. Whereas RVIN corruption changes some image pixels with a random value, its detection is more difficult than salt and pepper noise detection. The salt and pepper impulse noise is given by [4] p where a and b are minimum and maximum pixel values of an image dynamic range. P a and P b are probabilities which are equal for salt and pepper noise.

3) POISSON OR PHOTON NOISE
The Poisson distribution is used to model photon noise caused by the photon's random arrival on the image sensor [5]. The applications of Poisson noise removal include astronomy, medical imaging, and low-light photography. The conditional probability of Poisson distributed image y for clean image x is given by [6] where i, and j denote pixel indices.

4) GAMMA NOISE
The speckle noise in ultrasound images occurs due to coherent imaging mechanisms from the scatters [7]. It reduces the image sharpness and creates difficulty for lesion diagnosis. It is modeled by Gamma distribution, whose probability distribution function is given by the following equation.
where parameters a and b are positive integers.

5) RAYLEIGH NOISE
The noise in synthetic aperture radar (SAR) images is granular in nature, and it is modeled by Rayleigh distribution [8]. Sometimes, ultrasound images are also prone to Rayleigh noise corruption. The Rayleigh distribution is given by the following probability density

6) CAUCHY NOISE
The atmospheric and underwater acoustic signals of radar and sonar imaging are corrupted with additive heavy-tailed impulse like noise, known as Cauchy noise [9]. The probability distribution function of Cauchy distribution is given by: where γ > 0 denotes the scale parameter, and δ ∈ R denotes the localization parameter.

7) MIXED NOISE
In many real-life applications, images are corrupted by more than one noise type. The mixture of Gaussian and impulse noise is found in computed tomography (CT) images and cDNA microarray imaging [10], [11]. The mixed noise in cDNA microarray imaging occurs due to photon and electronic noise interaction, dust particles on surface of glass slides, and laser reflection. In hyperspectral images, the combination of signal independent additive Gaussian noise and signal dependent multiplicative Poisson noise is found [12].

B. CLASSIFICATION OF IMAGE DE-NOISING TECHNIQUES
The image de-noising methods can be grouped into spatial domain techniques, transform domain techniques, fuzzy filtering-based techniques, and machine learning techniques [13], [14]. The block diagram illustrating the classification of image de-noising techniques is given in Fig.1.
The spatial domain filtering is widely used for image restoration in which filtering operation is directly applied to the image pixels. They are further divided into linear and non-linear filters. The most common linear filters are the mean filter, Gaussian filter, and Weiner filter. The basic mean filter replaces the particular pixel of operation with the mean value from the pre-defined neighborhood. Similarly, Gaussian filters use a Gaussian kernel with a particular mean and deviation. They suffer from the problem of over-smoothening and blurring of edges. To overcome this problem, Wiener filter was introduced but it is also unsuccessful while operating on sharp edges. Later, non-linear filters were introduced in which output is a non-linear function of input for edge, detail, and texture preservation. The primary examples of non-linear filters are total variation filters, anisotropic diffusion filters, bilateral filter, and fourthorder partial differentiation filter. The bilateral filter replaces pixel value with neighborhood weights which are function of both Euclidian distance and range difference [15], [16]. The detailed comprehensive review of impulse and Gaussian de-noising filters is given in [14].
The transform domain techniques convert the image into the transform domain, and then mathematical operations are carried out on transform domain coefficients. It is followed by inverse transform to restore de-noised image. These techniques are divided into data-adaptive and non-data adaptive techniques based on the transform basis function. The independent component analysis (ICA) and principal component analysis (PCA) are data-adaptive transform methods. The ICA is successfully utilized for non-Gaussian de-noising. PCA is a de-correlation method that transforms the original image dataset into the PCA domain and selects the most significant principal components (maximum Eigen-vectors) for image restoration [17]. Wavelet-based image de-noising is a multi-resolution image analysis technique that uses different mother wavelets such as Daubechies, Haar, etc., to obtain wavelet coefficients. It has been used to de-noise Gaussian, salt and pepper, and Poisson noise using the appropriate thresholding operator [18], [19]. In recent years, the most promising non-local means, collaborative filtering method in the transform domain is block-matching and 3D filtering (BM3D) [20]. In this approach, similar 2D image patches are compiled into 3D groups by the block matching process. The collaborative Wiener filtering is done in the transform domain on this 3D group. The improved versions of BM3D are given in [21], [22]. Curvelet filter is based on theory of multiscale geometry (i.e., position, scale, and orientation usage). It gives better de-noising performance on edges and borders than state-of-the-art wavelet de-noising methods [23]. It uses ridgelet transform as a primary step, and curvelet sub-bands are formed with a filter-bank structure formed by trous wavelet filters. The 2-D contourlet transform provides spatial and directional resolution to keep contours and details intact [24].
The image restoration using fuzzy-based methods considers the image as a fuzzy set and its pixel values as its member. Fuzzy-based filters use fuzzy rules to design membership functions by calculating the gradient's degree in various directions. The fuzzy impulse noise detection and reduction method calculate the gradient in eight directions for noisy pixel detection prior to the filtering [25]. In histogram fuzzy de-noising filters, the membership function is derived from the input histogram [26]. It consists of the fuzzy detection phase and cancellation phase. A detailed explanation of fuzzy-based techniques is given in [14], [27].
The image de-noising models can be grouped into analytical models (stochastic and deterministic) and machine learning-based models. In analytical models, forward de-noising model is explicitly known to the user, and the solution approach is used based on certain chosen criteria. The deterministic modeling of spatial filters is challenging for each image type. The edge deterioration and blurring are common artifacts in spatial and transform domain techniques. On the other hand, in the machine learning models, the inverse model is learned with the help of image datasets containing clean and noisy image pairs. The most important question arises: what is the relative advantage of machine (deep) learning approach over analytical methods? In deep learning models, computational burden exists in the learning phase, whereas the testing phase consists of a feed-forward model. Whereas analytical methods rely on a computationally demanding optimization process and heuristic selection of hyper-parameters which is not the solution for getting good de-noising results. It has been observed that machine learning models give superior performance compared to analytical methods, as feature learning makes a single model apt for considerable variation in the noise level.
Some de-noisers are based on an analytical optimization, which involves an iterative process based on some stopping criteria. Although, analytical optimization is involved but it cannot be directly categorized in the machine learning domain which is basically a numerical optimization problem. Some of the important analytical optimization methods are total variation regularization [28] and weighted nuclear norm minimization (WNNM) [29]. Variational-based methods find the appropriate priors such as low-rank priors, non-local selfsimilarity priors, sparse priors, low-rank priors, and gradient priors. WNNM assigns the weight to the singular value of an image and analytical optimization is done based on some energy function.
In recent years, there is a paradigm shift from analytical models to machine learning models owing to improved image quality assessment metrics. In the following section, machine learning-based image de-noisers are explained in detail. In this paper, the following convention is followed in explaining methods: y is the noisy input image, x is the clean image or ground-truth image, v is the noise component added x to generate y, and the final predicted de-noised image from de-noiser isx.

C. MACHINE LEARNING-BASED IMAGE DE-NOISING
The machine learning image de-noising techniques have made considerable progress with introducing benchmark datasets for a particular application, deep learning advancements, and increased computational power with Graphical Processing Unit's (GPU's). They are further broadly classified into sparsity-based dictionary learning models, multi-layer perceptron models, convolutional neural network-based models, and generative adversarial network-based models.

1) SPARSITY-BASED DICTIONARY LEARNING MODELS
In the sparsity-based techniques, every image patch is constituted as a linear combination of several patches from an overcomplete dictionary D. The image encoding is done with the coding vector α over a complete dictionary and l 1 -norm sparse regularizer on coding vector α, i.e., min α α 1 s.t. x = Dα, following a generalized model given by [30], [31]:α = arg min α y − Dα 2 2 + λ α 1 Here, λ is a sparseness-balancing regularization parameter, and α 1 is α s 1-norm. Another design of the model uses α 0 (α s 0-norm) in place of α 1 . K-Singular Value Decomposition (K-SVD) technique is the pioneering work that uses dictionary learning to frame the sparse representation model. The learning of this model can take place from the benchmark datasets as well as from the input image by K-SVD [32]. The K-SVD is the iterative process in which two consecutive steps take place sparse coding of the examples using the current dictionary and updating the dictionary atoms for optimum data fitting. Some other works as in [33], [34] follow the same workflow like that of K-SVD with variation in dictionaries and optimization problems. The clustering-based sparse representation involves a cost function (double header l 1 optimization problem) in which both structural structuring and dictionary learning is used as the regularizer. A typical sparsity-based image de-noising algorithm is given in Algorithm 1.

Algorithm 1
De-Noising Algorithm of Sparsity-Based de-Noiser [30] 1. Input: y, where y is the image observed in the noisy environment 2. Findx = Dα, where D is a sparse dictionary constructed to suit x,α is the sparsity constraint, and α is an unknown parameter, λ is a sparseness-balancing regularization parameter set according to: L = 1 2 y − Dα 2 2 + λ α 0 ( α 0 may be replaced by α 1 ) Such that L is as low as possible 3. Find the estimate of x according to the following: where is a small-value limiting parameter 4. Solve the above non-deterministic polynomial problem by using greedy pursuit or convex relaxation 5. Output:x

2) MULTI-LAYER PERCEPTRON MODELS
The multi-layer perceptron (MLP) network, as shown in Fig. 2, is the feed-forward model that maps the input image vector (y) with the output image vector (x) with several intermediate hidden layers. The general equation of MLP network with two hidden layers is given bŷ where w is the weight matrix, b is vector-valued bias, and the activation function is tanh, which operates componentwise. The stochastic gradient is used for training with noisy and clean image pairs. The parameters of MLP are updated by back-propagation, minimizing the mean-square error.
To increase the training efficiency, data normalization, proper weight initialization, and learning rate division is done. The noisy image is broken into overlapping patches, and each patch is de-noised separately. MLP estimates the de-noised version of the overlapping noisy patches, and then the average is calculated for overlapped de-noised patches [35]. There is an improvement in de-noising performance when de-noised patches are weighted by the Gaussian window. The MLP with four hidden layers uses time-series images and has shown significant improvement in keeping details and edges intact for SAR images [36]. The trainable non-linear reaction-diffusion model [37] is a feed-forward architecture that embeds a standard non-linear diffusion model in the neural network. The number of layers of MLP is less because of vanishing gradient compared to convolutional neural networks that limit their performance. A multi-layer perceptron de-noising algorithm is given in Algorithm 2. This algorithm is in accordance with Fig. 3, which depicts a single hidden layer MLP.

3) CNN-BASED DE-NOISING MODELS
In recent years, the convolutional neural network (CNN) based models have shown significant improvement in various VOLUME 9, 2021 FIGURE 2. Multi-layer perceptron network [35]. image quality metrics compared to other state-of-the-art methods [39]. The success of CNN models can be attributed to large modeling capacity and significant advancement in network training and design. The CNN is designed for grid or matrix kind of data as input taking inspiration from the visual cortexes of animals. In CNN models, the convolutional kernel with learnable parameters is shared across all image positions. The convolutional kernel can be visualized as a feature extractor for a particular image restoration application. The convolutional layers have a cascade connection, so extracted features become more complex, hierarchically, and progressively. CNN model consists of an input layer, series of intermediate hidden layers, and the output layer. The convolutional kernel with learnable weights is applied on each layer, followed by some activation function. The output of each layer is fed as the input of the next one.
where S j represents selection of the input feature map, FM l−1 i is the previous feature map, w l ji is the weight of the convolution kernel of the l th layer, A is the activation function which can be a rectified linear unit, sigmoid function etc. and b l j is the bias in the l th layer. The training procedure involves  optimizing parameters such as kernels by using clean and noisy image labels with stochastic gradient descent, Adam's algorithm, etc. The cost function optimization takes place during the training process. The mean square error between the clean image and its de-noised version is the fundamental cost function. Fig. 4 illustrates the basic architecture of CNN. Algorithm 3 gives CNN de-noising process.

4) GAN-BASED DE-NOISING MODELS
The generative adversarial network (GAN) uses generative modeling with two sub-models, termed generator and discriminator [41]. This network is designed to overcome deep generative model difficulty of learning complex probabilistic distributions. The generator model is used for extracting new plausible images from the problem domain, whereas the discriminator determines whether the generated image samples are real or fake. The discriminator model acts as an adversarial network. The main motive of the generator network is to obtain image samples that can disguise the represents the feature map of l − 1 layer, w l ji and b l j are weight and bias of l layer, A is activation function, M j is selection operator of feature maps.

Residual learning impliesx
represents residual learning CNN operator.

Loss function
where θ denotes CNN parameters, N are the number of images in the training dataset, y and x represent a noisy and clean image, R is residual learning.
discriminator network. Usually, the generator network maps the noisy image with the ground-truth, and the discriminator network uses the loss function to find the difference between the output image of the generator and the ground truth. The discriminator finds whether the predicted image by generator outputx = G(y) is real or fake. The other de-noising methodology involves the extraction of noise blocks from the input noisy images with the GAN. Thereafter, generated noise blocks along with clean images from a training dataset for CNN to produce de-noised output [42]. The following equation represents GAN objective function: is the real data distribution, p y (y) is the generated data distribution (i.e., input noisy image y), and E is the expected output. Fig. 5 shows the architecture of GAN for image restoration. Algorithm 4 gives GAN de-noising pseudocode. TABLE 1 gives advantages and disadvantages of different machine learning image de-noisers.

II. MACHINE LEARNING-BASED GAUSSIAN DE-NOISERS
The Gaussian noise de-noisers are used in many important applications such as MRI de-noising, optical coherence tomography images, natural images, etc. The CNN-based machine learning model has shown tremendous improvement in image quality assessment metrics as compared to other state-of-the-art networks. The CNN based de-noisers have excelled in Gaussian noise de-noising but real world image de-noising is still a challenging problem. The Gaussian de-noisers are being designed for adaptive white Gaussian, spatially variant Gaussian noise, and blind Gaussian noise. The two important benchmark datasets BSD-68 and Set-12 are used for comparative analysis for Gaussian de-noisers. The CNN-based models follow discriminative learning while GAN's are generative learning models. CNN models require supervised learning, i.e., the availability of noisy and clean image pairs in training datasets. In the recent works, we are progressing towards unsupervised learning due to the lack of clean-noisy image pairs for real-world applications. The training of deep learning models involves optimization of the loss function, which consists of data fidelity term and regularizer. The different de-noiser variants are designed by changing the loss function, the number of layers, training dataset size, activation functions, and so on.
The following section describes some benchmark machine learning models which are used for image de-noising: The dictionary learning models achieve sparse representation by updating the dictionary with the training images. The fixed dictionary is limited to a specific type of images, whereas atoms of the learned basis dictionary are empirically learned for any family of images. The learned dictionary provides more efficient image priors for Bayesian estimation as compared to the fixed dictionary. The Gaussian dictionary learning techniques are K-SVD [32], locally learned dictionary (KLLD) [43], non-local hierarchical learning with wavelets [44], mean-corrected atoms dictionary learning [45]. The K-SVD algorithm finds the best dictionary for the N image samples by solving the following sparsity equation in which dictionary D is initialized with l 2 normalized columns In the above equation, T 0 is the no. of non-zero entries in representation vector α. The two iterative steps are the sparse coding stage and the codebook or dictionary update stage. In the sparse coding step, pursuit algorithm is used for the computation of the representation or sparse vector α for each VOLUME 9, 2021 input image y, by solving the following equation The next step is the dictionary update stage. Out of K columns of the dictionary, i.e., K atoms, each atom is updated considering one at a time. The image input examples that use a particular atom are retained, the rest of the examples are discarded. The contribution of other atoms is also subtracted from the representation vector. Now, the overall representation error is minimized by singular vector decomposition (SVD) to update the dictionary. The flowchart of K-SVD algorithm is given in Fig. 6.
KLLD algorithm involves clustering, dictionary selection, and coefficient calculation. In clustering local, features capture the local structures of the image data. The next step that is dictionary selection is the optimization according to the clustering done in the first step. In the final step, coefficients are calculated for dictionary atoms, which are linearly combined subjected to the kernel weights. The non-local hierarchical dictionary learning is achieved by sparsity and multiresolution analysis of wavelets in each decomposition level. Recently, K-SVD algorithm is modified based on dictionary learning with mean corrected atoms, which outperformed Algorithm 4 Image de-Noising for GAN-Based Image de-Noiser [41], [42] 1. Input noisy image (y) to Generator G; p y (y) being the noise distribution.
2. Generator generates reconstructed image data G(y) with distribution p g (y) 3. Pass original image data (x) with distribution p x (x) and reconstructed image data G(y) to the discriminator 4. Discriminator D outputs the probability of the input belonging to the original data 5. G and D play a two-player minimax game in an adversarial setup, in which G and D try to minimize and maximize a value function based on binary cross-entropy function 6. Training of GAN: optimize the value function as min G max D V (G, D) a) Fix the learning of G: update generator parameters (θ D ) by gradient ascent using m data samples and m fake samples as For fixed G, V (G, D) will be maximum for b) Fix the learning of D: update discriminator parameters (θ G ) by gradient descent using m fake samples as where JS is known as Jensen-Shannon divergence. 7. Training of GAN ends when JS(p x ||p g ) becomes 0, i.e., p x = p g and min G max D V (G, D) − 2ln2 K-SVD in terms of PSNR. In [46], the authors propose to break up the noisy image into patches and treat the vectorized version of each patch as signals, thereby restricting the dimensionality of each atom in the dictionary. The size of the patch is chosen to allow for enough details of the underlying signal. Overlapping patches are chosen to reduce blocking artifacts that might result at the boundaries. Dealing with patches as signals, the K-SVD algorithm can be effectively scaled to de-noise large images. The algorithm is given in Algorithm 5.

B. METHODOLOGIES OF CNN-BASED MODELS (GAUSSIAN NOISE)
The de-noising CNN ( DnCNN) is the benchmark de-noiser which is being used for image restoration tasks like image Algorithm 5 De-Noising Algorithm of Sparsity-Based Dictionary Learning Models [46] 1 Input image De-noised image patches Z are obtained with the help of an optimization problem, which aims at minimizing the following cost function: . It is solved in terms of smaller optimization problems defined as: i.e., z i = R i Z, and β is a parameter that depends on noise variance 5. The cost function minimizes the error between the restored image and the input noisy image, under the assumption that each patch in the input image can be represented as a sparse linear combination of patches in the dictionary D 6. The closed form solution of the above optimization problem: super-resolution and JPEG image de-blocking apart from the Gaussian image de-noising [38]. DnCNN model overcomes the disadvantage of trainable non-linear reaction-diffusion (TNRD) and a cascade of shrinkage fields [47], which uses specific priors based on the analysis model. So, the priors fail to capture image structures effectively. Moreover, many handcrafted parameters are used during stage-wise greedy training in combination with joint fine-tuning. The additive noise (v) is combined with the clean image (x) to form the noisy image (y) . The DnCNN model uses residual learning with the batch normalization module. In residual learning, the CNN learns the noise component instead of a de-noised image. The residual learning model is given by R(y) ≈ v, and the desired output image is y − R (y). The batch normalization achieves faster training by mini-batch stochastic gradient by reducing the internal co-variate shift. It is implemented by normalization and scale + shift step before non-linearity in each layer. In the l depth network, the first layer is a convolutional layer with ReLU activation, which uses sixty-four filters (3 × 3 × no. of image channel) to give sixty-four feature maps as output. The intermediate layers are repeated units of convolution (sixty-four filters of size 3×3×64) and ReLU activation with batch normalization. The concluding layer is a convolutional layer which uses same number of filters as the number of image channels of size 3 × 3 × 64. The model gives same de-noising results with both stochastic gradient descent and Adam's algorithm by optimizing the following loss function: where denotes DnCNN parameters, N are the number of images in training dataset, y and x represent a noisy and clean image, R is residual learning. There are other de-noisers variants whose basic architecture resembles with DnCNN network. Wavelet de-noising CNN i.e., WDnCNN [48] uses residual learning in the novel feature space of the wavelet domain. In this method, the network is trained with four decomposed wavelet sub-bands, and the architecture is the same as that of DnCNN. SCNN [49] is residual learning-based model which uses soft shrinkage activation function for varying noise levels of the input image.
IDCNN [40] is another deep convolutional neural network that follows the same residual learning architecture as that of DnCNN without incorporating batch normalization. This network fails to converge with stochastic gradient descent because of the gradient explosion. So, this network clips the gradient in the specific pre-defined interval, i.e., gradient clipping procedure. It has been observed that network performance improves as the depth of the network increases from four to ten. In this model, a non-fixed noise mask is used during the process so that a single model can be used for different noise levels. The loss function of IDCNN is given by where x andx denote clean and estimated images respectively. The ECNDNet [50] is a residual learning model which follows the loss function given in equation (13). The architecture is the same as that of DnCNN. The main feature of the ECNDNet network is the usage of dilated convolution to increase the receptive field size. It reduces the computational cost and enhances the extraction of more context information. Batch Normalization Residual Network (BRDNet) also uses residual learning, batch re-normalization, and dilated convolution to address the problem of internal co-variate shift for extraction of more features [51]. Deep iterative down-up CNN (DIDN), densely connected hierarchial denoising network (DHDN) [52], and multi-level wavelet CNN (MWCNN) [53] are based on UNet [54] architecture which was designed for the semantic segmentation. The deep iterative down-up CNN (DIDN) [55] is also based on receptive field size variation for improving de-noising results. It consists of four stages: initial feature extraction, down-up block, reconstruction, and enhancement. The initial feature maps are extracted by convolution followed by iterative up and down sampling of feature maps by down-up block. The outputs of all the down-up blocks are fed into the reconstruction block, which has convolutional and parametric rectified linear units. The concatenated output of the reconstruction block is fed into an enhancement block with a convolution unit. DHDN network uses modified UNet architecture to learn a large number of parameters, solves vanishing gradient by residual learning and dense connectivity to convolution layers. In MWCNN, multiwavelet transform is integrated into UNet architecture to increase the receptive field size by reducing the resolution of feature maps.
The fast and flexible de-noising convolutional neural network (FFDNet) is the fastest in terms of implementation time, and it can handle spatially variant Gaussian noise [56]. The unique feature of this model is that unlike other networks, the mapping function contains a noise level map in the input. The noise level map plays a crucial role in keeping the trade-off between noise reduction and detail preservation. Conventionally, the mapping function learns de-noised images from noisy images, CNN parameters, and Gaussian noise standard deviation. In FFDNet, the CNN parameters are not affected with variation in Gaussian noise level. It works on downsampled sub-images, which tend to increase the receptive field. The architecture of FFDNet has the same units as that of DnCNN, i.e., convolutional operator in the first layer, repeated units of convolution, batch normalization, and ReLU activation, concluded with the convolutional layer. The Adam's algorithm [57] is used for training to minimize the following loss function.
where F denotes FFDNet learning function and M is noise level map. Some models like NN3D [58] and graph CNN [59] which exploits non-local and local similarities through non-local filter and graphical signal processing. The NN3D uses standard pre-trained CNN in cascade connection with standard non-local filter. The DnCNN, IDCNN, and FFDNet focus towards local features with biased receptive field. The NN3D integrates non-local features in a single modular framework to further improve de-noising performance. Similarly, graph CNN also exploits the non-local similarities by incorporating a graphical convolutional layer. The graph CNN layer works on feature maps to aggregate similar spatially adjacent and spatially distant pixels. The averaging of local and non-local pixels is done to produce the desired feature map. Universal Denoising network (UNet and UNLNet) is another network that integrates convolution and non-local filtering layers for both gray and color image denoising [60]. The models, namely PDNN [61], IRCNN [62], and DRUNet [63] integrate the observational model with deep CNN's discriminative learning. The model-based methods require several iterative steps to solve the optimization problem, but they are utilized to solve different image restoration tasks like de-blurring, super-resolution, and de-noising with the single model with the help of an image degradation matrix. They utilize the powerful de-noising capabilities of CNN and prior of the observational models in a single modular framework. In [61], [62], model based optimization is merged with robust image priors with variable splitting technique. The variable splitting reduces the number of CNN parameters and enhances the CNN training efficiency. The observation model is unfolded into discriminating CNN learning, which is composed of multiple de-noisers modules interleaved with back-projection (BP) that ensure the observation consistencies. DRUNet [63] is the improved version of IRCNN and its methodology involves usage of CNN as deep denoiser prior accompanied by half quadratic splitting based iterative algorithm for solving deblurring, super-resolution, denoising and color image demosacking.
Recently, the attention-guided de-noising convolutional neural network (ADNet) [64] has outperformed all previous CNN's. They are specifically designed to overcome the disadvantage of increment in network length. As the length of the network increases, the influence of shallow layers becomes weak in de-noising performance. It is divided into four major modules; sparse block (SB), feature enhancement block (FEB), attention block (AB) and reconstruction block (RB). The SB reduces the depth and improves the efficiency of the network with the usage of convolution and dilated convolution operator. It is twelve layer block with dilated Conv + BN + ReLU (second, fifth, ninth, and twelfth layer) and Conv + BN + ReLU in the rest of the layers. The next (13 th to 16 th ) layers form FEB to create robust features by merging global and local features. The first three layers of FEB is Conv + BN + ReLU, and the fourth layer is Conv. The output of the Conv layer and input noisy is concatenated to improve the representation capability further. It is followed by the usage of tanh activation for the non-linearity. The AB consists of just one Conv layer, which compresses the features into the weights to modify the previous layer output. RB is the final stage, which incorporates subtractor for the residual learning process. The architecture of ADNet is given in Fig. 8. The fully convolutional encoder-decoder structure with skip connections is also used for Gaussian and speckle noise removal [65].
The PReLU (parametric rectified linear units) and edge aware based CNN de-noiser is one of the latest works that has produced good PSNR results for both BSD-68 and Set-12 compared to other networks [66]. It is improvised DnCNN network with PReLU as an activation function which learns the slope in the negative direction as well. The inclusion of principal component analysis on the feature maps in sixteenth layer has led to the extraction of more features. The final step is cascading the network with an adaptive bilateral edge aware filter to further refine the edge and texture details.

C. METHODOLOGIES OF GAN-BASED MODELS (GAUSSIAN NOISE)
The GAN network given in [67] in uses DenseNet CNN as the generator network to ease up the vanishing-gradient problem and Wasserstein-GAN as the loss function. The generator network outputs an estimated ground truth image from the noisy image, whereas the discriminator eliminates the difference between the generator output and the ground-truth image. The generator network follows the architecture of DenseNet with eight Dense Blocks, along with input, output, and bottleneck convolution block. The generator extracts both low-level and high-level features efficiently. The discriminator network uses leaky ReLU as the activation function and layer normalization instead of batch normalization. It has eight convolutional layers and two fully connected layers, which assign a probability to generated images and ground-truth images. The value function of de-noising GAN network is given by where D is the set of 1-Lipschitz functions. The objective is to make an approximation of K * W (pdata (x) , p y (y)), in which K is a Lipschitz constant and W is a Wasserstein distance. The gradient penalty term is added so that the gradient of the discriminator network does not exceed K , and is given by Loss function is the combination of content loss and adversarial loss given as l = λl GAN + l content (18) where, content loss is given by l 1 or l 2 norm, and adversarial loss is given by Wasserstein-GAN critic function. The GAN-CNN based blind de-noiser (GCBD) model [42] extracts noise blocks from the clean images. The GAN produces a noisy block instead of the de-noised image. The noisy blocks extracted from the GAN are used for the creation of a training dataset for CNN. The GCBD model is a cascade connection of GAN followed by CNN. The generated blocks by GAN along with extracted noise blocks are used for training the discriminative learning based CNN. The GCBD can be used when there is an absence of paired data for the supervised training of CNN. It gives promising results for Gaussian noise, mixed noise, and real-world noisy images. The limitation is that noise is taken only as additive white noise with zero-mean.

III. MACHINE LEARNING-BASED IMPULSE DE-NOISERS A. METHODOLOGIES OF DICTIONARY LEARNING MODELS (IMPULSE NOISE)
Wang et al. have proposed an adaptive dictionarylearning-based method to preserve image structure in impulse-contaminated images with the help of a robust l 1 -norm data-fidelity term to help impulse noise cancellation [68]. In this algorithm, the restoration problem is mathematically formulated into an l 1 − l 1 minimization objective and solved under the augmented Lagrangian framework through a two-level nested iterative procedure. The algorithm has high image restoration power to produce restored images with a high PSNR value. Guo et al. [69] have introduced a novel algorithm to enhance image sparsity to help remove salt and pepper noise removal with a fast multiclass dictionary learning, and then both the sparsity regularization and robust data fidelity are formulated as minimizations of l 0 − l 0 norm for impulse noise removal. Additionally, a numerical algorithm of modified alternating direction minimization is derived to solve the proposed de-noising model. This algorithm excels in image detail preservation. Deka et al. in [70] have proposed a novel two-stage de-noising method for removing random-valued impulse noise from an image. In the first stage, an impulse noise detection scheme is used to detect the pixels which are likely to be corrupted by the impulse noise, viz., noise candidates. In the second stage, the noise candidates are reconstructed by the image impainting method based on sparse representation in an iterative manner until convergence is achieved. This algorithm works well in terms of both visual and quantitative aspects.

B. METHODOLOGIES OF CNN-BASED MODELS (IMPULSE NOISE)
Chen et al. have proposed a blind CNN architecture for random-value impulse noise (RVIN) removal [71]. This improvised de-noising mechanism for RVIN suppression works on the principle of flexible noise ratio prediction, which proved to be better than DnCNN-based RVIN suppression by eliminating unnecessary dependence on the exact perception of the noise ratio. Random patches are selected from the RVIN-corrupted test image and feature vectors that indicate whether the centre pixel is contaminated or extracted by the predictor. These feature vectors are composed of numerous statistics, viz., the multiple rank-ordered absolute differences (ROADs), the clean pixel median deviation (CPMD), and the edge pixel difference (EPD). They are rapidly mapped to noisy/clean (1 for noisy, 0 for clean) labels by the pre-trained noise detector. According to the ratio of the obtained noisy labels to the total number of selected patches, the predictor provides the noise ratio of the whole image. From the output of the NRP, i.e., the predicted noise ratio, the most appropriate DnCNN specifically trained for this noise ratio is exploited for de-noising. Under the guidance of the NRP, the proposed method has the ability to handle unknown noise ratios. This method performs well in terms of execution efficiency and image restoration. Turkmen [72] has proposed an artificial neural network for de-noising RVINincorporated images by detecting the noisy pixels.
The statistics used to detect the RVIN noisy centres are rank-ordered absolute differences (ROADs), and rank-ordered logarithmic difference (ROLD) values. These are the inputs to the ANN for the detection process. After the detection process is completed, the corrupted pixels are restored by the edge-preserving regularization (EPR) method, allowing edges and noise-free pixels to be preserved. This mechanism works well in the presence of high-density RVIN.
Li et al. [73] have improvised the usage of densely connected convolutional networks (DenseNet) to de-noise images corrupted by impulse noise with the help of CNN to learn pixel-distribution features from noisy images. The proposed method, viz., a densely connected network for impulse noise removal (DNINR), captures the pixel-level distribution information using wide and transformed network learning. This mechanism shows significantly better results in terms of edge preservation and noise suppression.
Khaw et al. [74] have used an efficient CNN with particle swarm optimization (PSO) for high-density impulse noise removal. This high-density impulse noise detection and removal model mainly consists of two parts: impulse noise removal and impulse noisy pixel detection for restoration. The deep CNN architecture facilitates the de-noising procedure to filter out noise from the noisy images. The PSO algorithm optimizes the threshold values for detecting impulse noisy pixels. The method is robust and works well on both gray and color images in terms of both qualitative and quantitative aspects.
The RVIN can also be removed with the combination of classifier and regression CNN [75]. Classifier network separates noisy and noise-free pixels. Thereafter, the regression network uses noise-free pixels along with the original noisy input image to predict the output image. Batch Normalization is embedded in both classifier and regression network to accelerate the de-noising performance. Fig.9 shows the overall architecture of the Impulse-noise removal model. The first step is the extraction of a random patch from the noise-corrupted image, and then a classifier N/W is used to predict noisy labels. Thus, the noise-contamination determiner determines the predicted labels (PLs) in case of Jin et al., and extracts feature vectors (FVs) in case of Chen and Turkmen. These feature vectors are composed of numerous statistics, viz., ROADs, CPMD, EPD, ROLD, etc. Finally, a de-noiser N/W is used to de-noise the contaminated image based on the identified noisy centers.

IV. MACHINE LEARNING-BASED POISSON DE-NOISERS
Poisson noise is a special type of noise that is not additive in nature. Unlike the Gaussian noise, the noise power is measured by the peak value as its strength is dependent on the image intensity. It is natural to define the noise power in an image by the maximal value in the image, i.e., its peak value. Thus, Poisson de-noisers are described in terms of the peak value as the strength of the noise power.

A. METHODOLOGIES OF DICTIONARY LEARNING MODELS (POISSON NOISE)
Giryes et al. [76] have proposed a novel method to apply the sparse-representation technique to image patches extracted, adopting the same exponential idea. The proposed algorithm uses greedy pursuit with boot-strapping based stopping condition and dictionary learning within the de-noising process. The stopping criterion is novel in its nature. The paper effectively migrates from the Gaussian Mixture Model (GMM) to a dictionary-learning based model by resolving the difficulties involved in the conversion process. The reconstruction performance of the proposed scheme is competitive with leading methods in high SNR, and achieving state-of-the-art results in cases of low SNR.

B. METHODOLOGIES OF CNN-BASED MODELS (POISSON NOISE)
Kumwilaisak et al. [77] have proposed a method (CNN+ LSTM) based on Deep Convolutional Neural and Multi-directional Long-Short Term Memory Networks to de-noise images of Poisson noise. CNN layers are used to extract image features and to estimate noise bases in the images, and the multi-directional LSTM layers are efficiently used to memorise the statistics of residual noise components, which possess long-range correlations and are VOLUME 9, 2021 sparse in the spatial domain. The Blahut-Arimoto algorithm is used to numerically derive a distortion-mutual information function for the image de-noising algorithm. The algorithm shows state-of-the-art performance in terms of objective and subjective qualities. Su et al. [78] have proposed a novel method to tackle the problems caused due to Poisson noise in the low-light imaging field. This proposal is that of a deep multi-scale cross-path concatenation residual network (MC2RNet) which incorporates cross-path concatenation modules for de-noising. MC2RNet learns the remnant residue between the noisy and the latent clean image to facilitate the model training procedure. This method opts for blind Poisson training over discriminative de-noising algorithms to train a single model for handling Poisson noise with different levels. The algorithm shows a better performance in terms of peak signal-to-noise ratio and visual effects. Ramez et al. [79] have proposed a flexible and data-driven method to de-noise Poisson-corrupted images, which reduces the heavy ad hoc engineering load occurring due to computational post-processing in the contemporary de-noising procedures. They have used a powerful framework of deep CNNs and a training mechanism that trains the same network with images having a specific peak value. Thus, by using a supervised approach and the representation capabilities of deep CNNs, and using a specific class of images for training, the authors have presented a comparatively simple method that shows state-of-the-art performance both qualitatively and quantitatively and is an order of magnitude faster than other methods. Ramez et al. [80] have introduced a methodology that exploits the architecture of a fully convolutional CNN that uses shallow layers to handle local noise statistics and deeper layers to recover edges and enhance textures. The de-noiser is made class-aware by exploiting semantic class information that boosts performance, and enhances textures, and reduces artifacts. The residual learning based Gaussian de-noiser (DnCNN) [38], discussed in section 2.2 can also be trained for Poisson noise removal with training with Poisson noise corrupted data patches and relevant hyper-parameter settings [77].

V. MACHINE LEARNING-BASED MIXED NOISE DE-NOISERS
The extraction of the clean image from mixed noise corrupted image is very complex problem because of the high-level of non-linearity in the noise distribution. The combination of Gaussian and impulse noise is present in many practical applications. The CNN based transfer learning models, dictionary learning model, and variational based mixed noise model are the machine learning models developed for mixed-noise removal. The comparative analysis of various mixed noise models is difficult as mixed noise can be modeled in different ways. The expression of noisy image pixel y obtained by corruption with the Gaussian and SPIN noise is given by where d min and d max are minimum and maximum values in the entire image dynamic range with probability of p/2. The AWGN noise v is added with the probability 1 − p. Similarly, the expression of noisy image pixel y obtained by the corruption with the SPIN, RVIN and Gaussian noise is given by: where d is random pixel value with the probability r (1 − p).

A. METHODOLOGIES OF DICTIONARY LEARNING MODELS (MIXED NOISE)
The dictionary learning models are designed for mixed Gaussian noise of different standard deviation and mixed Gaussian-impulse noise [84]. The energy minimization model with the weighted l 2 −l 0 norm is being used for mixed noise removal such as Gaussian-Gaussian mixture, impulse noise, and Gaussian-impulse noise. It integrates maximum likelihood estimation and sparsity over the learned dictionary. Modified-SVD is used for low rank approximation. In the recent structured dictionary learning model [87], two structured dictionary learning models are combined together. The data-fidelity term uses l p -norm fidelity to fit image patches and l q -norm regularizer for the sparse coding. The authors in [32] propose a novel algorithm to tackle mixed Gaussian noise, i.e., the K-SVD algorithm, which generalizes the K-means clustering process for adapting dictionaries in order to achieve sparse signal representations on a given set of training signals. A dictionary is sought for that leads to the best representation for each member in the set, under strict sparsity constraints.

B. METHODOLOGIES OF CNN-BASED MODELS (MIXED NOISE)
The CNN-based transfer learning, four-stage convolutional filtering model is mixed noise de-noiser designed for a mixture of Gaussian and impulse noise [82]. It uses a rank order filter in the preprocessing step, which is Cai's filter in case of Gaussian and SPIN, whereas the combination of adaptive median filter and adaptive center weighted median filter is used in the case of Gaussian, SPIN, and RVIN. The bilinear interpolation is performed on rank order filter output to get a slightly smoother version of the noisy image. The purpose of bilinear interpolation is to suppress high-frequency components that occurred due to rank order filtering on the Gaussian noise. It is followed by the four-stage convolutional filtering. The first stage consists of the conv layer and ReLU activation function followed by the max-pooling layer. The second and third stages consist of the conv layer and ReLU activation function. The fourth stage is the conv layer. The squared Frobenius norm is used as the loss function, and training is done by the back propagation algorithm. The other CNN model for mixed Gaussian and impulse noise involves two parts: the first half for impulse noise removal and the second half for the Gaussian noise removal [83]. It consists of the input layer, intermediate layers of convolution, batch normalization, and leaky ReLU followed by the convolutional output layer. The second part of Gaussian noise removal has a skip connection for residual learning. CNN model given in [88], has conv+ReLu+BN as basic building block and shows best structural metrics results for both known and unknown noise level of mixed Gaussian-Impulse noise [88]. The CNN is used as a regularizer in traditional variational based methods for mixed noise removal [86]. The mixed noise parameters are iteratively estimated by variational method followed by noise classification according to the statistical parameters. The methodology is implemented by optimization of sub-problem involving four steps which are regularization, synthesis, parameters estimation and noise classification.

VI. REAL WORLD-DENOISERS
Xu et al. have constructed a benchmark dataset to denoise real-world images [89]. The authors have used different cameras with different camera settings. They have evaluated different de-noising methods on the new proposed dataset as well as previous datasets for a proper comparison and subsequent analysis. Extensive experimental results demonstrate that the methods designed specifically for realistic noise removal based on sparse or low rank theories, achieve good de-noising performance and are robust. Another observation made by the authors suggests that the proposed dataset is more challenging for the state-of-the-art methods. In Kim et al. [90], a grouped residual dense network (GRDN) is proposed, which is an extended and generalized architecture of the state-of-the-art residual dense network (RDN) [91]. The core part of RDN is the grouped residual dense block (GRDB) and is used as a building module of GRDN. Cascading GRDNs aids the de-noising performance significantly. Inspired by the GAN modeling technique, the authors have made their own generator and discriminator for real-world noise modeling. Lin et al. [92] have constructed a new dataset to solve the problem of low availability of proper datasets and obtained the corresponding ground truth by averaging, and then they extended them through noise domain adaptation. Furthermore, they went on to propose an attentive generative network by injecting visual attention into the generative network. During the training, the visual attention map learns noise regions. The generative network pays more attention to noise regions, which contributes in balancing between noise removal and texture preservation. Extensive experiments show that this method performs well both qualitatively and quantitatively. Chen et al. have proposed a Deep Boosting Framework (DBF) [93] for real-world image denoising by combining the deep learning into the boosting algorithm. The DBF replaces conventional boosting units with elaborate convolutional neural networks. The outcome is a lightweight Dense Dilated Fusion Network (DDFN) as the boosting unit, which addresses the vanishing gradient problem during training due to the cascading of networks while promoting the efficiency of limited parameters. This method reduces the domain-shift issue with the one-shot domain transfer scheme. This is a strong technique in terms of real-world de-noising. Real-world de-noising has been tested and evaluated on different datasets like DND and NIGHT. DND is a novel benchmark dataset which consists of realistic photos from 50 scenes taken by 4 consumer cameras. The NIGHT dataset is divided into 20 images (denoted as NIGHT-A) and the other 5 images (denoted as NIGHT-B). Another dataset used is RID. It has 20 representative scenes, which are captured under different shooting conditions. The problems faced in real-world de-noising are as follows: (1) The noise in real-world noisy images is very complex, which cannot be described by simpler distributions like Gaussian or Poisson. (2) The inherent practicality of real-world noisy images makes the de-noising more difficult than the synthetic case. (3) the noise distribution may change along with the in-camera imaging pipeline [94]. It thus makes the noise distribution in a captured RGB image different from its Gaussian assumption in the RAW space. (4) The problem of domain shift cannot be neglected in the practical scenario. It can neither be neglected between the synthetic and the real-world noise, but the characteristics of real-world noise can also exhibit differences pertaining to different camera settings (viz., sensor or aperture size), shooting conditions (viz., light, environment, and temperature), and imaging pipelines (viz., smartphone and professional camera) even under the same ISO values [93]. These problems make real-world image de-noising difficult and still a challenging task.

VII. BLIND IMAGE DE-NOISERS
The noise models are defined for a particular noise type with known probability distribution function. For example, in case of Gaussian noise the standard deviation of the noisy image is known and the corresponding de-noised images are calculated. However, in real life scenario, the noise can be due to combined effect of various sources and noise modeling input parameters may not be well-defined. So, de-noisers which produce a de-noised image even when noise level of input image is not defined are termed as blind de-noisers. The models are trained in such a way that it can incorporate a wide range of unknown noise levels. Another approach is to estimate noise levels of the input image, which does not produce accurate de-noised images due to inaccurate noise approximation. BM3D is the non-learning blind de-noiser based on leveraging self-similarity by joint filter application on self-similarity image patches.
The DnCNN [38] model based on residual learning and batch normalization is the blind Gaussian de-noiser in which single model is trained with varying noise levels from zero to fifty-five, down-sampled images with different upscaling factors and images (JPEG) with multiple quality factors. So, the same network can be used as blind Gaussian de-noiser, JPEG image de-blocker and single-image superresolution. Although FFDNet [56] improves DnCNN performance at noise level fifty and seventy-five on BSD-68 dataset, VOLUME 9, 2021  it is a non-blind de-noiser as it requires a noise level map at the input. Similarly [60] involves two sub-networks which are trained separately based on noise-level choice at inference time making it inappropriate for blind level denoising. Blind Universal Image Fusion De-noiser [95] is the network that extracts features to learn an image prior and intermediate noise level values, which is fed into the fusion part of the model for final de-noising. The latest de-noisers such as ADNet [64], BRDNet [51], SCNN [49], PReLU [42], [66], [78] are designed for blind denoising. The blind de-noiser for mixed Gaussian impulse noise is also being designed [83]. The recent research trend in the field of computer vision is progressing towards the development of universal and blind de-noiser for real-world de-noising.

VIII. DESCRIPTION OF DATASET AND SOFTWARE TOOLS A. SOFTWARE
The tremendous success of machine learning particularly deep learning is because of the parallel computing of GPU.

B. DATASETS
The machine leaning based methods have shown significant progress due to the availability of open access benchmark datasets. The datasets are available for gray scale de-noising, color image de-noising, medical image de-noising and realworld de-noising. The training dataset is used for training the model, whereas testing dataset images are used to assess the de-noising results. The peak signal to noise ratio (PSNR) and VOLUME 9, 2021     structural similarity index (SSIM) are most commonly used image quality assessment metrics. However, there are many image quality assessment metrics which are given in [39]. The performance comparison of de-noisers can be done if they use a common testing dataset. In case of Gaussian de-noisers, Set-12 dataset comprises of twelve scenes and BSD-68 dataset, i.e. Berkeley Segmentation Dataset comprises of sixty eight natural images is commonly used. Kodak-24, LIVE and McMaster are also being used for synthetic de-noising. RENOIR, NAM, DND, SIDD and Xu are datasets for real world de-noising [89]. Some of the benchmark datasets are given in [96].

IX. RESULT AND DISCUSSION
Out of all machine learning methods, dictionary learning models performance is inferior in terms of PSNR. The disadvantages of dictionary learning are heuristic selection of the hyperparameters like sparsity level, number of atoms and iterations [97]. It fails to learn invariant features such as translational, rotation and scale invariance and it is apt for low dimensional signal only. The machine learning models have evolved from fully connected neural networks to CNN based de-noisers. CNN's have various advantages over fully connected neural networks such as multi-layer perceptron. The spatial information is intact in the case of CNN whose input is multi-dimensional image data. The parameters of CNN are reduced due to weight sharing as a fixed weight kernel is used. Therefore, reduction in the number of learning parameters, translational invariance and locality due to convolutional operation has given on edge to CNN over other fully-connected models [98]. Most of the CNN de-noisers require application oriented large datasets for supervised learning. The availability of medical image datasets is still challenging as it requires manual intervention for its annotation. Moreover, the de-noising results are almost stagnant after the network attains a certain depth and there is no significant change by increasing the number of training images. The CNN methodologies involve a change in activation function, network depth, loss function, training dataset, etc. In order to solve this problem there is a gradual shift from discriminative learning CNN model to generative learning model GAN model. It uses two neural networks generator and the discriminator, where generator model creates plausible images and discriminator model constantly evaluates the generator images as real or fake. Therefore, both networks work in synchronization and act as an adversarial for each other. The fundamental design of GAN is based on indirect training of the generator by the discriminator. This falls under the category of semi-supervised learning. The training efficiency of GAN is more than that of CNN as more features are learned in GAN in the same number of epochs as compared to CNN [99]. The GAN's achieve better results with less training images as compared to CNN. TABLE 2 shows de-noising results in terms of PSNR for dictionary learning and CNN based networks. The models progressed from K-SVD, KLLD i.e., from dictionary learning models to CNN based models. The DnCNN model is  the benchmark residual learning-based Gaussian de-noiser, which has led to the further development of many de-noisers. The methodologies involve a change in the loss function, increase in receptive field size, the change in the number of layers, integration of transform, spatial domain methods with CNN and inclusion of graph theory in CNN. It can be inferred from TABLE 2 that PSNR values obtained by different CNN based methods are very close to each other. However, ADNet [64] network with four modules suppresses the effect of network length on shallow layers and gives good PSNR results on Set-12 dataset. In the CNN network, after an optimum number of layers, PSNR values attain saturation. It implies that further increment in network length does not improve the de-noising performance. Apart from ADNet, BRDNet is the other network that integrates residual learning with batch renormalization and dilated convolutions to enhance de-noising performance. The de-noising performance of BRDNet can be attributed to an increase in receptive field size by dilated convolutions and an increase in network width by concatenation of two networks. Therefore it overcomes the disadvantages of the previous networks, such as (a) training difficulty and stagnation of results due to an increase in network length (b) mini-batch and internal co-variate shift problems. Further, PReLU [66] based edge aware filter has attained the best PSNR results both on Set-12 and BSD-68 dataset at different sigma levels. It has used parametric rectified linear units as activation, which overcomes the disadvantage of ReLU by learning in the negative direction. The success can be attributed to the fact that this is a hybrid methodology which has the inclusion of principal component analysis and edge aware bilateral filter. Moreover, CNN uses supervised learning which is becoming computationally demanding with an increase in dataset size. Therefore, the generative learning model of Generative Adversarial Network is being used. The GCBD model gives promising result even in the absence of supervised learning data. Its PSNR value is the same as that of the DnCNN network on the BSD-68 dataset for noise level=15, and 25 .  TABLE 3 gives a comparative analysis of machine learning methods on the BSD-68 and Kodak-24 datasets. It has been observed that there is no significant difference in the PSNR values of different networks. The DIDN network designed with receptive field variation and modification of U-net architecture designed for semantic segmentation perform better for color images too as shown in TABLE 4. DRUNet [63] network which is based on deep learning CNN based image prior plugged into the half quadratic splitting-based de-noising iterative algorithm shows good results on both gray and color images.
The impulse de-noisers predict pixels affected by noise in the first step. It is followed by a noise contamination determiner and post noise detection processing. The dictionary learning and CNN-based models are designed for impulse noise removal. However, the noise ratios are varied in a very large range. To overcome the problem of less flexibility due to the unknown severity of contamination, [70] uses a noise ratio predictor that can measure the severity of corruption, i.e., the noise ratio of the image rapidly and efficiently. Fig. 11 (d) shows that blind CNN achieves a higher value of PSNR, compared to ANN [72]. Blind CNN removes the noise and retains image details, the reason being the NRP. It converts the noise mask into a noise ratio, and, according to this ratio, the most appropriate CNN model is selected for de-noising, rather than restoring the image by removing the detected RVIN noise pixel-by-pixel.
The Poisson noise is modeled by its peak value and it is also categorized into dictionary learning models and CNN based models. There is just one dictionary learning model which performs de-noising by greedy pursuit algorithm and boot strapping based stopping criterion. Gaussian de-noisers such as DnCNN and IRCNN are also being used for Poisson de-noising with different parameter settings. Although Gaussian de-noisers are being used for the poisson noise, the heuristic setting of network parameters is again a big challenge. TABLE 5 gives poisson de-noising performance on Live1 dataset. DenoiseNet is the first residual learning based CNN de-noiser designed for poisson noise. Later, CNN+LSTM and MC2RNet models have outperformed DenoiseNet. The CNN + LSTM Poisson de-noiser, which uses CNN for feature extraction and LSTM layers to store noise components, outperforms DenoiseNet as given in TABLE 5 and VI. The inclusion of Blahut-Arimoto algorithm to determine the number of CNN layers and learning of residual noise statistics by LSTM improves the de-noising results of CNN+LSTM. Deep multi-scale cross-path concatenation residual network (MC2RNet) which incorporates cross-path concatenation modules for de-noising also outperforms CNN based DenoiseNet as given in TABLE 7 and  TABLE 8 on Set 10 and BSD-68 dataset respectively. Therefore, CNN+LSTM, DenoiseNet and MC2RNet are the available CNN based Poisson de-noisers which are less in number as compared to Gaussian de-noisers. The mixed noise can be modeled mathematically in different ways. There are models designed for mixture of impulse and Gaussian noise. The four-stage residual learning-based mixed network [82] and de-noiser with twostage cascade connection of impulse and Gaussian de-noisers are used for mixed Gaussian Impulse noise given in

X. CONCLUSION AND FUTURE SCOPE
In this paper, a comprehensive study and analysis of machine learning models for removal of different noises is provided. The categorization of different de-noisers is done into dictionary learning models, CNN based models and GAN based models. The comparative analysis PSNR results of different de-noisers on some benchmark datasets are provided for better understanding of the reader. It has been observed that integration of analytical methods in machine learning model can further improve the results. Although there are numerous networks designed for synthetic datasets, but real-world image de-noising is still a challenging problem. The GAN based de-noisers are still in the primitive stage. However, the generative learning based GAN and deep belief networks can perform unsupervised learning to a certain extent unlike CNN. The future prospects lie in the design of real-world de-noisers with unsupervised learning framework for practical applications. The transfer learning approach, graph theory inclusion in neural network, prior design, and receptive field enhancement are some of the areas for future research.