Locally Adaptive Channel Attention-Based Network for Denoising Images

Channel attention has recently been proposed and shown a great improvement in image classification accuracy. In this paper, we show that channel attention can greatly help a low-level vision task, image denoising, as well, and propose channel attention-based networks for image denoising. We provide a thorough analysis on the effect of channel attention on image denoising, which shows that channel attention boosts denoising performance by making the network to focus on informative channels more closely related to noise. We also show that channel attention has an adaptive nature to image contents and noise and propose locally adaptive channel attention for further improving image denoising quality. Experimental results show that our denoising network with global channel attention outperforms existing state-of-the-art methods in both blind and non-blind settings, and our locally adaptive channel attention substantially improves both image quality and computation time.


I. INTRODUCTION
Image denoising is one of the most fundamental problems in computer vision and image processing fields. A noisy image y is generally modeled as y = x + n where x is a noise-free image, and n is noise, which is often assumed to be additive white Gaussian noise with a standard deviation σ . The goal of image denoising is to infer x from y, which is ill-posed due to the loss of information caused by noise. Image denoising has a wide range of applications such as consumer cameras, medical imaging, and other vision systems that take noisy images as input.
Recently, deep learning-based approaches have shown significant improvement over classical ones [1]- [5]. A key challenge in applying deep learning to denoising is to find an effective network architecture that maximizes denoising performance. As the representational power of a convolutional neural network (CNN) usually increases as does its depth, an intuitive solution would be to use more convolutional layers. However, simply stacking up more convolutional layers makes learning more difficult and causes over-fitting or performance saturation [6], [7].
The associate editor coordinating the review of this manuscript and approving it for publication was Ziyan Wu .
To resolve this and to achieve higher denoising quality, we propose a channel attention-based network for denoising images (CANDI). Our network adopts channel attention, which was first proposed by Hu et al. [8] to enhance image classification accuracy. Specifically, Hu et al. compute channel-wise weights, or channel attention, from a given feature volume and recalibrate the channels in the feature volume using the weights. In this way, informative features can be emphasized and selectively used for more accurate image classification.
In our work, we adopt channel attention to emphasize informative features that help separate out noise from image contents. More specifically, extracting features from a noisy image using a CNN is analogous to extracting different frequency components of an image using a transform such as wavelet transforms. Among different frequencies, image noise is most distinguishable in high-frequency components, so classical denoising methods focus on such components [9]. In the case of a CNN, different feature channels correspond to different frequency components, so we may selectively use informative channels using channel attention to more effectively identify noise.
Based on this motivation, we first present CANDI and examine different network configurations to find an optimal architecture for CANDI (Sec. III). We then con-duct an analysis of the effect of channel attention on image denoising with respect to different image contents and noise levels (Sec. IV). Through the analysis, we verify that channel attention does select informative feature channels with high-frequency components related to noise. We also show that channel attention has an adaptive nature to image contents and different noise levels. Based on the analysis, we also propose a locally adaptive channel attention-based network for denoising images (LACANDI) (Sec. V). Experimental results on natural images with Gaussian noise show that CANDI outperforms most state-of-the-art methods in both blind and non-blind settings, and LACANDI substantially improves image denoising quality as well as computation time (Sec. VI).
Our main contributions are summarized as follows: • We propose novel channel attention-based networks for image denoising that outperform state-of-the-art methods in both non-blind and blind settings.
• We provide an analysis of the effect of channel attention on image denoising, which shows that channel attention boosts denoising performance in three aspects. 1) It makes a network to focus on informative channels more closely related to noise. 2) It adapts a network to image contents to faithfully restore a clear image.
3) It also adapts a network to different noise levels for effective blind denoising.
• We present locally adaptive channel attention for modeling locally different nature of natural images.

A. IMAGE DENOISING
Early approaches usually use explicit modeling of the characteristics of natural images. Filtering based approaches such as Gaussian filtering and bilateral filtering [10] assume that nearby pixels have similar values. Total variation [11], [12] assumes the magnitudes of image gradients follow a Laplacian distribution. Buades et al. [13] proposed to exploit the self-similarity property of natural images. This property has been widely applied in many following works such as [4], [5], [7], [14], [15]. To more faithfully capture the characteristics of natural images, learning-based approaches have been actively studied. Elad and Aharon [16] learn over-complete dictionaries for image denoising. Yang et al. [17] proposed coupled dictionaries learned from high-and low-resolution image patches for single-image super-resolution. Roth and Black [18] proposed the Fields of Experts framework that learns potential functions of Markov random fields. Zoran and Weiss [19] proposed an image prior based on a Gaussian mixture model of natural image patches. However, their performance is limited as they rely on relatively simple models compared to recent deep learning-based approaches.
For the last few years, deep learning has been actively applied to image denoising. Chen and Pock [2] proposed a non linear diffusion model called TNRD. Mao et al. [3] introduced a fully convolutional encoding-decoding frame-work for image denoising and super-resolution. Zhang et al. [1] proposed DnCNN that adopts residual learning [6] and batch normalization [20]. Yang and Sun [4] presented a BM3D-Net, which is inspired by BM3D [14]. Lefkimmiatis [21] developed a non-local operator to exploit self-similarity, and a deep network architecture consisting of several non-local operators. Plötz and Roth [5] introduced a deep learning architecture based on differential K-nearest neighbor selection called a neural nearest neighbors block. Liu et al. [7] presented a recurrent network based on non-local recurrent modules to exploit self-similarity. While deep learning-based approaches have shown superior results to classical ones, their performance is limited as they treat useful and less useful features in the same way.

B. CONTENT-ADAPTIVE IMAGE RESTORATION
Our work is also closely related to content-adaptive image restoration techniques for reflecting different characteristics of different images. As different images may have different characteristics, adaptive techniques to image contents have been proposed. Saquib et al. [22] proposed to estimate parameters of a prior from a noisy image for image restoration. As a single image may have different characteristics in different local areas, locally adaptive approaches have also been proposed. Bishop et al. [23] introduced an image restoration approach that splits an image into regular grid cells and adapts a prior to each cell. Cho et al. [24] proposed to estimate parameters of a prior from local image regions. Sun et al. [25] showed that locally adapted priors can significantly improve the quality of non-blind deconvolution. On the other hand, recent deep learning-based image denoising approaches treat different channels representing different features in a fixed way regardless of image contents. Thus, they can be considered analogous to using one universal model or prior to all images in classical approaches, and bear the same limitations.

C. CHANNEL ATTENTION
Since Hu et al.'s work [8], a few following works that use channel attention have been introduced. Woo et al. [26] proposed a convolutional block attention module that combines channel attention and spatial attention for high-level vision tasks. Regarding low-level vision tasks, Zhang et al. [27] and Cheng et al. [28] recently proposed single image super-resolution approaches that utilize channel attention. However, both of them do not target at image denoising. Furthermore, we provide a careful analysis on the effect of channel attention on image denoising, and present locally adaptive channel attention based on it.

III. CHANNEL ATTENTION-BASED NETWORK FOR DENOISING IMAGES
In this section, we first review the channel attention module [8]. We then introduce the architecture of CANDI and explain its training. Finally, we examine different design options for CANDI. For brevity, we denote convolution, batch VOLUME 8, 2020 FIGURE 1. A channel attention module [8].
normalization [20] and rectified linear unit [29] by Conv, BN and ReLU, respectively in the rest of the paper.

A. CHANNEL ATTENTION MODULE
A channel attention module, or a squeeze-and-excitation block, was recently proposed by Hu et al. [8] for enhancing image classification accuracy. Fig. 1 illustrates the architecture of a channel attention module. Given a feature volume, the module computes channel-wise global statistics using global average pooling, and per-channel weights, or channel attention, ranging from 0 to 1 from the statistics. Each per-channel weight is then multiplied to its corresponding channel of the input feature volume to produce a re-scaled feature volume. In this way, informative channels can be emphasized while less useful ones are suppressed. Fig. 2 shows the architecture of our denoising network, CANDI. CANDI takes a noisy grayscale image as input and predicts its noise map, which can be subtracted from the input image to produce a denoised result. We adopt the residual learning strategy that predicts a residual map, or a noise map in our case, as the strategy has consistently shown to outperform direct estimation of a restored image in recent image restoration methods [1], [7].

B. ARCHITECTURE OF CANDI
As shown in Fig. 2, CANDI consists of a series of residual blocks except for the first and last blocks. The first block extracts shallow features from an input image. It consists a Conv layer with 64 filters of size 3×3×1 followed by a ReLU layer for non-linearity. In the middle, we have 20 residual blocks that perform denoising in the feature space. Each residual block consists of three Conv+BN+ReLU followed by one Conv and one channel attention module. Each block has a skip connection to ease the training of a deep network. Every Conv layer in the residual blocks has 64 filters of size 3 × 3 × 64. We set the number of channels (C in Fig. 1) in the middle of the channel attention modules as 4 using the reduction rate of 16 suggested by Hu et al. [8]. Finally, the last block reconstructs a noise map from features, and has a single Conv layer of size 3 × 3 × 1.
The network architecture of CANDI is mainly inspired by DnCNN, which is a state-of-the-art CNN-based denoising method [1]. Removing the skip connections, Conv layers marked in yellow in Fig. 2, and channel attention modules, CANDI reduces to DnCNN. The effects of the additional components adopted to CANDI will be examined in Sec. III-D.

C. TRAINING
To evaluate the performance of CANDI, we train a few different models including models for known specific noise levels, and a blind model for unknown noise levels. We refer the models trained for known noise levels as CANDI, and the blind model as CANDI-B. In the following, we describe how we train both CANDI and CANDI-B.

1) LOSS FUNCTION
To train CANDI and CANDI-B, we use an L 2 loss function. Specifically, given a training dataset D = . . . , I (i) , J (i) , . . . where I (i) and J (i) are the i-th noisy image, and its corresponding ground truth noise-free image, respectively, we minimize the following loss function: where is a set of network parameters, and f (I (i) ; ) is noise predicted by CANDI with parameters .

2) TRAINING DATA FOR CANDI
We generate training data following [2]. We use 400 images from the training and test sets of BSD500 dataset [30], each of which consists of 200 images. We downsample each image by scaling factors of 0.9, 0.8, and 0.7, and obtain four images including the original one. Each image is then randomly cropped into 180×180. From each cropped image, we extract 40 × 40-sized patches with stride of 10 × 10. We augment each patch by random horizontal and vertical flips and random rotation by 90 • 's, and obtain two augmented versions. Through this process, we generate 476,800 patches. In our experiments, we consider three noise levels: σ = 15, 25, and 50. To train CANDI for each noise level, we add Gaussian noise of each noise level to the generated patches.

3) TRAINING DATA FOR CANDI-B
For CANDI-B, we follow the training strategy for the blind version of DnCNN [1]. We use the same 400 images from the BSD500 dataset, and perform the same procedure except for a couple of steps. First, we extract patches of 50 × 50 instead of 40×40 as done in [1]. Second, for each patch, we randomly sample a noise level σ from a uniform distribution defined on [0, 55], and add Gaussian noise of σ to the patch.

4) TRAINING SETUP
For both CANDI and CANDI-B, we initialize the weights of all Conv layers by random normal initialization with zero-mean and a standard deviation of 0.0005. We use Adam optimizer [31] with parameters β 1 = 0.9, β 2 = 0.999, and = 10 −8 . We set the learning rate as 0.001 and reduce it by half every 30 epochs. We use a mini-batch size of 64, and train the models for 100 epochs. We used PyTorch [32] to implement and train CANDI and CANDI-B. The training of each model takes four days using an Intel Zeon E5-2620 @ 2.0 GHz and an NVIDIA TITAN RTX (24GB).

D. HYPERPARAMETERS AND NETWORK DESIGN
The network architecture is one of the most important factors for the performance of a neural network. In this section, we examine several different design options to find the optimal network architecture for CANDI. As our model is based on DnCNN [1], we begin with the architecture of DnCNN and examine different options one by one. In all the experiments, the performance of different models are evaluated using the Set12 dataset [1] for a noise level σ = 50. All models including DnCNN in this section were trained for 50 epochs using the setting described in Sec. III-C.

1) NUMBERS OF CONV LAYERS IN THE RESIDUAL BLOCK
We first conducted an experiment to find an optimal number of Conv layers in each residual block. Specifically, we prepared variants of CANDI with different numbers of Conv layers ranging from 1 to 5. We set the total number of layers of each variant as either 17 or 18 to compare them with DnCNN with 17 layers. We denote a model with y residual blocks with x Conv layers by CANDI x×y . CANDI x×y has (x ×y+2) Conv layers in total including the ones in the first and last blocks. The number of channel attention modules also varies across different models. For example, CANDI 1×15 has 15 modules while CANDI 5×3 has only three. Fig. 3 depicts a variant of CANDI (CANDI 2×8 ) tested in this experiment. Table 1 shows the experimental result. Among different versions of CANDI, CANDI 4×4 achieves the highest PSNR. This suggests that simply using more channel attention modules does not improve denoising quality, but the number of modules should be carefully balanced.

2) NETWORK DEPTH
In the next experiment, we fix the number of Conv layers in each residual block as four, and gradually increase the number of residual blocks y to find out an optimal depth. Table 2 reports the result. It shows that the performance gradually increases until y reaches at 20, and it drops when y is 30. A possible reason of the performance drop for y = 30 is  overfitting, which may be solved using a larger amount of training data. Based on this result, we fix the number of blocks as 20 in our final model.

3) THE STRUCTURE OF THE RESIDUAL BLOCK
Finally, we test a few different designs for the residual blocks. We compare six different designs shown in Fig. 4. Fig. 4(a) corresponds to a simple extension of DnCNN, which has 82 convolution layers. Fig. 4(b) corresponds to an extension of DnCNN with skip connections, each residual block of which consists of four Conv+BN+ReLU and a skip connection, but has no channel attention modules. Fig. 4(c) has a channel attention module, but no BNs. Fig. 4(d) has BNs as well as a channel attention module. Fig. 4(e) and (f) are obtained by removing ReLU and BN one by one from the last block before the channel attention module of Fig. 4(d). Fig. 4(f) corresponds to our final model presented in the main paper. Using these blocks, we prepared five variants of CANDI, each of which has 20 residual blocks. Table 3 reports the performance of the residual blocks. The simple extension of DnCNN without skip connection (Table 3(a)) did not converge during training possibly due to the increased difficulty of training as the depth of the network is much larger than the original DnCNN.   Fig. 4. performance. Although the reason is unclear, we conjecture that this is because features for image denoising are closely related to intensity values, and non-linear functions such as BN and ReLU can break the relationship between them.

IV. ANALYSIS ON CHANNEL ATTENTION A. CHANNEL SELECTION
To investigate how channel attention helps image denoising, we first examine what channels are selected by the channel attention modules. To this end, we visualize channel attention weights computed from a noisy image and their corresponding feature channels at different depths (Fig. 5). The input image has Gaussian noise of a noise level σ = 25. We sampled channel attention values from the first and 11th residual blocks.
At the first residual block, features corresponding to the largest channel attention weights are less correlated with the structural content of the image, and show more random and high-frequency patterns. On the other hand, features corresponding to the smallest channel attention weights are more correlated with the structural content of the image. The features at the 11th residual block show a similar tendency too, while the tendency becomes less obvious. This verifies that channel attention emphasizes channels corresponding to high-frequency components closely related to noise.

B. CONTENT-ADAPTIVITY
A channel attention module aggregates information from different spatial locations, which may possibly encode the  global context of an input image. Thus, in the next experiment, we investigate whether the channel attention modules reflect the content of an input image to better restore a clean image as done in content-adaptive image restoration techniques [22], [24], [25].
To verify this, we first check whether channel attention modules produce different weights with respect to different image contents. Specifically, we added Gaussian noise of a noise level σ = 25 to an image ( Fig. 6(a)). Then, we cropped two sub-images (marked in red and green in Fig. 6(a)) and fed them as well as the original noisy image to CANDI to capture their channel attention weights. We found that most channel attention modules produce almost identical weights regardless of image contents except for one or two modules. Fig. 6(b) visualizes the channel attention weights at the 12th residual block that produces different weights with respect to image contents. The channel attention weights of the three images are similar but clearly different to the weights of the other images, showing that they are adaptively determined to image contents.
To more clearly verify the content-adaptivity of channel attention, we examine the effect of adaptively computed channel attention weights. To this end, we applied CANDI to the sub-image marked in red in Fig. 6(a) using three different channel attention weights computed from the red sub-image, from the green sub-image, and from the entire image. Then, we measured their PSNR values. The PSNR values of the original noisy image and its denoising result with different channel attention weights are 20.64, 25.27, 24.81, and 25.04 dB, respectively. The result shows that non-adaptive weights can still remove noise. Among the denoised images, the one obtained using channel attention weights from totally different contents has the lowest PSNR, while the one using channel attention weights from its own content has the highest value. This shows that the content-adaptivity of channel attention can improve denoising quality, analogously to previous content-aware priors [24], [25].

C. NOISE-ADAPTIVITY
We also investigate whether channel attention is adaptive to noise, and whether its adaptivity helps handle different levels of noise. To this end, we prepared three images of the same scene with different noise levels (σ = 15, 25, and 50). Then, we fed them into CANDI-B and captured their channel attention weights. Fig. 7 visualizes the weights of the different noise levels at different depths. As shown in the figure, different noise levels produce different channel attention weights, indicating that the channel attention modules are adaptive to noise.
To verify whether the adaptivity of channel attention helps handle different levels of noise, we conducted another experiment using the same three images used earlier. Specifically, in this experiment, we remove noise from the image with σ = 25 using three different channel attention weights computed from the images with noise levels 15, 25, and 50. Fig. 8 shows the results. The PSNRs of the original noisy image with σ = 25, and its denoising results are 20.47, 24.66, 30.22, and 26.67 dB, respectively. Qualitatively speaking, the result using the attention weights of σ = 15 ( Fig. 8(b)) has remaining noise, while the result using the attention weights of σ = 50 (Fig. 8(d)) has blurry details. On the other hand, the result using the attention weights of σ = 25 (Fig. 8(c)) has clearly restored details and no remaining noise. This behavior of channel attention is analogous to denoising strength parameters of traditional denoising algorithms such as the range sigma of the bilateral filter [10], and also shows that channel attention adapts the network to more effectively remove noise with respect to different noise levels.

V. LOCALLY ADAPTIVE CHANNEL ATTENTION-BASED NETWORK FOR DENOISING IMAGES
The analysis in Sec. IV shows that channel attention has a content-adaptive property. While different images have different types of contents, even a single image may also have different types of contents on different local areas. However, the channel attention module cannot model such locally different nature of natural images due to the global average pooling operation. Inspired by this observation, in this section, we develop a locally adaptive channel attention module that allows us to compute spatially different channel attention. A locally adaptive CANDI (LACANDI) is then obtained by simply replacing all channel attention modules in CANDI by locally adaptive channel attention modules.
To compute locally adaptive channel attention, we split an input feature volume into a regular grid, and compute channel attention for each grid cell. To this end, we modify the channel attention module. We first replace the global average pooling by local average pooling that computes the average value for each grid cell. Specifically, for an input feature volume of size k w W × k h H × C where k w and k h are the numbers of cells along the horizontal and vertical axis, respectively, and W × H is the size of each grid cell, we define a local average pooling operator as a combination of mean filtering and subsampling. Mathematically, the local average pooling operator LAP is defined as: where x c is the c-th channel of an input feature volume x, f is a mean filter, * is a convolution operator, and D is a decimation operator that subsamples the input feature value at the center of each grid cell. z is the output feature volume of the local average pooling operator, which has the size of k w × k h × C.
In our experiments, we use a mean filter of W × H , whose elements are 1/(W H ), for f . We also replace fully connected layers in the channel attention module by 1 × 1 Conv layers. Fig. 9 illustrates the network structure of a locally adaptive channel attention module. Using a locally adaptive channel attention module, we obtain a channel attention map  of size k × k × C. To re-scale the input feature volume, we upsample the map to the size of the input feature volume. For upsampling, we use bilinear interpolation to introduce smooth transition between grid cells and to avoid tiling artifacts. The upsampled channel attention map is then multiplied to the input feature volume element-wise to obtain a re-scaled feature volume.

A. TRAINING
In our experiments, we did not train LACANDI models separately, but simply reused the weights of CANDI models. This is possible because a fully connected layer in a channel attention module is equivalent to a 1 × 1 Conv layer in a locally adaptive channel attention module. Also, as we use small training images of 40 × 40, we can safely assume that the weights of CANDI models are already trained to be adaptive to small local areas. We also introduce a blind version of LACANDI, denoted by LACANDI-B. For LACANDI-B, we reused the weights of CANDI-B.

B. GRID SIZE
If we set k w = W and k h = H for an input image of size W × H and use a mean filter of size W × H , we can compute locally adaptive channel attention weights for all pixels. Nonetheless, we empirically found that k w = k h = 10 works well in most cases. We compared these two options using the BSD68 dataset for a noise level σ = 25. Both LACANDI models with (k w = 10, k h = 10) and (k w = W , k h = H ) achieve 29.34 dB, but their computation times are 0.09 seconds and 0.74 seconds, respectively, as the model with (k w = W , k h = H ) needs a much larger amount of computation. In the rest of this paper, we use k w = k h = 10 for both LACANDI and LACANDI-B.

VI. EXPERIMENTS
In this section, we evaluate the performance of our final models: CANDI, LACANDI, CANDI-B, and LACANDI-B. For evaluation, we use three widely used benchmark datasets: BSD68 [33], Set12 [1], and Urban100 [34]. We compare our models with several state-of-the-art denoising methods: BM3D [14], TNRD [2], DnCNN [1], N 3 Net [5], NLRN [7], DnCNN-B [1], and GCBD [35]. Except for BM3D, all the methods are learning-based. BM3D, N 3 Net and NLRN exploit non-local self-similarity to effectively handle repeated structures. DnCNN-B is a blind version of DnCNN that shares the same architecture. GCBD is also a blind denoising method, which is based on generative adversarial networks. We refer the readers to our supplementary material for more details on the evaluation in this section and the analysis in Sec. IV.

A. QUANTITATIVE COMPARISON
We first quantitatively compare our models with state-ofthe-art non-blind methods in terms of PSNR and structural similarity index (SSIM) [36], which is another widely used measure for image quality assessment. A higher SSIM value means that an image is more similar to the ground truth one. The PSNR and SSIM values of all the other methods are from their papers except for the SSIM values of N 3 Net, which are not reported in [5]. For the SSIM values of N 3 Net, we measured them using the trained models provided by the authors. Table 4 shows a quantitative comparison. It is shown that LACANDI outperforms CANDI in all cases in terms of both PSNR and SSIM, which validates the effectiveness of the locally adaptive channel attention modules, and their contentadaptivity. It is also shown that CANDI and LACANDI outperform the other methods in most cases. Even LACANDI-B, a blind version of LACANDI, shows similar performance to N 3 Net, which is non-blind, on the Set12 and BSD68 datasets thanks to its noise-adaptivity. NLRN, which is another stateof-the-art method, is not included in this comparison because it is orders-of-magnitude slower than our models as will be discussed later. Table 5 shows a quantitative comparison against state-of-the-art blind denoising methods. It shows that LACANDI-B outperforms all the other methods by a large margin.  Fig. 11 shows a qualitative comparison against state-of-theart methods. The figure shows that our models produce less artifacts than the other ones especially on the first, third and last rows. The second and third rows show that our models preserve more details than the others. More examples can be found in the supplementary material.

C. COMPUTATION TIME
Finally, we compare the computation times of state-of-theart methods and ours. The computation times were measured using the authors' code in the same environment as the training environment (an Intel Zeon E5-2620 @ 2.0 GHz and an NVIDIA TITAN RTX). Table 6 reports the average computation times on the BSD68 dataset that has images of either 321 × 481 or 481 × 321. Both our models take about 0.1 seconds to denoise a single image, which shows that our models can be used in practical applications. Compared to N 3 Net and BM3D, our models are an order-ofmagnitude faster. Compared to NLRN, LACANDI is more than 700 times faster. N 3 Net, BM3D, and NLRN perform feature matching to exploit non-local self-similarity, so they require relatively large computation times. Especially, NLRN repeatedly performs feature matching, which causes a significant amount of computation. Interestingly, LACANDI is faster than CANDI even though LACANDI computes a larger number of channel attention weights. This is because local average pooling is more GPU-friendly than global average pooling. Fig. 10 visually compares the computation times and denoising qualities of different methods measured on the BSD68 dataset for a noise level σ = 25. The figure shows that our methods outperform all the other methods except for NLRN in terms of both PSNR and SSIM, although they require relatively small amounts of computation. In terms of PSNR, our methods are worse than NLRN. Specifically, the PSNR and SSIM of NLRL are 29.41 dB and 0.8331, respectively, while those of LACANDI are 29.34 dB and 0.8338. However, ours are orders-of-magnitude faster.

VII. CONCLUSION
In this paper, we proposed CANDI, a novel channel attention-based network for denoising images. Then, we analyzed the effect of channel attention on image denoising, and showed that channel attention has an adaptive nature to image contents and noise. Based on this, we proposed a locally adaptive channel attention module and an image denoising network, LACANDI, based on it. Experimental results showed that both CANDI and LACANDI and their blind versions outperform state-of-the-art methods.
We believe that locally adaptive channel attention can also benefit other problems such as super-resolution, deblurring, and high-level vision problems such as segmentation. The performance of image denoising may depend on the complexity of image contents, and an analysis on this may help design a more effective network structure for denoising. LACANDI splits an image into a uniform grid, which may hinder fully exploiting locally different characteristics of natural images.
To resolve this, we may adopt semantic segmentation. Exploring such possibilities would be an interesting future direction.