Progressive Back-Traced Dehazing Network Based on Multi-Resolution Recurrent Reconstruction

In order to alleviate adverse impacts of haze on high-level vision tasks, image dehazing attracts great attention from computer vision research field in recent years. Most of existing methods are grouped into physical prior based and non-physical data-driven based categories. However, image dehazing is a challenging ill-conditioned and inherently ambiguous problem. Due to random distribution and concentration of haze, color distortion and excessive brightness often happen in physical prior based methods. Defects on high-frequency details’ recovery are not solved well in non-physical data-driven methods. Therefore, to overcome these obstacles, in this paper, we have proposed an effective progressive back-traced dehazing network based on multi-resolution recurrent reconstruction strategies. A kind of irregular multi-scale convolution module is proposed to extract fine-grain local structures. And a kind of multi-resolution residual fusion module is proposed to progressively reconstruct intermediate haze-free images. We have compared our method with several popular state-of-the-art methods on public RESIDE and 2018 NTIRE Dehazing datasets. The experiment results demonstrate that our method could restore satisfactory high-frequency textures and high-fidelity colors. Related source code and parameters will be distributed on Github for further study.


I. INTRODUCTION
Due to absorption and scattering of light by small floating particles like dust and smoke, the visibility of scene in hazy environment is severely degraded. Low quality images with color distortion, low contrast, and scene attenuation are inevitably suffered. Since many vision algorithm can only work on clear image, high-level vision tasks such as video surveillance, remote sensing, autonomous driving, object detection etc can not work well under hazy condition. In order to alleviate the adverse impacts of haze on high-level vision tasks, in this paper, we are committed to developing an effective image dehazing algorithm.
Image dehazing is a challenging ill-conditioned and inherently ambiguous problem. Most of existing dehazing algorithms are based on atmospheric scattering model proposed by McCartney and Hall [1]. The physical model is formulated The associate editor coordinating the review of this manuscript and approving it for publication was Yuming Fang .
where, x is image pixel, and λ is image color channel, such as λ ∈ {red, green, blue}. I λ (x) is a hazy observation. J λ (x) is the haze-free ground-truth. A λ represents global atmospheric light. t is a transmission map which represents the proportion of light reaching camera after the attenuation of haze. Under the assumption that haze distribution is uniform, t is expressed as (2) t(x) = e −βd(x) (2) where, β is the attenuation coefficient and d(x) is the scene depth. Generally, image dehazing methods are divided into physical prior based and non-physical data-driven based. Typically, as an outstanding representative, He et al. [2] proposed a dark channel prior (DCP) method to effectively (1) Compared with algorithms based on the atmospheric scattering model. It can be seen that the algorithms using the atmospheric scattering model has a poor effect in the sky. But our proposed network can better remove haze from the sky. (2) Compared with algorithms based on the CNN. It can be seen that the other three models are not as good as ours in the details. estimate transmission map. However, its performance on wild areas isn't good. Through analyzing the difference of color distribution between clear image and hazy image, Berman et al. [3] proposed a non-local image dehazing (NLD) method. In recent, with popularity of convolution neural network, Zhang and Patel [4] proposed a densely connected pyramid dehazing network (DCPDN) to jointly estimate A λ and t for the atmospheric scattering model. However, though simple and effective the atmospheric scattering model is, due to the random distribution and concentration of haze in reality, methods based on physical prior cannot get satisfactory results. Color distortion and excessive brightness often happen, as shown in Figure 1(1).
With popularity of deep learning, end-to-end network was employed to regress final haze-free image. Representatively, as a pioneering work, Cai et al. [5] proposed a DehazeNet to directly remove haze from hazy images. In most recent, gated fusion network (GFN) [6], enhanced Pix2pix dehazing network (EPDN) [7] and gated context aggregation network (GCANet) [8] are successively proposed to directly restore haze-free image. Although the current non-physical datadriven based methods have greatly improved dehazing performance, there are still some defects on high-frequency details' recovery, as shown in Figure 1 (2).
In this paper, we focus on solving the above mentioned obstacles. The proposed network is inspired from superresolution and recurrent residual learning. It adopts progressive back-traced dehazing strategy, and recurrently recovers haze-free image from low-resolution to high-resolution. For maximumly preservation of hazy content details, a kind of irregular multi-scale convolution module is proposed to extract feature maps in cascade scales. For maximumly feature reuse during the back-traced dehazing process, a kind of residual fusion module is proposed integrate cascade convolution features of different resolutions. We have evaluated the proposed network on several public dehazing benchmarks. The experiments demonstrate that compared with popular state-of-the-art methods, our method could satisfactorily recover high-frequency details with highly preservation of realistic colors.
The contributions of this paper are as followings: 1) We have proposed an end-to-end progressive backtraced dehazing network based on multi-resolution recurrent reconstruction strategies. The proposed network is inspired from super-resolution, and could restore satisfactory high-frequency details and highfidelity colors. For highly preservation of hazy content textures, we have proposed multi-scale convolution module with irregular kernel shapes to extract finegrain local structures for image restoration. For efficient reuse of hierarchical informations, we have proposed a multi-resolution residual fusion module to progressively reconstruct intermediate haze-free images, ensuring that the network could well dehaze at different resolutions. 2) We have evaluated our method on public dehazing benchmarks. Compared with latest state-of-theart methods, our method achieves more superior performance.

II. RELATED WORK
Image dehazing is a challenge ill-posed computer vision problem. In the past, various methods have been developed to solve the problem. Most of them depend on an elegant physical atmospheric scattering model. In these years, with the rising up of deep learning, data-driven end-to-end learning methods become popular. Experiences from image generation inspire researchers exploring non-physical new solutions. In this section, we focus on reviewing a portion of representative dehazing methods and its most recent developments. More comprehensive surveys can be referred to [9], [10].

A. PHYSICAL PRIOR-BASED DEHAZING METHODS
Among the prior-based algorithms, atmospheric scattering model was commonly accepted as the physical basis of haze removal. Great efforts were paid on estimating global atmospheric light A λ and transmission map t.
Fattal [11] assumed that the surface reflectance of object was uncorrelated with transmission map. They removed haze through estimating scene reflectivity by using independent component analysis. He et al. [2] discovered that patches of outdoor haze-free images often had low-intensity values in at VOLUME 8, 2020 least one channel. Based on the observation, they proposed a dark channel prior for estimation of haze concentration and transmission map. Zhu et al. [12] observed that the concentration of haze is positively related to the difference between brightness and saturation. They proposed a linear regression model on depth-map estimation for transmission map, and proposed a color attenuation prior for haze removal. Berman et al. [3] observed that colors of a haze-free image could be well approximated by hundreds of distinct color clusters in RGB space. These clusters formed haze-lines in hazy case. Based on the observation, they proposed a dehazing model to recover both distance map and haze-free image.

B. PHYSICAL LEARNING-BASED DEHAZING METHODS
With great success of convolutional neural network in computer vision, recent dehazing methods proposed to learn transmission map fully from data to avoid inaccurate estimation of physical parameters from a single image.
Cai et al. [5] proposed to construct a convolution neural network for transmission map estimation. Ren et al. [13] proposed to use a coarse-scale net for holistic transmission map prediction and a fine-scale net for results' local refinement. Li et al. [14] proposed to learn a newly defined transmission variable that integrates both classic transmission map and atmospheric light. Ren et al. [6] proposed a gated fusion network to learn intermediate confidence maps through which different handcrafted feature maps were fused to restore clear image from corresponding hazy one.

C. NON-PHYSICAL DATA-DRIVEN DEHAZING METHODS
Considering existing uncertain concentration of haze, physical scattering model is not always satisfied in practice. Directly exploring nonlinear regression between hazy image and its clear ground-truth is becoming a dominant trend, achieving superior performance through big-data learning.
Mei et al. [15] proposed an end-to-end self-encode dehazing network to directly replace the atmospheric scattering model. Chen et al. [8] proposed a gated context aggregation network to directly restore haze-free image. It could reduce grid artifacts by using a new smoothed dilation technique with negligible extra parameters. Xu et al. [16] proposed an encoder-decoder dehazing architecture with skip connections and instance normalization.
Due to similarities existing between image dehazing and image generation task, many researchers utilized generative adversarial networks to generate haze-free image. Zhang and Patel [4] proposed an end-to-end densely connected pyramid dehazing network based on adversarial learning. Qu et al. [7] proposed a generative adversarial network followed by a well-designed enhancer for direct dehazing.

III. OUR PROPOSED DEHAZING NETWORK
As we know, lower-level features have higher resolution and contain more details with less semantics, while higherlevel features have more semantic information with lower resolution and poorer detail perceptions. In many existing state-of-the-art methods, though cascade convolution blocks were employed for usage of multi-scale information, however, their commonly accepted strategies for haze removal haven't explicitly considered the problem on hazy details' recovery. Therefore, to highly preserve structural details and avoid color distortion on semantic objects, we propose an effective recurrent dehazing network with progressive fusion on hierarchical resolution informations.
In this section, we describe the proposed dehazing network in details. The idea is inspired from super-resolution and recurrent residual learning. As shown in Figure 2, it is roughly composed of four components: (1) multi-scale cascade convolution pipeline; (2) multi-resolution progressive back-traced dehazing pipeline; (3) intermediate-resolution dehazing output layers; (4) multiple loss computation layers. For convenience of description, without explicitly stated, all convolutions in this work contain ReLU activation by default and doesn't change spatial size of its input feature map through padding.

A. MULTI-SCALE CASCADE CONVOLUTION PIPELINE
The cascade convolution pipeline is employed to generate multiple feature maps of different scales. It contains an initial convolution module, successive irregular multi-scale convolutional modules and downscale-sampling modules.
The initial convolutional module consists of two convolution layers with kernel size 3 × 3 and stride 2. They aggregate informative features from observed hazy image and raise feature channels up to 16.
Since the existing random concentration of haze in reality, in order to highly preserve structural characteristics for semantic objects, a kind of irregular multi-scale convolution module is proposed by following ideas from ResNet [17] and Inception_v3 [18]. The structure of multiscale convolution module with irregular kernel shapes is as shown in Figure 3. Specifically, two convolutions are first used for feature transformation. Then multiple irregular convolutional kernels are employed in parallel to intently learn fine-grain information that might be smoothed by haze. Skip-connections are accepted for sufficient feature reuse. Considering that there might exist redundancy among different scale features, a convolution with kernel size 1 × 1 is further applied for comprehensive integration on the learned multi-scale features.
We denote an irregular multi-scale convolution module at level k = {1, 2, 3, 4, 5} as ''M k ''. The output channel size at level k is set 16 × 2 k−1 in this paper. Later, we will conduct ablation studies to verify the effectiveness of the proposed irregular convolution module.

B. MULTI-RESOLUTION PROGRESSIVE BACK-TRACED DEHAZING PIPELINE
The back-traced dehazing pipeline aims to progressively restore haze-free image through recurrently constructing intermediate results from low-resolution to high-resolution.
At the highest-level (lowest resolution), feature maps are transformed to learn initial dehazing result independently.   The ''FT'' block contains a channel attention (CA) block [19] and four following residual blocks. Their structures are described in Figure 4.
Multi-resolution residual fusion block (RF) is then proposed to recurrently integrate convolutional features of cascade scales with intermediate dehazed results. The structure of ''RF'' block is demonstrated in Figure 5. At each level n, informations from three branches are fused in the RF block. Convolutional feature maps of adjacent levels M n and M n+1 are fused for multi-resolution information reuse. Channel attention on M n is applied for semantic attention. It adaptively gives different importances on feature channels. The recurrent usage of intermediate F n+1 is to redundantly fuse higher-level dehazing result for lower-level fine-grain resolution processing. Information summed from these three branches are further learned by a residual block and output its integration Z n . The channel size of Z n is kept the same as channel size of M n . VOLUME 8, 2020

C. INTERMEDIATE-RESOLUTION DEHAZING OUTPUTS
In many dehazing methods, haze-removal results are often either smoothed or unclear. We argue that training dehazing network with single-scale output can not sufficiently ensure preservation of structural details.
In spired of ideas from super-resolution, we adopt multiple resolution training strategy. Specifically, at level n = {2, 3, 4, 5}, a haze-free result O n is obtained by transforming feature map Z n through two successive convolution layers. At the first level, since the channel size of Z 1 is relative small, one convolution layer is sufficient to output its dehazing result. These intermediate dehazing results of different resolutions are output for training, ensuring our network adaptive to various resolutions.

IV. EXPERIMENT
In order to demonstrate the effectiveness of our proposed model, in this section, we conduct comprehensive experiments on widely accepted public dehazing datasets. Metrics such as PSNR(Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index) are adopted for performance evaluation. ADAM [20] is employed as the optimizer. Our experiments are implemented by PyTorch on GPU GTX 1080Ti. Throughout our experiments, the training batch size is set 22. The related source code and pretrained model will be distributed on Github https://github.com/Joyies/dehaze.

A. DATASETS
We conduct our experiments on RESIDE dataset [10], O-Haze dataset [21] and I-haze dataset [22]. The RESIDE [10] dataset is a large scale synthetic dataset. All images have spatial resolution of 620 × 460 level. In this experiment, we train our network model on its subset -Indoor Training Set (ITS), and test comparison performance on its subset -Synthetic Objective Testing Set (SOTS).
The ITS dataset contains 1399 clear images and 13990 hazy images. The clear images are derived from public depth-map datasets like NYU2 [23] and Middlebury [24]. For each clear image, 10 synthetic hazy images are generated according to its corresponding depth-map. Specifically, given a clear image, a random atmospheric light A λ ∈ [0.7, 1.0] for each channel and a corresponding depth image d(x), according to transmission map t(x) = e (−β·d(x)) , a hazy image is generated through Equation 1. The value of β is randomly selected between [0.6, 1.8].
The SOTS dataset contains 500 matched Indoor synthetic test images and 500 matched Outdoor synthetic images.
I-Haze [22] and O-Haze [21] are dehazing benchmark with real hazy and haze-free images that were used in 2018 NTIRE Dehazing Challenge. The I-Haze dataset contains 35 image pairs of hazy and corresponding haze-free indoor images. The O-HAZE contains 45 different outdoor scenes depicting the same visual content recorded in haze-free and hazy conditions. Different from most of existing dehazing databases, hazy images in these two datasets have been generated using real haze produced by a professional haze machine and captured in a controlled environment. Therefore, both haze-free and hazy images are captured under the same illumination conditions.
We perform data augmentation on these images for training. The augmentation process is similar as following steps. Firstly, rotations and mirror flips are performed on images. The rotation angles are set R = {0; π/2; π; 3π/2}. The mirror flips are Mirror = {NoFlip; HorizontalFlip; VerticalFlip}. As a result, 12 variants are obtained for each image. Then, sliding window is employed to extract image crops of 256 × 256 size. The stride is set to be 128 pixels. The obtained patches are in consequence augmented for training.

B. EXPERIMENT RESULTS
In order to verify the effectiveness of our proposed network, we perform comprehensive comparisons with several state-of-the-art image dehazing methods. The methods to be compared are DCP [2], AOD-Net [14], DehazeNet [5], PFF-Net [15], GFN [6], EPDN [7], and GCANet [8]. Since the experiment setting up is standard, for convenience, the performances are cited from their published papers. The quantitative performance comparison are demonstrated in Table 1. Some visual comparisons on real images are shown in Figure 6.
From the experiment results, we can see that our method shows great superiority over the compared methods on RESIDE-SOTS dataset. From the visual comparisons both in Figure 1 and Figure 6, we can easily find that our method achieves more satisfied visual effects with much clearer results than other models.
Our model's dehazing results are more in line with real situation. For example, DCP suffers greatly color distortion. In terms of images in Figure 6, for the first and second row, our method can remove haze clearly, while other methods have more or less defects. For the third and fourth row, texture details of clouds in the sky are perfectly recovered by our method, while in other dehazing results, more or less details are lost. Some remove haze excessively. Especially the sky restored by EPDN and GCANet shows severe color infidelity on real ground-truth.    GCANet achieves the second best quantitative performance on RESIDE dataset in our experiments. Both GCANet and our method show great improvement gap than other state-of-the-art methods. Therefore, to better illustrate our model's superiority, we further compare our model with GCANet more carefully. Some examples are compared in Figure 7. From the comparisons on local textures, it is not difficult to find that our method is superior than GCANet in detail processing. We owe this advantage to our multiscale recurrently dehazing strategy which is inspired from super-resolution.
On both I-Haze and O-Haze datasets, our model has achieved relatively good performance too, as shown in Table 2 and Table 3. Since available training images are very few and training data after data augmentation is not sufficient, the improvement gap is not as satisfactory as we expected.

C. ABLATION STUDY
We have conducted ablation studies on RESIDE to verify the superiority of the proposed irregular multi-scale convolution module and multi-scale residual fusion (RF) module. Specifically, we remove the multi-scale irregular convolutions and channel attention layer respectively. The degraded modules are shown in Figure 8.
The ablation performances are illustrated in Table 4. From the ablation studies, we can clearly witness the effectiveness of the proposed modules.

V. CONCLUSION
In this paper, we have proposed an effective dehazing network inspired from super-resolution and recurrent residual learning. The proposed network progressively generates dehazing results of different resolutions in a back-traced reconstruction pipeline. In order to adapt to random concentration and distribution of haze, we have introduced a kind of irregular multi-scale convolution module to extract fine-grain local details smoothed by haze. We have also introduced a kind of multi-resolution residual fusion module for hierarchical feature reuse. The ablation studies have witnessed their effectiveness on final dehazing performance. We have compared our method with several popular and latest state-of-the-art methods on widely accepted public dehazing datasets. The experiments demonstrate that our method is more superior and can restore haze-free image with satisfactory highfrequency details and high-fidelity colors.