Fast Single Image Defogging With Robust Sky Detection

Haze is a source of unreliability for computer vision applications in outdoor scenarios, and it is usually caused by atmospheric conditions. The Dark Channel Prior (DCP) has shown remarkable results in image defogging with three main limitations: 1) high time-consumption, 2) artifact generation, and 3) sky-region over-saturation. Therefore, current work has focused on improving processing time without losing restoration quality and avoiding image artifacts during image defogging. Hence in this research, a novel methodology based on depth approximations through DCP, local Shannon entropy, and Fast Guided Filter is proposed for reducing artifacts and improving image recovery on sky regions with low computation time. The proposed-method performance is assessed using more than 500 images from three datasets: Hybrid Subjective Testing Set from Realistic Single Image Dehazing (HSTS-RESIDE), the Synthetic Objective Testing Set from RESIDE (SOTS-RESIDE) and the HazeRD. Experimental results demonstrate that the proposed approach has an outstanding performance over state-of-the-art methods in reviewed literature, which is validated qualitatively and quantitatively through Peak Signal-to-Noise Ratio (PSNR), Naturalness Image Quality Evaluator (NIQE) and Structural SIMilarity (SSIM) index on retrieved images, considering different visual ranges, under distinct illumination and contrast conditions. Analyzing images with various resolutions, the method proposed in this work shows the lowest processing time under similar software and hardware conditions.


I. INTRODUCTION
Fog or haze is a major source of unreliability in outdoor navigation systems, surveillance systems, and other outdoor computer vision applications, which usually is caused by atmospheric conditions [1]- [3]. Defogging can be defined as the removal of fog. Defogging algorithms have to deal with a trade-off between restoration quality, under different fog intensities and scenarios [4], [5], and time-consumption [1], [6].
The associate editor coordinating the review of this manuscript and approving it for publication was Jiachen Yang .
He et al. [7] proposed the use of Dark Channel Prior (DCP) in defogging tasks, which demonstrated that it is possible to remove fog with remarkable results [7]. Nevertheless, DCP has three inherent limitations: 1) high time-consumption, 2) artifact generation, and 3) sky-region over-saturation [8], [9]. Recent research has focused on reducing DCP processing time without losing restoration quality, and avoiding image artifacts [6], [10]. For instance, Pang et al. [11] used the DCP and the Guided Filter to avoid image artifacts by refining the obtained transmission map. Zhu and He [10] improved the image restoration by minimizing the energy function through a linear attenuation based on saturation and brightness. Nishino et al. [12]employed the maximum posterior probability to compute image depth more accurately. Recently, pixel clustering in the RGB space was used to reduce image artifacts during image defogging assuming that colors in a haze-free image are closely approximated by a few hundreds of distinct colors [13]. Furthermore, some optimization approaches have been proposed for obtaining enhanced dehazing results [14], [15]. For instance, some methods based on Artificial Intelligence (AI) have yielded to promising results through the use of a Multilayer Perceptron (MLP) [16]. On the other hand, Ren et al. [17] used a multi-scale Convolutional Neural Network (CNN), a coarsescale net to predicts a transmission map, and a fine-scale net to locally refine results. Cai et al. [18] proposed the DehazeNet that adopts the CNN deep architecture specially designed to embody image dehazing. Li et al. [19] proposed an end-to-end CNN based design (AOD-Net) for improving high-level tasks on hazy images. In [20], an image-to-image translation problem was proposed using a generative adversarial network, named Enhanced Pix2pix Dehazing Network, to generate a haze-free image (without the physical scattering model). Fu et al. [21] employed a convolutional network architecture, called multi-feature-based bilinear CNN, to mitigate halo effects, abrupt edges, and image noise. However, despite the progress in AI-based methods, DCP-based research has continued [22]. For example [8], [9], [23]- [26] are focused on reducing the over-saturated areas generated when DCP is applied over sky regions. These works improved DCP computation by adding an image segmentation stage or implementing quadtree techniques, but their main drawback is the relatively long processing time. Furthermore, research efforts have focused on improving the performance measurement of dehazing methods [1], [27]- [29]. The main difference of the method proposed in this work regarding previous dehazing DCP/Fast Guided Filter (FGF) [5], [7], [30], [31] and sky-detection based methods [8], [9], [23]- [25] is the effective combination of DCP and FGF with local Shannon entropy, resulting in a fast and efficient method with remarkable results on outdoor-image dehazing.
This research aims to recover the latent sharp image from its hazy version, overcoming the DCP limitations. The proposed-method main contributions are: • The improved performance of existing dark channel prior (DCP) dehazing algorithms based on a robust skydetection-segmentation and the Fast Guided Filter.
• A robust sky-detection-segmentation process, based on DCP and local Shannon entropy.
• Faster computation speed and competitive performance than recent deep learning based dehazing algorithms under similar conditions of software and hardware.
As a result, the proposed method reduces typical DCP artifacts, achieving a better recovery on sky regions than the techniques in reviewed literature, reducing over-saturated areas, and reaching a proper response with an adequate trade-off between different levels of haze and computation time.
Obtained results and performed analyses demonstrate that the proposed method improves the image recovery process by reducing the typical DCP artifacts, minimizing over-saturated areas and generating a robust response to different levels of haze with less computation time.
The remaining of the document is organized as follows: Section II introduces the theoretical background of involved topics. Section III is devoted to the proposed method and used data. Obtained results and their corresponding analysis are presented in Section IV. Finally, some conclusions are given in Section V.

A. ATMOSPHERIC DICHROMATIC MODEL
Image-degradation by haze is caused because of particles in the atmospheric medium absorb and scatter light [35]. The most accepted model for atmospheric degradation can be expressed as follows [35]: where y i ∈ R 3 represents the RGB foggy or haze image, x i ∈ R 3 is the RGB fog-free (sharp) image, t i ∈ R is the transmission medium, i ∈ N represents the pixel position of each variable, and a ∈ R 3 represents the atmospheric light color. Assuming that the wavelength is independent of the atmospheric scattering, the transmission t i can be expressed as follows [35]: where β is a homogeneous attenuation coefficient and d i is the scene depth at each pixel i. Based on (1), it is possible to retrieve an estimatex i of the fog-free image x i from the foggy image y i as follows [35]: DCP is one of the most widely-used methods for computing these unknown variables t i and a in order to dehaze the image [7], [36].

B. THE DARK CHANNEL PRIOR
The dark channel of image y i is defined as [36] dc i = min i∈ k min c∈{r,g,b} where the c-th color component of y i and a are y c i and a c respectively, k is a square patch of size s × s centered in a pixel k.
According to [36] and [35], it is possible to establish the relation between a fog-free image and its dark channel. For the case of non-sky regions, the dark channel usually has low-intensity values , i. e.
From the previously established relation, He et al. [7] proposed that the corresponding transmission t i can be computed as where 0 ≤ ω ≤ 1 is the level of the desired restoration. The atmospheric light color a is computed by selecting the brightest pixel of y i , from the subset of dc i composed of the 0.1% of the brightest values.

1) RELATION BETWEEN dc i AND DEPTH d i
Substituting (2) in (6), with w = 1 we can establish the relation between dc i and d i as as a consequence, when the distance d i → 0, dc i = 0, and when d i → ∞, dc i = 1. DCP-based methods usually have two main drawbacks [1]: 1) Generation of saturated sky-regions i.e. sky regions with unreal colors as shown in Fig. 1(c). 2) The restoration or recovery process has a trade-off between the retrieved-image quality (produced number of artifacts) and processing time. Fig. 1(d) shows an example of the visual artifacts that can be generated.

C. LOCAL SHANNON ENTROPY
The local Shannon entropy E k on a square window k is defined as where L is the number of possible values for a pixel of E k (in a grey-scale image L equal to 256), P j = n j s×s is the probability that the grey-scale value j appears in k , which is an s × s square window centered in the pixel k. n j is the number of pixels with the value j in k .

D. THE FAST GUIDED FILTER
The Fast Guided Filter (FGF) [30] is an edge-preserving linear smoothing filter defined as where q i is the filtering output image and I i is the guidance image. i is the position of a pixel and k is the index of a local square window with size s × s. a k and b k are linear coefficients constants in k . Given the filtering input image p, the filter minimizes the reconstruction error between p and q as where µ k and σ k are the mean and variance of I in k ,p k is the average of p in k , and is a regularization parameter controlling the degree of smoothness.

III. METHODS AND DATA A. PROPOSED METHOD
The proposed method is based on two assumptions about sky regions in hazy outdoor images: 1) Sky region distance d i → ∞ from the capture device; thus, the transmission t i described in (2) takes the value t i = e −∞ , i.e., t i → 0. Based on (7), it can be concluded that dark channel dc i → 1 in sky regions, as shown in Fig. 2  Based on these two assumptions, a new two-stages method that computes an initial and an improved dark channel maps, is proposed. The aim of the first stage is to compute initial values of dc 0 i , a 0 , and sky region mask s i . The second stage is devoted to obtain an improved dark channel map dc 1 i , atmospheric light a 1 , and a refined transmission map t 1 i using the sky detection-segmentation. Fig. 3 presents a flowchart of the proposed method, and Fig. 4 provides an stage-tostage visual example of its application. The detection and segmentation of sky regions are explained in Section III-B. Each stage of the proposed method is detailed as follows: • Stage 1. From an input image y i , Fig. 4(a), it is possible to obtain its corresponding dark channel dc 0 i with an initial atmospheric light a 0 and a sky mask s i as follows: 1) Estimate the atmospheric light a 0 as in [36], Fig. 4(b). 2) Compute the dark channel dc 0 i using a 0 and (4), Fig. 4(c).
3) Detect, segment, and obtain the sky region mask s i using local Shannon entropy and dark channel criteria described in Section III-B, Fig. 4(d).
• Stage 2. Compute an improved atmospheric light a 1 and transmission map t 1 i by using the detected sky region s i . Finally, apply the scattering model as follows: 4) Estimate the atmospheric light a 1 based on the detected sky region s i , as the average of the pixels in the input image y i that belong to the sky region s i . If a sky region is not detected, the value of a 1 is assigned as a 1 = [1 1 1], as shown in Fig. 4(e). 5) Compute dark channel dc 1 i using a 1 and (4), as shown in Fig. 4(f).

6) Compute a rough transmission t 0
i based on the dark channel dc 1 i as follows: Please refer to Fig. 4(g). 7) Compute a final refined transmission t 1 i using the FGF (see Section II-D) as follows: Please refer to Fig. 4(h). 8) Retrieve the restored imagex i by applying the scattering model using the refined transmission map t 1 i and the atmospheric light a 1 through (3), as shown in Fig. 4(i).

B. SKY REGION DETECTION-SEGMENTATION
The sky detection-segmentation flowchart process is presented in Fig. 5. 2 This process is divided into two stages: detecting and segmenting a baseline sky region, and refining or improving the sky region, as depicted in Fig. 6. The stages of this process are described as follows: • Stage 1. Detect and segment a baseline sky region.
1) The input image y i is transformed from its RGB color model into the corresponding CIELab color spaceȳ i in order to obtain an accurate gradient information [38].
where Gx i , Gy i are the Sobel operators, defined as: 3) The local Shannon entropy E i is computed over G i using (8), as shown in Fig. 6(c).

4) The local Shannon entropy map is binarized by
assuming that E i → 0 on sky regions, as shown in Fig. 6(d).
5) The dark channel map is binarized considering that on sky regions dc 1 i → 1, as depicted in Fig. 6(e) and 6(f).d 6) A baseline sky segmentation S i is obtained by combiningĒ i andd c 1 i through the AND logical operator (∧), (19), as shown in Fig. 6(g).
• Stage 2. Refine and improve sky region S i through morphological operations. 7) In order to obtain the basic structure of S i , its morphological skeleton sk 0 i must be computed, as depicted in Fig. 6(h). 8) Seeds sd i are obtained for a region growing process by computing and combining the skeleton branches and endpoints through the OR logical operator (∨), [39] of sk 0 i as shown in Fig. 6(i).

9)
Compute image e i by binarizing G i using an edge threshold as illustrated in Fig. 6(j).
10) In order to avoid possible discontinuities in e i , a morphological dilation is applied on e i using a structural element B, as displayed in Fig. 6(k).
12) Region growing [40] is used for computing the accurate sky region. Region growing examines neighboring pixels of initial seed points sd i stopping when an edge in de i is found, as depicted in Fig. 6(l), 6(m), and 6(n).
13) False sky regions are removed by applying a morphological opening operation [41], as shown in Figure 6(o) Please see Fig. 6(o). The parameters of the proposed method were tuning empirically, using the images in the dataset [42]. The code of the proposed method can be found in. 3

C. SKY DETECTION-SEGMENTATION STAGE VALIDATION
Since the proposed method aims to diminish haze effects on outdoor images using a sky detection-segmentation stage, it is necessary to measure the performance of this stage; hence, the dataset in [43] was used, which is a 60-images subset with the corresponding ground-truth from Caltech Airplanes Side dataset [44]. A Jaccard similarity coefficient was used for measuring the segmentation quality, which is a commonly used metric in the literature [45] and it is defined as the size of the intersection between two finite sets divided by the size of their junction, multiplied by 100. That is: The Jaccard index defines a similarity percentage between 0 and 100. The median Jaccard index reached by the detection-segmentation stage in this work was 96.24 %, which validates the applied method. Fig. 7 shows two examples from the used database and the corresponding results obtained through the proposed approach.

D. DATA
The quantitative evaluation of the proposed algorithm is performed using the following datasets, in order to have a wide range of images for testing. It is worth it to notice that these datasets were used in recent works from reviewed literature.

• Hybrid Subjective Testing Set (HSTS) from a Realistic
Single Image Dehazing (RESIDE) dataset composed of ten real-world images [33].
• A HazeRD dataset composed of 14 real-world images, in which the haze was simulated with different visual ranges (0.05, 0.1, 0.2, 0.5, and 1 Km) [34]. Only outdoor images from the above datasets are used to evaluate the proposed method, allowing an accurate simulation of fog with realistic parameters, which are justified by the scattering theory. The parameter ω from (6) used in the experiments of our method is 0.95.

E. QUANTITATIVE EVALUATION 1) PERFORMANCE METRICS
Full reference metrics PSNR and SSIM, as well as the no-reference metric NIQE were used for evaluating and comparing quantitatively the proposed-method performance. These metrics are described as follows: • The Peak Signal-to-Noise Ratio (PSNR) is a quantitative measure about the quality of a reconstruction. This is one of the most widely used metrics in dehazing literature [46]. The mean square error (MSE) between two m × n monochromatic images I and J is required to obtain the PSNR metric, as follows: hence, the PSNR is given by PSNR = 10 log 10 MAX 2 where MAX = 2 B − 1 and B is the number of bits used in the image. The higher the PSNR value, the better the restoration.
• Structural Similarity (SSIM) Index is a perceptual image-similarity metric, alternative to the mean square error MSE) and PSNR, to increase correlation with subjective assessment. For an original and a reconstructed image, I and J , respectively, SSIM is defined as where µ, σ and σ IJ , are the mean, the variance, and covariance of the images, respectively.
• Naturalness Image Quality Evaluator (NIQE) [47] is a no-reference, image-quality score, which is based on the construction of ''quality-aware'' features and their adaptation to a Multivariate Gaussian (MVG) model. The quality-aware characteristics are derived from a Natural Scene Statistics (NSS) model. Quality is expressed as the distance between the MVG and NSS elements extracted from the assessed image, and the corresponding MVG quality characteristics obtained from the natural image corpus.

IV. RESULTS AND ANALYSIS
Quantitative and qualitative tests are performed to evaluate and compare the efficiency of the proposed method, using metrics as in [11], [48], and [16], [49] for its assessment. The quantitative evaluation is performed through the commonly used PSNR value and SSIM index to determine the recovered-image quality as in [5]- [7], [19], [33]. All experiments and tests were performed on a PC with 2.6 GHz Intel Core i7-6700HQ, Nvidia GTX 950m GPU and 16 GB of RAM. All methods except the one proposed by Qu et al. [20] are Matlab2018-coded and run using only the CPU. The Qu et al. [20] method runs on Python 3 language and Caffe framework using the GPU (4 GB RAM). Fig. 8 shows 10-real world images from the HSTS dataset; where column (a) represents the ground-truth images, column (b) represents synthetic fogged images, and (c) to (k) depict the results from different defogging algorithms: (c) He et al. [7], (d) Pang et al. [11], (e) Zhu and He [10], (f) Berman et al. [13], (g) Ren et al. [17], (h) Cai et al. [18], (i) Li et al. [19], (j) Salazar-Colores et al., [16], and (k) Qu et al. [20]. Columns (l) shows the result from the proposed method. From this figure, it can be observed that the algorithms (c), (e), and (g) show unreal colors in the retrieved image, mainly in the sky region; moreover, methods (e), (f), (g) and (i) present some visual artifacts in zones with different depth values. Fig. 9 shows 5 out of 500 real-world outdoor images from the SOTS dataset. These ground-truth images are depicted in column (a). Fog-affected images are shown in column (b), and the image-restoration results are listed in subsequent columns as follows: (c) He et al. [7], (d) Pang et al. [11], (e) Zhu and He [10], (f) Berman et al. [13], (g) Ren et al. [17], (h) Cai et al. [18], (i) Li et al. [19], (j) Salazar-Colores et al. [16], and (k) Qu et al. [20]. Column (l) shows the result from the proposed method.

A. QUALITATIVE EVALUATION
These results are consistent with those shown in Fig. 8. Similar changes can be seen in the color of sky regions when methods (c), (e), (f), and (g) are used; furthermore, some artifacts are present by using methods (e), (f), (g), and (i).

B. QUANTITATIVE EVALUATION
As described before, SSIM index is computed over the restored images shown in Fig. 8, obtaining the corresponding index values given in Table 1.
The results are compared against the nine methods, demonstrating that the proposed method can achieve a SSIM index of 0.9, which is the second one average index of the compared methods, only Qu et al. [20] shows better performance. Note that Cai et al [18]. and Li et al. [19] use a deep-learning technique, which is a state-of-art method, whereas the other methods show lesser SSIM index values. The SSIM index  results appear to be in good agreement with the visual evaluation presented in Fig. 8 and with the results shown in Fig. 10.
The PSNR computations using the 10 HSTS images overall methods are presented in Table 2. These results revealed that the proposed method has a significant advantage over the other methods; only the method reported by Cai et al. [18] has a similar PSNR value. The obtained experimental results corroborate that the method proposed in this work provides a similar or even a better outcome than those in the state-ofthe-art, including artificial intelligence methods, considering PSNR value and SSIM index. The SSIM index was computed for over 500 images from SOTS to statistically evaluate and compare the proposed method against the other nine approaches. The results are presented using a box-plot. See Fig. 10. The box-plot is divided into four parts (quartiles), where the red central line represents the median value. By comparing the median values of the SSIM index, it is possible to note that the proposed method is one of the best three methods (higher SSIM). By comparing the interquartile range (the box), it is possible to see that our method returns the most compact interquartile range, meaning that it is robust. Additionally, this method   can generalize better than the AI-based methods. Finally, those points outside of the whiskers represent outliers. 4 4 Outlier points are defined as the points outside 1.5 times the interquartile range above the upper quartile and below the lower quartile.  The performance of classical DCP and FGF approach, considering the SSIM index, is consistently lower than the proposed method.   Similarly, the PSNR computation over the 500 images from SOTS dataset is illustrated through box-plots in Fig. 11. In general, it is possible to see that the PSNR range value is higher than the SSIM index. The proposed method has the best median value 24.27. By comparing the inter-quartile range, it is possible to see that Zhu et  Another dataset used for the quantitative evaluation was the HazeRD dataset [34]. The quantitative evaluation is based on the SSIM index and PSNR over the same outdoor images with 5 different visual ranges: 0.05 km, 0.10 km, 0.20 km, 0.50 km, and 1 km, where 0.05 km is the highest fog density and the lowest visual range. Fig. 14 presents an example of an image with different fog intensities. Fig. 12 presents the SSIM index over five visual ranges: 0.05 km, 0.10 km, 0.20 km, 0.50 km, 1 km, and the average, using the HazeRD dataset. The point-up triangle marker ( ) represents the maximum SSIM index over all methods at certain visual range. In contrast, the point-down triangle marker (∇) represents the minimum index over all methods. Fig. 14 reveals that our method has outstanding performance in the visual ranges of 0.20 km, 0.50 km, 1 km, and the average value. These results suggest that the defogging methods, including the deep-learning techniques (e.g. Cai et al. [18], Li et al. [50], and Ren et al. [17]), have a lower SSIM performance with different illumination and contrast conditions. Although the SSIM performance of our method is not the best at 0.05 and 0.10 km, it exhibits a superior SSIM performance to the remaining methods.
Average PSNR values for the retrieved images from the HazeRD dataset are shown in Fig. 13, over the five visual ranges: 0.05 km, 0.10 km, 0.20 km, 0.50 km, 1 km. From this figure, the proposed method shows the best performance with the highest PSNR value, indicated by the pointing-up triangular marker (( )), at 0.50 km, 0.20 km and 1 km.
Four real-world images (Fig. 15) widely used in the literature were employed to evaluate the methods performance with the no-reference metric NIQE. From obtained results, the proposed method achieves the second-best outcome, being outperformed just by the Zhu et al. method; however, this approach shows an inferior performance than the proposed method considering PSNR and SSIM metrics. Finally, the AI-based methods have worse performance than the classical methods in all cases.

C. PROCESSING TIME
In Table 4, a time-consumption comparison is performed considering all methods with different image resolutions (640 × 480, 800×600, 1024×768, 1280×720, 1920×1080). From this table, it is worth it to notice that the proposed method has the lowest processing time in comparison to all the other approaches with similar software and hardware conditions, considering all image resolutions. The Qu et al. method [20] a faster processing time; however, it could not be considered a fair comparison since, the method is implemented on Python, and executed using GPUs. Moreover, it is important to notice that with the maximum image size (1920 × 1080) considered in this work, this method deliver a memory error; therefore, it can be considered that a very important feature of the proposed method is its capability of being implemented in conventional embedded systems, making it suitable for being used in real-life, computer-vision applications to carry out defogging tasks online.

V. CONCLUSION
Defogging is a significant process of computer vision, which has to consider several factors like restoration quality, different fog-intensity scenarios (visual ranges), and processing time. DCP is usually employed for this task; however, it suffers from high time-consumption, artifact generation, and sky-region over-saturation; hence, resent research has focused on improving these features in dehazing methods. In this research, the recovery of a sharp image from its hazy version, eluding the DCP limitations, is aimed. The proposed method performance is assessed through a qualitative and quantitative analysis applying the commonly used SSIM and PSNR metrics over retrieved images from more than 500 pictures of the HSTS, SOTS and HazeRD databases, and comparing the obtained results against 9 recently proposed approaches in reviewed literature; from there, it is demonstrated that the proposed method falls into the three techniques with the highest SSIM index; furthermore, the it has the highest PSNR median value among all considered defogging approaches. On the other hand, considering different visual ranges, the proposed method shows a superior performance at least in three out of five distinct ranges, than all the other defogging approaches under different illumination and contrast conditions. In addition, the proposed method achieved the second-best performance utilizing the no-reference NIQE metric. Finally, the proposed method has the lowest processing time (under similar conditions of software and hardware), considering different image resolutions, compared to all examined algorithms which is a quite relevant for many computer vision applications.  Irapuato-Salamanca de la Universidad de Guanajuato, where he is currently a Titular Professor. His current research interests include digital image and signal processing, artificial intelligence, robotics, smart sensors, real-time processing, mechatronics, FPGAs, and embedded systems. He has authored more than 50 articles in international journals and conferences. He is also a National Researcher with the Consejo Nacional de Ciencia y Tecnologia, Mexico. ULISES CORTÉS is currently a Full-Professor and Researcher of the Universitat Politècnica de Catalunya (UPC). He is the Scientific Coordinator of the High-Performance Artificial Intelligence Group, Barcelona Supercomputing Center (BSC). He works on several areas of artificial intelligence (AI), including knowledge acquisition for and concept formation in knowledge-based systems, as well as on machine learning and in autonomous intelligent agents. He advised 22 Ph.D. Thesis, two of them have been awarded with the ECCAI AI Dissertation Award, and more than 20 Master Thesis in the field of artificial intelligence and has more than 60 articles in international journals and more than 150 papers in conferences and workshops. He is a member of Sistema Nacional de Investigadores (SNI-III), CONACyT, México. In 2018, he was awarded as the Mexicano Distinguido 2018, for his contributions to artificial intelligence by the Instituto de los Mexicanos en el Exterior and Secretaría de Relaciones Exteriores de México. VOLUME 8, 2020