Singe Image Dehazing With Unsharp Masking and Color Gamut Expansion

Image dehazing is a fundamental problem in computer vision and has hitherto engendered prodigious amounts of studies. Recently, with the well-recognized success of deep learning techniques, this field has been dominated by deep dehazing models. However, deep learning is not always a panacea, especially for the practicalities of image dehazing, because high computational complexity, expensive maintenance costs, and high carbon emission are three noticeable problems. Computational efficiency is, therefore, a decisive factor in real-world circumstances. To cope with this growing demand, we propose a linear time algorithm tailored to three primitive parts: unsharp masking (pre-processing), dehazing, and color gamut expansion (post-processing). The first enhances the sharpness according to the local variance of image intensities. The second removes haze based on the improved color attenuation prior, and the third addresses a residual effect of color gamut reduction. Extensive experimental results demonstrated that the proposed method performed comparatively with popular benchmarks, notably deep dehazing models. With such a comparative performance, the proposed method is still fast and efficient, favoring real-world computer vision systems.

(5) 67 The total irradiance E t (d, λ) is the sum of E a (d, λ) and intensities. Similarly, J(x), A, and t(x) can substitute for 77 λ S 0 F λ , λ S 0 , and exp[−β sc d(x)], respectively. These three 78 are referred to as the haze-free intensities, the atmospheric 79 light, and the transmittance. The following is, therefore, the 80 simplified form of (6), and it commonly serves as the degra-81 dation model in image dehazing: In (7), the hazy intensities I captured by the camera are 91 the only data available, whereas the remainder is unknown. 92 Accordingly, for a given set of A and t estimated, (7) yields a 93 different solution of J and violates the uniqueness condition. 94 This issue renders image dehazing ill-posed and brings about 95 prodigious amounts of relevant studies. Recently, researchers 96 have adopted deep learning techniques to address the ill-97 posedness, as witnessed by [6], [7], [8], [9], [10], and [11]. 98 Despite such excellent performances as those deep dehazing 99 models have delivered, they have been linked with several 100 problems in real-world execution, such as high power con-101 sumption, high carbon emission, and expensive maintenance 102 costs [12]. 103 Furthermore, for a low-level vision task such as image 104 dehazing, deep neural networks (DNNs) are often overkill, 105 as discussed in [13] about deep learning and traditional com-106 puter vision techniques. In fact, they fit well with high-level 107 cognitive tasks, such as object classification, recognition, and 108 localization. The data-driven performance of DNNs is also 109 more of a hindrance than a help because abstract features 110 learned by DNNs are specific to the training dataset, whose 111 construction is highly cumbersome for statistical reliability. 112 Thus, the learned features may be inappropriate for images 113 different from those in the training set, lowering the perfor-114 mance in general. 116 This section briefly reviews influential works in the literature 117 based on the categorization in [14], where algorithms have 118 been divided into three categories according to their data 119 exploitation. The first two, image processing and machine 120 learning, were typified by low-level hand-engineered image 121 features discovered through statistical analysis of real-world 122 images. The last category, deep learning, exploited the pow-123 erful representation capability of DNNs to learn high-level 124 data-driven image features. This categorization could give 125 useful insights into (i) the complexity of dehazing algorithms 126 and (ii) subjective/objective preferences for dehazed images.

II. RELATED WORKS
highly challenging, the distilled knowledge from relevant 163 image datasets may improve the dehazing performance.  From a more general perspective, Tang et al. [20] investi-174 gated four haze-relevant features, including the dark channel, 175 hue disparity, locally maximum contrast, and locally maxi-176 mum saturation, at multiple scales and found the following.

177
Although the dark channel was the most informative feature 178 (as discovered by He et al. [15]), other features also con-179 tributed in a complementary manner. Hence, Tang et al. [20] 180 devised a framework for inferring the transmittance from 181 different haze-relevant features. In [20], they employed a 182 random forest regressor for ease of analysis and demonstra-183 tion, albeit with slow inference time. They also discussed 184 the importance of post-processing and presented two post-185 processing options: adaptive atmospheric light estimation and 186 adaptive exposure scaling.

188
The aforementioned approaches require significant efforts in 189 seeking (i) a good feature (or a set of features) and (ii) an 190 efficient inference scheme. However, there is no guarantee 191 that they will always perform as intended in all circumstances. 192 As a result, deep learning has been applied to image dehazing 193 to improve flexibility. Given a reliable training dataset, DNNs 194 can estimate the transmittance and atmospheric light with 195 high accuracy because they allow learning and augmenting 196 image features from low to high levels of abstraction. For 197 example, Cai et al.

201
The powerful learning ability of DNNs or deep CNNs also 202 allows them to infer the dehazed image directly from the hazy 203 input. In this direction, the encoder-decoder network has been 204 proved highly efficient for end-to-end learning [22], [23]. 205 In addition, some well-known image processing schemes 206 can be applied to deep learning to improve performance, 207 as witnessed by multi-scale image fusion [22] and domain 208 adaptation [23]. Also, inspired by the human brain that 209 knowledge learned from doing a particular activity may 210 benefit another activity, joint learning is a promising direc-211 tion, typified by [24], where image dehazing benefits object 212 detection.

213
Some state-of-the-art deep dehazing networks developed 214 recently include GridDehazeNet (GDN) [25], multi-scale 215 boosted dehazing network (MSBD) [26], you only look your-216 self (YOLY) [27], and self-augmented unpaired image dehaz-217 ing (D 4 ) [28]. GDN is a supervised network and comprises 218 three modules. The pre-processing module applies different 219 data-driven enhancement processes to the input image. The 220 backbone module then fuses the results based on the grid net-221 work, where a channel-wise attention mechanism is adopted 222 to facilitate the cross-scale circulation of information. Finally, 223 the post-processing module remedies residual artifacts to 224 improve the dehazing quality.    As haze is depth-dependent, it is generally smooth except 279 at discontinuities. Hence, it can be viewed as a low-frequency 280 component that obscures fine details in the captured image. 281 This pre-processing step then enhances these obscured details 282 by adding the scaled Laplacian image to the original, as Fig. 4 283 shows. Because the sharpness enhancement only applies to 284 the luminance channel, it is necessary to convert between 285 RGB and YCbCr color spaces using (8) and (9) from [32]. 286 In (8), Y , Cb, and Cr are the luminance, blue-difference 287 chroma, and red-difference chroma components of the input 288 image I, and Y e denotes the output luminance with sharpness 289 enhanced. In (9), I e ∈ R H ×W ×3 , or I e = {I R e , I G e , I B e }, is the 290 output RGB image corresponding to {Y e , Cb, Cr}. Next, the Laplacian image is obtained by convolving the 295 input luminance Y with the Laplacian operator ∇ 2 , whose 296 definition is in (10). Meanwhile, the local variance v of 297 luminance intensities is calculated as the expected value of 298 the squared deviation from the mean, as (11) illustrates. The 299 symbol denotes the convolution operator, and U k is an all-300 is an odd integer.
As demonstrated at the bottom-left of Fig. 4, the scaling 305 factor α is a piece-wise linear function of the local variance are user-defined parameters for fine-tuning. Hence, the output 308 luminance Y e is obtained by (13), which scales the Laplacian 309 image and adds it back to the input luminance. The YCbCr-to-310 RGB conversion in (9) then yields the output RGB image I e . The scene depth d is inferred from saturation S and brightness 329 V using a linear function below: The scene depth d is now available, but there is no guar-352 antee that it will be mostly smooth except at discontinuities. 353 Consequently, the refinement block applies a modified hybrid 354 median filter [35] to the scene depth to impose edge-aware 355 smoothness. Given the refined scene depth d r , the transmit-356 tance is obtained through t = exp[−β sc d r ], with β sc = 1. 357 Generally, most image dehazing algorithms in the literature 358 adopted two fixed limits to constrain t, expressed as t 0 ≤ 359 t ≤ 1, with t 0 being a small positive number. The following, 360 on the contrary, describes two NBP constraints for posing an 361 adaptive lower limit on t.  (7), the dehazed image (or, equivalently, scene radi-364 ance) J can be obtained as: The first NBP constraint, J ≥ 0, is relatively evi-367 dent because it can reduce the number of black pixels that 368 occur after dehazing due to underflows. Hence, it is derived 369 from (15) that: where min c∈{R,G,B} (·) denotes a channel-wise minimum 372 operation.

373
The second NBP constraint is inspired by [33] that the local 374 mean intensity of J must be greater than or equal to its local 375 standard deviation, as expressed by: where Y p represents the luminance channel of J, q is a pos-378 itive number to adjust the strictness, and mean ∀y∈ (x) (·) and 379 std ∀y∈ (x) (·) denote the mean and standard deviation filters, 380 respectively, with (x) being a square patch centered at x. 381 It is worth noting that Y p is related to Y e through (15), and 382 this relation can be exploited to approximate the two terms 383 of (17) as follows:  (17) to obtain the second NBP constraint, as below: Let t NBP 1 and t NBP 2 denote expressions on the right-hand 392 side of (16) and (20). The NBP constraint t NBP is then 393 expressed as: 395 where max(a, b) returns the greater number between a and 396 b. Thus, the transmittance t is constrained between t NBP and 397 unity; that is, 399 and the scene radiance J is recovered using (15). adaptive limit point (ALP) to constrain the range scene-423 wisely. Given the luminance channel Y p of J, ALP is cal-424 culated from the meanȲ p and the cumulative distribution 425 function CDF of Y p as follows: where L CDF k denotes the luminance value at which 429 CDF(L CDF k ) = k, with k ∈ R and 0 ≤ k ≤ 1.

430
It is worth noting that over-enhancement is avoidable by 431 assigning higher gains to smaller luminance values, and ALP 432 can be exploited for that purpose, as (24) shows:  The first block of color space conversion in Fig. 6 460 proportional to the ratio between Y f and Y p , as expressed by 469 the color gain g 3 below:

471
Moreover, an additional weight g 4 is adopted to maximize 472 the TMQI, and its expression in (32) is determined through 473 experiments.

498
This section presents a comparative evaluation of the 499 proposed method against nine state-of-the-art benchmarks 500 selected from the three image dehazing categories discussed 501 in Section II. These nine are proposed by He et al. [15], 502  Above all, it can be observed that nine benchmark meth-518 ods are ineffective in recovering image details, as witnessed 519 by the traffic light and the man's face in the red-cropped 520 and blue-cropped regions. This common drawback can be 521 explained as follows. Dehazing is fundamentally the sub-522 traction of haze from the input image, and the subtraction 523 degree depends on the transmittance. However, estimating a 524 transmittance with rich details is challenging because spa-525 tial filtering usually attenuates high-frequency information. 526 Although an outstanding guided filter [41] has been adopted 527 to refine the transmittance estimate, it is noted that the best 528 guidance image in single image dehazing methods is the input 529 image itself. Accordingly, the lack of an informative guidance 530 image constrains the refinement.

531
The proposed method, in contrast, effectively removes 532 haze while enhancing the sharpness and the color gamut, 533 as witnessed by the man's face and the facial skin color in 534 the blue-cropped region. This definite advantage is attributed 535 to the pre-processing (unsharp masking) and post-processing 536 (color gamut expansion) steps. The intermediate results in 537 Fig. 8 show that the former has improved image details 538 to such an extent that the contours of distant objects have 539 become noticeable. Meanwhile, the latter, as claimed, has 540 successfully remedied the post-dehazing problem of color 541 gamut reduction. 542 Fig. 9 shows more qualitative comparison results on real-543 world hazy images. It can be observed from the first row 544 that the result by He et al. [15] is satisfactory, albeit with 545 the post-dehazing false enlargement of the train's headlight. 546  Similar observations also emerge from the second to the 561 fourth rows of Fig. 9. The dark channel assumption of the 562 method of He et al. [15] does not hold for the sky region, 563 causing severe color distortion in the second row. As with 564 the interpretation of results in the first row, the method of 565 Tarel and Hautiere [17] suffers from halo artifacts, and the 566 method of Zhu et al. [19] suffers from a loss of dark details. 567 Results by deep-learning-based methods, on the contrary, 568 do not exhibit any unpleasant artifacts, which is attributed 569 to the powerful representation capability and the flexibil-570 ity of CNNs. Compared with these benchmarks, the pro-571 posed method exhibits an almost comparative or even better 572 performance.

574
This section presents an objective assessment of the pro-575 posed method against nine benchmarks on public image 576 datasets. It is worth noting that there are numerous met-577 rics for image quality assessment, such as the conventional 578 peak signal-to-noise ratio (PSNR), the structural similarity 579 VOLUME 10, 2022     Table 4 summarizes the processing time of ten methods on 665 different image resolutions, ranging from VGA (640×480) to 666 8K UHD (7680×4320). As source codes of nine benchmarks 667 are publicly available, we used them and adopted the param-668 eter configuration provided by their authors. This measure-669 ment was conducted in MATLAB R2019a and Python 3.9.9 670  The proposed method is ranked second overall, notably 713 without batch processing or parallel computing but simply 714 sequential computing. Although it is slower than the fastest 715 model of Liu et al. [25], Tables 2 and 3 demonstrate that 716 it outperforms this model under FSIMc and TMQI. Also, 717 compared with the fast sequential method of Zhu et al. [19], 718 it achieved approximately 2.2× speedup for two main rea-719 sons. Firstly, the proposed method skips the atmospheric 720 light estimation. Secondly, it only makes six calls to three 721 different O(N ) spatial filters, including a call to a 3 × 3 722 Laplacian filter in (13), four calls to the box filter in (11) 723 and (20), and a call to the modified hybrid median filter in 724 the scene depth refinement step. On the contrary, the method 725 of Zhu et al. [19] needs to estimate the atmospheric light 726 and make eighteen calls to the box filter inside the fast-727 guided filter. This difference accounts for a big gap between 728 the processing time of the two methods. Hence, the defi-729 nite advantage of a low computational cost is attributed to 730 the elegant partition of image dehazing into three essential 731 steps: pre-processing, dehazing, and post-processing, where 732 each can be implemented using traditional computer vision 733 techniques.

735
The proposed method consists of three steps that operate in 736 a complementary manner. To verify the individual contribu-737 tion of each step, we conduct ablation studies by consid-738 ering three variants of our algorithm. They are created by 739 dropping the pre-processing step, the post-processing step, 740 and both, respectively. Table 5