Edge-Aware Interactive Contrast Enhancement

Contrast enhancement is required in many applications. Many studies have been conducted to perform contrast enhancement automatically, but most of them do not consider various personal preferences for contrast. We propose an edge-aware interactive contrast enhancement algorithm to enable a user to adjust image contrast easily according to his or her preference. A user provides a parameter for controlling the global brightness and two types of scribbles to darken or brighten local regions in an image. Then, the proposed algorithm generates an edge-aware mask by propagating the scribbles to nearby regions and restores an enhanced image through a neural network, called e-IceNet. The user can provide annotations iteratively until he or she obtains a desired image. We train e-IceNet on guidance images to yield reliable results for diverse input images. We also propose two differentiable losses to train e-IceNet effectively and reliably. Extensive experiments demonstrate that the proposed e-IceNet is capable of allowing users to enhance images satisfactorily with simple scribbles, as well as producing enhanced images automatically.

to personal preferences, but using such tools takes much  [7], 28 [8], [9], [10] have been developed. They learn mappings from 29 low contrast images to high contrast ones using big training 30 datasets. However, CE is a non-trivial task, partly due to 31 the non-linear relationship between input and output images. 32 Furthermore, it makes CE even more challenging that people 33 The associate editor coordinating the review of this manuscript and approving it for publication was Mingbo Zhao . have different preferences for images; CE is a subjective 34 process. 35 In this regard, the conventional algorithms [5], [6], [7], 36 [8], [9], [10] have the limitation that they cannot sat-37 isfy various personal preferences. To overcome it, Ko and 38 Kim [11] recently developed the IceNet algorithm, which 39 enhances image contrast after accepting user annotations. 40 However, their algorithm demands meticulous interactions: 41 users should pay attention to the boundaries of regions for 42 controlling brightness. Otherwise, unwanted results may be 43 obtained as illustrated in Figures 1(b) and 1(d). 44 To overcome this problem, we develop an edge-aware 45 interaction system, which allows a user to specify desired 46 regions roughly without painstaking annotations but can gen-47 erate accurate masks, as shown in Figures 1(c) and 1(e). 48 Specifically, a user provides a parameter for controlling 49 the global brightness and two types of scribbles to darken 50 or brighten local regions in an image. Since the scribbles 51 represent only rough locations for controlling brightness, 52 we propagate them to nearby regions in an edge-aware man-53 ner, by employing multiple random walkers (MRW) [12], and 54 generate accurate masks. Then, we restore an enhanced image 55 through the proposed edge-aware interactive CE network 56 various transformation functions [18], [19], [20], [21], [22], 91 among which gamma correction and logarithmic mapping are 92 well-known parametric curves for mapping input pixel values 93 to output ones. On the other hand, some CE algorithms [23], 94 [24], [25], [26] have been developed based on retinex the-95 ory [27]. These conventional methods produce promising 96 results. However, their performance usually depends on care-97 ful parameter tuning. It is difficult to find reliable parameters 98 effectively for diverse input images. To address this problem, 99 algorithms based on edge-preserving filters [ To meet various preferences, professional software pro-119 vides CE tools, but using these tools takes a lot of effort 120 and training. To reduce such effort, simple interactive meth-121 ods have been developed. Stoel et al. [35] proposed an inter-122 active histogram equalization scheme for medical images. 123 Their method allows a user to specify a region of interest 124 (RoI) and applies the equalization to the region. Grund-125 land and Dodgson [36] proposed an interactive tone adjust-126 ment method. When a user selects key tones on an image, 127 it preserves those tones but adjusts the other tones, while 128 maintaining the overall tonal balance. Lischinski

137
The proposed algorithm yields an enhanced image according 138 to simple user annotations, as shown in Figure 2. By inspect-139 ing an image I, a user provides an exposure level η for 140 controlling the global brightness and two types of scribbles: 141 blue and red scribbles, respectively, mean that the user wants 142 to brighten and darken the corresponding local regions. Then, 143 the proposed algorithm generates an edge-aware mask M and 144 reconstructs an enhanced image J through e-IceNet.  Instead, we need only rough scribbles for l b and l d . Then, 155 we propagate them in an edge-aware manner by adopting 156 the MRW system [12], which simulates the interactions of 157 multiple agents on a graph. Note that MRW was originally 158 developed for unsupervised segmentation. We extend it to 159 accommodate user scribbles.  complexity. Then, we divide the patch into superpixels. 164 We compute the standard deviation of each of R, G, B chan-165 nels of all superpixels and calculate the average standard 166 deviation σ over the three channels. If σ is smaller than 167 a threshold (= 10), we double the patch size to con-168 sider a broader region. Initially, we set K = 32. Next, 169 we estimate initial distributions of the three agents. Finally,

173
First, we over-segment an image into N superpixels [39]. 174 The number of superpixels is set to N = min{4K , 250}. 175 We construct the graph G = (V, E), where V = {v 1 , . . . , v N } 176 is the set of nodes (or superpixels) and E = {e ij } is the set of 177 edges. Each edge e ij , connecting neighboring nodes v i and v j , 178 is assigned a weight where d l is the distance metrics of node features. Table 1 sum-181 marizes the distance metrics. Here, the MFF feature is 182 extracted from the proposed MFF module in Figure 5.
We employ three agents for the three classes in {l b , l d , l u }: b-185 agent, d-agent, and u-agent. We set the initial distributions of 186 b-agent and d-agent uniformly along the nodes overlapping 187 with brightening and darkening scribbles, respectively. Also, 188 the initial distribution of u-agent is set uniformly along the 189 boundary nodes, excluding those similar to the scribbled 190 nodes. Specifically, we allocate uniform probabilities to the 191 nodes v at the image boundaries that satisfy the conditions The three initial distributions are refined by MRW iterations. 197 In MRW, each agent travels on the graph G according to the 198 transition probability a ij to move from node v j to node v i . 199 We obtain a ij by normalizing w ij in (1), a ij = w ij / k (w kj ). 200 We then construct the transition matrix A = [a ij ].

201
For simplicity, we describe the MRW process from the 202 viewpoint of b-agent. The others, d-agent and u-agent, are 203 processed in the same manner. Let be the distribution of b-agent, in which p θ b,i is the probability 205 that b-agent is found at node v i at iteration θ. The random 206 movement of b-agent is determined recursively by is the restart distribution. With 209 probability 1 − , b-agent moves according to the transition 210 matrix A. On the other hand, with probability , it is forced 211 to restart with the distribution r θ b . To make the three agents 212 VOLUME 10, 2022  interact with one another, we propose the restart rule, which 213 determines the restart distributions r θ b , r θ d , and r θ u by consid-214 ering p θ b , p θ d , and p θ u jointly. Specifically, we set where δ is a cooling factor [12] and r 0 b = p 0 b is set. Here, 217 we increase the probability that b-agent stays at its scribbled 218 nodes, by adding the initial probability p 0 b,i at iteration 0.

219
Also, b-agent is enforced to restart with higher likelihoods at 220 nodes in which it is more probable than the other two agents.

221
This makes the three agents repel one another and form their 222 own clusters. 223 We perform the MRW iterations until the agents yield the 224 stationary distributions π b , π d , and π u .
These mask values form the edge-aware mask M .  , [40], 240 [41]. It is important to select an appropriate gamma value 241 by considering personal preferences as well as contextual 242 information in an image. We hence determine a gamma value 243 for each pixel in the image I using the MFF, LFE, and GME 244 modules in Figure 5.

245
In general, coarse-scale feature maps provide global con-246 texts, whereas fine-scale ones convey detailed local contexts. 247 Because both global and local contexts are important for 248 CE [10], we extract multi-scale contexts through the MFF 249 module in Figure 6. Based on the U-Net architecture [42], 250 MFF includes seven residual blocks [43] and convolutional 251 layers. Each residual block consists of two convolutional 252 layers and a residual connection.

253
In addition to the contextual information extracted by MFF, 254 we obtain user-specific information for controlling local 255 brightness by feeding the HSV image and the edge-aware 256 mask into the LFE module. Then, in the GME module, 257 we mix the contextual information and the user-specific infor-258 mation using three convolutional layers, yielding a feature 259 map F. Also, we produce a brightness vector w from η via 260 two fully-connected layers. Note that the brightness vector w 261 encodes the user preference for the global brightness. Finally, 262 based on the brightness vector w, we convert the feature 263 vector F(x) at pixel x to a gamma value (x), given by where ·,· is the inner product and ϕ(·) is the sigmoid 266 function. Hence, we have 0 < (x) < 10.  To take advantage of both approaches, we construct guid- We construct a guidance image adaptively according to the 300 exposure level η and the edge-aware mask M . We first derive 301 a reversed-S-shaped curve S from a conventional transforma-302 tion function T by where l denotes an input intensity level and T −1 is the inverse 305 function of T . Note that a typical transformation function T 306 for CE is concave and has steep and flat slopes near the min-307 imum and maximum intensity levels, respectively. By com-308 bining T and T −1 , the reversed-S curve S has steep slopes at 309 both the minimum and maximum levels, as shown in Figure 8, 310 so it can enhance both under-and over-exposed regions effec-311 tively. Also, we can control the overall output brightness by 312 changing η. Various transformation functions can be used for 313 T . In the default mode, we adopt AGCWD [22] to generate T 314 adaptively according to the intensity distribution of an image. 315 To enable e-IceNet to control local brightness, we add the 316 edge-aware mask M to the input intensity V , where λ is a parameter controlling the impacts of M , which 319 is fixed to 10. Then, we obtain the guidance image G by 320 transforming each pixel inṼ via S, given by

338
We train e-IceNet by minimizing a weighted sum of two 339 losses,

341
Here, L g is the guidance loss, which is the mean square error 342 between the output image W in (7)  We conducted user studies to assess the interactive CE per-383 formance of the proposed e-IceNet.

384
First, using 10 images in DICM [20], we asked 15 partici-385 pants to provide annotations to e-IceNet and IceNet and vote 386 for better results. 150 votes in total (10 images × 15 partic-387 ipants) were cast. The proposed e-IceNet won significantly 388 more votes: it was preferred in 76% of the tests, while IceNet 389 in only 24%. This is because e-IceNet generates more accu-390 rate masks from simple scribbles and provides more natural 391 CE results. In contrast, as shown in Figure 1, IceNet does not 392 provide sufficiently accurate masks. Moreover, in Figure 9, 393 IceNet over-enhances images even at a middle exposure level 394 η = 0.65.

395
For more subjective assessment, we collected 50 images by 396 choosing the first 10 indexed images from each of the test sets 397 of NPE [23], LIME [25], MEF [45], DICM [20], and VV [46]. 398 Then, we conducted another user study with the participants. 399 It was designed as follows:  Figure 11 compares qualitative results, 418 in which the result of e-IceNet was obtained by a participant. 419 We see that e-IceNet yields a more natural result by bringing 420 out details without over-enhancement.   In this test, the proposed e-IceNet automatically generated   12. Comparison of the proposed mask generation with conventional interactive segmentation techniques [47], [48]. CNN-based interactive segmentation techniques [47], [48], 456 [50], [51], [52] have been developed. However, they train the 457 networks to generate segmentation masks for objects in spe-458 cific classes in normal-exposed images, yielding unreliable 459 masks for unknown classes in under-exposed images. Fig-460 ure 12 compares the results of the state-of-the-art interactive 461 segmentation techniques [47], [48] with those of the proposed 462 algorithm. For a fair comparison, we provide the same anno-463 tations. The proposed algorithm yields better results on such 464 under-exposed images.

465
2) MULTI-SCALE FEATURE FUSION 466 We analyze the impacts of MFF. Figure 13 compares the 467 results of e-IceNet trained without and with MFF. In this test, 468 the output images are generated at η = 0.7 without scribbles. 469 It is observed that e-IceNet with MFF restores more pleasing 470 images with less noticeable artifacts than that without MFF. 471

472
We analyze the impacts of transformation functions for 473 guidance images. Figure 14 compares enhanced results 474 respectively. Note that differences between (b) and (c) can be seen more clearly in the background (e.g. cloud and stone).  Figure 16 shows CE results of IceNet and e-IceNet using the 497 same masks, generated by the proposed MRW simulation. 498 In this test, the results are obtained at low exposure levels 499 to emphasize the differences between the two algorithms. 500 Even when provided with accurate masks, IceNet generates 501 visually annoying rippling artifacts near the mask boundaries. 502 It is because IceNet is trained to encourage smooth variations 503 between adjacent pixels. On the other hand, e-IceNet yields 504 much better results near the boundaries.  obtained by AGCWD directly. These results, hence, indicate 510 that e-IceNet yields more reliable results than AGCWD, even 511 though it uses the AGCWD results as the guidance images.

512
Furthermore, e-IceNet can accommodate user scribbles to 513 achieve personalized CE. We use the cubic function in (11) to estimate an initial expo-517 sure level. Figure 18 shows that the performance increases 518 up to the degree of three and then saturates, meaning that the 519 cubic function is an effective choice. Also, the blue circles 520 correspond to the PSNRs using the average luminance as 521 input as in [11]. They are poorer than the PSNRs using the 522 entropy. Thus, we adopt the entropy as input.     size gets larger, the running time increases, but the proposed 531 mask generation is fast enough for practical applications. 532 Thus, we set K = 32. Let us discuss the impacts of the hyper-parameters in Figure 3 539 and Table 1. In this test, we measure the segmentation per-540 formance by the intersection over union (IoU) scores with 541 100 images in the VidSeg dataset [54]. We compute the aver-542 age IoU scores by changing each parameter in Table 7. In this 543 test, we generate a segmentation mask from one scribble with 544 the initial patch size K = 64. The performance increases up 545 to λ 1 = 2.5, λ 2 = 1.0, λ 3 = 10 3 , and σ = 10, respectively, 546 and then decreases.  Figure 19 shows how two participants enhanced the same 549 input image differently during the user study, which indi-550 cates that people have diverse preferences for contrast. Also, 551 by comparing Figure 19(c) with Figure 19(d), we see that the 552 visual quality is affected significantly by user scribbles. The 553 proposed e-IceNet allows the user to specify desired regions 554 work and its application for image contrast enhancement,'' IEEE Trans.