Refocusing Metric of Light Field Image Using Region-Adaptive Multi-Scale Focus Measure

Compared with conventional photography, the newly emerging light field image capturing technique has dramatically extended potential capabilities of post processing. Among the new capabilities, refocusing is of the most interest. In this paper, we first investigate a region-adaptive multi-scale focus measure (RA-MSFM) that is able to more robustly and accurately measure focus of light field images. It is especially superior when measuring focus in flat areas where previous methods struggle. Following we design a novel refocusing measure metric which employs the RA-MSFM as core technique. Using the metric, refocusing capability of a given light field image as a whole can be measured in a single number by combining focus score maps of each refocused image in the focal stack. The focus score maps are generated using the proposed RA-MSFM. In RA-MSFM, different multi-scale factor is adaptively selected depending on different regions such as texture-rich or flat areas using a multi-layer perceptron network. Different from most light field image metrics that assess image quality, our metric targets to assess refocusing capability. Our experiments have shown that not only does the proposed refocusing metric have high correlation with subjective evaluations given in the form of mean opinion scores, but it also produces all-in-focus images having 0.7 ~ 4.6dB higher PSNRs compared to previous state-of-the-art methods. The proposed refocusing metric can be used to measure refocusing loss in practical application such as compression, tone mapping, denoising, and smoothing.

microscopy, robotics, and medical imaging among others [1]. 23 The light field (LF) camera allows effective reverse raytracing 24 from already recorded image so that the image itself can 25 be adjusted in post processing [2]. That is, focus, exposure, 26 viewing angle, and depth of field can be adjusted after the 27 picture is taken [3]. The availability of depth information over 28 an entire scene also facilitates users in adjusting other aspects 29 of the image like controlling the depth-of-field [4]. 30 The associate editor coordinating the review of this manuscript and approving it for publication was Senthil Kumar . In this regard, one likes to measure how much in focus a pixel 46 is in a refocused image. The general refocusing capability of an LF image can be thought as the overall degree of focusing  [10], tenengrad of gradient variance (GRA7) [11], eigenvalue 67 based (STA2) [3], and image contrast (MIS3) [14] methods 68 achieve the overall high performance among the four fam-69 ilies. Noise robustness tests in [9] indicate that STA2 and 70 GRA7 methods handle noise very well. 71 However, it should also be noted that there are a few FM 72 methods that have been developed for LF images. Surh [15] 73 proposed a FM employing a ring difference filter (RDF) 74 which maintains high robustness and confidence by utilizing 75 a relatively large window of neighboring pixels and placing a 76 ring gap space to ignore certain regions in that window. The 77 RDF focus measure is especially useful in depth estimation 78 for LF images. Rizkallah [17] proposed a metric to decide 79 whether a certain pixel in a focal stack is in focus or not by 80 thresholding of pixel gradient. This metric is used to evaluate 81 compression loss by counting the number of in-focus pixels. 82 It is a very simple metric which judges whether a pixel is 83 in-focus or not by hard thresholding on gradient value, but 84 performs not so well in flat in-focus area. Chantara [18] 85 proposed a FM based on the Summation of the Modified 86 Laplacian, which is sensitive to noise. This FM is used to 87 select the focus area of an LF focal stack. 88 These existing FM schemes work well in high frequency 89 areas such as rich textures or high sharpness areas, but 90 encounter difficulties when dealing with in-focus flat areas 91 [9]. A multi-scale concept [21], [22], [23] is seen promising in 92 improving the accuracy in those flat areas. Thus, in this paper, 93 we propose a region-adaptive multi-scale focus measure (RA-94 MSFM) which plays an important role in our refocusing mea-95 sure methodology. That is, for each refocused image in a focal 96 stack, the RA-MSFM is employed to generate a pixel-wise 97 focus map of the focal stack. An overall refocusing capability 98 score of a whole LF image is calculated by combining the 99 pixel-wise focus maps of the focal stack. 100 In this paper, we design a new refocusing measure for LF 101 images. Our contribution of this paper lies in introducing 102 1) an assessment metric for LF refocusing capability, differ-103 ent from existing metrics mostly targets for image quality; 104 2) the RA-MSFM method performs higher accuracy in both 105 texture-rich and flat area; 3) an appropriate focal stack range 106 and step size are determined by analyzing in a mathemati-107 cal way instead of simply predefining their values as most 108 research did.

109
To evaluate the proposed RA-MSFM method, we take two 110 approaches: rendering all-in-focus images and comparing 111 with other state-of-the-art approaches; carrying out subjective 112 experiments on the focus level of the ROI to analyze corre-113 lation. To evaluate the proposed refocusing measure metric 114 for whole LF image, another subjective experiment is carried 115 out on in-focus pixels coverage. The proposed refocusing 116 measure can be used in many image processing tasks which 117 may result in refocusing loss such as compression, tone 118 curve mapping, noise reduction, blurring, smoothing, etc. 119 Compression is a method which minimizes data size of image 120 even with image quality degradation, thus the consequen-121 tial information loss may reduce refocusing capability. Tone 122 curve mapping extends the dynamic range of one region while 123 A focus map is a collection of focus levels of pixels in an 154 image. The focus level indicates how much in-focus a given 155 pixel is. A higher focus level indicates that the pixel has the 156 better focus. By conventional methods [9], [18], the focus 157 level of a pixel I (x, y), denoted by F(x, y), can be measured 158 by a selected focus measure operator FM .  Fig. 1(a). To simulate realistic camera noise, we also 183 applied gaussian noise to the original light field images in the 184 experiment. A refocused image in a focal stack is referred 185 by its focal stack index k, k = 1, . . . , K . K is the total 186 number of images in the focal stack. The horizontal axis in 187 Fig. 1 indicates the focal stack image index, and the vertical 188 axis shows the focus level of a pixel at the center of its window 189 (shown in yellow in Fig. 1). Note that the more in-focus a 190 pixel is, the higher its focus level is. The focus level of a pixel 191 at (x, y) are different between the kth and (k + 1)th images in 192 the focal stack. In an ideal case, with increasing focal stack 193 index, the focus level should increase and will decrease after 194 a pick so that showing a hill-shape. In comparison of the 195 focus level curves generated by the five operators in Fig. 1(a), 196 RDF, GRA7, and STA2 are seen to show better resistance 197 to noise than the others since their focus level curves are 198 more similar to the desired smooth hill shape than the others. 199 Meanwhile, LAP2 was found to be the most sensitive to noise 200 since its rises and falls are out of order. As for the accuracy, 201 RDF and STA2 show better performance than GRA7 since 202 the peak of the GRA7's curve does not match with the most 203 in-focus image index marked by the black dashed line. Focus 204 measure performance test on in-focus flat area is shown in 205 Fig. 1(b). It is noted that none of the five operators show 206 their focus level curve having a smooth hill shape for the in-207 focus area. To compare complexity, we analyzed the average 208 computation time of the five focus operators to see that STA2 209 consumes the most time. In the overall tradeoff between 210 noise-robustness, accuracy, and complexity, RDF was found 211 to be better than GRA7 and STA2. As such, we select RDF 212 to generate the pixel-wise focus map. An image varies a lot in terms of its spatial contents so 215 that its processing is better to be adaptive especially when 216 it comes to focus measure regarding whether a pixel is in 217 textured or flat area. To deal with this issue for achieving 218 accurate focus map, we investigate a region-adaptive multi-219 scale focus measure (RA-MSFM). An appropriate degree of 220 down scaling highly depends on local regions, therefore, one 221 single fixed scale-down factor cannot cover all cases. For 222 example, the regions of an image can be roughly divided 223 into 4 types: in-focus flat area (p1), out-of-focus flat area 224 (p2), out-of-focus texture-rich area (p3), and in-focus texture-225 rich area (p4), as shown in Fig. 2. In this work, an adaptive 226 selection scheme is designed using a multi-layer perceptron 227 network which selects an appropriate scale-down factor. For 228 this, we down-scale the original image (having resolution 229 focus level of a pixel in an image is re-written as: wheret is the classified label for a pixel I (x, y) and Pr (t) 260 is the probability of the label t provided as network output. scale-down factor of each pixel is prepared among four scale 269 factors (1, 1/4, 1/16, 1/64) by selecting the one that reflects its 270 actual focus level the best. The focus level is calculated as in 271 (1) using a selected focus measure operator FM , for example, 272 RDF [15]. A high FM value of a pixel indicates that the pixel 273 is in-focus, on the other hand, a low value indicates an out-274 of-focus pixel. Accordingly, for a pixel located at the in-focus 275 region like p1 and p4, its ground truth scale-down factor is 276 selected as the one giving the maximum focus level. For a 277 pixel located at the out-of-focus regions (like p2 and p3), the 278 one giving the minimum focus level is selected as the ground 279 truth. Fig. 2 illustrates the ground truth of the scale-down 280 factor (indicated by bolded rectangle) for the pixels p1∼p4. 281 A focus map showing focus level of each pixel is gener-282 ated with our RA-MSFM. Fig. 4 compares the focus maps 283 generated by three well-known methods and the proposed 284 RA-MSFM. Unlikely the existing methods, the proposed 285 RA-MSFM is shown to work well even in flat areas by select-286 ing an appropriate scale-down factor to show high accuracy 287 focus level in in-focus flat area.

289
A focal stack is a collection of the same image but focused 290 on multiple planes, thus, refocusing can be understood in a 291   We compute the focus score S k (x, y) by normalizing In this section, we propose a refocusing capability measure 313 for the given 4D light field image and its rendered 2D refo-314 cused images. 315 VOLUME 10, 2022 where b k (x, y) represents whether the given pixel at (x, y) is

342
The overall refocusing pixel coverage (RPC) of a light field 343 image, RPC LF , is defined as below. Light field images are captured using a plenoptic cameras 383 which typically have microlens arrays placed in front of 384 an image sensor to record incoming light rays from many 385 directions [24]. This architecture allows differentiating as 386 many directions as there are pixels behind each microlens. 387 The light field can be used to digitally reconstruct an image 388 corresponding to a different camera focus, which we call the 389 refocused image [25], [26]. The light ray of 4D light field can 390 be parameterized by two parallel planes uv and xy, known as 391 the directional and spatial dimensions. The camera aperture 392 is positioned along the uv plane, while xy indicates the sensor 393 plane [19]. As illustrated in Fig. 7, the sensor plane is located 394 at a distance F from the aperture plane, and the light ray 395 L (x, y, u, v) reaches at the position x on the sensor plane. For 396 the refocus plane RP k , its distance from the aperture plane 397 is F = αF where α = F /F is defined as the relative 398 depth [27]. The dashed light ray converging on a position x 399  on the refocus plane RP k is denoted by L α (x, y, u, v) . Since 400 it also reaches at x k in the sensor plane, L α (x, y, u, v) =

401
L (x k , y k , u, v) . By similar triangles in Fig. 7,

405
The refocused image I α (x, y) is rendered by integrating all 406 directional rays at a specific position on the sensor plane: Here, L (u,v) (x, y) = L (x, y, u, v) denotes a 2D view from 425 the point (u, v). (10) shows the digital refocusing is real-426 ized by shifting a factor u (1 − 1/α), v (1 − 1/α). If we 427 denote the shift offset as x, y, then, Regarding the refocusing parameter α, an α value less than 430 1 indicates its refocus plane being close to the aperture plane 431 (that is, F < F), and α value larger than 1 means a refocus 432 plane far from the aperture plane (that is, F > F). In general, 433 α can assume any real value, and it is related to one refocused 434 image in the focal stack, that is, one refocused plane. The 435 number of images in the focal stack, K , is determined by 436 α. If K is of a high value, a large and redundant focal stack 437 demands huge hardware consumption as shown in Fig. 8(a); 438 if the value of K is low, the generated focal stack cannot 439 cover all the refocusing ranges as shown in Fig. 8(b), thus 440 the measured refocusing capability will be smaller than its 441 real refocusing capability. Therefore, it is necessary to decide 442 an appropriate K value. The K value depends on the range 443 α max − α min and step size a = α k+1 − α k .

444
There have been some studies [31], [32], [33], [34] on 445 setting the α value range and step size. One work [31] inves-446 tigated setting these values based on the image content depth, 447 but since content always varies, the depth of each image will 448 also vary, so it is difficult to give a consistent definition in 449 this way; another work [32] also investigated this based on 450 the plenoptic camera's focal length and microlens diameter, 451 but this approach is limited to only typical cameras; another 452 work [33] just used a fixed range and a step size was found 453 experimentally. In this paper, we study the problem again 454 mathematically and propose a method to define the range and 455 sample step size of parameter α.
Similarly, using the constraint on | y| above, So, the minimal α value depends on M , N , P. For the case 477 of α> 1, we always have 0 < 1 − 1 α < 1, thus, 479 Since | x| can be close to |u| max but cannot be |u| max , 480 to reduce redundancy, we set |u| max −1 as the maximum | x| 481 value, thus, | x| ≤ |u| max − 1, So, the maximal α value depends on P. 486 2) SETTING FOCAL STACK STEP SIZE

493
In order to cover as wide refocusing ranges as possible with 494 the minimal number of images in the focal stack, we should 495 define a proper value for α. The light ray L (u,v) (x k , y k ) and 496 L (u,v) (x k+1 , y k+1 ) converge to a position x of I (k) and I (k+1) , 497 respectively.
and d = |x k − x k+1 | is the distance between the two light 501 rays. If d > 1, refocusing possibility is compromised due 502 to omission of some light rays. If d ≤ 1, all the refocusing 503 ranges can be covered, therefore, d = |x k − x k+1 | ≤ 1 is 504 an essential constraint as shown below.
To ensure all the u values meet the constraint, it is required 507 that |u| = |u| max = (p − 1)/2. If d is too small, there will 508 be many images in a focal stack which causes redundancy 509 and high computational cost, thus, d = 1 is selected as an 510 appropriate value.
In case of 0 < α < 1, then α k+1 < α k , thus, In the same way, when α > 1, then, α k+1 > α k , thus, The refocusing plane positions using the step size decided 517 in this paper are shown in Fig. 8(c). With the proposed α 518 setting, the generated focal stack can cover all the refocusing 519 range while having only the minimum number of images.

521
In this Section, we evaluate the performance of 1) the pro-  To evaluate our refocusing capability metric, we also con- 551 Fig. 9 shows the all-in-focus images generated by the dif-552 ferent methods [15], [17], [18]. To ensure a fair experiment, 553 the window size is set to 5 × 5 for all the focus measure oper-554 ators tested. The AIF images rendered using our RA-MSFM 555 are seen better than the images generated by existing methods 556 (take note of the sky, wall, and face). While the AIF images 557 rendered by the proposed method is very clean like the ground 558 truth AIF images, the other existing methods show unpleasing 559 artifacts in the flat areas.  Table 2 illustrates the correlations between the reference 561 ground truth image and the rendered AIF images. Test 562 datasets I01 ∼ I12 are shown in Fig. 10(a). Peak signal to 563 noise ratio (PSNR) and root mean squared error (RMSE) 564 are used to evaluate the correlation between the images. 565 In checking similarity between the reference and the rendered 566 images, a high PSNR indicates that the rendered AIF image is 567 very similar to the reference AIF image. So, in this evaluation 568 of all-in-focus images, a high PSNR is seen to indicate a high-569 quality focus measure method. Since the RMSE quantifies 570 difference between the reference and the rendered image, 571 a smaller RMSE indicates better performance of the given 572 focus measure. The experiment results reveal that the pro-573 posed method has the highest PSNR and the lowest RSME. 574 The rendered AIF image using the proposed method has aver-575 age 0.7dB, 4.6dB, and 1.5dB higher PSNR than Chantara's 576 [18], Rizkallah's [17], and Surh's [15] methods, respectively. 577 The RSME results also show that the proposed method pro-578 duces 4.6%, 35.3%, and 13.8% smaller error than the existing 579 methods, respectively. 580 VOLUME 10, 2022    Fig. 10(a), the focus capability level of each ROI is scored 594 from 0 ∼ 1.

595
Among the 21 rendered images in the focal stack, the best 596 in-focus one is scored as 1, while the most blurred (that is, 597 out-of-focus) one is given 0. In the subjective test II, the 598 refocusing pixel coverage range is marked and the ratio of 599 the focused area to the whole image is scored as 0 ∼ 100% 600 as shown in Fig. 10 Table 4 shows the correlations between the objective and 633 subjective focus score for the datasets I01 ∼ I12 with Gaus-634 sian noise of σ = 0.001. The Laplacian-based Chantara's 635 method is found to be most sensitive to noise.  Table 5 shows the average performance over 12 LF images 637 for the original and its noise-added ones. In the case of the 638 original images, the proposed method is seen to achieve 2% 639 ∼ 4% higher correlation than Chantara's, 5% ∼ 8% higher 640 than Surh's, and 10% ∼ 15% higher than Rizkallah's. In the 641 case of the noisy images, the Laplacian-based Chantara's 642 method shows the worst performance. The comparison data 643 in Table 4, Table 5, and the measured focus level for the ROI 644 in Fig. 11 show that the proposed method is superior to the 645 VOLUME 10, 2022  existing methods not only for the original images but also 646 with noisy images.

647
Apart from the subjective test I of focus score for a given 648 local area, the subjective test II is for refocusing capability 649 metric regarding refocusing pixel coverage. In the subjective 650 test II, subjects are asked to mark all the in-focus pixels for 651 each image in the focal stack.

652
The subjective refocusing capability is measured in terms 653 of the in-focus pixel percentage over the whole image. The 654 comparative subjective and objective refocusing capability 655 evaluation is shown in Fig. 12 using I01 ∼ I12 data sets.

656
The proposed metric RPC gets closer to the subjective score 657 than the state-of-the-art Rizkallah's metric [17]. The corre-658 lation analysis results with PLCC and RMSE are shown in 659 Table 6 which shows also that the proposed metric achieves 660 12% higher correlation and 5% lower error than the other     Another potential future extension of this work is the design 721 of a perceptual focus measure which is more closely related 722 to how the human visual system evaluates images. 723