Hardware-Friendly Laplacian-Based Multi-Focus Image Fusion in DCT Domain for Visual Sensor Network

Visual sensor network (VSN) requires a multi-focus image or video frame fusion technique involving focus measure computation in the DCT-domain to generate an all-in-focus image. Such techniques are implemented on resource-constrained on-board systems requiring hardware-friendly implementations. In this article, we first show that components of the Laplacian matrix are related to the discrete cosine transform (DCT) basis. The relation is that the eigenvalues of the Laplacian with proper boundary condition form the diagonal elements of the diagonal matrix generated by the DCT operation on the Laplacian. Exploiting this relation, we propose a focus measure which works on the DCT coefficients reflecting the spatial-domain Laplacian operation. Certain simplifications allow our focus measure computation through hardware-friendly integer multiplication and summation, where matrix multiplication involves just N scalar multiplications for an $N\times N~2\text{D}$ signal. Finally, we propose an approach which suitably fuses multi-focus images or video frames in DCT based image or video coding framework through detection of properly focused area and neighborhood consistency analysis. We show that our proposed approach is hardware-friendly, computationally simple, and is fast enough for VSN. Through experimental results, we show that our approach outperforms the relevant state-of-the-art in multi-focus image fusion for VSN both quantitatively and subjectively. We also show that our approach is effective in comparison to the state-of-the-art and a few latest generic multi-focus image fusion techniques in terms of quantitative and subjective evaluations.


I. INTRODUCTION
Visual sensor network (VSN) is an intelligent system [1], [2] which generates huge sensory data from the environment through geographically distributed camera nodes and then collaboratively processes [3] to send useful information for different applications related to surveillance, monitoring, etc. [4]- [6]. The cameras have limited depth of field, and hence, they capture reliable (focused) information of objects only at their focus ranges and blur the rest of the scene [7]. The on-board system for the required collaborative processing to produce all-in-focus images/video frames, is resource-constrained [8], [9], allowing very simple The associate editor coordinating the review of this manuscript and approving it for publication was Sudhakar Radhakrishnan .
hardware-friendly processing of the huge captured data [10]. Therefore, generating an all-in-focus image/video frame in real-time, especially in a wireless sensor network [11], [12] is a challenging task.
To handle and transmit huge visual data, VSN adopts an image/video compression or video coding technique. Many use discrete cosine transform (DCT) in view of its efficient energy compaction property [13]- [15]. Thus, a simple multi-focus image fusion technique that works and generates an all-in-focus image/frame in DCT domain would be preferable for VSN, as it can be directly incorporated in the DCT based compression framework. Many such approaches have been proposed [10], [16]- [23], which are discussed in Section III. Among these, a few attempts have been made to formulate DCT domain equivalent of efficient spatial domain focus measures that utilize spatial variance [18] or spatial frequency [19] in a local spatial neighbourhood. Other attempts have also been made in DCT domain that represent operations in spatial neighborhoods [20]- [23], which are discussed in Section III as well. For example, Amin-Naji et al. [20] proposed the use of variance and energy of Laplacian response in DCT domain for multi-focus image fusion in VSN. Though these approaches produce state-of-the-art results, they are comparatively expensive as detailed in Section VI-C. But, such a solution requires floating point squaring and multiplication operations, which are not hardware-friendly like addition.
It is straightforward to assume that when an area of an image is not focused, it is blurred. Such an area contains substantially more low frequency content than high frequency one. On the other hand, when an area (except smooth areas) is focused, it is comparatively sharper, contributing to high frequency content. Therefore, different operators which compute spatial frequencies can provide focus measures. Gradient is one such operator to measure image sharpness indicating spatial frequency. It's equivalent in DCT domain has been used in [19] to get spatial frequency based focus measure. Laplacian operator is another well known operator [24], which is good at determining the sharpness in an image. A plethora of focus measures in spatial domain based on Laplacian operator have been proposed and are found to be effective [7], [25]- [28]. Subbarao et al. [26], [27] have proposed the energy of the Laplacian of an image to measure the focus for an auto focusing application. Nayar et al. [25] have proposed the sum of modified Laplacian (SML) of an image as an efficient focus measure. Malik et al. [7] have proposed an auto focus algorithm, where SML of an image is used to determine the focus information.
Inspired by the aforesaid requirement of hardware friendliness, in this article, we formulate a computationally simple hardware-friendly multi-focus image fusion approach for VSN using a non-trivial equivalent of the spatial domain Laplacian operator in the DCT domain.
The major contributions of the paper are as follows: 1) It is shown that an N × N Laplacian matrix (formed using Laplacian operator with proper boundary condition) is related to the N × N DCT matrix in such a way that the eigenvalues of the Laplacian matrix form the diagonal elements of the diagonal matrix generated by DCT operation on the Laplacian matrix. 2) The above property of the Laplacian matrix is exploited to propose a focus measure, which can be directly computed on the DCT coefficients such that it requires N fixed point multiplications. In comparison, a generic matrix multiplication requires N 3 floating point multiplications. 3) An approach, which fuses the multi-focus images or video frames in DCT based image or video coding framework using our Laplacian based focus measure, is proposed. The proposed approach is not only computationally less expensive and hardware-friendly, but also produces state-of-the-art results of multi-focus image fusion for VSN. The paper is organized as follows. Section II shows how the properties of the Laplacian with proper boundary condition can be exploited to provide a computationally simple solution in the DCT domain, which can be used in multi-focus fusion for VSN. Section III describes the related literature on multi-focus fusion for VSN. Section IV proposes hardware-friendly computationally simple focus measure for VSN. Section V describes the step-by-step procedure of the proposed multi-focus image fusion method. Comparative results of multi-focus image fusion for VSN using different techniques including our proposed one are presented in Section VI. Finally, Section VII concludes the paper.

II. LAPLACIAN RESPONSE THROUGH EFFICIENT DCT DOMAIN COMPUTATION
In this section, we first discuss the relation of N ×N Laplacian operator with DCT. Later, we mathematically show how such a relation can be useful for developing a hardware-friendly computationally simple solution in the DCT domain to get the Laplacian response.

A. LAPLACIAN RESPONSE FOR A BLOCK BASED OPERATION
Let us first consider a digital double derivative having the smallest length, [−1 2 −1], which is obtained using the digital single derivative [−1 1] twice at consecutive locations. We shall then extend the operator [−1 2 −1] to get a 2D spatial Laplacian operator for block based operation. The digital double derivative (1D Laplacian) operation for digital 1D signal F having N samples is as follows As the operation at the signal sample x is basically high pass filtering, its response provides the high frequency content at that signal sample. If we want to extend the Laplacian response defined at 1 to N samples at a time, we can represent it in matrix form as follows where R is an N × 1 column vector, P is an N × N matrix defined and F is a column vector of size N × 1 given by As the 1D Laplacian operator of the smallest length requires one preceding sample and one succeeding sample for computation at a signal sample, the values of R at r(0) and r(N − 1) VOLUME 8, 2020 cannot be computed. Hence, we use '×' in the P operator to indicate the same, which essentially means that the values at '×' location would not be employed. As stated in [29], the operator P is related to DCT basis function. If F is a 2D signal of size N × N , for a location (x, y), we have The combination ∂ 2 ∂x 2 + ∂ 2 ∂y 2 is known as the Laplacian operator. For a 2D signal we will get two responses, one for row-wise operation (R r ) and another for column-wise operation (R c ) as follows (6) where F (the signal) is an N × N matrix. As mentioned already, '×' in P remains unemployed. However, depending on the underlying purpose, certain values at '×' can be more suited than others. In our case, we require the following • The values at '×' must yield high frequency upon operation on F, as P is meant to provide high frequency content.
• The values at '×' must allow efficient computation of R r and R c in the DCT domain, that is, on the DCT coefficients of F (See Subsection II-B). To meet the said requirements, we propose the following operator L for an N ×N signal block because of its interesting properties, which will be shown in the discussion that follows.
It is straightforward that L is obtained by replacing [× ×] of P in the first row by [1 −1] and in the last row by [−1 1], both being digital derivatives. The operator L, which contains the digital derivatives and 1D Laplacian, yields high frequency content of the signal in the relevant direction. We see that, L = L T and L can be defined by singular value decomposition as follows where both U and V are orthogonal matrices and λ i,j = λ i,i δ i,j , i, j = 1, 2, . . . , N , δ i,j being the Kronecker delta and λ i,j the singular values of L. Here, U and V are such that λ i,i > λ k,k , ∀i < k, (i, j) = 1, 2, . . . , N . In our case, U = V so that UV T = I, V T U = I. Let us represent u i,j as i th row and j th column element of matrix U. Then The 0 value indicates that the corresponding eigenvectors are orthogonal to DCT basis vectors, which is the case except at the index (i, N + 1−i). At index (i, N + 1−i), we get a value 1 when an eigenvector and a DCT basis vector are the same, and a value −1 when they have a 180 • phase difference. Due to the above properties, E is an anti-diagonal matrix where E = DU. This implies E T = U T D T . As, U = V, we can write E T = V T D T . Therefore, E is an anti-diagonal matrix which has values 1/−1 at index (i, N +1−i). Transpose of an anti-diagonal matrix is another anti-diagonal matrix. So, E and E T are two anti-diagonal matrices. Based on these, we can show that A = EE T is a diagonal matrix whose diagonal element a i,i = e i,N +1−i × e T N +1−i,i . The corresponding proof is provided in Appendix A. Applying (9) and the above result for a i,i , we can say, EE T = I. Now, applying DCT to both sides of (8), we have As E = DU and E T = V T D T , we can rewrite the above expression In the above expression, is a diagonal matrix. As is diagonal matrix and E T is an anti-diagonal matrix, we can show that B = E T is an anti-diagonal matrix where the anti-diagonal element b i,N +1−i = λ i,i × e T i,N +1−i . The corresponding proof is provided in Appendix B. Now, let us denote DCT (L) = E E T as M, where E and E T are two anti-diagonal matrices and is a diagonal matrix. Then, applying EE T = I, we can say that M is a diagonal matrix whose diagonal element m i,i is given by The corresponding proof is provided in Appendix C. Again, . . , N . So, we can say that the DCT operation on L generates a diagonal matrix M whose diagonal terms are the eigenvalues of L arranged from minimum to maximum starting from the first diagonal element.

B. EFFICIENT DCT DOMAIN COMPUTATION OF LAPLACIAN FOR A BLOCK BASED OPERATION
Any filtering operation on 2D signal (image block), like the ones used in the computation of our focus measure, requires row-wise and column-wise operations as shown in expression (6). In matrix representation, the row-wise operation can be expressed as C = AB, where any element c i,j is computed as additions. Further, we will have the same amount of computation for the column-wise operation and additive response computation. Multiplication operations, which are not hardware-friendly like additions, are always preferred to be minimal in number. The focus measure in [19], computed by the filtering operation on image block, reduces the computations required by one matrix operation, thus involving N 3 multiplications instead of 2N 3 . Our choice of operator L which is defined in (7) is such that DCT(L) is a diagonal matrix as shown in (12). For an image F, let us denote F D = DCT (F). To compute the additive response from T 1 = DCT (LF) we require N 2 multiplications due to the operator L. So, there is a reduction in the number of multiplications from N 3 to N 2 . As our DCT domain focus measure is an additive response on matrix multiplication operation, we can further reduce the required computations. The additive response Similarly, the additive response of It is clear that the above solutions reduce the number of multiplications from 2N 3 to 2N , making it more hardware-friendly and less computationally expensive.

III. RELATED WORKS ON MULTI-FOCUS IMAGE FUSION FOR VSN
Resource-constrained VSN system demands a simple multifocus image fusion solution generating an all-in-focus image/frame from different multi-focus images/frames captured from geographically distributed camera nodes. As mentioned before, a DCT domain hardware-friendly, fast, but computationally less intensive solution, is preferable which can be incorporated into image compression or video coding framework.
Tang [16] has proposed two methods in the DCT domain, DCT+Average (DCTav) and DCT+Contrast (DCTcm) for multi-focus image fusion. DCT+Average is the simpler one, where all the DCT coefficients are averaged to generate the fused all-in-focus image. This technique does not exploit the relationship between focused and defocused regions based on the DCT coefficients. Degree of focus which is not used in DCTav, is a measure of sharpness that reflects the high frequency components of an image block in the DCT domain. Considering the importance of focus, a contrast based on the mean amplitude over a frequency band in an image block is proposed. In the fused image, the DC coefficients are averaged, whereas the AC coefficients corresponding to larger contrast value are chosen. Including the above proposals, a number of novel approaches have been proposed and listed in [17], where the author has shown multiple ways to exploit the DCT coefficients for image fusion. In DCT+AC-Max (DCTma), the fused image is generated by choosing the AC coefficients of larger magnitude among the input images. For the DC coefficient in the fused image, the DC values of the input images are averaged. In DCT+Contrast-Modified (DCTch) and DCT+AC-Max-Modified (DCTah), the DC along with the lowest frequency coefficients are averaged in the fused image. The rest of the coefficients from input images are chosen based on high contrast value for DCTch and based on the magnitude of AC coefficients for DCTah. In DCT+Energy-Max (DCTe), the author has proposed the energy of the frequency band of the DCT coefficients as the measure, and the DCT coefficients corresponding to higher frequency band energy are selected in the fused image. But none of the above proposed techniques include spatial information of the image.
Haghighat et al. [18], [30] have shown that the variance in a local spatial neighbourhood is the same as the variance of normalized DCT coefficients. Such a formulation generates the result as in the spatial domain without doing spatial domain computations. Based on this, the authors have proposed two technqiues DCT+Variance (DCTv) and DCT+Variance+CV (DCTvc) [18]. The former approach fuses images based on the variance of normalized DCT coefficients, whereas the approach DCTvc includes a consistency verification in DCTv [18] for fusion. Cao et al. [19] formulated the computation of the spatial frequency in the DCT domain and used it for multi-focus image fusion. Based on the spatial frequncy in the DCT domain, they proposed DCT+SF(DCTs) and DCT+SF+CV (DCTsc) [19]. The former approach fuses images based on the spatial frequency in DCT domain, whereas the latter includes a consistency verification [19] in DCTs for fusion. Later Vakaimalar et al. [21] suggested Min-Max normalization on the DCT coefficients while performing fusion based on spatial frequency. Amin-Naji et al. [22] proposed a multi-focus image fusion technique which fuses the images based on the geometric mean of the largest five eigenvalues obtained through the singular value decomposition (SVD) on the input blocks in DCT domain. In [20], the authors VOLUME 8, 2020 proposed four approaches, namely DCT+EOL, DCT+VOL, DCT+Corr and DCT+Corr_Eng for multi-focus fusion. In DCT+EOL and DCT+VOL, the authors proposed the use of the energy of Laplacian and the variance of Laplacian for focus measure computation. Focus measure based on the correlation coefficient between image blocks in DCT domain is shown to perform well in multi-focus image fusion [23]. The authors in [20] proposed DCT+Corr and DCT+Corr_Eng, which use the correlation coefficient and the energy correlation coefficient between source blocks and artificially blurred blocks as the focus measure. Further, considering existing consistency verification in the approaches, the authors showed that their techniques were superior in performance. Moreover, the authors also suggested repeated consistency verification for better consistency, and therefore, better performance.

IV. PROPOSED FOCUS MEASURE
In an all-in-focus system, the crucial part is to fuse the most informative and relevant information, that is, the focused areas in multiple images. An appropriate focus measure determines the focused area based on sharpness quantification [25]. As mentioned already, the sum of modified Laplacian (SML) is such an efficient focus measure which is known to perform well.
The application of ∂ 2 ∂x 2 + ∂ 2 ∂y 2 on a 2D signal F is known as the Laplacian operation, which involves double derivatives on F in both the x and y directions. If such an operation generates responses of opposite polarities in the two directions, the overall response gets diminished. As the magnitude of the responses in the two directions quantify the amount of focus, Nayar et. al. [25] proposed the sum of modified Laplacian (SML) to measure focus as follows We extend it for use with N × N 2D signal and denote the measure as SML L defined using our N × N L operator. Note that SML L is the additive response of R r and R c , which are defined in (6), where P is to be replaced by L. In this case (using L) the responses (R r and R c ) are as follows where R r represents ∂ 2 F ∂x 2 and R c represents ∂ 2 F ∂y 2 for N × N block signal F. Now, let us define SML using L operator for N × N block in DCT domain as SML DCT L given by where the summation (.) is over all the matrix elements yielding a single value. combining (16), (17), and L = L T , we have Now, DCT (LF) is T 1 in (13) and DCT (FL) is T 2 in (14).
Combining expressions of (13), (14) and (19), we have, From the above expression, we see that the focus measure is nothing but a weighted sum of the absolute values of the DCT block coefficients, where the weights are the eigenvalues of L. From our formulation in (20), we infer that SML based focus measure computed in the spatial domain can be directly represented in the DCT domain. Moreover, as discussed in Section II-B, the formulation of (20) reduces the hardware-unfriendly multiplication operations from 2N 3 to just N . Fixed point operations are always hardware-friendly over floating point operations. To consider fixed point opearation in the DCT domain, Binary DCT [31], optimized Integer DCT [32], and efficient Integer DCT [15], [33] are employed in compression/coding framework. Inspired by this, we propose to compute the focus measure on the integer valued DCT coefficients. To further improve on hardware-friendliness, we modify expression (20) and propose the following as our focus measure (FM) where o and F o are the nearest integer values of and F D , respectively. To summarize, our focus measure is hardware-friendly for the following reasons. 1) For an N × N block, the focus measure computation requires N integer multiplications compared to N 3 floating point multiplications required in DCT+SF [19] and DCT+SF+CV [19]. 2) In our formulation of (21), the weights and modified DCT coefficients are all integer valued. Equation (21) can be represented as follows where M o is the diagonal matrix which is the nearest integer values of M defined in (12 for i ≤ 2 and j ≤ 2, which correspond to lower frequencies with w o i,j for i > 2 and j > 2 corresponding to high frequencies. 3) All the weights are integer-valued, and weights either increase or remain the same (monotonically non-decreasing function) with increasing frequency.
As mentioned already, it is well understood that when an image area is not focused, it is blurred. Such areas contain substantially more low frequency content than high frequency ones. On the other hand, when an area (except smooth area) is focused, it is comparatively sharper, comprising of high frequency content. Our focus measure exactly takes these aspects into account. First two properties of W o basically suggest that the influence of blurriness is discarded, and the third property shows that sharpness is included with the high frequencies getting higher weights.

V. PROPOSED MULTI-FOCUS IMAGE FUSION
As mentioned earlier, VSN adopts image compression or video coding for huge visual data to be transmitted. Figure 1 shows a schematic diagram of an all-in-focus system for VSN [16], [18], [19], where the framework considers DCT-based compression technique on 8 × 8 image/video frame blocks. Similar to [18], [19], let us consider the JPEG encoder framework, where the input image is divided into 8 × 8 non-overlapping blocks. Then, for an 8-bit image, levels in the image are shifted from [0-255] to [-128-127], after which block DCT transformation is performed. The DCT coefficients are quantized with a pre-defined quantization matrix. The quantized DCT coefficients are then scanned in a zigzag fashion for the application of run-length coding. Finally, the coded DCT coefficients are sent in the form of a bitstream to the receiving end. At the receiving end, the order of the processes is reversed, that is, decoding of the bitstream, dequantization, inverse DCT operation, and level shifting operation in that order are carried out. In such a compression framework for VSN application [18], [19], a multi-focus image fusion operation on DCT coefficients is the best suited one.
For simplicity, we will explain the fusion of two images/ video frames, and an extension to multiple images/video frames is straightforward. Our proposed approach consists of three modules, namely, estimation of a decision map for VOLUME 8, 2020 FIGURE 2. Initial and refined decision maps generated in our fusion approach to get the fused output.
fusion through our proposed focus measure, refinement of the decision map based on a spatial neighborhood relation, and finally, the fusion based on the refined decision map. The modules of the proposed approach are elaborated below.

A. COMPUTATION OF FOCUS MEASURE
The received encoded bitstream of the two images are first decoded and then dequantized. For the input images, say A and B, let us consider the (x, y) th block as A x,y and B x,y , respectively. Then, our focus measure (FM) is computed for both the blocks using (21). Let us denote the resulting FM values as FM A x,y and FM B x,y for A x,y and B x,y blocks, respectively. Note that a higher value of FM for a block means that it is more focused (less out of focus).

B. ESTIMATION OF INITIAL DECISION MAP
In the decision map, for each block, a decision is assigned in the form of +1/−1, where +1 indicates that at the time of fusion, the block from A x,y needs to be selected, whereas −1 indicates B x,y needs to be selected. We generate the initial decision map (D M ) as follows, Note that the above decision rule does not allow any provision of ambiguous case, that is, a case where there may be confusion of belongingness to a particular class. Moreover, the map D M takes the decision based only on the focus measure. So, this initial decision map may suffer from spatial inconsistencies. For example, it may happen that a block belongs to a particular class, whereas most of its surrounding blocks belong to the other class. Thus, a refinement is required, considering local spatial consistency.

C. REFINED DECISION MAP
Inconsistencies in the initial decision map may lead to generation of unwanted artifacts in the output image. The common consistency verification which is adopted in a few recent DCT based all-in-focus system is majority filter [10], [18], [19], [34]. In it, the majority in the neighborhood decides the class belongingness of the center being operated on. Sometimes [18] such an operation is carried twice sequentially for better consistency in the decision map, and therefore, we also do so but in a different manner. However in our case, we do a moving summation operation on the initial decision map to generate a modified decision map followed by another moving summation operation on the sign values of the modified decision map to generate the final refined decision map. Therefore, the refined decision map R M is obtained as follows where we consider, the neighboring (8-neighbor) blocks around the current central block of operation to incorporate the local spatial influence. The above operations in our refined decision map are computationally simple as elaborated in Section VI-C. Figure 2 shows examples of the decision maps.

D. FUSION
The inconsistencies in the initial decision map have now been handled by the refinement operation. Then, based on the refined decision map, we fuse the images choosing a block either from A or B. So, the fusion is as follows, Finally, the resulting DCT coefficients are quantized and coded, after which they are sent in form of the bitstream to receiving end, as mentioned earlier.

VI. EXPERIMENTAL RESULTS AND DISCUSSION
To analyze the effectiveness of our approach, we compare the proposed approach with existing technique applicable for VSN such as DCTav [16], [17], DCTcm [16], [17], DCTma [17], DCTe [17], DCTah [17], DCTch [17], DCTv [18], DCTvc [18], DCTs [19] and DCTsc [19], DCT+SVD [22], DCT+SVD+CV [22], DCT+EOL [20], DCT+EOL+CV [20],DCT+VOL [20], DCT+VOL+ CV [20], DCT+Corr_Eng [20] and DCT+Corr_Eng+ CV [20]. For our approach, we present results using only the initial decision map (Proposed) and using the refined decision map (Proposed+CV). We further compare our approach with a few latest generic multi-focus image fusion techniques that represent the state-of-the-art such as MFCNN [37], MFGFDF [38], MFRW [39], MADCNN [40] and ECNN [41]. First, we perform quantitative and subjective evaluation with multi-focus image fusion techniques for VSN. For quantitative evaluation, we perform no-reference evaluation, where outputs are evaluated with standard no-reference measures, as the ground truths are not available. Then, we perform a computation time evaluation on three image sizes with recent state-of-the-art techniques. For subjective evaluation, we present the visual comparison of the output images generated by the different techniques. We further present the initial and refined decision maps generated by the recent state-of-the-art techniques and our approach. Finally, we discuss the hardware friendliness and computational simplicity of the proposed approach, which make it suitable for VSN. In addition, we evaluate our approach quantitatively and qualitatively comparing it to the state-of-the-art and the latest generic multi-focus image fusion techniques.

A. QUANTITATIVE EVALUATION
As the ground truths of actual multi-focus images are not available, most of the techniques rely on evaluating the algorithm based on standard no-reference measures [28], [42]- [49]. Therefore, we evaluate the performance of the approaches on the standard multi-focus image database provided by the authors in [35] and Lytro [36] based on following standard no-reference quality measures for image fusion: • FMI (Q FMI ) [43], [44]: FMI of Haghighat et al. is an improved version of the widely accepted mutual information based measures for image fusion given by [50] and [51]. FMI measures the amount of information transferred from each of the input images to the fused image.
• Xydeas and Petrovic measure (Q AB/F ) [45]: Xydeas and Petrovic have proposed a no-reference quality measure, which includes the universal quality index of Wang and Bovik [52]. The measure computes the amount of gradient information (edge) transferred from each of the input images to the fused image. The measured value ranges from 0 to 1, with 0 indicating the worst result and 1 the best fused result.
• Pellia and Heijmans measure (Q W ) [46]: Pellia and Heijmans have proposed a no-reference quality measure, which also includes the universal quality index of Wang and Bovik [52]. The measure computes the amount of salient information transferred from each of the input images to the fused image without introducing distortion. The measure also includes similarity index [53] and human visual system sensitive edge information. Higher the value of the measure, better is the fused output.   Figure 3) from the database given by [35]. Table 1 presents the average results obtained for the 18 pairs of images from the database provided in [35]. One image    Figure 4) from Lytro database [36].  Figure 9) from [35] based on no-reference quality measures (best result in bold).
from each pair is shown in Figure 3. We also present the average results obtained for the 20 pairs of images from the Lytro database [36]. One image from each pair is shown  Figure 7) from [35] based on no-reference quality measures (best result in bold).  Figure 8) from [35] based on no-reference quality measures (best result in bold).
in Figure 4. We consider four individual images namely, Girl, Grass, Temple and Lytro-16 to present their quantitative performance in Tables 3, 4, 5 and 6. The results show that FIGURE 5. Source images: The 'Newspaper' pair of images from the database given by [35] and multi-focus image fusion with different techniques. our 'Proposed+CV' outperforms the rest in terms of all the six measures except in terms of Q W and Q AB/F in Table 5   TABLE 7. Comparison of CPU processing time (in seconds) of few recent state-of-the-art multi-focus image fusion techniques (Best in bold and second best in italic).
and in terms of Q W in Table 6 where our 'Proposed+CV' performs very close to the best. The above quantitative evaluation on two databases and four individual multi-focus image pairs show the superiority VOLUME 8, 2020 FIGURE 6. Source images: The 'Book' pair of images from database given by [35] and multi-focus image fusion with different techniques. of our approach. All the techniques are run on Matlab R platform in a system with Intel R Core(TM) i5-4590 CPU @ 3.30 GHz having 16 GB RAM. Now, we present the computation time comparison on three image sizes with recent state-of-the-art techniques in Table 7. The result shows that our approach with the initial decision map (Proposed) is faster than the rest of the recent state-of-the-art techniques. It also shows that Proposed+CV is better than all the other techniques shown here except for our approach with the initial decision map (Proposed). It further shows that the superiority of our approach is more evident with the increase in image size.

B. SUBJECTIVE EVALUATION
In Figure 5, 6, 7, 8, 9 and 10, we perform subjective evaluation of six standard images, namely, Newspaper, Grass, Temple, Girl, Book, and Pepsi, respectively, from the standard multi-focus image fusion database [35]. We compare our approach with best performing and recent/state-ofthe-art techniques like DCTv [18], DCTvc [18], DCTs [19] and DCTsc [19], DCT+SVD [22], DCT+SVD+CV [22], DCT+EOL [20], DCT+EOL+CV [20], DCT+VOL [20], DCT+VOL+CV [20], DCT+Corr_Eng [20] and DCT+ Corr_Eng+CV [20]. 'Newspaper' is one of the critical image pair where our 'Proposed' performs as good as or better than other techniques without consistency verification (CV). Further, the 'Proposed+CV' performs as good as or better than other techniques with or without consistency verification (CV). 'Book' is another critical image pair, where we can see that our approach performs the fusion properly. We can see that our approach with consistency verification (CV) is better than a few approaches like DCTvc, DCTsc, etc., and produces less artifacts (See the text 'terms' in the image). A similar performance can be seen from the results of other test image pairs, where our approach performs as good as or better than the rest of the techniques.
We present comparative subjective results of the initial and final maps generated by recent/state-of-the-art techniques including ours in Figure 11. Note that the final decision map of our approach and DCT+SVD+CV are binary, as both perform fusion selecting a block from either of the two images, whereas DCT+EOL+CV, DCT+VOL+CV and DCT+Corr_Eng+CV have gray final maps as the selection is either a block in one of the images or an average of the two. The figure shows that our decision maps determine the focused area with limited noise and initial and final maps are as good as or better than state-of-the-art techniques.

C. HARDWARE-FRIENDLY SOLUTION
Quantitative and subjective evaluation show that our approach is as good as or better than the other techniques. But as discussed in Sections I and II, VSN demands a computationally simple, hardware-friendly solution, which is our main motivation.
As shown in Section IV, our focus measure computation requires N fixed point multiplications over N 3 floating point multiplications as for DCTs, DCTsc on N × N block. Moreover, our focus measure does not require N 2 floating point squaring operations to calculate F D 2 (See expression (21) where F D represents the DCT transformed version of spatial domain signal F) as required in DCTs or DCTv. On the other hand, techniques like DCT+EOL, DCT+VOL and DCT+Corr_Eng require six non-diagonal matrix multiplication operations. Further, DCT+VOL and DCT+EOL require floating point element wise squaring operations and DCT+Corr_Eng requires floating point element-wise multiplications, divisions, square root operations. DCT+SVD requires singular value decomposition whose computational complexity is in the order of N 2 . Thus, in comparison to the existing state-of-the-art techniques, our approach requires N fixed point multiplications. Such reduction is a huge benefit in terms of energy consumption, and therefore, suits resource-constrained VSN. VOLUME 8, 2020 FIGURE 8. Source images: The 'Temple' pair of images from the database given by [35] and multi-focus image fusion with different techniques.
Apart from the focus measure, the state-of-the-art fusion techniques involve consistency verification through a decision map. The refinement of the decision map is performed through a filtering operation, and the final decision map is a non-binary map for techniques like DCTsc, DCT+EOL+CV, DCT+VOL+CV and DCT+Corr_Eng+CV. On the other hand, our modified decision map (See Section V-B) computation is filtering operation, which requires integer multiplication and addition on the initial decision map.
The refined decision map computation requires a similar filtering operation on the sign information of the modified decision map. The last part of the process is fusion, where techniques like DCTsc, DCT+EOL+CV, DCT+VOL+CV, and DCT+Corr_Eng+CV either select a block from two images or average two blocks to generate the fused image. But, our Proposed+CV (See Section V-D), only requires to perform move operation based on the sign bit of the decision map elements. Therefore, each module of our proposed approach is computationally simple and is very much hardware-friendly especially when compared with techniques like DCTs, DCTsc, DCTv, DCTvc, DCT+SVD, DCT+SVD+CV, DCT+EOL, DCT+EOL+CV, DCT+VOL, DCT+VOL+CV, DCT+ Corr_Eng, and DCT+Corr_Eng+CV.
To summarize our observations, quantitative and subjective evaluations show that our approach is as good as or better than state-of-the-art techniques. Computation time comparison shows the superiority of our approach. The hardware-friendly solution in DCT domain indicates that our proposed technique is best suited for resource-constrained VSN compared to other state-of-the-art techniques.

D. COMPARISON WITH GENERIC MULTI-FOCUS IMAGE FUSION
VSN demands real-time performance with hardware-friendly implementation in the DCT domain. There is a plethora of techniques [37]- [40], [54]- [56] proposed for generic multi-focus image fusion, which are not DCT based. These techniques are generally computationally intensive and require considerably more time to process. Few of the techniques also require graphical processing unit (GPU). Thus, most of these techniques are not suitable for VSN.
We perform no-reference based quantitative evaluation of a few latest generic multi-focus image fusion techniques that represent the the state-of-the-art such as MFCNN [37], MFGFDF [38], MFRW [39], MADCNN [40] and ECNN [41], which are presented in Tables 8 and 9, analogous to that in Tables 1 and 2, respectively. Consider the evaluations shown in Tables 8 and 9. From the tables we see that barring one existing technique for each database, no single technique performs the best with respect to more than one no-reference quality measure. Our approach gives the best Q AB/F performance for the database provided in FIGURE 10. Source images: The 'Pepsi' pair of images from the database given by [35] and multi-focus image fusion with different techniques. [35] and Q FMI performance for Lytro database. We also show quantitative evaluation on an individual image from Lytro in Table 10. We see similar performance (except MFRF performs better in two of two measures). Hence, we can infer that performance of our approach is comparable to the state-of-the-art in generic multi-focus image fusion, in spite of being fast, hardware-friendly and computationally simple, and most importantly, suitable for VSN. We also perform subjective comparison with the generic techniques and present the results in Figs. 12, 13 and 14. The results shows that the all the generic multi-focus image fusion techniques perform similar. The results also show that our 'Proposed+CV' performs similar to the generic multi-focus image fusion techniques.  We compare computation time (in terms of CPU time) of our approach with that of generic multi-focus image fusion techniques such as MFCNN [37], MFGFDF [38] and MFRW [39], which are presented in Table 11. The techniques including ours are run on Matlab R platform in a system of Intel R Core(TM) i5-4590 CPU @ 3.30 GHz having 16 GB RAM. MADCNN and ECNN are not considered as their codes are not available for the said platform. The results show that time taken by the generic techniques is considerably high. It is also evident that  in the Matlab platform with the said specification our approach can fuse video frames with 256 × 256, 512 × 512 and 1024 × 1024 size at 62 fps, 16 fps and 4 fps, respectively.

VII. CONCLUSION
The paper has shown that the DCT basis function and the block Laplacian operator are related in such a way that DCT operation on block Laplacian operator with proper boundary condition generates a diagonal matrix. The property is exploited to propose a novel focus measure, which can directly operate on the DCT coefficients to detect the focused region of an image. Such a solution is well suited for visual sensor network (VSN). Therefore, we propose an approach for multi-focus images or video frames in DCT based image or video coding framework for VSN.
The quantitative and qualitative evaluations show that our proposed approach outperforms all the techniques designed for VSN. Moreover, a simple and hardware-friendly fast approach of our approach is suitable for resource-constrained VSN. In addition, our approach performs similar to the stateof-the-art generic techniques for multi-focus image fusion.

APPENDIX A
To Prove: If A = XZ, where X and Z are two anti-diagonal matrices of size N × N , then A is a diagonal matrix whose diagonal element a i,i = x i,N +1−i × z N +1−i,i Proof: For anti-diagonal matrices X and Z, an element of x i,j or z i,j can be defined as follows To avoid confusion, let us assume that g i,j is a nonzero entry. So, the above matrices have values at the (i, N + 1 − i) index point. Now, for A = XZ, a i,j = N n=1 x i,n × z n,j . We know that x i,j = 0 except at the index (i, N + 1 − i). Therefore, a i,j can have a nonzero value x i,N +1−i × z N +1−i,j . But z i,j = 0 except at the index (i, N + 1 − i). This implies that we only can have nonzero values iff i = j. Therefore, a i,j = 0 except for i = j, which implies A is a diagonal matrix having a i,i = x i,N +1−i × z N +1−i,i . It is implied that the proof is true for all values of g i,j .

APPENDIX B
To Prove: If B = YZ, where Y is diagonal matrix and Z is an anti-diagonal matrix of size N × N , then B is an anti-diagonal matrix, whose anti-diagonal element b i,N +1−i = y i,i × z i,N +1−i .
Proof: To avoid confusion, let us assume that diagonal term of the diagonal matrix Y is nonzero. Thus, y i,j = 0 except for i = j. Now, for B = YZ, any element b i,j = N n=1 y i,n × z n,j .
As y i,j = 0 except i = j, b i,j can have a nonzero value when i = j. On the other hand, z i,j = 0 except at the index (i, N + 1 − i). Therefore, b i,j will have a nonzero value only at index (i, N +1−i). This implies B is an anti-diagonal matrix having b i,N +1−i = y i,i × z i,N +1−i . It is implied that the proof is true for any values of diagonal terms.

APPENDIX C
To Prove: If C = XYZ, where X and Z are two anti-diagonal matrices and Y is a diagonal matrix of size N × N , then C is a diagonal matrix such that the element c i,i = x i,N +1−i × y N +1−i,N +1−i × z N +1−i,i Proof: From the proof in Appendix B, B = YZ is an anti-diagonal matrix with b i,N +1−i = y i,i × z i,N +1−i . Again, X is an anti-diagonal matrix and from Appendix A, we can say that C is a diagonal matrix, where C = XB and c i,i =