Multi-Focus Color Image Fusion Algorithm Based on Super-Resolution Reconstruction and Focused Area Detection

Multi-focus image fusion is an image processing that generates an integrated image by merging multiple images from different focus area in the same scene. For most fusion methods, the detection of the focus area is a critical step. In this paper, we propose a multi-focus image fusion algorithm based on a dual convolutional neural network (DualCNN), in which the focus area is detected from super-resolved images. Firstly, the source image is input into a DualCNN to restore the details and structure from its super-resolved image, as well as to improve the contrast of the source image. Secondly, the bilateral ﬁlter is used to reduce noise on the fused image, and the guided ﬁlter is used to detect the focus area of the image and reﬁne the decision map. Finally, the fused image is obtained by weighting the source image according to the decision map. Experimental results show that our algorithm can well retain image details and maintain spatial consistency. Compared with existing methods in multiple groups of experiments, our algorithm can achieve better visual perception according to subjective evaluation and objective indexes.


I. INTRODUCTION
Due to the limitation of the depth of field in the optical lens, objects at different distances in the same scene cannot be fully focused by cameras. The area within the depth of field is usually a sharp focus area, while the area outside the depth of field is usually a blurry defocus area [1]. Multi-focus image fusion technology is used to extract different focus areas from multiple images in the same scene to synthesize a clear image.
The fusion process can also be regarded to improve the image quality on image processing [2], since clear all-focus images are more suitable for human visual perception systems and post-processing.
The associate editor coordinating the review of this manuscript and approving it for publication was Huazhu Fu .
Generally, the existing image fusion algorithms can be divided into pixel-level fusion, feature-level fusion and decision-level fusion [3]. Pixel-level image fusion directly processes the single pixel of the images to generate the fusion result. Feature-level image fusion works in a way by extracting the edge texture and structure information in an image. The region-based image fusion algorithm is an important method of feature-level fusion [4]. Decision-level image fusion makes use of the feature information from the feature-level image according to the specific requirements and then makes the optimal decision directly according to certain criteria and the credibility of each decision [5]. Pixellevel image fusion is the fundamental fusion of the three levels, on which our works focus on this paper.
Pixel-level image fusion algorithms are generally divided into three categories: methods based on transform domain, based on spatial domain and based on sparse representation [6]. The methods based on the transform domain are generally implemented by three steps: image multi-scale decomposition, fusion of the coefficients generated by the transformation, and multi-scale reconstruction based on the fused coefficients. Among the representative algorithms is image fusion based on Laplace pyramid transform [7], discrete wavelet transform [8], curvelet transform [9], contourlet transform [10], and shearlet transform [11] and so on. Spatial domain-based methods include image fusion based on: max-min filtering [12], image block matching [13], guided filter [14], dense-scale feature-invariant methods [15] and multi-focus image fusion algorithms based on boundary discovery [16]. Methods based on sparse representation include multi-focus image fusion based on sparse representation [17], convolution sparse representation [18], dictionary sparse representation [19], multi-scale transform and sparse representation [20], and so on. Although these methods can improve the fusion quality of the image to a certain extent, they still are not able to obtain the optimal fusion result due to the artifacts caused by low contrast and block effect.
In recent years, with the significant improvement in computing power, convolutional neural networks (CNN) have been widely used in image fusion, image segmentation, image classification, image recognition, image denoising and other fields. CNNs demonstrate their powerful automatic feature learning capability by designing a multi-layer network structure [21]- [25]. As an end-to-end model, CNN can deeply learn its features of multiple levels by setting with different levels. A shallow network has a convolutional layer with a small perceptual domain and can only learn features of some local areas, such as the structure of an image. A deeper network has a larger convolution perception layer and can learn more abstract features, such as image detail textures. Liu et al. [21] are the pioneers to bring convolutional neural networks into image fusion. They designed a dual network with two-channel weights to classify the focused and defocused pixels of the source image, then obtain the initial decision map, later segment the image and optimize it with post-processing algorithms, such as consistency verification. Tang et al. [26] proposed a pixel-level convolutional neural network to identify focused and unfocused pixels. They also created a multi-focus image training set to simulate different focus situations through masks with 12 geometric models. Mustafa et al. [27] proposed a multi-focus image fusion algorithm with a multi-scale convolutional neural network framework. Except for the basic convolutional layer, they introduced a set of 3 convolutional layers with different sizes of convolution kernels and added jump connections between convolutional layers to extract more features in multi-focus images. Ma et al. [28] proposed a boundary sensing method based on the residual network, which extracts features directly from the source image through a dual network and refined the decision map by post-processing.
Because multi-focus image fusion can effectively synthesize focused areas in multiple images in the same scene, the quality of the fused image depends largely on the successful detection of the focus area of the image. Authors in [29] proposed a multi-scale weighted gradient-based multifocus image fusion algorithm. This algorithm used a large scale to solve the fusion problem caused by image anisotropy blur and registration error, and small scale to determine the focus area boundary. In [30], Ma et al. proposed a two-scale multi-focus image fusion algorithm based on an enhanced random walk algorithm. This method uses a random walk algorithm on two-scale to detect focus areas. Although this method can improve the accuracy of detection on focused and defocused areas, when processing multi-focus images with complex backgrounds, a small number of areas with incorrect recognition still appear. In [15], the authors proposed a multifocus image fusion algorithm based on dense scale-invariant feature transformation. This algorithm uses a sliding window to extract the local feature of an image and detect focused areas of the image. Although this method can improve the fusion quality, artificial artifacts may still generate. In order to effectively eliminate noise and reduce artificial textures, in [31], the authors bring guided filter for the focus area detection. Because the guided filter can maintain the structure and edges of an image, this fusion algorithm proposed in [31] can effectively enhance image details. So we also employ a guided filter for focus area detection in this work.
In the detection of the focus area, the above methods usually use the original images to detect the focus area, which often leads to a large boundary error of the focus area. It is more serious for color image fusion. In order to better detect the focus area, many scholars try to carry out super-resolution for each source image to recover the lost image details during image acquisition, and then detect the focus area, so as to obtain better image fusion effect. In [32], Aymaz et al. firstly introduced the image super-resolution algorithm into multi-focus image fusion. They used the bi-cubic interpolation based super-resolution algorithm to retain more details of the original image in image fusion. In [33], Yang et al. proposed a multi-resolution image fusion algorithm based on super-resolution via convolutional neural network, which uses an end-to-end CNN mapping to improve the quality of the source image. In general, image super-resolution algorithms can be implemented by two types of methods, that is, simple interpolation methods and neural network based methods. These simple interpolation methods can be found in the industry: nearest neighbor VOLUME 8, 2020 interpolation, bi-cubic interpolation and bilinear interpolation [32]. Bi-cubic interpolation, for example, uses the relationship between adjacent pixels to do interpolation. This method is particularly prominent in retaining edge details, but this linear model is still insufficient to restore highfrequency details. The method based on neural network is to learn non-linear feature mapping of the low-resolution input image to the high-resolution target image, which can recover more details at high frequencies. With the development of the convolutional neural network, CNN is widely used in image super-resolution analysis. In [34], the authors proposed a deep learning method for single image superresolution (named as SRCNN). SRCNN has a lightweight structure, which can achieve fast speed for practical on-line use. However, the structure of SRCNN is relatively simple, only three convolutional layers are used for feature extraction, which will lose some image details. In [35], in order to extract more image detailed information, a 20-layer deep neural residual network was introduced into super-resolution image reconstruction. The introduction of residual structure not only improves the convergence of model training speed, but also brings a deep network structure with a wider receptive field. Although residual learning can better retain the highfrequency details, it still brothers the low-frequency errors in the structure. In order to better restore the image, we use the DualCNN in [36] for image super-resolution reconstruction. The DualCNN network is divided into two channels, the deep-level network is used to estimate the details, and the shallow-level network is used to estimate the structure. By learning the non-linear feature mapping between input and output, a high-quality image is finally generated.
In multi-focus image fusion, the resolution of the source images has a great influence on the quality of the fused image. Low-resolution images lead to poor fusion effect in the image fusion processing. The super-resolution reconstruction can not only make the image contain more image information, but also improve the quality of the image. Simultaneously, the performance of multi-focus image fusion is affected by the accuracy of image focus area detection. The higher the sharpness of the image is, the more accurately the focus area can be identified, which can improve the fusion effect. Therefore, we use a super-resolution image to improve the quality of source images. Color images contain rich colors and texture information. However, the defocus areas in color images have poor edge and texture information. Through super-resolution reconstruction, we can restore the edge and texture information in defocus area of color images, and can obtain better fusion performance.
To get better color fusion image, we propose a color image fusion algorithm based on image super-resolution reconstruction. Firstly, we use DualCNN to reconstruct the superresolution of color images. Secondly, in order to suppress the noise generated by super-resolution, we introduce bilateral filtering to denoise the reconstructed super-resolution images and optimize the image edges. Then, the difference images are generated by the super-resolution reconstruction images and the source images. From the initial focus area detected by combining with the above difference images and guide filter, we can obtain the initial decision map. Later, the final decision map is obtained by small area removal strategy and refined by the guide filter, and the fusion image is further obtained. The main contributions of the proposed method can be summarized as follows.
(1) Super-resolution reconstruction based on DualCNN is applied to enhance the detailed information and improves the image resolution of the color images, which can improve the clarity of the color images and can increase the accuracy in the recognition of the focus area to generate better fused images.
(2) The bilateral filter is used for suppressing the noise in super-resolution image and guided filtering is used for enhancing the image details in the focus area detection, which can maintain the edge structure and the spatial consistency  of the images. Compared with other algorithms, the artificial artifacts generated after fusion can be effectively avoided by the proposed method. Our algorithm is very efficient and suitable for big data applications.
The rest of this article is arranged as follows: In section 2, we mainly introduce some related work used in our algorithm. In section 3, we illustrate the framework of our algorithm. In section 4, we compare the proposed algorithm and other algorithms in color multi-focus images and gray-scale multifocus images separately. Finally, we give some conclusions in section 5.

II. RELATE WORK A. CONVOLUTIONAL NEURAL NETWORK
Generally, the framework of CNN is divided into 3 parts, namely the convolutional layer, the pooling layer and the fully connected layer [37]. A classic convolutional neural network framework is shown in FIGURE 1. Among all the layers, the convolution layer is the core layer for designing a neural network, which uses the convolution kernel to extract different levels of features from an image.
CNN convert the input image into a feature map by using a set of filters in the convolution layer and transform this set of feature maps into another set of feature maps by all convolution layers. This process can be described as follows.
where x i is the i − th feature map of the input, y j is the j − th feature map of the output, k ij is the convolution kernel of x i and x j , b j is the bias, and * is the convolution operator. Conventionally, super-resolution can be processed by using simple nearest-neighbor interpolation or bi-cubic interpolation, but artificial noise often appears in the recovered images. Due to the advantages of CNN in feature extraction, we choose it for super-resolution reconstruction.

B. BILATERAL FILTER AND GUIDED FILTER
The bilateral filter is a kind of widely used non-linear filter, which can effectively reduce the superimposed noise while maintaining the edge details of the image [38]. The output pixel value is obtained by the weighted combination of neighboring pixels, as shown in formula (2): where I p is the input image, BF(I ) p is the filtered image, p is the pixel points in the image, and N (p) is the neighborhood pixels near the pixel p whose size is (2N + 1) × (2N + 1). G σ s and G σ r represent the weights of the spatial domain according to the geometric space distance and the intensity domain according to the pixel difference, respectively. σ s is the spatial proximity factor, σ r is the intensity similarity factor, G σ s and G σ r are two factors that affect the filtering effect, G σ s and G σ r are calculated as shown in formula (3)(4).
where W p is the product of the weight coefficients in the spatial domain and the intensity domain, which can be calculated as formula (5).
In general, the larger the value of σ s , the more blur the flat area of the image, and the larger the value of σ r , the more blur the edge area of the image. So, the effect of bilateral filter is controlled by σ s and σ r . Because the fusion effect of multi-focus images is usually disturbed by noise, we use bilateral filter to denoise the multi-focus images after superresolution, as well as maintain good edge details. In order VOLUME 8, 2020 to obtain higher quality images, σ s and σ r are set to 3 and 0.6 respectively. At the same time, we set N in the neighborhood size of the pixel to 3.
The guided filter is a linear filter, which can better retain the details of the image and avoid the computation depending on the size of the filter [14], [39]- [41]. Assuming that the output image is q and the guide image is I , the local linear relationship between the output image q and the guide image I can be described as in formula (6).
where i and k are the indices of the pixels, and a, b are the coefficients of the linear function when the neighborhood center is at k. ω k is a local window of size (2r + 1) × (2r + 1) with the pixel k as the center. The next step is to solve the coefficients of the linear function by minimizing the error between the output value q and the filtering image p, that is: where ε is a regularization parameter, α k and b k can be calculated by linear regression, as: where µ k and σ 2 k are the mean and variance of the input image I in the local window ω k , and |ω| is the number of pixels in the local window ω k . p k represents the mean in the local window ω k .
In our experiments, the guided filter process can be represented by G r,ε (p, I ), which is determined by the parameter r and the regularization parameter ε, where we set r and ε to 3 and 0.3, respectively.

III. PROPOSED ALGORITHM
In order to overcome the problem of inaccurate detection of focus area caused by low resolution in color image fusion, we propose a color image fusion algorithm based on superresolution reconstruction.
The steps of the algorithm proposed in this paper are shown in FIGURE 2. As for two source images, firstly, DualCNN is used to recover the super-resolution source images from the two components of structure and detail. Since the superresolution images usually contain slight noise, the bilateral filter is employed for denoising. After filtering, the difference between the super-resolution image and the source image is used to obtain an initial difference image, and the initial difference image is refined by the guided filter to obtain a refined difference image. Secondly, we use pixel-max rules to obtain the initial decision map (IDM) from the refined difference image. Then, the small region removal strategy and the guided filter is used to optimize the IDM to get the final decision map (FDM). Finally, the final fusion image is obtained as a summed image from the source images weighted by the FDM.
Ideally, the focus area has more high-frequencies than the defocus area, so focus area detection is an essential process in image fusion. Our algorithm mainly includes the following steps: image super-resolution, focus area detection and image fusion. 90764 VOLUME 8, 2020

A. IMAGE SUPER-RESOLUTION
In order to better recover the structure and details of the image, we use DualCNN for super-resolution reconstruction, whose network model is shown in FIGURE 3.
In  The structure and details of DualCNN used in this paper are like those in [36]. In FIGURE 4, the red dotted block on the left is the Net-S network structure. The network structure is with 3 convolution layers, and the filter size of each layer is 9 × 9, 1 × 1, 5 × 5, the depth of the feature map of each layer is 64, 32 and 1 respectively. In FIGURE 4, the green dotted block on the right side is the network structure of Net-D. The network structure contains 20 convolutional layers, the size of each layer of filters is 3 × 3, and the depth of the feature map of each layer is 64.
In FIGURE 3, let X, S, and D be the truth image (highresolution image), the output of Net-S, and the output of Net-D, respectively. When training the network, the error between the final output of the network and the truth image should be minimized, which can be defined as the loss function of the network, such as where φ(S) + ϕ(D) is the output of DualCNN, φ() and ϕ() are the same as the functions used in [36]. Since DualCNN has two channels, if we only use the loss function of formula (10), it will be unstable in model training. Then, we use two separate loss functions to normalize the upper and lower channels. The loss functions of Net-S and Net-D are shown in formulas (11) and (12).
where S gt and D gt are the corresponding ground-truth values of Net-S and Net-D output. Combining the equations (10)- (12), the overall loss function of the DualCNN can be defined as: where α, λ and γ are regularization parameters, which is set to 1, 0.001 and 0.01. The training data set we used is from ImageNet. During the training, the selected 300 high-quality images (the groundtruth image X ) are first iterated 5 times by Gaussian blurring, and a total of 1500 low-resolution images are generated namely S gt . The low-resolution image S gt is up-sampled by the bi-cubic up-sampling method as the network input. The ground truth value D gt is obtained from the difference between the truth label X and the structure S gt . During the training process, we set the learning rate to 0.0001. We used the random gradient descent optimizer to minimize the loss function, set a reasonable regularization term of the loss function to prevent the model from over-fitting, and compared the PSNR between super-resolution images generated by the training and the original image to evaluate the performance of the training model.

B. FOCUS AREA DETECTION AND IMAGE FUSION
Let I 1 and I 2 represent as two color source images. Firstly, the structure and details of the source images I 1 and I 2 are restored by using DualCNN. Then, a bilateral filter is used for denoising, we can get image Sr 1 and Sr 2 . The initial difference image IDI 1 and IDI 2 can be calculated by IDI 1 (x, y) = |Sr 1 (x, y) − I 1 (x, y)| IDI 2 (x, y) = |Sr 2 (x, y) − I 2 (x, y)| where Sr 1 and Sr 2 are super-resolution images after bilateral filter, and |·| is the absolute value operation. Secondly, we use Sr 1 and Sr 2 as guide images to enhance the high-frequency information in the initial difference image IDI 1 and IDI 2 to obtain the refined difference image RDI 1 and RDI 2 RDI 1 = G r,ε (IDI 1 , Sr 1 ) Then, in order to retain more information about the image, the pixel-value-max rule is used to get the initial decision map (IDM) shown in formula (16): VOLUME 8, 2020  Although previous focus area detection methods have obvious effects, some outliers and isolated regions often appeared in the initial decision map (IDM). In our method, we use small area removal strategy in [21] to optimize the initial decision map. In this paper, the region threshold is set to 0.01 × H × W , where H and W are height and width of the source image. After the small area removing, we can get the refined decision map (IDMR). The initial fusion image I CF is obtained by using IDMR as shown in formula (17).
Finally, the final decision map can be got by FDM = G r,ε (IDMR, I CF ). And the final decision map (FDM) and the source image are subjected to pixel weighted fusion rules to obtain the final fused image I F , as shown in formula (18):

IV. EXPERIMENTAL RESULTS
In order to effectively evaluate the fusion performance of the proposed algorithm in different degrees of focus images, we test our method in color multi-focus image fusion test set named Lytro database (as shown in FIGURE 5) and commonly used gray multi-focus images (as shown in FIGURE 7). The performance indicators from both subjective and objective aspects are compared with the other nine representative multi-focus fusion algorithms. The compared algorithms are listed in the following. (1) A multifocus image fusion algorithm based on a convolutional neural network (CNN) proposed in [21]. (2) A multi-focus image fusion algorithm based on dense scale-invariant feature transformation (DSIFT) proposed in [15]. (3) A multi-focus image fusion algorithm by using boosted random walk based algorithm with two-scale focus maps (BRW) proposed in [30].
(4) Image fusion processing based on guided filter (GFF) proposed in [14]. (5) Multi-focus image fusion algorithm based on multi-scale weighted gradient fusion (MWGF) proposed in [29]. (6) Multi-focus image fusion algorithm based on image matching (IM) proposed in [13]. (7) Image fusion algorithm based on sparse representation (SR) proposed in [20]. We also make subjective and objective evaluations of the fused images generated by different algorithms. In the objective evaluation of images, we use eight kinds of objective evaluation indicators including indexes based on information theory (included the measures of feature mutual information Q FMI [31], the measures of normalized mutual information MI [43], Tsallis entropy Q TE [43], and nonlinear correlation information entropy Q NCIE [43]), index based on structural similarity Q Y [43], index based on human perception Q CB [43] and index based on image features (included metric based on edge information Q G [43] and metric based on phase consistency Q P [43]). The value of the index is larger, the effect of the fused image is better. At the same time, we also compared the computational efficiency of algorithms with the criteria of runtime.

A. COLOR MULTI-FOCUSED IMAGE
In order to test the proposed algorithm, the Lytro color multifocus image fusion test data set [19] is fused with the above image fusion algorithm. FIGURE 5 shows eight pairs of color source images with test data set.   In FIGURE 6.6 (d), the upper boundary part of the image has obvious artificial texture. The fused image in FIGURE 6.7 can fully fuse the focus area of the two source images, retain the detail information of the image. FIGURES 6.8 (d) and 6.8 (g) do not show the small ''monkey'' outline well, the spatial consistency is poor and some spatial information is lost.
In general, MGF smooths the high-frequency details of the image. Although SR, IM and GFF can improve the quality of fused image, they cannot avoid the block effect produced on the edge areas. PCNN and CNN can effectively classify the pixels in the focus area and the pixels in non-focus area through an end-to-end non-linear mapping. However, in smooth area where the focus area and the defocus area intersected, there are often classification errors by PCNN and CNN. DSIFT, BRW and MWGF are three representative methods for measuring the focus area, which can get relatively accurate in focus areas of image but are insufficient to handle the edge parts. Our method can classify these smooth areas well and retain more detailed texture.
To show the fusion effect of different fusion algorithms ,  TABLE 1 and TABLE 2 give all objective evaluation indexes of each fusion algorithm. In the given evaluation indexes, larger value means the algorithm has better performance. The bold value indicates the best objective evaluation index. From TABLE 1 and TABLE 2, we can find that our method has the best objective evaluation indexes except just little weak on Q CB in FIGURE 6.1 and Q Y in FIGURE 6.6. So, our algorithm is superior to other algorithms for both the visual effects of the image and the objective evaluation indexes.

B. GRAY MULTI-FOCUSED IMAGE
Our algorithm not only has a good fusion effect on color multi-focus images, but also can be applied to gray-scale multi-focus images. Here we use 4 pairs of widely used multifocus images as the source images to compare with all the algorithms. The gray-scale multi-focus images are shown in FIGURE 7. In order to verify the effect of the fusion algorithm on gray-scale images, we still use the same algorithm for grayscale multi-focus images as for color multi-focus images. The comparison of the four groups of images is shown in FIGURE 8-13. shows that our algorithm can retain more image texture, and the fused image has a better visual effect.
In order to better show the performance of the proposed algorithm, we zoom in on some areas. FIGURE 9 and FIGURE 10 are enlarged areas of the green frame part and the red frame part of FIGURE 8. FIGURE 9(a) and 9(c) have artificial artifacts on the edge part of the Pepsi cup, and the fusion image is blurred. FIGURE 10 (f) and 10 (g) have breakpoints at the edges of Desk. Figure 9 (j) and FIGURE 10 (j) show that the fusion image obtained by our method has a better visual effect .  TABLE 3 gives objective evaluation indicators for all algorithms in FIGURE 8. TABLE 3 shows that our algorithm has the best values in the evaluation indexes of Q PC , Q Y , Q CB , Q FMI and Q NCIE , which means that our algorithm can effectively maintain edge details. Although the proposed algorithm does not have the best results in every evaluation indicator, it has six good objective values and has satisfactory results on processing the details.  FIGURE 11(i) show that MWGF and MGF bring block effect on the right side of the fused image, which leads to poor image fusion quality. There are some spots in FIGURE 11 (f1) and FIGURE 11 (g1), which are pixels caused by IM and SR misclassification. FIGURE 11(b) and FIGURE 11(c) show that DSIFT and BRW has a good fusion effect, but some details of texture are missing. FIGURE 11 (h) shows that PCNN brings a small amount of artificial texture in the corner of the Book. FIGURE 11 (d) shows that GFF smooths the details of the image too much, which makes the fused image have poor visual effects. Compared with other algorithms, the CNNbased algorithm and our algorithm can well retain the image details and obtain better visual effects.  and Q FMI , which means that our algorithm can effectively maintain spatial consistency of image. Compared with DSIFT and CNN, the evaluation indexes of our method are slightly worse in MI , Q CB , Q TE and Q NCIE but the overall evaluation index shows that our algorithm produces a relatively satisfactory fusion result. Although the proposed algorithm has no advantage in terms of computation compared with GFF and MGF, it has better image clarity compared with other algorithms.    FIGURE 12(f) shows that IM cannot detect focus area accurately, which leads to some details of texture lost in the left of the small Clock FIGURE 12(a) and FIGURE 12(c) show that CNN and BRW bring slight artificial texture on the fused image. Our algorithm can fully extract the focus area between small Clock and large Clock without information lost and achieve satisfactory fusion results. TABLE 5 gives objective evaluation indicators for all algorithms in FIGURE 12. TABLE 5 shows that our algorithm has the best values in the evaluation indexes of MI , Q G , Q Y and Q FMI , which means that our algorithm can accurately identify the focus area of the image. Compared with BRW and IM, the evaluation indexes of our method are slightly worse in Q PC , Q CB , Q TE and Q NCIE , but the overall evaluation index shows that our algorithm produces a relatively satisfactory fusion result. FIGURE 13 gives all of the fusion images and difference images of the Desk with all image fusion algorithms. Compared with other algorithms, our algorithm eliminates the texture artifacts in the lower-left corner of the image, and correctly detects the focus and defocus areas. Table 6 gives objective evaluation indicators for all algorithms in FIGURE 13. Table 6 shows that our algorithm has the best values in the evaluation indexes of Q TE , Q PC , Q Y and Q FMI . Although Q G and Q CB are not optimal in all comparison indexes, they are comparable to the best indexes.

C. DECISION DIAGRAM
In order to evaluate the performance of the proposed algorithm, we compare the decision map of different algorithms. In this section, we select the five groups of color multifocus images in FIGURE 5 (a1-a5) and FIGURE 5 (b1-b5) in Section 4.1, and the three groups of gray multi-focus images in FIGURE 7 (a1-c1) and FIGURE 7 (b1-c1) in Section 4.2. The algorithms for comparison are CNN, DSIFT, TS, GFF, MWGF, PCNN and our algorithm. All the decision maps are shown in FIGURE 14 (a-g). FIGURE 14 lists the decision maps of the seven algorithms. From top to bottom, they are the decision maps of CNN, DSIFT, BRW, GFF, MWGF, PCNN and the proposed method. FIGURE 14 (d) shows that the GFF loses more detailed information compared with other algorithms in the defocus and focus areas. FIGURE 14 (e) shows that MWGF can obtain good performance in most cases, but often damage the boundaries of some targets during fusion. As shown in FIGURE 14(a) and FIGURE 14(f), CNN and PCNN can effectively segment the focus area of a color multi-focus image, but it still has some incorrectly segmented pixels in some gray-scale multi-focus images. As shown in FIGURE 14(b), DSIFT incorrectly detected the focus area or defocus area that appears at the edges of the image. In FIGURE 14(c), although the BRW can well identify the focus area and defocus area, in the center of the color multi-focus image there is still some defocused area that is  incorrectly identified. It can be seen from FIGURE 14 (g) that our algorithm can effectively detect the focused and defocused areas in the fused image whether it is a color multifocus image or a gray-scale multi-focus image, which proves the performance of the proposed algorithm.
Because the color multi-focus images contain more image information and higher contrast than the gray-scale multifocus images, the color multi-focus images can recover more high-frequency details after super-resolution reconstruction. Super-resolution reconstruction plays a very important role in focus area detection and generating high quality fused images. Therefore, our algorithm is more effective when applied to color multi-focus images.

D. COMPARED ON LARGER DATA
In order to fully demonstrate the effectiveness of the algorithm, we compare the proposed method with the other nine fusion methods in a lager data. The data contains Lytro color multi-focus image data in [19], multi-focus image data in [16] and other multi-focus image data in [15] and [37]. We calculate the mean value of all the evaluation indexes of all test fusion methods in all the test data, as shown in FIGURE 15.
FIGURE 15 (a) gives the graph of the mean objective evaluation indexes of MI , Q G , Q PC and Q Y of all the test images of the proposed algorithm and nine representative multi-focus fusion algorithms. It can be seen from FIGURE 15 (a) that MGF and MWGF have no advantage with other algorithms, the proposed algorithm has obtained the best indexes in MI , Q G , Q PC and Q Y indexes. FIGURE 15 (b) gives the graph of the mean the objective evaluation indexes of Q CB , Q FMI , Q TE and Q NCIE of all the test images of the proposed algorithm and nine representative multi-focus fusion algorithms. It is not difficult to see that there is little difference between all the comparison algorithms under the indexes of Q NCIE and Q TE . The other indexes Q CB and Q FMI in FIGURE 15 (b) show that the proposed algorithm has the best value among all the fusion methods. FIGURE 15 shows that the proposed algorithm has better performance than other algorithms.

V. CONCLUSION
This paper proposes a novel image fusion algorithm based on focus area detection via image super-resolution reconstruction with DualCNN. This method uses the DualCNN to restore the source image details, combined with bilateral filter and guided filter, which can well maintain the spatial consistency of the fused image. The experimental results show that the proposed algorithm can detect the focus area better than other algorithms and produce a clearer fusion image. The fusion image is not only with better visual perception than other algorithms, but also with an advantage in terms of computational efficiency.