3M: A Multi-Scale and Multi-Directional Method for Multi-Focus Image Fusion

Based on the analysis of the multi-scale-based and sparse representation-based multi-focus image fusion method, we believe that a multi-scale fusion method should consider the directivity of high/low frequency sub-bands and low computational complexity, ensuring the effectiveness and efficiency. Therefore, we propose a novel multi-focus image fusion algorithm based on multi-scale and multi-directional dictionaries. In the proposed method, source images are decomposed by multi-scale transform(MST) to obtain high/low frequency sub-bands. For the high-frequency part, a direction contrast-based fusion rule is presented. For the low-frequency part, the image patches are divided into strong/weak information patches. The strong information patches are merged with a fusion approach based on directional dictionaries, while the weak information patches are fused using weighted average. Finally, the fused image is obtained by performing the inverse MST. The experimental results show that the proposed approach extracts more important effective information from source images, and the computational efficiency is greatly improved.


I. INTRODUCTION
It is well known that optical sensors can only focus on targets with the same depth-of-field(DOF), and targets that are defocus are fuzzy. These images are called single-focus images. Since the need of images with all clear targets is getting higher and higher in the follow-up process of machine vision and in the fields of military and public safety, the development of multi-focus image fusion technology, to obtain an all-infocus image from single-focus images, becomes more and more important.
Numerous studies have attempted to explain multi-focus image fusion from different point of view [1]- [3]. In recent years, there has been an increasing amount of literature on multi-scale decomposition methods for their good multiresolution and multi-directional characteristic of high frequency [4], [5]. As early as 1990s, [6], [7] used discrete wavelet transform(DWT) to process multi-focus images, better extracting detailed information from source images such as edge contours. However, the traditional wavelet transform can only capture high frequency information in three directions (horizontal, vertical and diagonal). Therefore, dual-tree complex wavelet transform(DTCWT) [8], curvelet(CVT) [9], The associate editor coordinating the review of this manuscript and approving it for publication was Yi Zhang . non-subsampled contourlet transform(NSCT) [10], nonsubsampled shearlet transform(NSST) [11], fast finite shearlet transform(FFST) [12] and other MSTs were proposed successively. These MSTs have multiple directions in the high frequency part and directional selectivity, therefore good results are achieved.
We find that the MST-based fusion methods usually have the following characteristics: (1) most of fusion strategies are based on 'max-absolute' rule for high-frequency part; (2) weighted average rule is used for low-frequency part [13], [14]. We believe that the high frequency fusion rule doesn't take into account surrounding parameter information; the low frequency fusion rule hasn't self-adaptability and doesn't make use of direction information. This kind of methods are abbreviated as MST(L,M,R,A) in the paper, where MST is the multi-scale transformation used, L is decomposition level, M means 'max', R is the negative power of the threshold value and A means 'average'.
Along with the multi-scale methods, several attempts have been made to adaptive sparse representation(SR) based on dictionary learning [15]- [17]. For example, in 2010, Yang and Li applied SR to multi-focus image fusion, in which the sparse representation coefficients are fused [18]. In 2014, Liu and Wang proposed ASR model, in which instead of a single redundant dictionary, a set of more compact sub-dictionaries are used. Each sub-dictionary is adaptively learned from numerous high-quality image patches that are selected in a given direction [19]. In 2018, a fusion scheme based on image cartoon-texture decomposition and sparse representation is proposed. A spatial-based morphological structure preservation method is proposed for cartoon components. A sparse representation method is proposed for texture components and dictionaries with strong representation ability are trained [20].
Improved approaches based on the combination of MST and SR have been attempted. In 2013, Wang et al. proposed a fusion method combining NSCT and SR [21]. In 2015, Liu et al. adopted complementary advantages of MST and SR, and proposed a general image fusion framework(NSST is not included) [22]. In 2017, Qiu et al. proposed a fusion method based on NSST and SR [23]. This kind of methods is abbreviated in this paper as MST(L,Ms,R,D), where Ms means the absolute values of coefficients as activity level measurement rule for high-frequency part, D means dictionary coefficient max rule for low-frequency part.
Compared with MST(L,M,R,A), MST(L,Ms,R,D) methods has the following characteristics: (1) the high frequency part has multi-scale, directionality and fast algorithm. (2) The fusion rule for low frequency part is adaptive, but lacks directivity.
Based on the above analysis, we propose a novel multifocus image fusion approach based on multi-scale and multidirectional dictionaries. The proposed method makes better use of the complementation of the MST-based and SR-based fusion approaches, and makes up the shortcomings of the MST(L,Ms,R,D) methods by full using of direction information of the high/low frequency part.
To be more specific, there are three main contributions in this new method.
1) A novel fusion rule for high frequency sub-band. This rule is based on direction contrast and uses direction weight of each direction of each scale. Since the highfrequency sub-bands that are obtained by the MST decomposition have multi-directional characteristics, taking full use of the direction information in high frequency fusion rule will greatly improve the fusion results. 2) In the low frequency part, image patches containing useful information can be divided into strong information patch and weak information patch. Many patches with small variance are called weak information patches and the fusion rule for them can simply be weighted average. This approach for non-subsampled MSTs greatly decreases the number of image patches to be processed, reducing computational cost greatly.

3) A new fusion method for strong information patches
is proposed in the low-frequency part. These patches with direction information are adaptively sparsely represented by directional dictionaries and the ''L1-max'' fusion rule is performed on the sparse representation coefficients. In this method based on the directional dictionaries, the strong information patches are divided into several finer direction groups, and the fusion rule makes the best of the direction information of the lowfrequency image patches.
The rest of this paper is organized as follows. Section 2 combines high and low frequency fusion rule to get a complete multi-focus image fusion model. Section 3 discusses the high frequency sub-band fusion strategy based on direction contrast, including the determination of the MST decomposition level. In Section 4, the learning method for directional dictionaries is introduced. The concepts of strong/weak information patchs of low-frequency sub-band and the fusion rules for these two kinds of patches are proposed. The fusion performance of the proposed method is verified by seven sets of experiments in Section 5. Section 6 summarizes the main content of this paper.

II. MULTI-FOCUS IMAGE FUSION METHOD BASED ON MST DIRECTION
Suppose there are M source images {I 1 , I 2 , · · · , I M }, and the size of the source images is X , Y . The proposed method is summarized in Algorithm.1.
To the best of our knowledge, this is the first attempt to combine the multi-direction in the high frequency part and the directional dictionaries in the low frequency part for multifocus image fusion.

III. HIGH FREQUENCY SUB-BAND FUSION RULE
The key problem of high frequency fusion is to determine the number of MST decomposition level and full utilization of the directivity of high frequency sub-bands.

A. MST DECOMPOSITION LEVEL
The determination of the decomposition level is a key to the MST-based fusion methods [24]. If the decomposition level is too small such as 1 or 2, there will be a lot of spatial details that are not separated from the source images; if the decomposition level is too large, more than 4, the high frequency sub-band fusion will be more sensitive to noise and mis-registration [22]. On the basis of [22], the decomposition level is set to 3 or 4 in the paper, making the method more robust to mis-registration.

B. HIGH FREQUENCY SUB-BAND FUSION RULE BASED ON DIRECTION CONTRAST
After MST decomposition, the energy of the high-frequency part of the source images is concentrated on a few coefficients [25]. These sparse high-frequency coefficients represent detail information such as edges and textures at different scales and different directions. It should be pointed out that due to the vanishing moment property of the MSTs, at the same position the coefficient energy of the clear region is greater than that of the fuzzy region [4]. Therefore, 'max absolute' fusion rule is commonly used for the highfrequency fusion. This method is simple, but the correlation between pixels in image is not taken into account, which can easily lead to the loss of redundant information of the source images and the introduction of artificial information. It can't well extract the detailed information from the source images, and reduces the contrast of the fusion image.
Considering the correlation between image pixels, a fusion rule based on traditional contrast was proposed in [26]. Traditional image contrast is defined as where HP is the local gray level of the image, HB and HP−HB are respectively equal to the low frequency and high frequency component after image transformation. Traditional contrast takes into account the correlation between pixels of image points, ignoring the correlation between the coefficients in the region and the directivity of the high frequency sub-bands. In order to comprehensively consider the correlation between regional pixels and make better use of contrast with different scales and different directions, the direction contrast is defined by the regional window B p,q (0, 1 matrix) and the direction weight ω. The direction weight accords to the directional characteristics of the high frequency sub-bands, and every direction at every scale of the high frequency sub-bands have its own weight.
Let α p,q is the direction angle of the q-th high frequency sub-band at the p-th scale. The elements of regional window matrix B p,q with size N ×N is defined as 0 except the elements located between s1 = tan(α)r(n)+1 and s2 = tan(α)r(n)−1, where the value is set to 1. In the formula above, n is from 1 to Then the direction weight of the q-th directional sub-band at the p-th scale can be defined as where are the high frequency sub-bands. Using the convolution of them with respect to ω p,q , we have It can enhance directional features of H p,q I m and reduce noise. Now the direction contrast can be defined: where N (i, j) is the neighborhood window with pixel point (x, y) as the center and size is N × N . The direction contrast associates the low frequency sub-band with the high frequency sub-bands, and is obtained by adding up contrast information of all the high frequency sub-bands. So the contrast information of the source images I m is clearly reflected by the matrix C I m .
The high frequency sub-bands fusion rule is established based on direction contrast matrix C I m M m=1 . The high frequency sub-bands H p,q This high-frequency fusion method more comprehensively considers the correlation between regional pixels and the contrast of different scales and different directions, which can better extract the effective information from the high frequency sub-bands of the source images.

IV. LOW FREQUENCY SUB-BAND FUSION RULE
The two key problems of low-frequency fusion are the classification of strong /weak information patches, and the fusion method of strong information patches based on directional dictionaries.

A. STRONG/WEAK INFORMATION PATCHES
In the MST-based multi-focus fusion methods, the decomposition level should be set to 3 or 4, therefore a large amount of basic information that reflects the overall characteristics of the image still remain in some low frequency patches. Figure.1 and Figure.2 show the percentage of the low-frequency image patches, whose variance all less than 10 −3 after six MSTs (DWT, DTCWT, CVT, NSCT, NSST, FFST) respectively.   We can see that for the MSTs with down-sampling, such as DWT, DTCWT, CVT, etc., most of the low frequency image patches contain useful information. Due to the downsampling the size of low frequency part is much smaller than that of source images, which makes the information in the low frequency sub-band highly concentrated. Furthermore, for non-subsampled MSTs, such as NSCT, NSST, FFST, etc., the size of the low-frequency sub-band is the same as that of the original images, and the useful information is scattered. Therefore, the low-frequency sub-bands of this kind of MSTs contain many patches with small variance.
The percentage of patches with variance greater than 10 −3 and have at least one of the four direction information is showed in Figure.3 and Figure.4. It is observed that the percentage of patches which have direction information goes from zero to 90%, depending on the different MST used.
This analysis suggests that, to make full use of the direction information while reducing the computational cost, the image patches of the low frequency part should be divided into weak/strong information patch depending on the variance of each of image patches. Given a threshold, if the variance of image patch less than threshold, this image patch is weak information patch(smooth image patch); otherwise, it is strong information patch (image patch containing lots of useful information). Since the smooth weak information patches contain less detailed information, the simple weighted average rule can be directly performed. For the strong information patches, the sparse representation fusion rule based on directional dictionaries is used to better utilize direction information of strong information patches. This divided and conquer method can greatly improve the fusion results and computational efficiency.

B. DIRECTIONAL DICTIONARIES LEARNING
In order to make better use of the direction information of strong information patches, a fusion method based on directional dictionaries is proposed.
As we all know, the high frequency of each scale after the image decomposed by MST contains different directional sub-bands, indicating the direction information of the high frequency part of the image. For example, DWT contains three directions that are horizontal, diagonal (45 • , 135 • ) and vertical. Similarly, FFST also contains four different parts. Figure.5 is the frequency domain segmentation of FFST, and the frequency domain plane is divided into different regions, horizontal cones and vertical cones. The number of direction angles of high frequency sub-bands in each scale is greater than or equal to 4. In consequence, the number of main directions of the low-frequency sub-band should be less than or equal to 4. In order to consider the directivity of the low-frequency sub-band more comprehensively, we learn the directional dictionaries with four direction angles in the low frequency part. Thus, the low-frequency image patches with direction can be better represented in its corresponding directional dictionary, making the dictionary learning process more efficient.
To be specific, we randomly sample 60,000 image patches from thirty natural images that don't contain the images used in the experiment. The smooth image patches with small variance are removed to obtain 53,896 image patches with structural information. According to the principle of supervised classification and consistent gradient information, four groups, each group has one direction, are obtained from these patches. The corresponding directional dictionary is respectively learned from one of the four groups, and the number of atoms of each sub-dictionary is set to 256. In order to represent image patches with no clear single direction, such as cross points etc., we also learn a full dictionary from all image patches.
In this paper, the K-SVD method is used to learn the directional dictionary [27], [28]. The results of the directional dictionaries are shown in Figure.6. Figure.6(b) is the fully dictionary, and Figure.6(c)-(f) are respectively four directional dictionaries.

C. LOW FREQUENCY SUB-BAND FUSION ALGORITHM
Let {L 1 , L 2 , · · · , L M } be the low frequency sub-band. For the M patches at the same position, if the maximum variance of them is higher than the threshold, they are regarded as a set of strong information patches, and the patch with the largest variance is taken as the reference patch. According to the reference patch, the directional sub-dictionary is selected, which is used to represent the M patches sparsely. Then the coefficients are fused with L 1 -Max rule.
With the learned sub-dictionaries D k t , t = 0, 1, 2, 3, 4, the fusion algorithm for strong/weak patches in the low frequency part is as follows.
1) For each source image L m of {L 1 , L 2 , · · · , L M }, apply the sliding window technique to extract all possible patches, the size of patches is H , l k m (a, b) denotes the patch with position k in L m , where Otherwise, perform Step.2-Step.4, as shown in Figure.  wherec k m is the mean value over all the elements of c k m , 1 denotes an all-one valued H × 1 vector. Pick out c k J which has largest variance among ĉ k 1 ,ĉ k 2 , · · · ,ĉ k M . Then the adaptively selected dictionary {D t } 4 t=0 is selected based on its gradient information [19].
where ε > 0 is error tolerance. Merge the M obtained sparse vectors α k 1 , α k 2 , · · · , α k M by For each c k F , reshape it into H × H patch l k F . Let L F denote low frequency fused coefficient, and then plug l k F into its original position k in L F . As patches are overlapped, the value of each pixel in L F is averaged over its accumulation times.

V. EXPERIMENTS
A. EXPERIMENTAL PREPARATION 1) SOURCE IMAGES As shown in Figure.8, seven pairs of source images are employed to verify the effectiveness and efficiency of the proposed fusion algorithm [29].  [22], [23], MST(L,dM,R,dD) proposed in this paper and ASR proposed in [19]. Where L is decomposition level, M means 'max', A means 'average', D means dictionary coefficient max rule for low-frequency part, R is the negative power of the threshold value, dD means the low frequency fusion rule based on directional dictionaries, dM means high frequency fusion rule based on direction contrast, Ms means the absolute values of coefficients as activity level measurement rule for high-frequency part, ASR means a SR-based method.
The image filters for all the DTCWT-methods are selected as 'LeGall 5-3' and 'Qshift-06'; we use the 'pyrexc' as the pyramid filter and the 'vk' as the directional filter for all the NSCT-methods, and the four decomposition levels are selected as 3, 3, 4 and 4, respectively; for all the NSSTmethods, the size of shearing filter matrixes at four decomposition levels are respectively set to 30, 30, 36 and 36, the error tolerance ε = 0.1 [22]. The patch size H = 8 and the neighborhood window size N = 9. All experiments are performed on MATLAB R2017a. The computer processor is Intel(R) Xeon(R) Silver 4110CPU.

3) SELECTION OF THE NEGATIVE POWER R
This section verify the validity with the addition of variance thresholds. Compare MST(L,dM,R,dD) with MST(L,dM,0,dD), where MST(L,dM,R,dD) is fusion method proposed in this paper, R is the negative power of the variance limit, the variance thresholds δ = 10 −R , R = 2, 3, 4; in MST(L,dM,0,dD) fusion method, each low-frequency image patch is represented on the directional dictionary.
In the experiment, the 'Disk' images with different fuzzy region are used to evaluate the fusion performance of different variance thresholds. In terms of visual effects, the fused images are almost no difference and the images are relatively clear in Figure.9.
Except for the visual effects, we also compare the comprehensive evaluation indexes of the fused results. The evaluation indexes of the methods based on NSCT are listed in Table 1. The blod fonts of Table 1 mean better fusion performance. It can be seen that at the same MST and decomposition level, most of evaluation index values obtained by NSCT(L,dM,3,dD) are optimal. So, when adding the variance   Figure. In general, MST(L,dM,3,dD) has the best fusion performance. The low-frequency sub-band patches with variance less than 10 −3 should be merged by the weighted average rule and the low-frequency sub-band patches with variance more than 10 −3 should be merged by fusion rule based on directional dictionaries, which can enhance the fusion result.

4) EVALUATION METRICS
In general, image fusion results can be subjectively and objectively evaluated. On the one hand, the subjective approach is based on psycho-visual. Usually the fusion results are less visually different, therefore it is hard to properly evaluate the results. So two subjective evaluation metrics peak signal to noise-ratio(PSNR) [30] and root mean square error (RMSE) [31] that calculate the difference between the original image and the fusion image can be used as auxiliary. On the other hand, six commonly used objective evaluation metrics are selected to quantitatively measure the experimental fusion effect. These are mutual information metric (MI) [32], human perception based fusion metric (Q CB ) [33], gradient based fusion metric (Q G ) [34], phase congruency based fusion metric (Q p ) [35], structural similarity-based fusion metric(Q Y ) [36] and standard deviation (SD) [37]. As we known, MI is used to measure the amount of information obtained from the source images; Q CB reflects the major features of the human visual system (visual information fidelity); Q G evaluates the range of gradient information extracted from the source images, and measures the clarity degree of the fused image; Q p reflects the information of image salient features such as edges and corners; Q Y measures the level of structural information of the source images preserved in the fusion image; SD is used to measure the richness of the texture information of the fused image. Except RMSE, larger values for other metrics mean better fusion results. The closer RMSE is to zero, the more successful the method is.

B. DIFFERENT ASPECTS OF THE ALGORITHM PERFORMANCE
Ablation study are conducted to prove the influence of R on fusion efficiency, the fusion effectiveness of high/lowfrequency fusion rule.

1) THE INFLUENCE OF R ON FUSION EFFICIENCY
After source images are decomposed by subsampled MST, such as DWT, CVT and DTCWT, if most of the lowfrequency sub-band patches don't contain weak information patches, the thresholding step will take a little more time while keeping the same results. Therefore the efficiency improvement with the threshold is mainly based on nonsubsampled MST, such as NSCT, NSST and FFST.
In order to verify the efficiency improvement, the runtime of MST(L,dM,3,dD) and MST(L,dM,0,dD) are compared. The 'Disk', 'Balloon' and 'Leaf' source images with different fuzzy region are used to evaluate the computational efficiency with or without the variance threshold. The results of run time are severally showed in Table 2, Table 3 and Table 4. The bold fonts of these three tables mean that at the same MST and the same decomposition level, the fusion time of the algorithm proposed in the paper greatly reduces the fusion time.
Based on the same down-sampled MSTs at the same decomposition level, from Table 2, except for DTCWT(L,dM,3,dD) and DTCWT(L,dM,0,dD), the calculation time of MST(L,dM,3,dD) is slightly more than that of MST(L,dM,0,dD), but the difference is less than 0.5 seconds. The reason is that the low frequency sub-band patches based on DTCWT three-level decomposition contain a small number of weak information patches. From Table 3 and Table 4, most of the calculation time of MST(L,dM,3,dD) is less than that of MST(L,dM,0,dD). Based on the same nonsubsampled MST at the same decomposition level, the computation time of MST(L,dM,3,dD) is much smaller than that of MST(L,dM,0,dD).
In terms of computational complexity, The computational complexity of MST(L,dM,3,dD) is O(T N 3 +T N 2 XY +XY + 1 + K (1 + H k 2 + 256 2 + 1)), where T is the number of high   In summary, as long as the low-frequency sub-band patches contain weak information patches, the proposed algorithm MST(L,dM,3,dD) can improve the computational efficiency. The low computational complexity of the proposed algorithm in this paper is verified.   Figure.11 (c)-(i).  Figure.12 (c)-(i). images, and the last two images are the high-frequency fusion results of 'M' and 'dM' method respectively. From Table 5-10, the evaluation indexes of MST(L,dM,R,D) is 93.75% higher than that of MST(L,M,R,D), and the evaluation indexes of MST(L,dM,3,dD) is 93.75% higher than that of MST(L,M,3,dD). It can be obtained that direction contrast is extremely effective for the high-frequency fusion. The high-frequency sub-band mainly contains rich edge characterization and contour information, and the direction contrast can extract more information about contour and edge from the source images. All in all, the high-frequency fusion rule based on direction contrast is helpful to extract more effective information from the high-frequency sub-bands of the source images. The necessity of directionality for high-frequency fusion is verified.

3) FUSION PERFORMANCE OF LOW-FREQUENCY FUSION RULE
Based on the same high-frequency fusion rule, different low frequency fusion strategies are compared. The comparison of MST(L,M,R,D) with MST(L,M,3,dD) and the comparison of MST(L,dM,R,D) with MST(L,dM,3,dD) are used to show the difference effects of single dictionary and directional dictionaries-based. In Figure.11, the first two images are the low-frequency information of the source images, and the last two images are the low-frequency fusion results of 'D' and 'dD' method respectively. From Table 5   that of MST(L,dM,R,D). It can be seen that the low frequency sub-band coefficients chosen by fusion rule based on directional dictionaries are very effective. The low-frequency sub-band is an approximation of the source image at low resolution, including a large amount of basic information of the source image. Using the low frequency fusion rule based on directional dictionaries, the low frequency information can be better extracted, which greatly improves the quality of the fused image. The necessity of directionality for low-pass fusion is verified.

C. ALGORITHM FUSION EFFECTIVENESS
Based on the same MST and the same decomposition level, the subjective vision ( Figure.12-17) and evaluation  Figure.14 (c)-(i).  decomposition level is only discussed in the down-sampled CVT and non-subsampled FFST. Other MSTs are analyzed only at the four level decomposition.

1) EXPERIMENTS BASED ON DWT
In visual effects, the blurred regions of the two source images ( Figure.12(a) and (b)) in Figure.12(c)-(i) become more clear. With the help of quantitative assessments Table 5, the evaluation metric values of fusion result obtained by DWT(4,dM,3,dD) has four optimal values and two sub-optimal value. We can see from Table 5 that the DWT(4,dM,3,dD) method gets the first place for Q CB , Q G , Q Y and PSNR and has clear advantages over the other methods. The results of subjective and objective evaluations indicate that DWT(4,dM,3,dD) method improved the clarity of fused image, and the superiority of the proposed algorithm is verified. The fusion strategy (dM,3,dD) improves the DWT-based multi-focus image fusion results.

2) EXPERIMENTS BASED ON DTCWT
Considering visual comparison, Figure.13(c)-(i) all combine the effective information of the source images Lab A and Lab B. However, except Figure.13(h) obtained by the method proposed in this paper, there is shadow around the head of the person in Figure.13(c)-(i). The better fusion results in Figure.13(h) verify that the fusion strategy DTCWT(4,dM,3,dD) can better extract useful information  Figure. from the source images. While, it can be seen from Table 6 the evaluation metrics of fusion result by DTCWT(4,dM,3,dD) has six optimal values and one sub-optimal value. The comprehensive evaluations further point out the effectiveness of (dM,3,dD) algorithm in DTCWT-based multi-focus image fusion.

3) EXPERIMENTS BASED ON CVT
Compared with the source images Figure.14(a) and Figure.14(b), the fused image Figure.14(c)-(o) successfully saves the focused portion of the two source images. However, after careful observation, it is found that the results based on three-level decomposition of CVT in Figure.14(c)-(h) have obvious fusion unevenness on the rightmost side of the large clock. Referring further to Table 7, the CVT(dM,3,dD) method has three optimal values (Q CB , Q p , PSNR) corresponding high visual fidelity, better description of edges and  corners and low noise. The (dM,3,dD) fusion strategy has its obvious advantages for CVT-based multi-focus image fusion.

4) EXPERIMENTS BASED ON NSCT
In terms of visual effects, Figure.15(c)-(i) have better visual effect than the source images. By means of Table 8, the NSCT(4,dM,3,dD) method has seven optimal values and one sub-optimal value. The obvious superiority of NSCT(4,dM,3,dD) verifies that this method can obtain the largest amount of information from the source images. The combination of NSCT and (dM,3,dD) led to better NSCT-based multi-focus fusion results.

5) EXPERIMENTS BASED ON NSST
It can be seen from Figure.16 that Figure.16(c)-(i) extract useful information of the source images, so that the fused images are all clear. In virtue of quantitative assessments Table 9, the NSST(4,dM,3,dD) have six optimal values and one sub-optimal value. The NSST(4,dM,3,dD) method has a better extraction of the feature information of the source images, such as edges and corners, retaining the most structural information of the source images. Both the fusion results of Figure.16 and the evaluation indexes of Table 9 verify that the (dM,3,dD) fusion strategy can effectively extract the structural information of the source images in NSST-based multi-focus image fusion.

6) EXPERIMENTS BASED ON FFST
It can be found that the fusion result by FFST(L,dM,3,dD) has almost no artifacts and uneven fusion of the clothes and hair of the girl, and the contrast between the trunk and the girl is strong. With the help of Table 10, The FFST(L,dM,3,dD) method have six optimal values and one sub-optimal value. From Figure.17 and Table 10, the combination of FFST and (dM,3,dD) fusion strategy helps to improve the image quality of FFST-based multi-focus image fusion.
The comparison of MST(L,M,R,A), ASR, and MST(L,Ms, R,D) with our MST(L,dM,3,dD) can verify the fusion algorithm proposed in this paper is superior to MST-based method, SR-based method, and their simple combination MST(L,Ms,R,D). The evaluation indexes of MST(L,dM,3,dD) is 93.75% higher than that of MST(L,M,R,A), which indicates that the fusion algorithm proposed in this paper improves the multi-focus fusion result based on MST; the evaluation indexes of )MST(L,dM,3,dD) is 71.88% higher than that of ASR algorithm proposed in [19], which indicates that the fusion algorithm proposed in this paper is superior to the SR-based fusion algorithm ASR; the evaluation indexes of MST(L,dM,3,dD) is 96.88% higher than that of MST(L,Ms,R,D) proposed in [22], [23], which indicates that the multi-focus image fusion results by the proposed algorithm in this paper are better than algorithm combining MST-based and SR-based proposed in [22], [23]. So, the general multi-focus fusion method proposed in this paper is superior to the traditional MST-based method, the SR-based method ASR and the algorithm combining MST-based and SR-based. It is verified that a good multi-scale fusion method should consider the directivity of high/low frequency sub-bands.

VI. CONCLUSION
This paper proposes a novel general multi-focus image fusion method based on multi-scale and multi-direction. In this method, the fusion rule based on direction contrast is used for the high frequency sub-band, and the direction weight is defined according to the high frequency MST sub-band direction. The low-frequency MST sub-band patches are classified according to the variance. The weak information patches are merged by weighted average rule. The strong information patches are merged based on directional dictionaries. Six MSTs (DWT, DTCWT, CVT, NSCT, NSST, FFST) are included in the experiment. Seven fusion rules are compared. The experiment results show that the algorithm proposed in this paper is best in subjective visual effects and comprehensive evaluation indexes. The full use of directionality can better extract the detail, contour and edge information of the source images. When a suitable variance threshold is used, the calculation efficiency has also been greatly improved, especially for the non-subsampled MSTs.