An Infrared and Visible Image Fusion Algorithm Based on LSWT-NSST

Regarding the problems of image distortion, edge blurring, Gibbs phenomena in the traditional wavelet transform algorithm and the loss of subtle features in the Non-Subsampled Shearlet Transform (NSST), and considering the physical characteristics of infrared and visible images, an infrared and visible image fusion algorithm based on the Lifting Stationary Wavelet Transform (LSWT) and Non-Subsampled Shearlet Transform is proposed in this paper. First, since LSWT can quickly calculate and has all advantages of traditional WT, it is utilized to decompose infrared and visible images to obtain low-frequency coefficients and multi-scale and multi-directional high-frequency coefficients, respectively. Second, NSST multi-scale decomposition is used to extract the target features and detailed features of the image from the high and low-frequency sub-bands to obtain new high and low-frequency sub-bands. Third, according to the physical characteristics that low and high-frequency coefficients represent, different fusion rules are designed. Discrete Cosine Transform (DCT) and Local Spatial Frequency (LSF) are introduced in the low-frequency sub-band, and LSF adaptive weighted fusion rules are used in the DCT domain. The fusion strategy improves the regional contrast in the high-frequency sub-band with the spectral characteristics of human vision. Finally, the Inverse Lifting Stationary Wavelet Transform (ILSWT) is used to reconstruct the fusion coefficients to obtain the final fused images. To verify the advantages of the proposed algorithm in this paper, the classic and advanced 9 IR and VI fusion algorithms are selected for subjective and objective comparison. In the objective evaluation, a comprehensive ranking index is designed based on 9 classical indicators. Simulation experiments with 10 IR and VI fusion algorithms prove that the proposed algorithm has better performance and flexibility. The results show that the proposed algorithm in this paper fuses the images with clear edges, prominent targets, and good visual perception, and it outperforms state-of-the-art image fusion algorithms.


I. INTRODUCTION
Image fusion is a technology that performs registration on images obtained by different sensors on the same target; then, it utilizes certain algorithms to remove redundant information and integrate complementary information to generate more suitable fusion images for human visual perception [1]. Recently, image fusion technology obtains a boost and plays a key role in image segmentation and computer vision [2]- [4]. With the development of image fusion The associate editor coordinating the review of this manuscript and approving it for publication was Yong Yang . technology and reduction of hardware costs, higher reliability and comprehensiveness are expected for image fusion. Therefore, infrared and visible image fusion technology has received extensive research by scholars [5].
The infrared sensor and visible sensor can collect complementary information of the same scene. Infrared sensors catch rich thermal radiation information, which can clearly unveil hidden target outlines but cannot catch detailed information. The visible sensor characterizes the object through the spectral reflection, which yields fusion results that are more closely consistent with the human visual system. However, the image quality is limited by the environment and depth VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of field conditions, especially at night and in low-visibility conditions. The fusion of infrared and visible images provides more comprehensive information with high resolution of the visible-light target and clear hidden infrared light target. The technology overcomes the limitation of single sensor-information acquisition and improves the visual effect of the image. Thus, image fusion technology has always been a research hot-spot and plays a key role in the fields of target tracking and detection, medical imaging, military reconnaissance, remote sensing, face recognition, and space exploration [6]- [12]. Currently image fusion algorithms are mainly divided into pixel-based and region-based methods. According to the transformation range, image fusion algorithms are divided into transformation-based and spatial-domain-based methods. Multi-scale fusion based on the transform domain is the mainstream framework, whose core idea is to map the source image to multiple transform domains. Classical multi-scale transform algorithms are: Wavelet Transform (WT) [13], Curvelet Transform (CT) [14], Non-Sampled Shearlet Transform (NSST) [15], Sparse Representation (SR) [16], Non-Subsampled Contourlet Transform (NSCT) [17], multi-resolution singular-value decomposition [18], etc. Image filtering technology based on the spatial domain is an important theory, which processes a single pixel or area pixel of the source image. Typical methods are: non-local mean filtering [19], guided filtering [20], global filtering [21], and bilateral filtering. The key of region-based infrared and visible image fusion technology is to extract distinctive features in the infrared light area, which can be achieved by image segmentation or saliency detection, and regions with strong infrared rays are effectively obtained [22], [23].
In recent years, deep learning has rapidly developed in various fields and been widely studied and applied in image fusion [24], [25]. By learning the weight parameters and loss function in the training layer and verification layer, this method can obtain rich image information with good results. Literature [26] uses a convolutional sparse representation method to extract the features of the detailed layer and use it for image fusion of infrared and visible-light; it has better and more malleable fusion effect than the traditional sparse method. Literature [27] applied CNN to image fusion for the first time, and proposed a multi-focus image fusion algorithm with a deep learning framework. The deep convolutional neural network is trained to extract the focused or de-focused area of the source image, and the fused image is generated through fusion decision post-processing, which is more robust. Literature [28] uses CNN to address two key issues of image fusion: activity level measurement and weight distribution. Then, it fully consider infrared and visible-light imaging methods and local similarity post-processing strategies to adjust the fusion decision map and obtains a good fusion effect. The deep learning method requires high hardware resources and time-consuming calculations. However, with the rapid development of GPU hardware technology, parallel computing and accelerated computing capabilities have been greatly improved, and hardware costs have been greatly reduced. Traditional image fusion technology has certain advantages in handling certain types of problems. Therefore, how to effectively integrate deep learning with traditional algorithms is also a research hotspot.
Recently, deep learning has rapidly developed in various fields and been widely applied in image fusion [24], [25]. Rich image information with good effect is obtained by learning weight parameters and loss functions from the training and verification layers. However, the structures of deep learning networks are usually complicated. When there are many training layers, a lot of training and learning is very time-consuming and introduces serious phenomenon of over-fitting. Moreover, it has high requirements on hardware resources, which are not suitable for the popularization of minicomputers, poor applicability and poor real-time performance.
Fusion methods based on transform domain usually have better fusion performance, but their basic functions and decomposition scales are relatively fixed. Based on this fusion framework, selecting the optimal basis function to better express the source image and design effective fusion rules for the decomposed sub-bands to improve the fusion quality of the image is a challenging research point, and its complexity is also high. The fusion method based on the spatial domain can avoid the transformation and inverse transformation process of the transform domain method. The initial fusion decision map is usually obtained by solving the image's activity degree, but the final fusion decision map requires subsequent optimization processing. This algorithm has a small amount of calculation and is simple and easy to implement, but its fusion performance is poor. The limitation of the image fusion method based on deep learning is that a large number of images need to be trained, and it is difficult to obtain real data of these. If simulating data is used to classify pixels to obtain a fusion decision map, noise is often introduced. Therefore, it is necessary to use an endto-end unsupervised deep network to complete image fusion and obtain high-quality fused images. However, the existing training data does not have a standard reference image, which also brings specific difficulties in end-to-end image fusion. The classification of image fusion methods is not absolute. With the continuous development of related technologies, a variety of technical methods have shown a noticeable trend of cross-fusion.
Stationary Wavelet Transform (SWT) as an improved wavelet can effectively preserve the image texture and edge information. However, this algorithm can only represent the details of the image in three directions (horizontal, vertical, and diagonal), and it is weak for continuous regions. The algorithm performance is poor especially when the source image has complex details and continuous curves. NSCT has multi-scale direction anisotropy and shift invariance, which can effectively remove Gibbs effects [29]. However, NSCT has a complicated structure and a high computation cost.
NSST has strong flexibility and multi-directionality and is more efficient than NSCT. It can also well preserve the edges and curves of the image, but it is weak for subtle features of the image.
These multi-scale fusion methods for image reconstruction have the disadvantages of large computation burden, high complexity, poor real-time performance and high requirements on memory space. Lifting Stationary Wavelet Transform (LSWT) [30] has all advantages of the traditional wavelet and better performance than SWT. It also has fast computation speed, low memory requirements, and significant local characteristics in frequency and spatial domains. The shift invariance can effectively reduce the distortion of the image, but LSWT has poor performance for continuous curves.
Image fusion strategies greatly affect the quality of the fusion image. Multi-scale transform is used to decompose the source image into low and high-frequency coefficients. The low-frequency part gathers the main energy and represents approximate information of the source image. The high-frequency part is the representation of the edge and contour details of the source image. Traditional image fusion strategies mostly obtain high and low-frequency fusion decision maps by filtering the source image or performing calculations on the decomposition coefficients, and they have achieved good fusion results. However, this method ignores the physical characteristics of the high and low-frequency sub-bands, which causes the loss of details and reduces the effect of the fused image.
Discrete Cosine Transform (DCT) can focus on the key features of the source image on a small part of the DCT coefficients, which can concentrate information and compact energy during image processing [31], [32]. Local Spatial Frequency (LSF) can effectively reflect the regional characteristics of the source image, and is often used as the key parameter and key index of the image fusion algorithm [33].
Inspired by the above discussion, and by integrating the advantages of the image multi-scale frequency domain transform and the characteristics of LSWT and NSST, this paper proposes an infrared and visible image fusion algorithm based on LSWT-NSST. The NSST algorithm is used to obtain the continuous curve and edge to compensate for the deficiency of LSWT; LSWT algorithm is used to get subtle image features to make up for the deficiency of NSST. Different image fusion rules are designed according to the physical characteristics of infrared and visible-light, and the representation of high and low-frequency sub-bands. In lowfrequency sub-bands, LSF adaptive weighted fusion rules are used in the DCT domain. In high-frequency sub-bands, a fusion strategy of improving regional contrast is adopted according to the spectral characteristics of human vision. A comprehensive ranking algorithm is designed based on 9 classic indicators for objective evaluation, which greatly enhances the performance of the overall evaluation of the fusion image and decreases subjective recognition. Compared with the 9 advanced fusion algorithms, the main contributions of the proposed algorithm based on LSWT-NSST are as follows: (1) LSWT and NSST are classical algorithms and have optimal fusion quality. By combining the advantages of two algorithms, the effect of image fusion is greatly improved, and the efficiency based on LSWT improves the running efficiency while ensuring image quality. (2) By combining the physical characteristics of infrared and visible-light and the representation characteristics of high and low-frequency sub-bands, we design different image fusion rules in this paper. In the low-frequency part, the LSF adaptive weighted fusion rule is employed in the DCT domain, which greatly improves the target and detail characteristics of the fused image. In the high-frequency part, combining with the visual characteristics of the human eye, an improved regional contrast fusion strategy is proposed, which is more suitable for human vision, especially in image regions with high saliency such as edge contours. (3) In this paper, seven classic indicators are selected, and a comprehensive ranking index is designed, which comprehensively considers the ranking indices of different algorithms in terms of image gray-scale, frequency, etc. Therefore, it greatly enhances the comprehensiveness of the distribution of image indicators. In addition, more consideration is given to the macro visual effect, which decreases the artificial subjective consciousness. (4) The algorithm of this paper has improved the indicators of image decomposition, fusion rules and index evaluation. The improvements of the three aspects are combined to increase the performance of the proposed algorithm. The performance is superior, and the image fusion effect is perfect. The remainder of this paper is organized as follows: The second part introduces the related knowledge background and theoretical algorithm of LSWT-NSST. The third part introduces the infrared and visible image fusion algorithm based on LSWT-NSST in detail. The fourth part shows the experimental configuration and simulation and subjectively and objectively analyzes the IR and VR image fusion effects. The final part is the conclusion of this work.

II. RELATED THEORIES A. LSWT ALGORITHM
In image processing, the multi-scale decomposition based on the transform domain is more widely used and has stronger universality and stability than the model based on the spatial domain. The traditional wavelet as a classic algorithm of multi-scale transform in the field of image processing is a non-redundant decomposition algorithm and does not have shift invariance [34]. Lifting wavelet transform (LWT) overcomes the shortcomings of traditional wavelet, no longer relies on the traditional wavelet convolution operation, and can the construct Compactly Supported Biorthogonal Multi-wavelets in the spatial domain. For image decomposition, the high-frequency component of LWT uses a simple polynomial interpolation method, and the low-frequency component uses a scale function construction method to maintain some overall characteristics of the image. Therefore, the LWT algorithm is easy to implement and has a fast calculation speed. However, LWT does not have shift invariance, and the image fusion has Gibbs effect and serious distortion. To overcome the shortcomings of WT and secondgeneration LWT, this paper adopts LSWT as the multi-scale transformation algorithm. The filter extension is completed by canceling the parity-splitting steps, and the zero-filling operation of the corresponding filter coefficients of the LWT is canceled to achieve shift invariance of the LSWT. LSWT has the advantages of LWT, outstanding local characteristics in the spatial and frequency domains, and shift invariance, which can effectively avoid image distortion problems. The decomposition and reconstruction process is shown in Figure 1. where P and U represent the prediction operator and update operator, respectively. d l+1 and a l+1 are the low and high-frequency coefficients of input signal a l at the (l +1)(th) layer after decomposition by the LSWT algorithm. P l+1 and U l+1 are the prediction coefficients and update filter coefficients of LSWT, as defined in equations (1)- (2).
where p i (i = 0, 1, · · · , m − 1) and u j (j = 0, 1, · · · , n − 1) are the prediction coefficients and update filter coefficients of LWT, respectively; m and n are the numbers of coefficients of prediction operator P and update operator U , respectively.

B. NSST ALGORITHM
The Shearlet Transform inherits the advantages of WT and can realize the optimal sparse representation of the image, where multi-scale decomposition uses down-sampling pyramid filtering, and direction decomposition uses shear wave filtering by a shift window in the pseudo polar grid. The sub-sampling operation in the Multi-directional shear filter makes Shearlet Transform not have shift invariance and cause spectral aliasing.
To avoid this defect and retain the advantages of multiscale decomposition, Easley proposed the NSST algorithm [35]. NSST transform uses a non-subsampled Laplacian Pyramid (NLSP) for multiscale division. Using a two-dimensional convolution, NSST transforms the shear wave filter from a pseudo polar grid to a Cartesian system, which avoids sub-sampling and makes the NSST shift invariant. The decomposition process of NSST is shown in Figure 2.
The NSST image decomposition includes two main steps: (1) Non-subsampled multiscale subdivision. The source image is decomposed by the first layer of NLSP to obtain low-frequency coefficientsf 1 a and high-frequency coefficientsf 1 d . The (k + 1)(th) layer NSLP decomposition is based on the low-frequency components of the k(th)layer. Therefore, after image f is decomposed by k layers of NLSP, one low-pass sub-band and k high-pass sub-bands are obtained.
(2) Localization of direction. Shearlet filter is utilized to realize the direction localization of high-frequency coefficients. NLSP and Shearlet filters are used to make NSST algorithm multi-scale, multi-directional and shift invariant, which can effectively characterize the details of the source image and avoid Gibbs and eclipse phenomena.

C. DISCRETE COSINE TRANSFORM
DCT is a commonly used linear orthogonal transform in the field of image processing, whose outstanding advantage is the independent correlation of data, and it can concentrate the energy of the image in a few low-frequency components in the DCT domain. The DCT transform coefficients correspond to the low, mid, and high-frequency components of the image from the upper-left to the lower-right. The high-frequency coefficient is usually smaller than the low-frequency coefficient, so the energy is mainly concentrated in the low-frequency components.
f (x, y) is a two-dimensional M × N image, and the definition of DCT is shown in equation (3).
The definition of the two-dimensional M × N IDCT is shown in equations (4)- (6).
where F(u, v) is the two-dimensional DCT transformation of the image. M and N are the width and height of the image, and C(u) and C(v) are the compensation coefficients. The low-frequency coefficient of DCT reflects the slow change of pixels, i.e., the image frame. The high-frequency coefficient reflects the rapid change of pixels, i.e., the image details.

III. INFRARED AND VISIBLE IMAGE FUSION ALGORITHM BASED ON LSWT-NSST A. BASED ON THE COMBINED TRANSFORMATION THEORY OF LSWT and NSST
The key of infrared and visible image fusion is to effectively extract and fuse the complementary information of multi-source images. By integrating the advantages of LSWT and NSST algorithms, an infrared and visible image fusion algorithm based on LSWT-NSST is proposed. The multi-scale and multi-directional characteristics of the NSST algorithm cover the deficits of LSWT three-dimensional decomposition to retain more source image information through the redundancy of multi-scale decomposition. The high-frequency coefficients of LSWT decomposition are sparse, and its wavelet basis can fully reflect the texture characteristics of the source image in multiple directions and angles. The LSWT-NSST image fusion algorithm can compensate for the lack of subtle image features in NSST and greatly improve the efficiency. Its block diagram is shown in Figure 3. The flowchart of the LSWT-NSST image fusion algorithm is as follows.
Step 1: Select two infrared light images (IR) and visible-light images (VR) with identical resolution.
Step 2: Perform the LSWT multi-scale decomposition of the infrared image and visible-light image to be fused to obtain a corresponding low-frequency sub-band (LL-Sub-IR and LL-Sub-VR) and multiple high-frequency sub-bands (LH-Sub-IR and LH-Sub-VR). The number of LSWT decomposition layers is set to 3.
Step 4: Perform the DCT conversion on the newly obtained LNLL-Sub-IR, LNLL-Sub-VR, LNHL-Sub-IR and LNHL-Sub-VR. The key features of low-frequency sub-band images are concentrated on a small part of the coefficients in the DCT domain. The window scale of the DCT domain is set to ω = 4.
Step 5: Calculate the LSF value of each low-frequency subband coefficient in the DCT domain. The calculation results are: LNLL-Sub-IR-DLV, LNLL-Sub-VR-DLV, LNHL-Sub-IR-DLV and LNHL-Sub-VR-DLV. Through the calculation, the regional characteristics of the low-frequency sub-band in the DCT domain can be further enhanced. The window scale of the LSF domain is set to ω = 3.
Step 6: According to the characteristics of infrared and visible-light images, design the fusion rules of high and lowfrequency coefficients. The low-frequency sub-band image adopts the LSF value adaptive weighted fusion rule in the DCT domain; the high-frequency sub-band image adopts a fusion strategy based on improving the regional contrast. Details are showns in section III-B (INFRARED AND VISIBLE IMAGE FUSION RULES).
Step 8: Perform the IDCT inverse transformation on the low-frequency coefficients after the fusion of LNLL-Sub-IR-DLV, LNLL-Sub-VR-DLV, LNHL-Sub-IR-DLV and LNHL-Sub-VR-DLV to transform the energy to the NSST domain and obtain the low-frequency sub-band of the NSST domain.
Step 9: Perform the inverse NSST transform on the lowfrequency sub-band obtained in the NSST domain to convert the fusion coefficient to the LSWT domain, and obtain the low-frequency coefficient of the LSWT domain.
Step 10: Perform the inverse NSST transform on the highfrequency coefficients after the fusion of LNLH-Sub-IR, LNLH-Sub-VR, LNHH-Sub-IR and LNHH-Sub-VR to convert the fusion coefficients to the LSWT domain and obtain the high-frequency coefficients of the LSWT domain.
Step 11: Perform ILSWT on the new high and lowfrequency sub-bands to obtain the final fused image F.

B. INFRARED AND VISIBLE IMAGE FUSION RULES
Designing reasonable fusion rules as the key technology of the image fusion algorithm is very important for the fusion effect, since it determines the pixels or coefficients to select and merge into the final image. The image fusion rules require accurate and comprehensive important details and salient features of the source image, and the fusion of images to maximize adaptability to the human visual perception.
The physical properties of infrared and visible images are quite different. Infrared light imaging uses the infrared rays reflected by the target or the thermal radiation generated by the target to detect the target, and the infrared image sensor senses the temperature information of the measured object. The higher the object's temperature, the stronger the infrared spectrum signal, the brighter the infrared image, and the clearer the target. The infrared light image reflects the temperature characteristics of the target, its anti-environmental interference ability is strong, and it has good target detection ability. However, its target resolution is low, the background is fuzzy, and the ability to express details is relatively low. The visible light image sensor mainly uses the visible light information reflected by the object to image the object and can receive the visible spectrum information around the target scene. The visible light image reflects the background and contour information of the image. It has a high spatial resolution, obvious contrast, rich edge and structure texture information, and can better describe scene information. However, the image quality is poor in low visibility and poor lighting conditions.
The image's low-frequency coefficient is also called the approximate sub-band, which contains the main information of the infrared and visible-light images, represents the approximate components and average characteristics of the source image, and concentrates most of the energy of the infrared and visible-light images. The image's high-frequency coefficient is also called the detail sub-band, which describes the detailed information of the infrared and visible-light image, which directly affects the resolution and clarity of the fused image.
Traditional image fusion rules usually adopt the largest absolute value coefficient or weighted average. The former has high fusion efficiency but ignores the average characteristics of the image, which causes image distortion. The latter can reduce the information loss, but lacks the sharpness of the image.
Therefore, combining the characteristics of infrared and visible-light images and the characteristics of their high and low-frequency coefficients to design reasonable fusion rules can fully extract the spectral information of the visible light image and the thermal target information of the infrared image to improve the quality of image fusion effectively.

1) LOW-FREQUENCY COEFFICIENT FUSION RULES
After the source image has been orthogonally decomposed by LSWT and NSST algorithms, the corresponding low-frequency sub-bands and a series of high-frequency sub-bands of various scales and directions are obtained. For non-linear characteristics of the human visual system, a single pixel is closely related to its neighbor pixels. A single-point pixel cannot represent any information and only makes sense to associate with its neighbor set of pixels and be perceived by human vision. Since the low-frequency sub-band concentrates most of the energy of infrared and visible-light images, the key to low-frequency coefficient fusion is to maximize the preservation of important information of the source image, that is, the extraction and preservation of the features of essential regions of the image.
DCT can concentrate information into key features according to the frequency energy. LSF is a commonly used image region representation method, which can effectively select and extract the optimal image features. It has a strong ability to represent regional details and consists of the Local Row Frequency (LRF) and Local Column Frequency (LCF). Inspired by the literature [41], this paper performs the DCT conversion on low-frequency coefficients to obtain the regional characteristics of the DCT domain. Calculating the LSF feature matrix of DCT coefficients can further identify the low-frequency DCT coefficients and key features of the enhanced DCT domain. The formula to calculate the LSF value of the DCT coefficient is shown in equations (7)-(9).
where ω is the window size of LSF, and DCT (i, j) is the DCT coefficient at position(i, j). LRF and LCF are local row frequency and local column frequency, respectively. According to the difference between infrared and visible-light imaging systems and their sensitivity to local intensity and details, a quantitatively similar matching degree of the image region was designed. The similarity of the LSF feature region is defined in equation (10).
Based on the above discussion, the low-frequency coefficient fusion rule is designed as follows. If where (i, j) is the position of the DCT coefficients; FC ij is the fused DCT coefficients; is the similarity of the regional pixels of A and B. A larger S A,B (i, j) indicates that A and B are more similar to each other. Matching threshold T is set to 0.85 in this paper.

2) HIGH-FREQUENCY COEFFICIENT FUSION RULES
The multi-scale and multi-directional high-frequency coefficients obtained by the orthogonal decomposition by LSWT and NSST represent the characteristics of the source image, such as the edge and contour. The high-frequency sub-images decomposed by LSWT only contain three directions: horizontal, vertical and diagonal. The multi-directional NSST decomposition can overcome the problem of insufficient direction of LSWT and further extract the multi-scale and multi-directional details of high-frequency sub-bands. A large absolute coefficient value of the high-frequency represents a sudden change of pixels, and a region with a large sudden change contains rich details. If there is noise in the source image, this method will introduce artificial noise and reduce the quality of the fusion image.
According to the multi-resolution selection mechanism of human eye imaging, human vision is highly sensitive to the local contrast of the image. The contrast of the local direction is used to design the high-frequency coefficient fusion rule, which can fully consider the characteristics of infrared and visible-light. The definition is shown in equation (14).
where C 0 (i, j) is the low-frequency coefficient; C k,m (i, j) is the high-frequency coefficient of the scale k(th) and direction VOLUME 8, 2020 m(th) after the NSST decomposition. Since the contrast of a single pixel has no meaning to the image, if the contrast of a single pixel is directly used in the fusion rule, the similarity between pixels will be dissevered, and noise will be introduced. Therefore, the concept of local region contrast is introduced, which is defined in equations (15)- (16).
where C 0 (i, j) is the low-frequency region mean coefficient; M and N are the width and height of the image area block. The significance of the coefficients varies in the highfrequency region. C k,m (i, j) is calculated by the region average method, which repels the fact that the human eye has a higher degree of attention to the significant regions. An improved local area contrast is designed as defined by equations (17)- (19).
where I i,j is the high-frequency coefficient at location (i, j). D k,m (i, j) is the gradient at (i, j) pixel, whose scale and direction are k and m respectively. E is the regional gradient energy, which reflects both change degree and edge sharpness of the image. By combining with the area contrast, the high-frequency area with high saliency is given greater weight, and the visual result is more suitable for human vision. Based on the above analysis, the high-frequency fusion rule is shown in equation (20).
where F k,m (i, j) is the high-frequency coefficient of fused image F in the k(th) scale and m(th) direction.
To verify the effectiveness of the improved regional contrast fusion strategy for high-frequency coefficients, the "Lake" image pair in the fourth part is selected as the verification data set (Due to the need to control the length of the text, only the "Lake" image team is compared and analyzed, and the algorithm comparison results of other data sets are similar). The high-frequency coefficients of LSWT multi-scale decomposition are subjectively analyzed by three fusion strategies of directional contrast, regional contrast and improved regional contrast from the horizontal, vertical and diagonal directions. Then, we use these three fusion strategies to reconstruct the final fusion image. These three types of reconstructed images are compared and analyzed to calculate the objective evaluation index values of image fusion in the fourth part of the article. For the simplicity of the analysis, the fusion strategies of directional contrast, regional contrast and improved regional contrast are abbreviated as DCS, ACS and IACS algorithms, respectively. The experimental results are shown in Figure 4 and Table 1.
The first, second, and third rows of Figure 4 indicate that the high-frequency sub-bands of the LSWT decomposition of the "Lake" infrared and visible-light high-frequency fusion image are reconstructed in the diagonal, horizontal and vertical directions using the DCS, ACS, and IACS fusion strategies. The fourth row represents the final fusion image using the DCS, ACS and IACS fusion strategies. In the horizontal direction, the red box marks in Figures (a) After improving the regional contrast, the IACS fusion strategy is better than the DCS and ACS fusion quality.
The bold items in Table 1 represent the maximum of these three algorithms for ten evaluation indicators. A larger value indicates a better fusion effect. Table 1 shows that the IACS algorithm ranks first in AV, MI, SD, EN, AG and Q CB , and SF ranks second. The difference between its SF index and ACS algorithm is 0.531, which is not large. The reason is that the single-pixel contrast of DCS splits the local correlation of the image, and the regional mean of ACS smooths the local features, which will introduce noise. SF reflects the spatial activity of the image, and noise will increase SF. For example, the SF values of the DCS and ACS algorithms are 12.6594 and 12.6812, respectively. Q CB and MS-SSIM are visual evaluation indicators, which lead the ACS algorithm and DCS algorithm by 5.57, 6.05 and 6.29, 8.14 percentage points respectively. It can also be verified from Figures (j), (k) and (l). A comprehensive analysis of objective evaluation indicators shows that IACS has the better overall evaluation index and the best fusion effect.   The multi-scale decomposition method conforms to the multi-resolution physiological mechanism of human vision. In the high-frequency coefficients, the human eye has a high degree of recognition of the high salient area or high mutation area of the image. Thus, combining the characteristics of infrared and visible-light, this paper proposes an improved regional contrast fusion strategy for high-frequency images. Through the aforementioned subjective vision and objective evaluation analysis, the algorithm fusion quality ranking is obtained: IACS> ACS> DCS. In general, the images fused by the IACS algorithm have the best fusion effect of prominent infrared targets, clear contours and rich visible-light scene details.

IV. EXPERIMENTAL SIMULATION AND RESULT ANALYSIS A. EXPERIMENTAL CONDITIONS AND SETTINGS
The algorithm experimental environment is as follows: The host is configured with Intel(R) Core(TM) i7. The main frequency is 1.99 GHz and the memory is 8 GB. The experimental simulation platform is MATLAB R2019b.
To objectively evaluate the performance of the fusion algorithm, this paper selects experimental materials from the infrared and visible image libraries. Four classic infrared and visible images were selected for fusion experiments. ''Kayak,'' ''UNCamp,'' ''Bristol Queen's Road'' and ''Lake'' are shown in Figure 5. To verify the effectiveness of the proposed algorithm, the selected four groups of images have different sizes from small to large: 256 × 256, 280 × 360, 496×632 and 576×768; the clear regions of the selected four groups of images must have different positions. In order to more comprehensively verify the advantages of the algorithm proposed in this paper, 21 pairs images in the literature [47] is selected for the experimental analysis of the classic infrared and visible-light image sets. The purpose of the fusion algorithm is to make the fusion image clearly and objectively reflect the real scene and conform to human vision.
As classic algorithms for multi-scale image fusion, NSST and NSCT have better results than other traditional frequency domain methods. As classic algorithms for image fusion based on the spatial domain, CBF and GF have achieved good results. In particular, CBF smooths images through a nonlinear combination of neighborhood pixel values and has strong edge preservation. For LLRR, DCLSF and SD, which are improved multi-scale fusion algorithms proposed in recent years in the field of image fusion such as multifocus, infrared and visible-light, the fusion effect has been greatly improved in both subjective and objective aspects. CNN1 and CNN2 are two deep learning image fusion algorithms proposed in recent years, where a deep convolutional neural network architecture is trained to extract the features of the source image in depth. The fused image has clear targets, rich detailed information, more obvious advantages and greatly improved performance than the traditional image fusion methods. Therefore, this paper selects these 9 algorithms as performance comparison algorithms, which are feasible for evaluating the advantages of our algorithm.

B. MAIN PARAMETER SETTINGS
The rationality of the image fusion algorithm parameter setting determines the level of algorithm fusion performance directly. Generally, the more layers decomposed by the multi-scale algorithm, the richer the detailed information expressed by the image will be. However, the increase in the number of decomposition layers often leads to a sudden increase in the amount of calculation. Therefore, in using the LSWT algorithm and the NSST algorithm to perform the first-layer and second-layer multi-scale decomposition, we must also consider the fusion performance and the amount of calculation to select the parameters. When using the LSWT algorithm to carry out the first-level decomposition, the literature [30] and [44] are referred to, and the decomposition scale of LSWT is determined to be 3 through experimental analysis. When using the NSST algorithm for the second-level decomposition, reference was made to the literature [15], [26] and [33], and the number of multi-scale filter directions corresponding to the high-frequency was determined to be [4 4 4] through experimental analysis. The corresponding multi-scale filter direction number is [3 3 3].
In the experiment, we found that with the increase of the LSWT and NSST algorithm's decomposition scale, their fusion effect will be improved, and the calculation will be increased, which is very time-consuming. After careful consideration, we select the decomposition level of LSWT and NSST algorithms to be 3. There will be a small amount of high-frequency information in the low-frequency sub-band of the LSWT domain, and a small amount of low-frequency information will also be contained in the high-frequency sub-band of the LSWT domain, so the scale setting is not appropriate to be too large when performing the second layer decomposition. The selection of the DCT window and the LSF window refers to the literature [38], and the settings are the same. For the selection of the fusion threshold of the lowfrequency sub-band LSF domain and the high-frequency subband contrast area, because the adjustable range of these two parameters is small and the setting is easy, the general setting can be used.
The parameter setting is vital for the image fusion algorithm. Based on the above analysis, we reasonably set the parameters of the algorithm proposed in this paper to obtain a better fusion effect. The specific parameter settings are as follows: the LSWT decomposition level is set to 3; the NSST decomposition level is set to 3. Considering the difference in image high and low-frequency representative features, for LSWT multi-scale decomposition coefficients, the number of multi-scale filter directions corresponding to high frequencies is set to [4 4 4]; its low frequency corresponding multi-scale filter direction number is set to [3 3 3]. After the three-layer decomposition, the high-frequency coefficients of the LSWT domain have insufficient directivity. The filter in the high-frequency direction of NSST can compensate for the insufficiency of the high-frequency decomposition direction of LSWT and obtain more edge and contour information. After the threelayer decomposition of the low-frequency coefficients in the LSWT domain, the low-frequency sub-band contains a lot of detailed information and its low-frequency coefficients have a good fusion effect, but the low-frequency sub-band contains some high-frequency information, so better details and edge features in the low-frequency sub-band filtered by the NSST low-frequency direction filter can be obtained. The DCT window size is set to 4 × 4, which greatly affects the image fusion performance. The LSF window is set to 3 × 3, in case that it is too large to produce large redundant information. The fusion threshold is set to T = 0.85 for LSF low-frequency sub-band. The high-frequency sub-band region contrast is set to 3 × 3.

C. EVALUATION OF THE EFFECT OF IMAGE FUSION
The rationality of the image fusion algorithm determines the quality of the fusion image, while the fusion quality is another important index. The fusion image has variable measurement standards because of application purposes or scene. Therefore, the comprehensive use of multiple evaluation criteria can better determine on the fusion effect. VOLUME 8, 2020 Currently evaluation methods are divided into subjective and objective methods.

1) SUBJECTIVE EVALUATION
The differences between source image and fused image are directly evaluated by a subjective method with human vision system and mainly reflected by the image registration and clarity. The subjective method is suitable for images with significant differences, and it is relatively simple and intuitive, so it is an important method to determine the performance of the fusion images. However, this method is susceptible to the subject's knowledge level and subjective consciousness, which makes it difficult to set the standard scale and has great one-sidedness. Therefore, the comprehensive use of subjective and objective methods can make more accurate judgment on the fusion effect.

2) OBJECTIVE EVALUATION
The objective evaluation method can quantify the effect of image quality and effectively reduce the effect of human subjective factors. A quantifiable performance parameter is used to determine the pros and cons of each image fusion algorithm. Ten objective evaluation indicators are used in this paper: Average Value (AV), Mutual Information (MI), Standard Deviation (SD), Spatial Frequency (SF), Information entropy (EN) [41], Average Gradient (AG) [42], Q CB [43], Edge strength coefficient(Q ABF ) [45], Multi-scale structural similarity(MS-SSIM) [46] and Comprehensive Ranking Index (CRI).

a: AVERAGE VALUE
AV is the average brightness of the fused image. Larger AV implies a brighter image, whose mean is defined in equation (21).
where F(i, j) is the pixel value of the fusion image at position(i, j); M and N are the width and height of the image.

b: STANDARD DEVIATION
SD is the degree of dispersion between the single pixel and the average pixel of the image. Larger SD implies that the image has higher contrast, wider gray value distribution, and more image information. The definition of standard deviation is shown in equation (22).
where F(i, j) and µ are the gray value and mean value at (i, j) of the fusion image, respectively.

c: INFORMATION ENTROPY
EN is used to calculate the information richness and reflects the amount of information in the fused image. A larger EN indicates that the fused image has richer information and higher quality. The definition of the entropy value is shown in equation (23).
where L is the total gray level of the image. P i is the ratio of the number of pixels that the gray value is i to the total number of pixels of the image.

d: SPATIAL FREQUENCY
SF is the overall activity of the fusion image in the spatial domain. A larger SF corresponds to more image texture and edge information and higher quality of the fusion image. It is mainly composed of spatial Row Frequency (RF) and spatial Column Frequency (CF) and is defined as shown in equations (24)- (26).
MI is the degree of correlation information between the source image and the fused image. Larger MI implies stronger correlation, higher retention of the source image, and lower image distortion. The definition of mutual information is shown in equation (27).
where L is the total number of gray levels of the image. P AB and P F are the normalized histogram of the source image AB and fusion image F. P AB,F is the combined gray histogram after the normalization of the source image and fusion image.

f: AVERAGE GRADIENT
AG is the degree of image detail and texture changes. Higher AG implies that the fused image has more prominent texture and detail changes and contains more content. The definition of the average gradient of the image is shown in equation (28).
where M and N are the width and height of the image. I x and I y are the differences between directions x and y, respectively.

g: Q CB
Q CB [43] is an evaluation index based on human visual perception. This method uses the Contrast Sensitivity Function (CSF) to calculate the local contrast of each image. Suppose that the input source image is I A and I B , and the fusion image is I F , then its definition is shown in equations (29)- (30).
where λ A and λ B are the salient maps of the source image I A and I B , respectively. Q AF ∈ [0, 1] and Q BF ∈ [0, 1] are the fidelity of information from source images I A and I B to fusion image I F , respectively. Q CB is the average value of the entire fusion quality map Q C .

h: EDGE INTENSITY COEFFICIENT
Q ABF [45] quantifies the amount of information retained on the edge of the image. It reflects the amount of edge information obtained from the source image of the fused image. Q ABF ∈ [0, 1], the closer the Q ABF value is to 1, the more abundant the edge information of the source image is retained in the fusion image, and the better the fusion quality of the image. The definition is shown in equation (31).
where (i, j) is the pixel position, M and N are the size of the image. Q AF and Q BF represent the edge strength of the source image A and B and the fused image F respectively. ω A (i, j) and ω B (i, j) represent the quantization weights of Q AF and Q BF respectively.

i: MULTI-SCALE STRUCTURAL SIMILARITY
MS-SSIM [46] is an indicator based on the human visual system. It is suitable for extracting structural information from the scene, and its measurement can better approximate the visual perception of better image quality. MS-SSIM can make SSIM measure the structural similarity between multi-scale images. The larger the value is, the better the fusion image effect. The definition is shown in equation (32).
where l is the brightness comparison between images A and B, c is the image contrast, s is the image structure, α, β, and γ are the relative importance of adjusting image brightness, contrast and structure, and S is the image scale.

j: COMPREHENSIVE RANKING INDEX
To evaluate the image fusion effectiveness of the proposed algorithm as a whole, a Comprehensive Ranking Index (CRI) was designed based on the above seven objective evaluation indices. The brightness, contrast, amount of information and details of the fusion image are comprehensively considered. The design idea is as follows: a. Calculate the ranking of each fusion algorithm in this index in sequence. The value is S j ai , i = 1, 2, · · · , Z , j = 1, 2, · · · , R. b. Calculate the single index score. If the ranking is S ai , the single score is Z − S j ai + 1. c. Calculate the comprehensive index score and weighted sum. d. Normalize the index score. The definition is shown in equation (33).
where R is the number of indicators, and Z is the number of algorithms.

3) EXPERIMENTAL RESULTS AND DISCUSSION
The experiment verifies the effectiveness of the algorithm in this paper through the individual experimental analysis of four groups of classic infrared and visible-light images and overall fusion experiment of 21 pairs of classic infrared and visible-light image sets. In order to better compare and analyze the performance of the algorithms, we marked the top three indicators in bold. The value in bold blue indicates that it ranks first in the indicator and the fusion quality is the best, the value in bold green indicates that it ranks second in the indicator and the fusion quality is second, and the value in bold red indicates that it ranks third in the indicator and the fusion quality is third.
(1) The first set of source image pairs ''Kayak,'' the resolution is 256 × 256, which is the night view of the city including pedestrians, vehicles and buildings and other visible and infrared image scenes. The detailed information of IR and VI images is extracted to determine the performance of the fusion algorithm. The source image and fused images by all algorithms are shown in Figures 6 (a)-(k). From (c), (d) and (h) of Figure 6, the global contrast of the image is low. Figures (c), (d), (e), (g) have dotted artificial noise at the lights. In Figures (f) and (j), the bright light of the car lights disappeared, and the extracted infrared light information is too much, which causes the distorted visual effect. The brightness of the street lamps in Figures (c), (d), (e), (f), (g), (i) and (j) is obviously darker, and there are noises or distortions around the street lamps in several figures. In contrast, CNN2 based on deep learning and the algorithm based on LSWT-NSST proposed in this paper have close fusion effects from a visual viewpoint, and the discrimination is not obvious, but they are better than the other 8 fusion algorithms. Thus, these two algorithms can effectively extract the important features of infrared and visible images, and the resulting fusion image is clearer and has better visual effects than other methods as shown in Figures (k) and (l). Table 2 lists the fusion quality indicators of the first pair of image fusion methods. From the perspective of these 10 objective evaluation indicators, the proposed algorithm in this paper ranks first for indicators AV, SF, EN and CRI, second for indicators SD and AG, and third for indicators MI, Q CB , Q ABF and MS-SSIM. In particular, it is 0.8889 on CRI, which leads the DCLSF, CNN1 and CNN2 algorithms by 14.45, 16.67 and 26.67 percentage points, respectively. The fusion frameworks CNN1 and CNN2 based on deep learning have certain advantages for certain indicators such as MI, SD and MS-SSIM, but the proposed algorithm index in this paper ranks higher as a whole, which indicates that the algorithm in this paper is optimal.
(2) The second set of source images is ''Bristol Queen's road'' with a resolution of 280 × 360. The person sheltered by trees in the IR image is clearly visible, but the details of the surrounding environment are blurred. The details of trees and fences in VI are very clear, and the contrast is higher, which is more suitable for human vision, but the person is not visible.  Table 3 lists the fusion quality indicators of the second pair of image fusion methods. From the perspective of these 10 objective evaluation indicators, the algorithm proposed in this article ranks first for indicators AG, SF and CRI, second for indicators MI and AG, third for indicators SD EN, Q ABF and MS-SSIM. Although it ranks fourth for indicator Q CB , the difference with the first is only 0.011, which is very small. CRI is 0.8556, which leads the second to fourth by 7.78, 23.34 and 25.56 percentage points, respectively. The fusion frameworks CNN1 and CNN2 based on deep learning have certain advantages for certain indicators such as MI, SD, EN, Q ABF and MS-SSIM. The visual effects of CNN2 and the proposed algorithm are better than other algorithms. From the overall evaluation and analysis, the algorithm proposed in this paper has the best effect.  Table 4 lists the fusion quality indicators of the third pair of image fusion methods. From the perspective of these 10 objective evaluation indicators, the proposed algorithm in this paper ranks first for indicators AV, EN, MS-SSIM and CRI, second for indicators SF, AG and Q CB , and third for indicators MI, SD and Q ABF . The CRI is 0.8333, which leads the second to fourth by 8.89, 14.44 and 27.77 percentage points, respectively. The CBF algorithm has much larger SF and AG indicators than other algorithms because it introduces a lot of artificial noise, as shown in the figure. Fusion frameworks CNN1 and CNN2 based on deep learning have certain advantages for certain indicators such as SD, Q CB and Q ABF . The effect of most fusion algorithms is not subjectively ideal. However, the proposed algorithm in this paper, CNN2, and DCLSF algorithms have clear contours, high contrast and better results than other algorithms. VOLUME 8, 2020    The comprehensive analysis shows that compared with other algorithms, the infrared and visible images fused by the algorithm in this paper have rich scenes, clear target contours, and strong contrast, but there are distortions in some parts, so the visual effect must be further improved. Table 5 lists the fusion quality indicators of the fourth pair of image fusion methods. From the perspective of these 10 objective evaluation indicators, the proposed algorithm in this paper ranks first for indicators AV, EN and CRI, second for indicator AG, third for indicator SF, Q ABF and MS-SSIM, and fourth for indicators MI, SD and Q CB . In particular, the comprehensive index CRI is 0.8222, which leads the second to fourth by 5.55, 10.00 and 15.55 percentage points, respectively, and is nearly 20 percentage points ahead of other algorithms. The fusion frameworks CNN1 and CNN2 based on deep learning have certain advantages for certain indicators such as MI, SD and MS-SSIM. The overall evaluation of this algorithm is better than other algorithms in terms of fusion quality.
(5) Comprehensive Experiment Select twenty one sets of classic infrared and visible-light images for experimental verification.
(6) Comparison of calculation efficiency Figure 10 shows eight pairs of images randomly selected from the classic 21 pairs of infrared and visible-light images, and images fused using the algorithm proposed in this paper. From the fused images in Figure 10, it can be seen that the image fused using the algorithm in this paper has clear targets and outlines, contains more detailed information and contains less noise, and the overall fusion effect is good. Table 6 uses different fusion algorithms to calculate each evaluation index's average value for 21 classic infrared and visible-light image sets. Table 7 shows the EN objective evaluation index table for 21 pairs of classic infrared and visible-light images using different fusion algorithms. Figure 11 is a graph of EN objective evaluation indicators drawn using different fusion algorithms for 21 pairs of classic infrared and visible-light images. It can be seen from Table 6 that the fusion image index of the algorithm proposed in this paper is in the top three of the ten evaluation indexes. It is ranked first in indicators MV and CRI, second in indicators SF, EN, AG and Q CB , and third in indicators MI, SD, Q ABF and MS-SSIM. Especially in the CRI index, it leads the second to fourth places by 4.45, 15.56, and 24.45 percentage points. It shows that the algorithm proposed in this paper has a better overall effect on 21 pairs of image sets. It can be seen from Table 7 that the EN index of the fused image of our proposed algorithm is in the top three overall on the 21 pairs of image sets, but it is ranked fifth on image13 and image17, which is differs from the first place by 0.5229 and 0.3885 respectively. The gap is not big. It can also be seen from Figure 11 that the EN index of the image fused by the algorithm proposed in this paper is shown in the red pentagon, which is at the forefront of the 21 pairs of image sets as a whole. The EN index reflects the average information and texture richness of the image. It can be concluded that the fusion image of the algorithm proposed in this paper contains more information and rich texture information than other algorithms, and the overall fusion effect is the best. (6) Comparison of calculation efficiency The running time index cannot be ignored for the objective evaluation of the image fusion method. In this paper, four pairs of IR and VI images of different sizes in Figure 5 are used as examples of the calculation cost analysis, and the average value is run 10 times respectively. Its running efficiency is shown in Table 8. To conveniently compare and analyze the algorithms, the longest and shortest running times of the algorithm are bolded. The bold red value indicates the longest running time among all methods, and the bold blue value indicates the shortest running time among all methods. Table 8 shows that when the size of the images increases, the average running time of each algorithm increases. To facilitate comparative analysis, the maximum and minimum average fusion times of various algorithms of the four groups of image sets are displayed in bold red and bold blue, respectively.
The GF algorithm has the shortest running time for the four image sets; the largest-size "Lake" image pair algorithm run for only 1.0273 seconds; the CNN1 algorithm has the longest running time on the four image sets; the running time of the largest-size "Lake" image pair algorithm was 208.2961 seconds. In the "Kayak" image pair, the running VOLUME 8, 2020  time of this algorithm is only shorter than that of CNN1 and higher than those of the other 8 algorithms. In the "UNCamp" image pair, the running time of the algorithm in this paper is shorter than the deep learning CNN1 and CNN2 algorithms and longer than the other 7 algorithms. In the "Road" image pair, the efficiency of the algorithm in this paper has been greatly improved and is lower than those of CNN1, LLRR, CNN2 and DCLSF algorithms. In the "Lake" image pair, the efficiency of this algorithm continues to improve, which is far lower than the 124.987 seconds for CNN1 algorithm and 100.0626 seconds for LLRR algorithm. When the image size continues to increase, the running efficiency of the algorithm in this paper will continue to improve, which has obvious advantages compared to multi-scale decomposition algorithms and deep learning.
However, even in large-scale image collections, the running time of the algorithm in this paper is only medium and much longer than the CBF, GF, SD and NSST algorithms.  The reason is that CBF and GF algorithms are spatial fusion algorithms, and their processing is pixel-based and does not undergo multi-scale decomposition, so the time is shorter. Although SD and NSST are multi-scale decomposition algorithms, SD only performs two-level decomposition, which is far lower than the decomposition scale of the algorithm in this paper. The LSWT-NSST algorithm is based on NSST multi-scale decomposition; to obtain a better fusion effect, the decomposition scale is larger than that based on the traditional NSST algorithm, so the running time will be longer than that of NSST algorithm. NSCT, LLRR and DCLSF are multi-scale decomposition algorithms and relatively complicated. The LLRR and DCLSF algorithms introduce many mathematical calculations to improve the fusion rules. The LSWT algorithm in this algorithm runs faster than the SWT algorithm of the DCLSF algorithm. Because we introduce NSST into the LSWT algorithm, the performance is mediocre in small-size image sets, but the efficiency is significantly improved in large or extra-large image sets. CNN1 and CNN2 are recently popular deep learning fusion algorithms. The image fusion quality of this algorithm is high, but it runs slowly on the CPU, as shown for the ''Lake'' image set.  However, with the development of GPU and other hardware acceleration technologies and the reduction of costs, the running efficiency of deep learning will continue to accelerate.
The algorithm in this paper takes advantage of the fast calculation speed, saved memory, reduced storage space, and easy realization of inverse transformation of LSWT, which improves the running efficiency of the algorithm. It not only guarantees the quality of the fusion image but also reduces the running time of the algorithm. Because the running efficiency of the algorithm in this paper is medium, the running speed of the algorithm needs to be further improved.

V. CONCLUSION
This paper proposes an infrared and visible image fusion algorithm based on LSWT and NSST, which effectively utilizes the complementary advantages of LSWT and NSST multi-scale decomposition. First, the LSWT algorithm is used to perform a multi-scale decomposition of the IR and VI images to obtain low and high-frequency sub-band coefficients. Second, the high and low-frequency sub-bands of the LSWT domain are used in the multi-scale decomposition of NSST to further extract the target features and detailed features of the source image. The NSST algorithm can compensate for the insufficiency of the LSWT algorithm in decomposing the continuous curves and edges of the image; the LSWT algorithm can compensate for the disadvantage of the NSST algorithm in decomposing subtle features of the image. Through the high and low-frequency coefficients of the NSST domain, the target features of the low-frequency sub-bands of the LSWT domain and detailed features of the high-frequency sub-bands of the LSWT domain can be enhanced. Third, by combining the characteristics of IR and VI images and the characteristics of high and low-frequency coefficient representation, we design different fusion strategies for image fusion rules. The low-frequency part introduces the DCT algorithm and LSF features; then, adaptive weighted fusion rules are designed by LSF to enhance the regional characteristics of the DCT. The high-frequency part combines the imaging mechanism of human vision to design an improved regional contrast fusion strategy. Finally, IDCT, INSST and ILSWT algorithms are used to generate the final fusion image.
This paper conducts individual fusion experiments on four groups of classic infrared and visible-light images and conducts overall fusion experiments on 21 pairs of classic infrared and visible-light image sets. Nine classic and advanced image fusion algorithms are selected to compare fused images' subjective and objective effects with the algorithm proposed in this paper. Based on the objective evaluation, nine classic evaluation indices are selected, and a comprehensive ranking index is designed, which realizes the comprehensive consideration of image brightness, chroma, contrast, etc. The experimental results were comprehensively analyzed from the subjective and objective aspects. In terms of visual perception, the fusion image target with clear edges and high contrast in this paper is prominent. The ten objective evaluation indices are also higher than other algorithms, and the running efficiency is moderate. In summary, both subjective vision and objective evaluation show that the algorithm fusion image in this paper has the best effect and high quality and is an effective IR and VI image fusion algorithm.
Although the algorithm in this paper has a better fusion effect and a higher evaluation index than other algorithms, it faces some limitations in the running time of the algorithm, which reduces its performance of the algorithm to a certain extent. With the continuous development of deep learning technology, deep learning has achieved remarkable results in CV fields such as target detection, image recognition and image noise reduction, and the application in image fusion will also become more popular. The traditional image fusion method has certain advantages in some fields. In future research, the authors will combine advanced deep network models and traditional image processing algorithms to further extract multi-dimensional features of the image and perform unsupervised end-to-end image fusion. In terms of running efficiency, the suitable GPU acceleration technology or FPGU real-time processing technology is adopted to further improve the running efficiency and real-time performance of the algorithm. By improving the algorithm, we hope to obtain better fusion results.